Use of this document

This is a study note for using \(igraph\) package for Network Data. We also look quickly at some of the unique challenges posed by the problem of visualizing large network graphs. For more details on the study material see:

Extra reading:

Prerequisites

library(igraph) #creating, decorating, and assessing basic properties of network graph
library(sand) # collection of data sets

1. Create Network Graphs

1.2 Undirected and Directed Graphs

Formally, a graph G = (V,E) is a mathematical structure consisting of a set V of vertices (also commonly called nodes) and a set E of edges (also commonly called links), where elements of E are unordered pairs {u, v} of distinct vertices u, v ∈ V.

g <- graph.formula(1-2, 1-3, 2-3, 2-4, 3-5, 4-5, 4-6, 4-7, 5-6, 6-7)
dg <- graph.formula(1-+2, 1-+3, 2++3);
par(mfrow=c(1,2))
plot(g)
plot(dg)

The number of vertices N_v = |V | and the number of edges N_e = |E | are sometimes called the order and size of the graph G, respectively.

  • Representations for Graphs
V(dg)
## + 3/3 vertices, named, from 69170c1:
## [1] 1 2 3
vcount(dg)
## [1] 3
E(dg)
## + 4/4 edges from 69170c1 (vertex names):
## [1] 1->2 1->3 2->3 3->2
ecount(dg)
## [1] 4
get.adjacency(dg) # adjacency matrices
## 3 x 3 sparse Matrix of class "dgCMatrix"
##   1 2 3
## 1 . 1 1
## 2 . . 1
## 3 . 1 .
list.vertex.attributes(dg)
## [1] "name"

1.2 Data Frames as Input

head(elist.lazega)
##   V1  V2
## 1 V1 V17
## 2 V2  V7
## 3 V2 V16
## 4 V2 V17
## 5 V2 V22
## 6 V2 V26
head(v.attr.lazega)
##   Name Seniority Status Gender Office Years Age Practice School
## 1   V1         1      1      1      1    31  64        1      1
## 2   V2         2      1      1      1    32  62        2      1
## 3   V3         3      1      1      2    13  67        1      1
## 4   V4         4      1      1      1    31  59        2      3
## 5   V5         5      1      1      2    31  59        1      2
## 6   V6         6      1      1      2    29  55        1      1
g.lazega <- graph.data.frame(elist.lazega,
                             directed="FALSE",
                             vertices=v.attr.lazega)
g.lazega$name <- "Lazega Lawyers"
summary(g.lazega)
## IGRAPH 69b1e5d UN-- 36 115 -- Lazega Lawyers
## + attr: name (g/c), name (v/c), Seniority (v/n), Status (v/n),
## | Gender (v/n), Office (v/n), Years (v/n), Age (v/n), Practice
## | (v/n), School (v/n)
list.vertex.attributes(g.lazega) # find appropriate the attr for coloring vertex
## [1] "name"      "Seniority" "Status"    "Gender"    "Office"    "Years"    
## [7] "Age"       "Practice"  "School"
vertex.c <- as.numeric(as.factor(V(g.lazega)$Practice)) # use attr Practice to coloring
plot(g.lazega, vertex.color=vertex.c)

2. Visualizing Network Data

2.1 Graph Layouts

# lattice data
g.l <- graph.lattice(c(5, 5, 5))
data(aidsblog)
# large netword data
data(fblog)
party.names <- sort(unique(V(fblog)$PolParty)); party.names
## [1] " Cap21"                   " Commentateurs Analystes"
## [3] " Les Verts"               " liberaux"               
## [5] " Parti Radical de Gauche" " PCF - LCR"              
## [7] " PS"                      " UDF"                    
## [9] " UMP"

2.1.1 Circular

Application: low level of edge-crossings through the center of the circle

Drawing methods: ordering by degree and grouping by common vertex attributes.

igraph.options(vertex.size=3, vertex.label=NA, edge.arrow.size=0.5)
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
plot(g.l, layout=layout.circle) 
title("5x5x5 Lattice")
plot(aidsblog, layout=layout.circle) 
title("Blog Network")

2.1.2 Radial and Tree

Application: tree structure or radical structure.

Drawing methods: N/A

g.tree <- graph.formula(1-+2,1-+3,1-+4,2-+5,2-+6,2-+7, 3-+8,3-+9,4-+10)
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
igraph.options(vertex.size=30, edge.arrow.size=0.5,
vertex.label=NULL)
plot(g.tree, layout=layout.reingold.tilford(g.tree, circular=T))
plot(g.tree, layout=layout.reingold.tilford)

2.1.3 Energy-placement method: Kamada and Kawai

Application: visually appealing and relaxed state since the system energy of the position is ther minimum. Methods based on multidimensional scaling (MDS), which have a long history in the social network literature, are of this type.

Drawing methods: Energy-placement methods of graph drawing define a noteion of energy as a function of vertex position. A vertex placement is chosen which minimizes the total system energy.

igraph.options(vertex.size=3, vertex.label=NA, edge.arrow.size=0.5)
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
plot(g.l, layout=layout.kamada.kawai)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout.kamada.kawai)
title("Blog Network")

2.1.4 Spring-embedder Method: Fruchterman and Reingold

Application: creating useful drawings to exploit analogies between the relational structure in graphs and the forces among elements in physical systems

Drawing methods: spring-embedder methods of graph drawing define a notion of force for each vertex in the graph depending, at the very least, on the positions of pairs of vertices and the distances between them, and seek to iteratively update the placement of vertices until a vector of net forces across vertices converges.

par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
plot(g.l, layout=layout.fruchterman.reingold)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout.fruchterman.reingold)
title("Blog Network")

2.1.5 Spring-embedder Method: DrL for visualizing large networks

Despite their sophistication, for all of the methods described so far, the graph draw- ings will tend to look increasingly cluttered as the number of vertices Nv nears 100 or so—and simply unintelligible for thousands of vertices or more—due to the finite- ness of the available space and resolution.

Application: balance the detail with which both local and global structure by clustering for the purpose of visualizing large networks

Drawing methods: DrL is based on VxOrd, an enhanced version of the spring-embedder methodology that attempts to place vertices in clusters on the two-dimensional plane, with the help of sophisticated optimization methods to more efficiently search the space of possible graph drawings and a grid that helps reduce computation time from the typical O(Nv^2) down to O(Nv).

summary(fblog)
## IGRAPH NA UN-- 192 1431 -- 
## + attr: name (v/c), PolParty (v/c)
party.nums <- as.numeric(as.factor(V(fblog)$PolParty))
str(party.nums)
##  num [1:192] 3 3 3 3 3 3 3 9 9 9 ...
set.seed(42)
par(mfrow=c(1, 2))
l <- layout.drl(fblog)
plot(fblog, layout=l, vertex.size=5, vertex.label=NA,
     vertex.color=party.nums)
title("layout.drl")
l <- layout.kamada.kawai(fblog)
plot(fblog, layout=l, vertex.label=NA,
     vertex.color=party.nums, vertex.size=3)
title("layout.kamada.kawai")
mtext("Visualizations of the French political blog network", outer=TRUE,  cex=1, line=-1, font = 4)

fblog.c <- contract.vertices(fblog, party.nums) # Contract several vertices into a single one
E(fblog.c)$weight <- 1 # assign 1 to the weight of each edge
fblog.c <- simplify(fblog.c) # convert from multigraph to simple graph
# explore how much department communicate to each other
party.size <- as.vector(table(V(fblog)$PolParty)) 
plot(fblog.c, vertex.size=5*sqrt(party.size),
     vertex.label = party.names, 
     vertex.color=V(fblog.c), 
     edge.width=sqrt(E(fblog.c)$weight), 
     vertex.label.dist=1.5, 
     edge.arrow.size=0)

2.2 Interactive Network Visualization

To output the igraph class object for external uses:

write_graph(graph, file, format = c("edgelist", "pajek", "ncol", "lgl",
  "graphml", "dimacs", "gml", "dot", "leda"), ...)

In R, there are relatively few choices available for node and arrow shapes, and label placement is somewhat awkward, so that avoid- ing overlapping nodes and labels is not easy. Interactive editing of graph drawings is even more limited. However, if producing a network visualization of particularly high quality is a goal, there are certainly other software tools for this job.

  • Cytoscape, cross-platform network analysis and visualization tool: igraph::write_graph(graph, file, format = "gml"), then open in Cytoscape
  • Pajek, cross-platform network analysis and visualization tool: igraph::write_graph(graph, file, format = "pajek"), then open in Pajek
  • Gephi, cross-platform network analysis and visualization tool: rgexf::write.gexf(), then open in Gephi
  • Rgraphviz package: igraph::write_graph(graph, file, format = "dot "), then couple with Rgraphviz package
  • visNetwork package: see the example following
library(visNetwork)
library(dplyr)
# get g.lazega igraph
g.lazega <- graph.data.frame(elist.lazega,
                             directed="FALSE",
                             vertices=v.attr.lazega)
g.lazega$name <- "Lazega Lawyers"
summary(g.lazega)
## IGRAPH f31ba87 UN-- 36 115 -- Lazega Lawyers
## + attr: name (g/c), name (v/c), Seniority (v/n), Status (v/n),
## | Gender (v/n), Office (v/n), Years (v/n), Age (v/n), Practice
## | (v/n), School (v/n)
# convert igraph to VisNetwork Data
data <- toVisNetworkData(g.lazega)
# add characteristic
data$nodes <- data$nodes %>% 
  mutate(group = Office) %>%  # add grouping variable
  mutate(title = paste0("<p> Age <br>", Age, "</p>"), 
         stringsAsFactors = FALSE) #add tooltip (pop-up) variable
# ploting
visNetwork(nodes = data$nodes, 
           edges = data$edges,
           height = "500px") %>%  
  # grouping
  visGroups(groupname = "1", color = "blue", shape = "square", 
            shadow = list(enabled = TRUE)) %>% 
  visGroups(groupname = "2", color = "red", shape = "triangle") %>%
  visGroups(groupname = "3", color = "green", shape = "star") %>%
  # layout
  visIgraphLayout(layout = "layout.fruchterman.reingold") %>% 
  # options
  visOptions(highlightNearest = list(enabled = T, hover = T), 
             nodesIdSelection = T)

3. Network Graph Characteristics

3.1 Basic Graph Concepts

3.1.1 Weight

To transform a multigraph into a weighted graph, wherein each resulting proper edge is equipped with a weight equal to the multiplicity of that edge in the original multigraph. simple graph and multigraph:

  • Simple graphs: have their nodes connected by only one link type, such as road or rail links.
  • Multigraphs: contain more than one link type between the same two nodes. Or the multigraph is a combination of the two simple graphs.(source: The Geography of Transport Systems)
simple graph and multigraph
g <- graph.formula(1-2, 1-3, 2-3, 2-4, 3-5, 4-5, 4-6, 4-7, 5-6, 6-7)
is.simple(g)
## [1] TRUE
mg <- g + edge(2,3) # adding a duplicate edge
is.simple(mg)
## [1] FALSE
E(mg)$weight <- 1
wg2 <- simplify(mg)
E(wg2)$weight # by simplifying, the third edge has a weight of 2.
##  [1] 1 1 2 1 1 1 1 1 1 1
is.simple(wg2)
## [1] TRUE
par(mfrow=c(1,3), mar=c(1, 1, 1, 1))
plot(g); plot(mg); plot(wg2)

3.2 Descriptive Analysis

3.2.1 Vertex Degree

  • neighbors(): Show all neighbors of a vertex.
  • degree(): count the number of edges incident on a node for undirected graph, whereas count the number of edges pointing in towards and out from a v vertex for directed grqph
neighbors(g, 5)
## + 3/7 vertices, named, from c4d938f:
## [1] 3 4 6
# for undirected graph
degree(g)
## 1 2 3 4 5 6 7 
## 2 3 3 4 3 3 2
# for directed graph
degree(dg, mode="in"); degree(dg, mode="out")
## 1 2 3 
## 0 2 2
## 1 2 3 
## 2 1 1
  • degree() and graph.strength(): use with hist to plot degree distribution.
  • graph.knn(): Calculate the average nearest neighbor degree of the given vertices and the same quantity in the function of vertex degree
# has weight
data(karate)
E(karate)$weight 
##  [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3
## [36] 2 3 5 3 3 3 3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2
## [71] 2 4 2 3 3 4 4 5
par(mfrow=c(1,2))
# degree distribution
hist(igraph::degree(karate), col="lightblue", xlim=c(0, 50),
     xlab="Vertex Degree", ylab="Frequency", main="")
# weighted degree distribution
hist(graph.strength(karate), col="pink",
     xlab="Vertex Strength", ylab="Frequency", main="")

# no weight
data(yeast)
E(yeast)$weight 
## NULL
par(mfrow=c(1,3))

# degree distribution
hist(igraph::degree(yeast), col="blue", 
     xlab="Degree", ylab="Frequency", 
     main="Degree Distribution")

# Log-Log Degree Distribution
d.yeast <- igraph::degree(yeast) # the number of edge for each vertex
d <- 1:max(d.yeast)-1 # 1 to the max number of the edge
dd.yeast <- degree.distribution(yeast) # degree distribution by cut = 1
ind <- (dd.yeast != 0) # mask out the zero degree vertex
plot(x = d[ind], y = dd.yeast[ind], 
     log="xy", col="blue",
     xlab=c("Log-Degree"), ylab=c("Log-Frequency"), 
     main="Log-Log Degree Distribution")

# Average neighbor degree versus vertex degree (log–log scale) for the yeast data
a.nn.deg.yeast <- graph.knn(graph = yeast, vids = V(yeast))
plot(x = d.yeast, y = a.nn.deg.yeast$knn, 
     log="xy", col="goldenrod", 
     xlab=c("Log Vertex Degree"), ylab=c("Log Average Neighbor Degree"),
     ylim = range(c(1,110)),
     main = "Neighbor degree vs vertex degree")
par(new=TRUE)
plot(x = d, y = a.nn.deg.yeast$knnk,
     log="xy", col="red",
     xlab=c("Log Vertex Degree"), ylab=c("Log Average Neighbor Degree"),
     ylim = range(c(1,110)))

3.2.2 Vertex Centrality

Measures of centrality are designed to quantify important vertex in a network graph. The following is the three example of centrality:

  • Closeness centrality: measures attempt to capture the notion that a vertex is ‘central’ if it is close to many other vertices
  • Betweenness centrality: measures are aimed at summarizing the extent to which a vertex is located between (on the path) other pairs of vertices
  • eigenvector centrality: capture the status or prestige or rank that the more central the neighbors of a vertex are, the more central that vertex itself is.
data(karate)
A <- get.adjacency(karate, sparse=FALSE) 
library(network)
g <- network::as.network.matrix(A) 
# Note that igraph and sna use different formats for storing network data objects. Here we handle the conversion by first producing an adjacency matrix for the karate club object karate.
library(sna)
# The head teacher and instructor are indicated with blue and yellow, respectively
par(mfrow=c(2,2), mar=c(1, 1, 1, 1))
sna::gplot.target(g, degree(g), 
                  main="Degree", circ.col="skyblue",
                  usearrows = FALSE, circ.lab = FALSE,
                  vertex.col=c("blue", rep("red", 32), "yellow"), 
                  edge.col="darkgray")
sna::gplot.target(g, closeness(g), 
                  main="Closeness", circ.col="skyblue",
                  usearrows = FALSE, circ.lab = FALSE,
                  vertex.col=c("blue", rep("red", 32), "yellow"), 
                  edge.col="darkgray")
sna::gplot.target(g, betweenness(g), 
                  main="Betweenness", circ.col="skyblue",
                  usearrows = FALSE, circ.lab = FALSE,
                  vertex.col=c("blue", rep("red", 32), "yellow"), 
                  edge.col="darkgray")
sna::gplot.target(g, evcent(g), 
                  main="Eigenvector", circ.col="skyblue",
                  usearrows = FALSE, circ.lab = FALSE,
                  vertex.col=c("blue", rep("red", 32), "yellow"), 
                  edge.col="darkgray")

The HITS algorithm, based on the concept of ‘hubs and authorities,’ as introduced by Kleinberg in the context of the World Wide Web, characterizes the importance of the vertices according to the eigenvector centrality. Specifically, given an adjacency matrix A for a directed graph,

  • hub.score(): score the importance of so-called hub vertices by how many authority vertices they point to, according to the eigenvector centrality of the matrix M_hub = A A^T
  • authority.score(): scores the importance of so-called authority vertices by how many hubs point to them, according to the eigenvector centrality of the matrix M_auth = A^T A.
par(mfrow=c(1,2), mar=c(1, 1, 1, 1))
l <- layout.kamada.kawai(aidsblog)
plot(aidsblog, layout=l, main="Hubs", vertex.label="",
     vertex.size=10 * sqrt(hub.score(aidsblog)$vector)) 
plot(aidsblog, layout=l,
     main="Authorities", vertex.label="", 
     vertex.size=10 * sqrt(authority.score(aidsblog)$vector))

3.2.3 Characterizing Edges

Edge betweenness centrality: extends vertex betweenness centrality in a straightforward manner, by assigning to each edge a value that reflects the number of shortest paths traversing that edge.

# we are led to note that actor 20 plays a key role from this perspective in facilitating the direct flow of information between the head instructor (Mr Hi, vertex 1) and the administrator (John A, vertex 34).
eb <- edge.betweenness(karate)
E(karate)[order(eb, decreasing=T)[1:3]]
## + 3/78 edges from 4b458a1 (vertex names):
## [1] Actor 20--John A   Mr Hi   --Actor 20 Mr Hi   --Actor 32

3.2.4 Characterizing Network Cohesion

Netnork cohesion: the extent to which subsets of vertices are cohesive (or stuck together) with respect to the relation defining edges in the network graph. There are many ways that we can define network cohesion, depending on the context of the question being asked. Definitions differ:

  • in scale ranging from: local (e.g., triads) to global (e.g., giant components),
  • in the extent to which: they are specified explicitly (e.g., cliques) versus implicitly (e.g., ‘clusters’ or ‘communities’).

3.2.4.1 Clique, Maximal clique, Coreness

Cliques: are complete subgraphs and hence are subsets of vertices that are fully cohesive, in the sense that all vertices within the subset are connected by edges. In practice, large cliques are relatively rare, as they necessarily require that a graph G itself be fairly dense. The following is the subgraph of cliques of size range from 1 to 6:

par(mfrow= c(2,3), mar =c(1.5,1.5,1.5,1.5))
plot(make_full_graph(1))
title("Vertex: cliques of size 1")
plot(make_full_graph(2))
title("Twins: cliques of size 2")
plot(make_full_graph(3))
title("Triplets: cliques of size 3")
plot(make_full_graph(4))
title("Quadruplets: cliques of size 4")
plot(make_full_graph(5))
title("Quintuplets: cliques of size 5")
plot(make_full_graph(6))
title("Sextuplets: cliques of size 6")

# For the karate network a census of this sort reflects that there are 34 nodes (cliques of size one) and 78 edges (cliques of size two), followed by 45 triangles (cliques of size three).
table(sapply(cliques(karate), length))
## 
##  1  2  3  4  5 
## 34 78 45 11  2
# find and shown largest size of cliques
clique.number(karate); cliques(karate)[sapply(cliques(karate), length) == 5]
## [1] 5
## [[1]]
## + 5/34 vertices, named, from 4b458a1:
## [1] Mr Hi    Actor 2  Actor 3  Actor 4  Actor 14
## 
## [[2]]
## + 5/34 vertices, named, from 4b458a1:
## [1] Mr Hi   Actor 2 Actor 3 Actor 4 Actor 8

maximal clique: is a clique that is not a subset of a larger clique.

# It help to remove some redundancy in the analysis, which the cliques of larger sizes necessarily include cliques of smaller sizes.
table(sapply(maximal.cliques(karate), length))
## 
##  2  3  4  5 
## 11 21  2  2

cores: The k-core of graph is a maximal subgraph in which each vertex has at least degree k. The coreness of a vertex is k if it belongs to the k-core but not to the (k+1)-core. Weakened notions of cliques, particularly popular in visualization, as it provides a way of decomposing a network into ‘layers’, in the sense of an onion.

par(mar =c(1.5,1.5,1.5,1.5))
cores <- coreness(karate)
cores[cores == 4]
##    Mr Hi  Actor 2  Actor 3  Actor 4  Actor 8  Actor 9 Actor 14 Actor 31 
##        4        4        4        4        4        4        4        4 
## Actor 33   John A 
##        4        4
sna::gplot.target(g, cores, circ.lab = FALSE,
                  circ.col="skyblue", usearrows = FALSE,
                  vertex.col=cores, edge.col="darkgray") 

3.2.4.2 Density, clustering coefficient

  • Density of a graph: a measure of how close a graph is to being a clique
# the ego-centric networks around vertices 1 and 34 are noticeably more dense than the overall network.
graph.density(karate)
## [1] 0.1390374
sapply(make_ego_graph(graph = karate, order =1), graph.density)
##  [1] 0.2500000 0.4666667 0.3818182 0.7619048 0.8333333 0.7000000 0.7000000
##  [8] 1.0000000 0.6666667 0.6666667 0.8333333 1.0000000 1.0000000 0.7333333
## [15] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.6666667 1.0000000
## [22] 1.0000000 1.0000000 0.6000000 0.6666667 0.6666667 1.0000000 0.5000000
## [29] 0.6666667 0.8000000 0.7000000 0.4285714 0.3205128 0.2091503
sapply(make_ego_graph(graph = karate, order =1)[c(1,34)], graph.density)
## [1] 0.2500000 0.2091503
  • clustering coefficient (Relative frequency, transitivity): a measure of the degree to which nodes in a graph tend to cluster together. A standard quantity of interest in the social network literature, where it is also referred to as the fraction of transitive triples
    • The global clustering coefficient: is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed).
    • The local clustering coefficient of a vertex (node): quantifies how close its neighbours are to being a clique (complete graph).
transitivity(karate, type = "global") # global clustering coefficient
## [1] 0.2556818
transitivity(karate, "local") # all local clustering coefficient
##  [1] 0.1500000 0.3333333 0.2444444 0.6666667 0.6666667 0.5000000 0.5000000
##  [8] 1.0000000 0.5000000 0.0000000 0.6666667       NaN 1.0000000 0.6000000
## [15] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.3333333 1.0000000
## [22] 1.0000000 1.0000000 0.4000000 0.3333333 0.3333333 1.0000000 0.1666667
## [29] 0.3333333 0.6666667 0.5000000 0.2000000 0.1969697 0.1102941
transitivity(karate, "local", vids=c(1,34)) # # specific local clustering coefficient
## [1] 0.1500000 0.1102941

reciprocity (concept unique to directed graphs): the extent to which there is reciprocation among ties in a directed network. check for extra reading for better explanation

reciprocity(aidsblog, mode="default")
## [1] 0.03243243
reciprocity(aidsblog, mode="ratio")
## [1] 0.01648352

3.2.4.3 Diameter, Connectivity, Cuts, and Flows

A common notion of distance between vertices on a graph is defined as the length of the shortest path(s) between the vertices (which we set equal to infinity if no such path exists). This distance is often referred to as geodesic distance, with ‘geodesic’ being another name for shortest paths.

  • diameter(): The value of the longest distance in a graph.
g <- graph.formula(1-2, 1-3, 2-3, 2-4, 3-5, 4-5, 4-6, 4-7, 5-6, 6-7)
dg <- graph.formula(1-+2, 1-+3, 2++3);
par(mfrow=c(1,2))
plot(g)
plot(dg)

diameter(g, weights=NA)
## [1] 3
  • is.connected(): The graph G is said to be connected if every vertex is reachable from every other.
  • clusters(): calculate the maximal (weakly or strongly) connected components of a graph.
# for undirected graph
igraph::is.connected(g)
## [1] TRUE
# for directed graph
igraph::is.connected(dg, mode="weak"); igraph::is.connected(dg, mode="strong")
## [1] TRUE
## [1] FALSE
igraph::clusters(g)
## $membership
## 1 2 3 4 5 6 7 
## 1 1 1 1 1 1 1 
## 
## $csize
## [1] 7
## 
## $no
## [1] 1

A basic question of interest is whether a given graph separates into distinct subgraphs. If it does not, we might seek to quantify how close to being able to do so it is. Intimately related to such issues are questions associated with the flow of ‘information’ in the graph.

Small world property: which refers to the situation wherein:

  1. the shortest-path distance between pairs of vertices is generally quite small,
  2. the clustering is relatively high.
  • decompose.graph(): decompose the graph from a igraph, which is not connected.
# flow1_Connectivity
igraph::is.connected(yeast) # check if connected
## [1] FALSE
comps <- decompose.graph(yeast) # Creates a separate graph for each component of a graph.
table(sapply(comps, vcount)) # check which subgraph consist of the most vertece
## 
##    2    3    4    5    6    7 2375 
##   63   13    5    6    1    3    1
yeast.gc <- decompose.graph(yeast)[[1]]
average.path.length(yeast.gc) # find the average path length
## [1] 5.09597
diameter(yeast.gc) # compare to average path length, longest distance is short
## [1] 15
transitivity(yeast.gc) # high clustering coefficient
## [1] 0.4686663
# flow2_Cuts
vertex.connectivity(yeast.gc); edge.connectivity(yeast.gc) # the vertex and edge connectivity are both equal to one.
## [1] 1
## [1] 1
yeast.cut.vertices <- articulation.points(yeast.gc) # vertex-cut on articulation point: removal of only a single well-chosen vertex or edge in order to break this subgraph into additional components
length(yeast.cut.vertices)
## [1] 350

3.2.5 Graph Partitioning

3.2.5.1 Hierarchical Clustering

  • cluster_fast_greedy(): agglomerative hierarchical clustering algorithm
  • cluster_walktrap(): communities in a graph via random walks
  • cluster_spinglass(): find communities in graphs via a spin-glass model and simulated annealing
  • cluster_leading_eigen(): find densely connected subgraphs in a graph by calculating the leading non-negative eigenvector of the modularity matrix of the graph.
  • cluster_edge_betweenness(): Many networks consist of modules which are densely connected themselves but sparsely connected to other modules.
kc <- cluster_fast_greedy(karate)
length(kc); sizes(kc)
## [1] 3
## Community sizes
##  1  2  3 
## 18 11  5
membership(kc)
##    Mr Hi  Actor 2  Actor 3  Actor 4  Actor 5  Actor 6  Actor 7  Actor 8 
##        2        2        2        2        3        3        3        2 
##  Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16 
##        1        1        3        2        2        2        1        1 
## Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24 
##        3        2        1        2        1        2        1        1 
## Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32 
##        1        1        1        1        1        1        1        1 
## Actor 33   John A 
##        1        1
par(mfrow=c(1,2), mar =c(1,1,1,1))
plot(kc, karate)
plot_dendrogram(kc)

3.2.5.2 Spectral Partitioning

3.2.5.3 Validation of Graph Partitioning

3.2.6 Assortativity and Mixing