This is a study note for using \(igraph\) package for Network Data. We also look quickly at some of the unique challenges posed by the problem of visualizing large network graphs. For more details on the study material see:
Extra reading:
library(igraph) #creating, decorating, and assessing basic properties of network graph
library(sand) # collection of data sets
Formally, a graph G = (V,E) is a mathematical structure consisting of a set V of vertices (also commonly called nodes) and a set E of edges (also commonly called links), where elements of E are unordered pairs {u, v} of distinct vertices u, v ∈ V.
g <- graph.formula(1-2, 1-3, 2-3, 2-4, 3-5, 4-5, 4-6, 4-7, 5-6, 6-7)
dg <- graph.formula(1-+2, 1-+3, 2++3);
par(mfrow=c(1,2))
plot(g)
plot(dg)
The number of vertices N_v = |V | and the number of edges N_e = |E | are sometimes called the order and size of the graph G, respectively.
V(dg)
## + 3/3 vertices, named, from 69170c1:
## [1] 1 2 3
vcount(dg)
## [1] 3
E(dg)
## + 4/4 edges from 69170c1 (vertex names):
## [1] 1->2 1->3 2->3 3->2
ecount(dg)
## [1] 4
get.adjacency(dg) # adjacency matrices
## 3 x 3 sparse Matrix of class "dgCMatrix"
## 1 2 3
## 1 . 1 1
## 2 . . 1
## 3 . 1 .
list.vertex.attributes(dg)
## [1] "name"
head(elist.lazega)
## V1 V2
## 1 V1 V17
## 2 V2 V7
## 3 V2 V16
## 4 V2 V17
## 5 V2 V22
## 6 V2 V26
head(v.attr.lazega)
## Name Seniority Status Gender Office Years Age Practice School
## 1 V1 1 1 1 1 31 64 1 1
## 2 V2 2 1 1 1 32 62 2 1
## 3 V3 3 1 1 2 13 67 1 1
## 4 V4 4 1 1 1 31 59 2 3
## 5 V5 5 1 1 2 31 59 1 2
## 6 V6 6 1 1 2 29 55 1 1
g.lazega <- graph.data.frame(elist.lazega,
directed="FALSE",
vertices=v.attr.lazega)
g.lazega$name <- "Lazega Lawyers"
summary(g.lazega)
## IGRAPH 69b1e5d UN-- 36 115 -- Lazega Lawyers
## + attr: name (g/c), name (v/c), Seniority (v/n), Status (v/n),
## | Gender (v/n), Office (v/n), Years (v/n), Age (v/n), Practice
## | (v/n), School (v/n)
list.vertex.attributes(g.lazega) # find appropriate the attr for coloring vertex
## [1] "name" "Seniority" "Status" "Gender" "Office" "Years"
## [7] "Age" "Practice" "School"
vertex.c <- as.numeric(as.factor(V(g.lazega)$Practice)) # use attr Practice to coloring
plot(g.lazega, vertex.color=vertex.c)
# lattice data
g.l <- graph.lattice(c(5, 5, 5))
data(aidsblog)
# large netword data
data(fblog)
party.names <- sort(unique(V(fblog)$PolParty)); party.names
## [1] " Cap21" " Commentateurs Analystes"
## [3] " Les Verts" " liberaux"
## [5] " Parti Radical de Gauche" " PCF - LCR"
## [7] " PS" " UDF"
## [9] " UMP"
Application: low level of edge-crossings through the center of the circle
Drawing methods: ordering by degree and grouping by common vertex attributes.
igraph.options(vertex.size=3, vertex.label=NA, edge.arrow.size=0.5)
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
plot(g.l, layout=layout.circle)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout.circle)
title("Blog Network")
Application: tree structure or radical structure.
Drawing methods: N/A
g.tree <- graph.formula(1-+2,1-+3,1-+4,2-+5,2-+6,2-+7, 3-+8,3-+9,4-+10)
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
igraph.options(vertex.size=30, edge.arrow.size=0.5,
vertex.label=NULL)
plot(g.tree, layout=layout.reingold.tilford(g.tree, circular=T))
plot(g.tree, layout=layout.reingold.tilford)
Application: visually appealing and relaxed state since the system energy of the position is ther minimum. Methods based on multidimensional scaling (MDS), which have a long history in the social network literature, are of this type.
Drawing methods: Energy-placement methods of graph drawing define a noteion of energy as a function of vertex position. A vertex placement is chosen which minimizes the total system energy.
igraph.options(vertex.size=3, vertex.label=NA, edge.arrow.size=0.5)
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
plot(g.l, layout=layout.kamada.kawai)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout.kamada.kawai)
title("Blog Network")
Application: creating useful drawings to exploit analogies between the relational structure in graphs and the forces among elements in physical systems
Drawing methods: spring-embedder methods of graph drawing define a notion of force for each vertex in the graph depending, at the very least, on the positions of pairs of vertices and the distances between them, and seek to iteratively update the placement of vertices until a vector of net forces across vertices converges.
par(mfrow=c(1, 2), mar=c(1, 1, 1, 1))
plot(g.l, layout=layout.fruchterman.reingold)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout.fruchterman.reingold)
title("Blog Network")
Despite their sophistication, for all of the methods described so far, the graph draw- ings will tend to look increasingly cluttered as the number of vertices Nv nears 100 or so—and simply unintelligible for thousands of vertices or more—due to the finite- ness of the available space and resolution.
Application: balance the detail with which both local and global structure by clustering for the purpose of visualizing large networks
Drawing methods: DrL is based on VxOrd, an enhanced version of the spring-embedder methodology that attempts to place vertices in clusters on the two-dimensional plane, with the help of sophisticated optimization methods to more efficiently search the space of possible graph drawings and a grid that helps reduce computation time from the typical O(Nv^2) down to O(Nv).
summary(fblog)
## IGRAPH NA UN-- 192 1431 --
## + attr: name (v/c), PolParty (v/c)
party.nums <- as.numeric(as.factor(V(fblog)$PolParty))
str(party.nums)
## num [1:192] 3 3 3 3 3 3 3 9 9 9 ...
set.seed(42)
par(mfrow=c(1, 2))
l <- layout.drl(fblog)
plot(fblog, layout=l, vertex.size=5, vertex.label=NA,
vertex.color=party.nums)
title("layout.drl")
l <- layout.kamada.kawai(fblog)
plot(fblog, layout=l, vertex.label=NA,
vertex.color=party.nums, vertex.size=3)
title("layout.kamada.kawai")
mtext("Visualizations of the French political blog network", outer=TRUE, cex=1, line=-1, font = 4)
fblog.c <- contract.vertices(fblog, party.nums) # Contract several vertices into a single one
E(fblog.c)$weight <- 1 # assign 1 to the weight of each edge
fblog.c <- simplify(fblog.c) # convert from multigraph to simple graph
# explore how much department communicate to each other
party.size <- as.vector(table(V(fblog)$PolParty))
plot(fblog.c, vertex.size=5*sqrt(party.size),
vertex.label = party.names,
vertex.color=V(fblog.c),
edge.width=sqrt(E(fblog.c)$weight),
vertex.label.dist=1.5,
edge.arrow.size=0)
To output the igraph class object for external uses:
write_graph(graph, file, format = c("edgelist", "pajek", "ncol", "lgl",
"graphml", "dimacs", "gml", "dot", "leda"), ...)
In R, there are relatively few choices available for node and arrow shapes, and label placement is somewhat awkward, so that avoid- ing overlapping nodes and labels is not easy. Interactive editing of graph drawings is even more limited. However, if producing a network visualization of particularly high quality is a goal, there are certainly other software tools for this job.
igraph::write_graph(graph, file, format = "gml"
), then open in Cytoscapeigraph::write_graph(graph, file, format = "pajek"
), then open in Pajekrgexf::write.gexf()
, then open in Gephiigraph::write_graph(graph, file, format = "dot "
), then couple with Rgraphviz packagelibrary(visNetwork)
library(dplyr)
# get g.lazega igraph
g.lazega <- graph.data.frame(elist.lazega,
directed="FALSE",
vertices=v.attr.lazega)
g.lazega$name <- "Lazega Lawyers"
summary(g.lazega)
## IGRAPH f31ba87 UN-- 36 115 -- Lazega Lawyers
## + attr: name (g/c), name (v/c), Seniority (v/n), Status (v/n),
## | Gender (v/n), Office (v/n), Years (v/n), Age (v/n), Practice
## | (v/n), School (v/n)
# convert igraph to VisNetwork Data
data <- toVisNetworkData(g.lazega)
# add characteristic
data$nodes <- data$nodes %>%
mutate(group = Office) %>% # add grouping variable
mutate(title = paste0("<p> Age <br>", Age, "</p>"),
stringsAsFactors = FALSE) #add tooltip (pop-up) variable
# ploting
visNetwork(nodes = data$nodes,
edges = data$edges,
height = "500px") %>%
# grouping
visGroups(groupname = "1", color = "blue", shape = "square",
shadow = list(enabled = TRUE)) %>%
visGroups(groupname = "2", color = "red", shape = "triangle") %>%
visGroups(groupname = "3", color = "green", shape = "star") %>%
# layout
visIgraphLayout(layout = "layout.fruchterman.reingold") %>%
# options
visOptions(highlightNearest = list(enabled = T, hover = T),
nodesIdSelection = T)
To transform a multigraph into a weighted graph, wherein each resulting proper edge is equipped with a weight equal to the multiplicity of that edge in the original multigraph. simple graph and multigraph:
g <- graph.formula(1-2, 1-3, 2-3, 2-4, 3-5, 4-5, 4-6, 4-7, 5-6, 6-7)
is.simple(g)
## [1] TRUE
mg <- g + edge(2,3) # adding a duplicate edge
is.simple(mg)
## [1] FALSE
E(mg)$weight <- 1
wg2 <- simplify(mg)
E(wg2)$weight # by simplifying, the third edge has a weight of 2.
## [1] 1 1 2 1 1 1 1 1 1 1
is.simple(wg2)
## [1] TRUE
par(mfrow=c(1,3), mar=c(1, 1, 1, 1))
plot(g); plot(mg); plot(wg2)
neighbors()
: Show all neighbors of a vertex.degree()
: count the number of edges incident on a node for undirected graph, whereas count the number of edges pointing in towards and out from a v vertex for directed grqphneighbors(g, 5)
## + 3/7 vertices, named, from c4d938f:
## [1] 3 4 6
# for undirected graph
degree(g)
## 1 2 3 4 5 6 7
## 2 3 3 4 3 3 2
# for directed graph
degree(dg, mode="in"); degree(dg, mode="out")
## 1 2 3
## 0 2 2
## 1 2 3
## 2 1 1
degree()
and graph.strength()
: use with hist
to plot degree distribution.graph.knn()
: Calculate the average nearest neighbor degree of the given vertices and the same quantity in the function of vertex degree# has weight
data(karate)
E(karate)$weight
## [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3
## [36] 2 3 5 3 3 3 3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2
## [71] 2 4 2 3 3 4 4 5
par(mfrow=c(1,2))
# degree distribution
hist(igraph::degree(karate), col="lightblue", xlim=c(0, 50),
xlab="Vertex Degree", ylab="Frequency", main="")
# weighted degree distribution
hist(graph.strength(karate), col="pink",
xlab="Vertex Strength", ylab="Frequency", main="")
# no weight
data(yeast)
E(yeast)$weight
## NULL
par(mfrow=c(1,3))
# degree distribution
hist(igraph::degree(yeast), col="blue",
xlab="Degree", ylab="Frequency",
main="Degree Distribution")
# Log-Log Degree Distribution
d.yeast <- igraph::degree(yeast) # the number of edge for each vertex
d <- 1:max(d.yeast)-1 # 1 to the max number of the edge
dd.yeast <- degree.distribution(yeast) # degree distribution by cut = 1
ind <- (dd.yeast != 0) # mask out the zero degree vertex
plot(x = d[ind], y = dd.yeast[ind],
log="xy", col="blue",
xlab=c("Log-Degree"), ylab=c("Log-Frequency"),
main="Log-Log Degree Distribution")
# Average neighbor degree versus vertex degree (log–log scale) for the yeast data
a.nn.deg.yeast <- graph.knn(graph = yeast, vids = V(yeast))
plot(x = d.yeast, y = a.nn.deg.yeast$knn,
log="xy", col="goldenrod",
xlab=c("Log Vertex Degree"), ylab=c("Log Average Neighbor Degree"),
ylim = range(c(1,110)),
main = "Neighbor degree vs vertex degree")
par(new=TRUE)
plot(x = d, y = a.nn.deg.yeast$knnk,
log="xy", col="red",
xlab=c("Log Vertex Degree"), ylab=c("Log Average Neighbor Degree"),
ylim = range(c(1,110)))
Measures of centrality are designed to quantify important vertex in a network graph. The following is the three example of centrality:
data(karate)
A <- get.adjacency(karate, sparse=FALSE)
library(network)
g <- network::as.network.matrix(A)
# Note that igraph and sna use different formats for storing network data objects. Here we handle the conversion by first producing an adjacency matrix for the karate club object karate.
library(sna)
# The head teacher and instructor are indicated with blue and yellow, respectively
par(mfrow=c(2,2), mar=c(1, 1, 1, 1))
sna::gplot.target(g, degree(g),
main="Degree", circ.col="skyblue",
usearrows = FALSE, circ.lab = FALSE,
vertex.col=c("blue", rep("red", 32), "yellow"),
edge.col="darkgray")
sna::gplot.target(g, closeness(g),
main="Closeness", circ.col="skyblue",
usearrows = FALSE, circ.lab = FALSE,
vertex.col=c("blue", rep("red", 32), "yellow"),
edge.col="darkgray")
sna::gplot.target(g, betweenness(g),
main="Betweenness", circ.col="skyblue",
usearrows = FALSE, circ.lab = FALSE,
vertex.col=c("blue", rep("red", 32), "yellow"),
edge.col="darkgray")
sna::gplot.target(g, evcent(g),
main="Eigenvector", circ.col="skyblue",
usearrows = FALSE, circ.lab = FALSE,
vertex.col=c("blue", rep("red", 32), "yellow"),
edge.col="darkgray")
The HITS algorithm, based on the concept of ‘hubs and authorities,’ as introduced by Kleinberg in the context of the World Wide Web, characterizes the importance of the vertices according to the eigenvector centrality. Specifically, given an adjacency matrix A for a directed graph,
hub.score()
: score the importance of so-called hub vertices by how many authority vertices they point to, according to the eigenvector centrality of the matrix M_hub = A A^Tauthority.score()
: scores the importance of so-called authority vertices by how many hubs point to them, according to the eigenvector centrality of the matrix M_auth = A^T A.par(mfrow=c(1,2), mar=c(1, 1, 1, 1))
l <- layout.kamada.kawai(aidsblog)
plot(aidsblog, layout=l, main="Hubs", vertex.label="",
vertex.size=10 * sqrt(hub.score(aidsblog)$vector))
plot(aidsblog, layout=l,
main="Authorities", vertex.label="",
vertex.size=10 * sqrt(authority.score(aidsblog)$vector))
Edge betweenness centrality: extends vertex betweenness centrality in a straightforward manner, by assigning to each edge a value that reflects the number of shortest paths traversing that edge.
# we are led to note that actor 20 plays a key role from this perspective in facilitating the direct flow of information between the head instructor (Mr Hi, vertex 1) and the administrator (John A, vertex 34).
eb <- edge.betweenness(karate)
E(karate)[order(eb, decreasing=T)[1:3]]
## + 3/78 edges from 4b458a1 (vertex names):
## [1] Actor 20--John A Mr Hi --Actor 20 Mr Hi --Actor 32
Netnork cohesion: the extent to which subsets of vertices are cohesive (or stuck together) with respect to the relation defining edges in the network graph. There are many ways that we can define network cohesion, depending on the context of the question being asked. Definitions differ:
Cliques: are complete subgraphs and hence are subsets of vertices that are fully cohesive, in the sense that all vertices within the subset are connected by edges. In practice, large cliques are relatively rare, as they necessarily require that a graph G itself be fairly dense. The following is the subgraph of cliques of size range from 1 to 6:
par(mfrow= c(2,3), mar =c(1.5,1.5,1.5,1.5))
plot(make_full_graph(1))
title("Vertex: cliques of size 1")
plot(make_full_graph(2))
title("Twins: cliques of size 2")
plot(make_full_graph(3))
title("Triplets: cliques of size 3")
plot(make_full_graph(4))
title("Quadruplets: cliques of size 4")
plot(make_full_graph(5))
title("Quintuplets: cliques of size 5")
plot(make_full_graph(6))
title("Sextuplets: cliques of size 6")
# For the karate network a census of this sort reflects that there are 34 nodes (cliques of size one) and 78 edges (cliques of size two), followed by 45 triangles (cliques of size three).
table(sapply(cliques(karate), length))
##
## 1 2 3 4 5
## 34 78 45 11 2
# find and shown largest size of cliques
clique.number(karate); cliques(karate)[sapply(cliques(karate), length) == 5]
## [1] 5
## [[1]]
## + 5/34 vertices, named, from 4b458a1:
## [1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 14
##
## [[2]]
## + 5/34 vertices, named, from 4b458a1:
## [1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 8
maximal clique: is a clique that is not a subset of a larger clique.
# It help to remove some redundancy in the analysis, which the cliques of larger sizes necessarily include cliques of smaller sizes.
table(sapply(maximal.cliques(karate), length))
##
## 2 3 4 5
## 11 21 2 2
cores: The k-core of graph is a maximal subgraph in which each vertex has at least degree k. The coreness of a vertex is k if it belongs to the k-core but not to the (k+1)-core. Weakened notions of cliques, particularly popular in visualization, as it provides a way of decomposing a network into ‘layers’, in the sense of an onion.
par(mar =c(1.5,1.5,1.5,1.5))
cores <- coreness(karate)
cores[cores == 4]
## Mr Hi Actor 2 Actor 3 Actor 4 Actor 8 Actor 9 Actor 14 Actor 31
## 4 4 4 4 4 4 4 4
## Actor 33 John A
## 4 4
sna::gplot.target(g, cores, circ.lab = FALSE,
circ.col="skyblue", usearrows = FALSE,
vertex.col=cores, edge.col="darkgray")
# the ego-centric networks around vertices 1 and 34 are noticeably more dense than the overall network.
graph.density(karate)
## [1] 0.1390374
sapply(make_ego_graph(graph = karate, order =1), graph.density)
## [1] 0.2500000 0.4666667 0.3818182 0.7619048 0.8333333 0.7000000 0.7000000
## [8] 1.0000000 0.6666667 0.6666667 0.8333333 1.0000000 1.0000000 0.7333333
## [15] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.6666667 1.0000000
## [22] 1.0000000 1.0000000 0.6000000 0.6666667 0.6666667 1.0000000 0.5000000
## [29] 0.6666667 0.8000000 0.7000000 0.4285714 0.3205128 0.2091503
sapply(make_ego_graph(graph = karate, order =1)[c(1,34)], graph.density)
## [1] 0.2500000 0.2091503
transitivity(karate, type = "global") # global clustering coefficient
## [1] 0.2556818
transitivity(karate, "local") # all local clustering coefficient
## [1] 0.1500000 0.3333333 0.2444444 0.6666667 0.6666667 0.5000000 0.5000000
## [8] 1.0000000 0.5000000 0.0000000 0.6666667 NaN 1.0000000 0.6000000
## [15] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.3333333 1.0000000
## [22] 1.0000000 1.0000000 0.4000000 0.3333333 0.3333333 1.0000000 0.1666667
## [29] 0.3333333 0.6666667 0.5000000 0.2000000 0.1969697 0.1102941
transitivity(karate, "local", vids=c(1,34)) # # specific local clustering coefficient
## [1] 0.1500000 0.1102941
reciprocity (concept unique to directed graphs): the extent to which there is reciprocation among ties in a directed network. check for extra reading for better explanation
reciprocity(aidsblog, mode="default")
## [1] 0.03243243
reciprocity(aidsblog, mode="ratio")
## [1] 0.01648352
A common notion of distance between vertices on a graph is defined as the length of the shortest path(s) between the vertices (which we set equal to infinity if no such path exists). This distance is often referred to as geodesic distance, with ‘geodesic’ being another name for shortest paths.
diameter()
: The value of the longest distance in a graph.g <- graph.formula(1-2, 1-3, 2-3, 2-4, 3-5, 4-5, 4-6, 4-7, 5-6, 6-7)
dg <- graph.formula(1-+2, 1-+3, 2++3);
par(mfrow=c(1,2))
plot(g)
plot(dg)
diameter(g, weights=NA)
## [1] 3
is.connected()
: The graph G is said to be connected if every vertex is reachable from every other.clusters()
: calculate the maximal (weakly or strongly) connected components of a graph.# for undirected graph
igraph::is.connected(g)
## [1] TRUE
# for directed graph
igraph::is.connected(dg, mode="weak"); igraph::is.connected(dg, mode="strong")
## [1] TRUE
## [1] FALSE
igraph::clusters(g)
## $membership
## 1 2 3 4 5 6 7
## 1 1 1 1 1 1 1
##
## $csize
## [1] 7
##
## $no
## [1] 1
A basic question of interest is whether a given graph separates into distinct subgraphs. If it does not, we might seek to quantify how close to being able to do so it is. Intimately related to such issues are questions associated with the flow of ‘information’ in the graph.
Small world property: which refers to the situation wherein:
decompose.graph()
: decompose the graph from a igraph, which is not connected.# flow1_Connectivity
igraph::is.connected(yeast) # check if connected
## [1] FALSE
comps <- decompose.graph(yeast) # Creates a separate graph for each component of a graph.
table(sapply(comps, vcount)) # check which subgraph consist of the most vertece
##
## 2 3 4 5 6 7 2375
## 63 13 5 6 1 3 1
yeast.gc <- decompose.graph(yeast)[[1]]
average.path.length(yeast.gc) # find the average path length
## [1] 5.09597
diameter(yeast.gc) # compare to average path length, longest distance is short
## [1] 15
transitivity(yeast.gc) # high clustering coefficient
## [1] 0.4686663
# flow2_Cuts
vertex.connectivity(yeast.gc); edge.connectivity(yeast.gc) # the vertex and edge connectivity are both equal to one.
## [1] 1
## [1] 1
yeast.cut.vertices <- articulation.points(yeast.gc) # vertex-cut on articulation point: removal of only a single well-chosen vertex or edge in order to break this subgraph into additional components
length(yeast.cut.vertices)
## [1] 350
cluster_fast_greedy()
: agglomerative hierarchical clustering algorithmcluster_walktrap()
: communities in a graph via random walkscluster_spinglass()
: find communities in graphs via a spin-glass model and simulated annealingcluster_leading_eigen()
: find densely connected subgraphs in a graph by calculating the leading non-negative eigenvector of the modularity matrix of the graph.cluster_edge_betweenness()
: Many networks consist of modules which are densely connected themselves but sparsely connected to other modules.kc <- cluster_fast_greedy(karate)
length(kc); sizes(kc)
## [1] 3
## Community sizes
## 1 2 3
## 18 11 5
membership(kc)
## Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7 Actor 8
## 2 2 2 2 3 3 3 2
## Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16
## 1 1 3 2 2 2 1 1
## Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24
## 3 2 1 2 1 2 1 1
## Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32
## 1 1 1 1 1 1 1 1
## Actor 33 John A
## 1 1
par(mfrow=c(1,2), mar =c(1,1,1,1))
plot(kc, karate)
plot_dendrogram(kc)