Background
In this lab, we used the network visualizer program Gephi to analyze the connections through citations between environmental classics (defined here as appearing on the Powell’s or GoodReads list of the best environmental books, and being cited by more than 100 sources). Gephi is a program which allows one to visually map connections of all sorts; a graph in Gephi is composed of “nodes” (the primary data points) connected by “edges,” and various algorithms within the program graphically order the data by pushing unconnected nodes apart and pulling nodes with shared edges together. The environmental classics and the works which cite them are both classified as nodes, while each citation of a classic was an edge. After ordering the data, we analyzed it conceptually to assess the relationships between the classics that we looked at, focusing particularly on the connectedness and clustering of the data.
Procedure
To create the class-sourced data for this lab, we each created one environmental classic node, adding its Author-Date label, full title, full author name, year published, and the number of citing works. For each classic node, we found the most frequently cited citing work, and added the same data to each of these citing work nodes. We then associated the citing nodes to the classic nodesthrough a separate sheet with all of the edge data on it. For lab, we each individually converted the node and edge sheets into .csv files for import into Gephi. I first distinguished between classic and citing nodes by color, and then established two separate continuous size scales—the size of the node bubble is correlated with the total number of citations of the work, while the label size is correlated with the number of connections (or “degree”) of the node. Next, I ran the ForceAtlas 2 algorithm to disperse and order the randomly clustered data, and applied the Label Adjust algorithm to space out the nodes to prevent overlap of thelabels, rerunning and adjusting the algorithms until I was satisfied with the first and complete graph. To produce a second, more simplified graph, I set a minimum degree parameter at 2, visually excluding all citing nodes which shared an edge with only one classic node, and reran the algorithms. I lastly manually adjusted this result, making all linear extensions go in the same direction and reordering the data to prevent cross-overs of edges, while attempting to hold distance between nodes reasonably
constant.
Results
My first graph shows a relatively large cluster of inter-cited environmental classics texts, including Limits to Growth by Meadows et al., The Population Bomb by Erlich, Silent Spring by Carson, and Sand County Almanac by Leopold. There are several linear spurts off this agglomeration, seen belowthe graph’s center in this representation, and above the center lies an interconnected network of contemporary works including Pollan’s The Omnivore’s Dilemma, and Friedman’s Hot, Flat, and Crowded, tenuously connected to the hub through two citations of Thoreau’s Walden. Five of our environmental classics were completely unconnected to this greater network, including The Lorax (Seuss 1971), The Hot Zone (Preston 1995), Fast-FoodNation(Schlosser 2001), Gaia (Lovelock 1979), and The Diversity of Life (Wilson 1999).
The second graphic expresses these relationships more cleanly. Circled in green is the aforementioned classics of the classics agglomeration, with connections to contemporary clusters around Hot, Flat, and Crowded (circled in red), and The Omnivore’s Dilemma (circled in yellow). In total, thirty-one of the thirty-six nodes are connected together through citing works, an impressive figure considering the relative disconnectedness of our data; our network had 372 unique nodes, barely less than the 396 unique nodes a completely disconnected network would have. Yet these 14 shared citing nodes were enough to connect the vast majority of our nodes into a network.
Examining the degree of classics nodes in the second graphic can help us quantify the centrality of various nodes inside the clusters around them. By this measure, Hot, Flat, and Crowded and The Omnivore’s Dilemma are clearly the most central nodes of their clusters, with five shared citing nodes apiece. Limits to />Growth, The Population Bomb, Sand County Almanac, Staying Alive (Shiva 1989), and The Monkey Wrench Gang (Alice 1975) all feature prominently in the broader classics sphere, with four shared citing nodes apiece. This portion of the network appears far more polycentric than the newer hubs around Pollan and Friedman, with no single point dominating visually or numerically.
Discussion
Drilling into the composition of these clusters beyond their most recognizable names reveals both familiar categorical groupings and some data eccentricities. The cluster around Pollan 2006 does have some food-related classics, including Animal, Vegetable, Miracle (Kingsolver 2007) and In Defense of Food: An Eater’s Manifesto (Pollan 2008), though other works in the surrounding cluster are far more peripherally related to food (like Weisman’s The World Without Us and Louv’s The Last Child in the Woods). The cluster around Friedman 2008 is primarily related to climate change with Kolbert 2006, Fagan 2008 and Lynas 2008 all directly about global warming, while MacKay 2009 is about sustainable energy and McDonough’s work, Cradle to Cradle, is about reshaping human processes to recycle all possible materials. Each of the books in this cluster focuses primarily on anthropogenic problems and solutions. Lastly, the large polycentric cluster around Ehrlich 1968 and Meadows et al. 1972 defies simple categorization, though it generally is comprised of older works with an ecological and anti-anthropocentric bent.
A fundamental flaw of this data set is how temporally sensitive it is; since a work cannot cite a work which comes out after it, works which are categorically very related can easily appear completely separated in our network of classics. For example, the food cluster around Pollan’s The Omnivore’s Dilemma includes only works later than 2006, excluding works like Fast Food Nation (Schlossler 2001) which is completely detached from the network, and the string of older food works found in a line on the bottom, such as Diet for a Small Planet (Moore L’applé 1971), The Unsettling of America: Culture and Agriculture (Berry 1978) and The Botany of Desire (Pollan 2001). Additionally, the network analysis turns up some less explainable wacky results. An Inconvenient Truth lies far closer to the food cluster than the climate change cluster, and the branch of nodes from The Monkey Wrench Gang is completely bizarre, with this book about radical eco-sabotage connected to a book on permaculture and an island biogeography of the dodo on one branch and a memoir about working as a Rock Mountain ranger and The Sixth Extinction on the other (the books about extinction, despite their thematic similarities, have no common citing node). Such oddities are likely due to the limited sample size of this analysis; 372 nodes to map the network of even our selected environmental classics is evidently not enough to overcome random variation that produces such results.