3 METODOLOGÍA
3.5 Modelo de análisis: institucionalismo centrado en los actores
The findings articulated in this chapter came from three main forms of analysis, two quantitative and one qualitative, but all mixed to various degrees. With respect to the theory of methods and ways of knowing, these findings emerge from a trace ethnography that used several scalar devices to “compress“ the enormous discursive space into a more manageable collection of traces. This process includes the data preparation, the topic model, the measure of entropy and hierarchical clustering. Each provides a unique perspective of digital
humanities blogs, foregrounding certain features (i.e., clusters of English topics) at the expensive of others (i.e., the diffusions of digital history themes within other clusters).
114 French started her blog and website as a space for posting information about her professional
life. There is a wealth of information, beyond the blog, on or linked on her website. It would be interesting to dig into web archives and see how scholar’s representation of their (professional) self changes over time. http://amandafrench.net/resume/
Figure (46). An example of a blog with average topic diversity.
The analysis of entropy provided a measure of the diversity of topics per blog. While digital humanities bloggers as a whole write about a variety of topics, as Figure (46) shows, at an individual level they typically write mostly about a single topic, but touch upon a long tail of other topics. What is not determined in this analysis is the degree to which topic expression is correlated. If a digital
humanities blogger writes about topic X, does topic Y have a higher likelihood of expression? To what extent to blogs themselves cluster around a set of themes? If anything this chapter shows there is a wealth of questions and answers in these data.
The analysis of topic clusters begins to explore the computationally generated topics using a distance measure and hierarchical clustering to produce a dendrogram of topic’s relationships.115 The whole tree (Appendix B) shows the
entire landscape of a hundred topics generated by MALLET. Using qualitative coding, labels were applied to the clusters based upon the top keywords:
Technology, Digital, English, Libraries & Meta, Miscellaneous, and Non-English.
115 There are extensions to MALLET to support hierarchical topic models (Blei 2003), but those
haven’t been as extensively used as vanilla LDA. How to document and interpret the model, the results, and the significance using trace ethnography are opportunities for future work.
The technology cluster has two subgroups. The first gathers deep discussions of technological subject matter like configuring Linux or installing and using text analysis tools. The second cluster brings together methodological discussions around data, quantitative analysis, and machine learning. The digital cluster is less semantically compact than the technology cluster. Digital topics focus on technology as an object of study, teaching with digital technology, digital writing, and the important “What is DH” topic. The English cluster features topics
relevant to the subject matter of English as a discipline, such as British literature, storytelling, and narrative. The prominence of the English cluster is an indicator of English’s dominant presence in the community (despite many arguments that DH is just as much about history as it is about English). However, the
significance of libraries in the digital humanities should not be ignored; those discourses were strong enough to form their own cluster (which is interestingly adjacent to the meta/administrative cluster). The library cluster features deeply technical discussions of digital libraries, copyright, and open access alongside more administrative themes like job postings and general talk about the academy. The miscellaneous cluster captures a variety of topics ranging from personal politics and food to high theory and knowledge work. This cluster collects many of the smaller themes distributed through the digital humanities blogosphere.116 Non-English topics are accurately relegated to their own outlying
branch of the tree. These topics indicate that while the community is mostly Anglo-centric, there are spaces in the community for non-English writers.
The four categories of informal scholarly communication, quasi-academic, meta-
academic, para-academic, and extra-academic come out of an ethnographic
analysis of traces. Quasi-academic discussions evoke classic humanities themes and subject matter. Meta-academic discussions focus on the maintenance and administration of the digital humanities as a community. Para-academic
discussions address themes that are vital to the digital humanities, but don’t (yet)
116 Digging into these topics is an area for other researchers interested in exploring how other
have a place in disciplinary and formally published venues. Finally, extra- academic themes show how blogs enable scholars to write about anything.
Figure (47): The distribution of categories over 100 topics. Each topic was assigned to one of the four categories of informal scholarly communication or labeled as junk or non-English.
The chart in figure (47) shows how the four categories of informal scholarly communication are distributed across the 100 topic model. Each topic was
assigned to one of the four categories of informal scholarly communication based upon a coding of the keywords and top ranked documents. This is a recursive mapping of the analytical themes and categories, derived from a content analysis back onto the topics, i.e. computational representation of the corpus. That is to say, it is an approximation of how the four categories are distributed across the corpus by seeing how they are distributed across the 100 topics of the model. Quasi-academic discourses are the largest portion, which is an indicator that digital humanities scholars focus most of their writing on academic themes. While the analysis above shows this category is not entirely composed of new
ideas, the size of this category shows that scholars are using blogs as a space to do academic work in some capacity. The second largest category is para-academic, which is especially important for the digital humanities because this means blogs are supporting a significantly large discourse around themes that don’t
necessarily have a natural outlet in formal modes of scholarly communication. Writing about technical subject matter, how to use tools, or supplementary details of computational research is not a small or isolated interest of a small group, it is a significant chunk of how digital humanities scholars use blogs. The number of meta-academic topics reveals the management and maintenance of the community is a tertiary, but still important, function of blogs. The extra- academic topics constitute a small, yet present, portion of the whole. What is missing from this synchronic representation is how the four categories are
distributed across the corpus over time. As was shown by looking at the temporal allocation of the four categories in danah boyd’s blog was an increase in quasi- academic and decrease in extra-academic writing, might the same be true across the entire corpus?
Clearly the data has many more stories to tell and only a few have been addressed in this chapter. The data-driven and empirical representations of the digital humanities put forth here may be somewhat alien to members of the community, which is why it is important to emphasize that these are representations to be interpreted and not declarations of authority or truth. Scalar devices, as a
theoretical construct, are useful for understanding this nuance. Topic modeling is a form of lossy compression, like an mp3, whereby certain features of a
phenomenon are emphasized, while others are deemphasized.117 However, like an
mp3, just because some information is lost doesn’t mean the resulting object is not useful or insightful.
117For an amazing infrastructural inversion of the mp3 compression algorithm watch Ryan
Maguire’s videos of what gets removed: http://www.theverge.com/2015/2/19/8068923/mp3- compression-ghost-suzanne-vega-toms-diner