Gender and the DHPoco Open Thread: A Corpus Analysis

by Heather Froehlich (@heatherfro) and Michelle Moravec (@ProfessMoravec)

Participating in the #DHPoco OPEN THREAD: THE DIGITAL HUMANITIES AS A HISTORICAL “REFUGE” FROM RACE/CLASS/GENDER/SEXUALITY/DISABILITY was an enlightening experience for both of us. Adeline invited us to contribute a guest post based on analysis of the open thread that we offer in the hopes of further productive dialogue. We are both particularly interested in gender-specific differences as well as others that were impossible to code based on our knowledge of the commenters – differences such as sexuality, ethnicity, but without self-identification we felt uncomfortable attempting that. We recognize our attribution is reliant on binary gender. We conduct our analysis using AntConc, a corpus analysis software to approach the question “Do men and women in the DHPoco thread talk about DH and postcolonialism differently? If so, how?”

Michelle scraped the DHPoco thread off the internet and compiled it into files by author-name, so that all words written by each individual who commented were grouped together. We marked each file based on our knowledge of commenters’ gender as either male or female.[1] Of 38 individual commenters producing a total of 153 comments, we coded 26 commenters as male (68.5%) and 12 (31.5%) commenters as female. 72% of all the comments were written by men compared to 28% written by women. We have not anonymized the corpus.[2]

We wanted to know how members of the DHPoco thread talk about what we are discussing. Michelle and Heather investigated at how participants on the thread wrote about postcolonialism and digital humanities using methods from corpus-driven critical discourse analysis. CDA, defined by Van Dijk as “the way social power abuse, dominance, and inequality are enacted, reproduced, and resisted by text and talk in the social and political context,” offers insights into how we talk about what we discuss. [3]

Because the male comments were a much larger set than the female comments, Heather used the male comments as a reference corpus and the female comments as an analysis corpus to ask “how do the women’s comments compare to the men’s comments on the DHPoco thread?” Heather did this by calculating keyness in the corpus. This is done by compiling a word list of the analysis corpus and the reference corpus, and comparing the two using a log-likelihood calculation. Words that are comparatively more frequent in the analysis corpus than in the reference corpus are deemed “key”.

Let’s look then at who was talking about what. Here are the most-key terms in the female-authored comments when compared to the male-authored comments:


The most key terms include – among others – “whiteness”, “color”, “white”, “engaged”, “invisible”, “queer”, “woman”, and “intersecting”. This suggests that that in the DHpoco thread, women are talking about those things more than men are. For example, women were more likely to write about race (lexical variants of race included race, racial, racializing, racing, racism, racist ), as well as issues of gender, sexuality, disability.


Men were more like to write about colonialism (which includes colonial, colonialist) as well as variants of imperial and global. Men tended to talk about postcolonialism as theory, object of study, critique, or category, whereas women’s commentary was more about the constituent parts of postcolonialism and limits of acceptance of inclusion of these terms within the “digital humanities”.


These two examples are intriguing for what they suggest about relationships to power: the men’s comments were also more likely to use “colonial”, “imperial”, and “global”. Women wrote more about differential consequences of access to power in terms of identities, through a discussion of race, gender, sexuality, and disability, as evidenced by the KWIC view above. Postcolonialism appears to be a more-abstracted discourse in the men’s comments when compared with ways women related to postcolonialism to enacted identities – as noted above in the positive keyword analysis for women’s comments.

Conducting a detailed discourse analysis on a corpus this small is difficult, but some other interesting gendered patterns emerged. For example, “poco” occurs a small number of times (13 total times in the corpus) and was starkly gendered in its use (5 men, 1 woman), as seen below:


Men use “poco” in different contexts, some of which seem to connote negativity or attempts to define it. Our one female commenter uses the term in five instances that were in general more reflective of the discourse on postcolonialism, linking postcolonialism to other discourse markers such as “digital”, “humanities”, “feminism”, “minority”.

Heather also ran negative keywords – that is, keywords which were less likely to appear in the analysis corpus compared to the reference corpus (negative keyness is marked in blue):

What you see above are are not the MOST negatively-key words. Those were largely function words – the of i it but on from at by my you which can all was most see because etc – which are often interesting in combination with other words, but not particularly so on their own here. We instead highlight some of these other negatively-key words – including “discourse”, “technologies”, “technology”, “institutional”, “research”, “institutions” and “culture” – which are less likely to appear in the women’s comments compared to the men’s comments.

So let’s zoom in on three of these negatively-key words: research, institutional*, and technolog*. The keyword analysis suggests that these are less likely to appear in the women’s comments, and indeed we see that these words are being used in remarkably different contexts between the male and female commenters:


Research, when used by male commenters, is more about the act of doing research (“my research”, “our research”, “for research”), whereas the female commenter above is discussing an institutional issue at hand.

Institution* is interesting because the sheer quantity (52 instances) of male comments using the term nearly drown out the two instances of women’s comments:

Overall, it seems that commenters are interested in ways the institution is engaged with digital projects – but one female commenter mentions the issue of receiving institutional support, whereas the other comments seem to be largely more concerned with ways an institution deals with digital things (is markup a form of institution?, what structures are in place at different universities, research centers, etc?, and in what what have we been institutionalized?) Immediate collocates for institut* include “comparative”, “different”, “history”, “structures”, and “support”, suggesting that there’s a lot of discussion about what’s going on in different places and no consensus has been reached yet.

What about technolog*?


There’s a recurring collocation of “about technology”, “technologies of”, “and technologies”/”technologies and”, so it also appears that we’re not so sure about which technologies are the issues at hand – the commenters are largely trying to define variant forms of technology for their arguments here.

The sheer quantity of male commentators pretty much completely drown out the female commentators here, suggesting that some of these negative keywords may be a little misleading: female commenters on the thread are discussing these topics, but the sample size is insufficient to see if they’re discussing them all that differently. Overall, “digital humanities” discussion predominated over that of “postcolonialism” when measured by usage of those terms or variants of them. However, defining postcolonialism moves us into a complex debate: it is in many ways equal to, and older than, that around the question of what constitutes digital humanities. [3] The interplay between these two streams seems to have comprised the majority of our discussion thus far.

Michelle also analyzed the the example of “thank*,” used by just over half the commenters, making it one of the more frequent words with a very positive connotation. The collocates for “thank*”’ include “for”, “you”, and “to,” as commenters in the thread expressed appreciation for other commenters’ posts, as shown below.


Commenters are using “thanks” here as a way of indicating a good faith effort to participate productively. Only a few took the “academic speak” of “thanks, but……”, which suggests that politeness formulae (cf Brown and Levinson 1987) are very much enacted in this thread. It is interesting that the uses of “thank*”are gendered: 58% of women use it, compared to 35% of men. This is in accordance with prior work from feminist online communication theory and linguistic politeness theory.

The largest strand of thanks went, deservedly, to Adeline and Roopika for starting the open thread, while a smaller number referred to the remarks of an individual commenter (“thank you X”, or “thanks for that” in a reply to a comment, or a generally expression of thanks to “everyone”). Overall, participants on the thread did this as well: “thanks” occurs with a specific commenter’s name 57% of the time. Attentiveness to this sort of politeness as well as the interaction and good faith it represents is critical if we wish to encourage more people to participate in these online conversations in the future.[4]

We would like to suggest that though a number of patterns which emerge in a keyword-in-context analysis and keyness analysis require a close reading of context and usage to address issues more fully. In our brief analysis here we found that men and women were using the open-thread space in noticeably divergent ways. We suggest that male commenters are talking more about the process of doing digital humanities, whereas female commenters are talking more about the actual topic at hand (“Are digital humanities a refuge from racism/classism/sexism/ability?”). It appears, based on our analysis here, that women are being more critical of the structural and institutional issues at hand than the male commenters are.

In order to TransformDH we need to attend closely to how we participate in these kinds of online discussions. A more complex analysis based on multiple self-attributed identity factors of participants would lead to a far more nuanced analysis, but we offer this analysis as a starting point that could be done quickly, without making assumptions or taking on the potentially impossible to complete task of asking participants to self-identify for our purposes. We hope readers will be inspired to construct additional multifaceted readings of the thread using other tools, as we’d be very curious to see what other patterns emerge from such an important discussion space.

[1] This was done by intuition from names provided: we apologize to anyone we have imposed a (potentially false or inaccurate) gendered identity upon.
[2] We aim not to attach any names to any comments in our analysis, but rather to show some broad brush strokes of patterns in the corpus.
[3] Using definitions of postcolonialism and digital humanities from the website combined with terms for the framing questions of the open thread, Michelle represented the postcolonial stream and digital humanities with defining words. Digital humanities terms still appeared more frequently. “Postcolonialism” occurred 19 times, 45 with variants (postcolonial, postcolonialism, postcoloniality, post-colonial, poco, post-colonialist, post-coloniality), “Digital humanities” appeared 37 times, “DH” 176 for a total of 213. Total occurrences of postcolonial terms= 274 occurrence of digital humanities terms n=702. Spreadsheet of data here.
[4]A useful comparison might be between the open thread and the twitter commentary about it: see #dhpoco tweets Michelle storified.


on “Gender and the DHPoco Open Thread: A Corpus Analysis
15 Comments on “Gender and the DHPoco Open Thread: A Corpus Analysis
  1. Pingback: @hleman

  2. Fascinating analysis. I found these statements particularly interesting, “the uses of “thank*”are gendered: 58% of women use it, compared to 35% of men….” and “male commenters are talking more about the process of doing digital humanities, whereas female commenters are talking more about the actual topic at hand (“Are digital humanities a refuge from racism/classism/sexism/ability?”)” and “Research, when used by male commenters, is more about the act of doing research (“my research”, “our research”, “for research”), whereas the female commenter above is discussing an institutional issue at hand.”

    Does anyone have any theories as to this finding, “Institution* is interesting because the sheer quantity (52 instances) of male comments using the term nearly drown out the two instances of women’s comments…?”

    Well done!

  3. Thanks for commenting Hope <<< re the "thank*" analysis, as we said this is completely consistent with findings in other fields, but it would be extremely interesting to see what a close reading revealed about the deeper context and content of the remarks that included "thank*"

    I'll leave it to heather to answer the institution ? as that was her finding

  4. Yikes!! I appeared as a keyword!!!!!! Thanks for doing the analysis Michelle and Heather. very interesting indeed. I am learning so much here about DH.

  5. Too cute about appearing as a keyword. That would be my life’s ambition.

    About “Thank”–I wonder if those who use “thank” tend to leave the discussion immediately thereafter or if they re-engage with even greater vigor.

    And I wonder if “institution” is supposed to suggest that women just don’t “get” how universities and other institutions work and need to get with the program.

  6. I do think that using “institution” could be interpreted as a distancing strategy, although there is a feminist critique of that for sure (institutions are people, who are those people though?)

    I would love an analysis of this thread that examine who mentioned who, in what order, who entered and left when.

  7. I think the reason for the male/female skew regarding the approach to “postcolonial/poco” and the approach to “research”, as well as the seeming avoidance of the topic at hand, is the result of a popularity mechanic that operates in these kinds of DH topics. When the community becomes aware of a popular topic–a blog post or open thread or other place where they can comment or contribute–there’s an honest attempt by people to do so, even if it’s not in their field or discipline (whether this is post-colonialism, information visualization, curation, literary analysis, or anything else in the Big Tent of DH). The result is that there’s a core of people who speak directly to the issue, and then a large number that are speaking to general issues in DH that they see intersecting with the issue. I would hypothesize that many open threads on specialized topics follow this trend.

    On the one hand, this promotes a superficial engagement with a wide variety of topics, and we all risk becoming a flâneur who one day is talking about decolonization, and the next talking about nuclear weapons, and the next Urdu soap operas, but each with “DH” pre-prended.

    But, and here I’m a hopeful optimist, I think this is actually reflective of a growing accessibility of these subjects, and that one digital humanities trend that is reflected across society is that growing accessibility. I don’t mean this in a generic way, but rather in the manner expressed by Benkler in “Coase’s Penguin, or Linux and the Nature of the Firm”. He noted that open source efficiencies result from increased accessibility, and you can see this at play in any commons-based peer production. My hope is that as we all become more code literate, and library science literate, and academic administration literate, that we are also growing more post-colonially literate.

  8. Interesting comment Elijah. I’m having a hard time understanding how a popularity mechanic accounts for a gender difference unless you are arguing that male commenters comprise the group “a large number that are speaking to general issues in DH that they see intersecting with the issue” while the women comprise “a core of people who speak directly to the issue.”

    • It’s my assumption that the interlopers are more male and focused on general issues in DH, though it’s not based on counting and categorizing the posters. I suppose one way to test this is to identify the male/female ratio of contributors who are in a poco discipline compared to the male/female ratio of contributors who are not. It would be useful, also, to have the number of comments and amount of text written by those two groups to compare.

      These interlopers, I think, skew the conversation toward more general DH, while the group that is more conversant with the topic/theory/practice of the issue being interloped upon (and, it should be note, is actively inviting interlopation–if that’s a word) continues to respond to those more generalized comments with specialist response embedded in theory but inflected by this the concerns of the generalist community.

      That’s been my sense when I’ve been involved in threads and posts like these, though this is the first time I’ve been critically engaged with the gender of the participants and how their commentary differs by gender (which says a lot about my own blind spots).

      I think the worry I would have with my own line of reasoning is that it presumes a sort of pragmatic reason for the commentary and that gender just correlates with the specialist/non-specialist demographics of this discipline.

  9. Pingback: David Golumbia (@dgolumbia)

  10. Hi everyone – thank you for your comments so far. I’m going to try to answer some of the questions posed above:

    I can’t offer any explanation about the “institution*” example beyond what’s written above – I was really surprised by that, and included it as an example in hopes that others could explain why/how that was happening.

    I would like to make a quick note that there are actually 3 female commenters using instutition*, not 2 as we originally suggested – it seems my eyes are failing me – but the ratio of usage is still remarkably stark. Those of you with better eyes than me will probably have noticed this by now.

    I would love to see a more detailed analysis of thank* – ours is very cursory. I would guess that more people were using thank* as a way to join the discussion rather than leave it, especially if someone else creates a space for your perspective and potential comment. This was my reasoning for joining the open thread in the first place, and I suspect that it may hold true for others.

    You raise an interesting point about interlopers (who are the interlopers? How will we decide?) and I want to stress that we chose to compare female comments to male comments based on the sample we have. The concept of “An Open Thread” defines the space to be for whatever is deemed worthy of conversation by participants, so I have no doubt that this analysis could have been very different under a variety of different circumstances, including but certainly not limited to participants, the topics being discussed, and how these topics were discussed.

  11. Pingback: @Dr_Margrit

  12. This post exposes linguistic patterns that correlate pretty dramatically with gender. Those patterns probably could be interpreted in a range of different ways, but what’s striking (in the context of the original question) is how handily quantitative methods lend themselves to cultural critique here.

    Maybe that shouldn’t surprise us: social scientists have been using numbers to make political arguments for a long time, and corpus linguistics is arguably on the border of social science.

    You wouldn’t even have to assume that gender is a binary to use quantitative methods. Moravec and Froehlich rightly acknowledge that this is a simplifying assumption. But there’s good quantitative research on gender in Twitter where computer scientists have borrowed Judith Butler (and used clustering algorithms) to consider multiple styles of gender performance.

    Anyway, I’d like to thank everyone involved. Also, thanks whiteness color privilege engaged invisible intersecting stories. (Just messing with your algorithms.)

    • We were really struck by how dramatically the male and female comments differed – we had expected some variation but we certainly had not expected to see such a clear divide. When I ran the keyword analysis my jaw dropped a little!

      Thanks for linking to Bamman, Eisenstein and Schnoebelen’s paper. I’m very familiar with their work, and if we had more time we would have loved to conduct a similar study in addition to what we’ve done. We certainly invite further analysis of this open thread; we’d love to see what else can be done, and see this merely as a jumping-off point.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>