Issue 9: Graduate Degree Research Showcase

Luis Villasmil, via unsplash.com

                        January 24, 2022

                Issue 9: Graduate Degree Research Showcase

Luis Villasmil, via unsplash.com
The Graduate Degree Research Showcase
How do you characterize a department? That is, if a group of scholars has any impact on the world, how would you go about describing that? One approach would be to look at the things they’ve written. You could collect bibliographies from every work in the field, and search for mentions (and how do you decide what a ‘field’ is, anyway?). You’d need enormous resources to do it, and even once you’d done it, there would still be issues (here’s looking at you, Google Scholar).
Perhaps the way to frame it might be to think that our impact on the world is through our students, yes? And so another approach might be to look at the work that the department’s students have done. Sometimes, this means trying to actively keep track of student’s post-university careers. But again, that requires a lot of time and resources. So why not take a look at the final pieces of work that students did, while they were here and part of this community of scholars? What questions did they ask, what answers did they come up with? What were they interested in? 
That material, while not easy to wrangle, does exist. When a student completes their MA or PhD thesis, that work is deposited in the University’s institutional repository, CURVE. But only the theses. Major research essays, major research projects: these are not collected anywhere (but we’re trying to fix that). When students complete the corrections on an MRE, many of them do grant a license to Carleton to make copies available on request ‘in the interests of facilitating research at this institution and elsewhere’; having the title and abstract here can facilitate that. (In any event, the few MREs that we currently include are those that have been catalogued by the Library or have already been made freely available online by the student; see ‘lacunae’ below).
In this issue, we’d like to take a moment to walk you through a new project of the Communications Committee - The Graduate Degree Research Showcase. The goal is to bring all of that graduate research that led to the award of the degree into one place in a way that facilitates exploration, discovery, and communication. And maybe, we’ll be able to see, however dimly, the kind of impact we’ve been having.
Facilitating Discovery and Exploration
Think of this Showcase as a discovery layer on top of CURVE. It currently contains records/metadata for over 300 Phd and MA theses going back to 1958, and 25 MREs going back to the 1990s. The full abstracts for over 200 items can be searched, and very nearly every item has subject and location metadata which may also be searched. Each item has its own URL (for linking & embedding), and each item will also take you to the PDF or library record. 

At the bottom of the home page, the data is also made available in two formats, as ‘csv’ files - familiar tables of information - and as ‘json’ files - text files with the information arranged to facilitate other kinds of computer-to-computer interaction. For instance, using json, one could create a new website that mashes up this information with newspapers, photography, archival or other records. 

The site also allows us to build exhibitions; if a student’s work contained podcasts, for instance, or video clips, we could build pages that integrate those materials into a kind of story. Right now, the ‘about‘ page demonstrates this in only the simplest of ways.

Gaps and Lacunae
The one obvious gap in all of this is that we do not have many of the MREs or MRPs that students wrote as their culminating project. Bound copies of MREs are held in the Underhill Reading Room, but of course, we can’t really go in there right now. If you have information about such a project - title, abstract, student - please do send that along to shawn.graham@carleton.ca; we’d like to know about it and see about how we might include it in this Showcase.
If a student published an article or newspaper op-ed; if there was some sort of public recognition of their work; if there was some other outcome from the research, let us know about that, too, and we can add it. Ultimately, this entire site is powered by a single spreadsheet of metadata, so adding more information is fairly straightforward.
Same with corrections. The other thing you might notice are odd spellings or weirdly formatted information. This is a function of the automatic image-to-text software we used. If you spot something that needs correcting, let us know; similarly, if there is better information we could use, don’t hesitate to get in touch. 
Notice also that we are not putting the actual thesis or MRE online with this Showcase - we only link to copies if they’ve already been made available online. If you are supervising a student, it’s worth asking them to think about whether or not they want to make their MRE/MRP available online, and then discuss some of the possible places it might live - there’s Carleton’s Dataverse repository; there’s Figshare (and here’s a student’s mre and its associated data lodged there); there’s Zenodo.  This is especially important as student work is becoming more and more ‘multimodal’ - there might be supplementary files and folders of information; there might be audio, video, or high-resolution images; there might be some sort of interactive website. All of those sorts of things could go in a repository for longer term access and preservation; the Communications Committee can help with that sort of thing.
A Portrait Since 1995
We’ll demonstrate one quick example of the kind of things a person might do, having all of this information in one location.
Right now, every record from 2022 to 1995 (nearly thirty years) has its full text abstract. We could ask, what ideas persist across those thirty years? What ideas characterize that work? There is a technique called ‘topic modeling’ that lets us get a sense of an answer. 
Imagine that when we write, we draw words in different proportions from a limited number of buckets to compose a text. Here, a bucket about world war II. There, a bucket about power. Another one about sports. A fourth about governance. Any given abstract, in this model, is composed of words that come from these different buckets in differing amounts. If we can accept this, then it is possible to ask a computer to decompose all of these abstracts back into their various buckets - and to then say, ‘this abstract is 30% world war II, 20% power, 10% governance…’. 
We did this, and found that imagining 20 buckets seemed to find buckets that were more or less self contained (if you want some more details about topic modeling, this short video might be helpful).

Each dot represents a year where the particular topic (the keywords on the left are the most used words in that topic) is prominent. The world wars are a perennial favourite, it seems! Now, the words that we use to ‘label’ a topic should be taken with a grain of salt. One ought to look at each topic’s top words - not just the first four or five - and from that, deduce what the topic is going on about. The ‘food’ topic also contains the word ‘carleton’, and that is curious; when I jump into the abstracts where this topic is prominent, we find a number of theses that are reflective on Carleton’s role in broader themes (food being one). We cycle from the distance view suggested by the model to a close read of the abstracts, and back again. The procedure for determining topics has flagged up this reflective approach as comparatively rare, and so bundled some things together that we might not have done ourselves - and so, if we did this again looking for more topics, we might see this one separate into different buckets. This kind of computational approach is never about finding the answer as it is about deforming our perspective and giving us new lenses to look through!
Let’s ask a different question now. What periods do we look at? Here, we simply searched through all of the abstracts for digits (to make up an example, “Food security in Canada 1923-1954”) and counted up the number of times a particular year was mentioned. ‘1960’ is mentioned in 23 theses; ‘1950’ in 20; then 1930 (16), 1939 (13), and 1935 (11) to round out the top 5. 

Crude, but effective. 
Finally, we pulled the locations mentioned from the metadata, and Excel can turn that into a map (light grey are locations for which there are no mentions in the metadata):

Our obvious Canadian focus is apparent. Now, this was a very crude approach, and limited by Microsoft’s ability to recognize place names, especially for polities that no longer exist (‘East Germany’ caused it some trouble!)  
We will leave it up to the reader to decide what all of this says about us as a department, but as a portrait, it will become sharper as more of the abstracts become turned into text (right now, badly scanned pdfs defeat an automated process) and more of the MREs and MRPs are found and their metadata added. But right now, we think what we are seeing is a fairly accurate representation of what we do around here, and where we could be putting more energy- obviously, the Global South, and eras other than the 19th and 20th centuries, are not as well represented.
But don’t take our word for it; go explore the The Graduate Degree Research Showcase for yourself!

                        Don't miss what's next. Subscribe to The View From The 4th Floor Paterson: