To kick-off the Digital Scholarship Module, a training for first-year PhD researchers at the Faculty of Arts, we, at Artes Research, hosted a training session dedicated to research data workflows. Three researchers from the Faculty of Arts offered a behind-the-scenes look at their research workflows by outlining how they approach and structure their research, the tools they use, and with what kind of data they are working. The goal of the session was to provide examples of more advanced workflows for the first-year PhD researchers as they embark on their research journey.
Elisa Nelissen: applying digital tools throughout the entire research workflow
Elisa is a PhD researcher under the supervision of Jack McMartin, working on the interdisciplinary project “The Circulation of Science News in the Coronavirus Era” in collaboration with the KU Leuven Institute for Media Studies. Her research focuses primarily on how science news about COVID-19 vaccines travels to and from Flanders, and the inter- and intralingual translations it is subject to.
Elisa started off the session by introducing us to the tools she applies during various steps of her research workflow, leaving us with plenty of food for thought.
For collecting all the literature that holds potential relevance for her research, Elisa uses Zotero as it has some very interesting features such as full text searches (which makes it easy to look up specific concepts), highlighting and color coding interesting sections or terms, etc.
Reading literature and tracking progress
After gathering the relevant literature, reading all the collected material naturally follows. Here, Elisa had a very useful tip for those that, just like herself, easily lose focus when reading a text: why not try turning text into audio files? This helps Elisa to follow the text more closely and take notes while listening. She also keeps close track of her reading progress by using the productivity application Notion. Apart from creating reading lists, Notion also helps her to keep an overview of her project’s progress, upcoming tasks, etc.
Also for collecting her data, Elisa had to make herself acquainted with new digital tools. A first important piece of data for her research are news articles. As Elisa did not yet know how to code, she followed some online courses on Python to learn the basic skills needed. Thanks to this, she can now scrape websites for metadata of news articles. Another important element in her data collection is conducting interviews, where she finds it very important that you invest in proper recording systems and equipment to guarantee the usability of the material.
Next to interviews, she conducts surveys with Qualtrics. The best advice here is to test your surveys thoroughly. Once sent out, you cannot change the survey questions anymore, so you have to be sure that the chosen questions will deliver the needed results.
First, in order to correctly organize and analyze all the collected data from news articles, Elisa felt the need to build a relational database in FileMaker. It helps her to organize her data, compare texts, and keep track of her overall workflow.
Secondly, for transcribing the conducted interviews, she uses sonix, which is an automated transcription service. It offers good quality transcriptions that you can edit yourself afterwards. Elisa stresses the importance of anonymizing your interviews before sending them in, to make sure you do not unwillingly share any personal data! Lastly, for coding the interviews she uses NVivo.
To conclude her talk, Elisa left us with a useful tip: it might be interesting to try out a different browser (in her case Sigma) as this might give you new perspectives about how to structure and manage your daily work.
Sara Cosemans: using digital research methods to deal with information overload
Sara is a Doctor Assistant in Cultural History at KU Leuven and a part-time Assistant Professor in the School of Social Sciences at UHasselt. The digital method discussed below was developed during her PhD at KU Leuven, together with data scientists Philip Grant, Ratan Sebastian, and computational linguist Marc Allassonnière-Tang. Learn more about her digital approach in this blog post.
Sara’s presentation was based on her PhD project entitled “The Internationalization of the Refugee Problem. Resettlement from the Global South during the 1970s”, which initially started off as a very analogue project. However, when facing some serious challenges, Sara started to explore digital methods. Her journey was one of trial and error, with a lot of investment in, on the one hand, educating herself in how to use digital tools, and, on the other hand, building a network of digital experts to collaborate with.
Sara’s project required a lot of archival visits in various countries. When going to the archives, she did not yet know what she was looking for exactly, making it necessary to scan every piece of information that held potential relevance. An analysis of the content would have to wait. However, when finalizing her archival visits, she ended up with an unimaginably large corpus of about 100,000 pages. She quickly realized that she would never be able to read everything and needed to come up with a digital solution.
To photograph the archival documents, Sara used her iPad as this had a big enough storage capacity and rendered high quality pictures. By using ABBYY FineReader she could subsequently apply Optical Character Recognition (OCR), which converted these photographs into fully text-searchable documents.
The next question, however, was how to search through all these files. A first idea was to build a relational database in FileMaker, which would mean entering all the metadata of the files coming from different institutions into the database, with the ultimate goal of making relations between those files. Unfortunately, entering the metadata was so time-consuming that it could only be completed for one institution. Therefore, she needed to come up with another solution. Since all her photographs were now searchable documents, a first quick way to find information that she was already expecting to find was simply using the CRTL + F function. But how can you find what you don’t already know? Here, natural language processing (NLP) proved to be the solution.
Since Sara did not have the time to learn natural language processing methods like topic modelling and clustering herself, she invested her energy in networking at DH conferences, which led to finding researchers who were very eager to work with her data. They developed a Google Colaboratory notebook in Python to do topic modelling on all files, determine topics, and make visualizations. They then created reading lists about the most important documents so that Sara could start with reading those files. This close reading made it possible for Sara to find new topics, which she could then explore further in other documents by using her CRTL + F method.
Sara concluded by saying that while she needed digital methods to make her research manageable and to help her find relevant connections, the analysis of the material still depended completely on her. The computer will never fully replace the close-reading, deep-thinking historian.
Marianna Montes: reproducibility and versioning as two important keys to a successful coding project
Mariana’s main research interests lie in corpus linguistics and cognitive semantics. The goal of her PhD project is methodological triangulation of distributional methods (namely, comparing vector space semantics, behavioral profiles and traditional lexicographical analysis), with case studies in English and Dutch. Some of the tools developed and used within the project can be found on her personal webpage. She recently also started working at ICTS, where she supports research data management.
Marianna’s interest in digital methods and tools was spiked when studying languages for which she needed to acquaint herself with statistics and programming. During her talk she therefore stressed the importance of challenging yourself to learn new skills and to use new digital tools. Over the past years, she has actively helped fellow researchers in their process of trying out new methods to achieve greater efficiency in their work.
Her main expertise is in R. She showed us how R can be used in multiple ways throughout your research: creating plots, making interactive reports, presenting slides, coding workflows, and so forth. On her blog, Marianna wrote an interesting piece about how you can implement R-project tools in your workflow.
Marianna also underlined how your work should be reproducible for both yourself and other people. During her research, Marianna experimented a lot with running various codes, trying out different clustering algorithms, etc. She ended up forgetting how she reached her results, making it necessary to double- or triple-check everything. Therefore, she started to carefully register all steps in her workflow in order to put into words the reasoning behind her coding. This way, she could answer questions like “What decisions did I make, and why?”. Marianna has written more extensively about how your old, current, and future self might not understand your decisions in this insightful blog post.
In the same vein, Marianna highlighted how versioning can be a true life-saver. For this, she uses Git. Git allows you to control versions, keep track of the differences between files, retract files that were removed, and make a screenshot of the state of your files at a given time. This way, you create an online backup, that you can also share with other people.
To conclude with an important message that was shared throughout all the presentations: doing a PhD, despite popular belief, should not be done in isolation. Instead, you should look for potential ways to connect with other researchers. A willingness to make the process of developing the dissertation visible can only help to improve the project and stimulate collaborations, which might lead to solving the problems you are facing or opening up new research avenues and generating new perspectives.