The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Fien Messens, BiblioTech Hackathon participant. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Fien’s group, “God Save the Tweets!,” worked on the contemporary news media dataset featuring tweets including the hashtags #queueforthequeen, #abolishthemonarchy, and #queenelizabeth during a short span of time around the death of Queen Elizabeth II in September 2022. You can learn more about the team’s work by having a look at their project poster in the BiblioTech Zenodo community. You can also find data related to the technical aspects of the project in the God Save the Tweets! GitHub repository. To read more about the hackathon, view the full photo album, and discover the teams’ results, you can visit the BiblioTech website.
What first interested you in the hackathon? Have you done one before? What is your background?
When I first learned about the hackathon, I was immediately drawn to the opportunity of tackling a challenging problem within a limited time frame and collaborating with a team of individuals with specific expertise from a variety of backgrounds. I have a background in art history and digital humanities, especially with born digital collections in the GLAM scene. I can archive and preserve a tweet in a stable way, but have little knowledge on how to analyze the output. So this hackathon provided me with a learning platform where I could go about learning about this through trial and error.
What was your primary concern when beginning the project? Interface? Usability?
We analyzed the tweets from the first 10 days of the passing of Queen Elizabeth II. Our primary concern at the outset was the sheer volume of the data. We were confronted with a vast corpus comprising multiple languages, but we swiftly narrowed it down to English only. Despite this refinement, we still faced the challenge of dealing with a substantial dataset of 300,000 posts. Our main objective became comprehending and effectively managing this extensive stream of tweets by employing suitable methodologies.
What kind of audience did you have in mind?
That’s a tricky question.
The project posed a complex challenge as we used text analysis, sentiment analysis and data mining techniques. In addition, we also created a tweet generator, utilizing our existing corpus to craft newly generated tweets. The primary objective of our tweet generator was to cater to users who sought to share opinions and partake in engaging discussions on Twitter, fostering a sense of community and/or collaboration. Our target audience encompassed not only researchers but also a wider public audience, aiming to provide a platform for diverse individuals to express themselves and facilitate meaningful (or not so meaningful) interactions. In general, we were aiming for a broad public, not just researchers.
How did you establish your methodology and approach to the data set? Were you inspired by any other platforms or projects?
The methodology revolved around text analysis, also known as text mining. To ensure relevance, we filtered the English language tweets using specific keywords and then employed sentiment analysis techniques to analyze the data. In addition to drawing inspiration from other sentiment analysis projects, we benefited from the expertise of one team member who had previously explored sentiment analysis methodologies during a digital humanities course.
When it came to the tweet generator, we found inspiration in various existing tweet bots. We especially liked the @TorfsBot tweet bot, from the old rector of KU Leuven, which had publicly available source code that proved immensely helpful. While we drew insights from existing bots, we primarily relied on our own experiences and creatively adapted concepts from other tweet bots to shape our unique implementation.
Why did you choose to do the most active users as opposed to a larger pool? Was it just to downsize the data size or was there another reason?
We focused on the most active users because it helped us get a grasp on the users and those who were actively participating in online discussions. We observed an occurrence of overfishing in specific hashtags, where certain users exploited the hashtags to promote unrelated content. So, by looking at the top 10 or so active users, it allowed us to better understand the overall sentiments surrounding the Queen’s passing. This approach aided us in filtering out the excessive use of certain hashtags and reducing their impact. It allowed us to focus on the overall sentiment analysis without being influenced by the promotion of unrelated tweets or ideas that were going viral at the time.
How did you classify words/users on a -1.00 to 1.00 scale?
We utilized a sentiment analysis algorithm that employed a numerical scale ranging from -1.00 to 1.00. A score closer to -1.00 indicated a negative sentiment, while a score closer to 1.00 indicated a positive sentiment. The algorithm took into consideration various linguistic and contextual factors, such as keywords, to determine the polarity of sentiments expressed in the text. It is important to highlight that one of our team members had expertise in sentiment analysis, which significantly enriched our project.
Given the size of our team, we were able to allocate tasks efficiently. I primarily focused on developing the website, while other team members handled tasks such as presentation creation, analysis, and more. This division of responsibilities within the team proved to be highly beneficial, ensuring a smooth workflow and allowing each member to contribute their expertise to the project.
Was there any main motivator/goal that encouraged the team when things didn’t go as expected?
Digital humanities is all about embracing the adventure of trial and error, and boy, did we dive headfirst into it! Our incredible team leader, Leen Sevens, was our ultimate cheerleader, always keeping our spirits high with her GIFs and emojis. But it didn’t stop there – the organizing team of the hackathon also rocked our chat with their infectious motivation. Knowing that we had this amazing support system, cheering us on and having our backs throughout the process, made the journey all the more exciting and rewarding. It’s safe to say we had a real dream team!
What were some of the roadblocks you faced?
Our initial plan was to generate tweets for the tweet generator in the backend, but it proved to be more complex than anticipated. Not only would it have required purchasing additional storage, but it simply didn’t work out as intended. Thankfully, we had a solid plan B in place. We resorted to manually creating the tweets for our generator and seamlessly integrating them into the HTML and Java Script system.
Timing also presented its own challenges. With numerous tasks and responsibilities on our plate, the 10-day timeframe seemed like a whirlwind. However, there was a silver lining to this intense rush. It brought the entire team together, all of us uniting our efforts to meet the deadline. In a way, it fostered a sense of community among us, reinforcing our shared commitment to achieving our common goal.
What kind of tips would you give to a team doing their first hackathon?
I’ve already written some down [laughs].
- Plan, divide, and conquer. Let folks explore what tickles their brain and make it a learning experience for everyone.
- Communication is key. Regularly touch base, share ideas, and foster an environment of open dialogue.
- Embrace the beautiful chaos of errors, because that’s where the magic happens!
- Have fun and enjoy the process. It’s a great opportunity to learn from each other and work towards one shared goal.