Santa-fy Yourself

For the last three years, the UC Arts Digital Lab has been sharing a floor with the College of Arts Office staff and has been invited to enter their Christmas-door decorating competition. This is an invitation that we do not take lightly, mostly due to an over-zealous thirst for competition, but also because we love decorating things, and (of course) we love Christmas.

This year, we decided that as a Digital Lab our door should reflect the digital skills that we have on offer. So armed with Python and a webcam, we began the process of creating a script that would turn people walking down the hallway into Santa! The idea was to detect people’s faces using facial recognition software and to then to superimpose a beard and a hat onto their face and project it onto a screen.

As it happens, there is a fantastic library called OpenCV which can be used for facial recognition. OpenCV provides an infrastructure for object detection, which can be trained to detect any kind of object. It also comes with a number of ready-to-use detectors such as a face, mouth, and nose detector which can be used to build face-detecting programs. If you are interested in how these work, you can read about it in this blogpost by Engin Kurutepe.

A quick google revealed that there are many tutorials and scripts available online for OpenCV. I started with a tutorial which shows you how to attach a mustache to people’s faces:

While I liked the mustache, our goal had always been to add a beard. However, Jennifer and I quickly found that adapting scripts in OpenCV is incredibly difficult since everything is managed with just four coordinates (the top left and right corners plus bottom left and right corners of the face). Moving the beard down the face wasn’t too hard but adjusting the shape and proportions was incredibly difficult. What’s more, if the image of the beard tried to move beyond the limits of the screen, the programme would crash. We got very sick of seeing this message:

15683220_10212104121787118_1436059485_n

With some help from our volunteers Brad and Aidan, I did manage to get a beard working at one point, but it was very buggy. If you moved too close to the edges or the corners of the screen, the programme would stop running. I decided to abandon this idea.

Lucky for me, there was another tutorial online specifically for a santa hat! After adjusting the size of the frame, I at least had the hat portion of our program sorted:

Not happy with merely a hat, I tracked down a video of falling snow and overlaid each frame with the webcam. I also played around with the idea of audio, specially a clip which would play “ho ho ho” whenever it detected a face. I quickly realised that this was not only extremely irritating, but it also slowed our webcam feed to a standstill. Snow and a santa hat was enough anyway…right???

While I was working on the software component of the project, Jennifer and Rosalee created the “hardware”. Thanks to Rosalee’s talented flatmate Riaan, we had a screen with a metal bracket on the back that could be slotted over our door. We decided that we should construct a 3D structure around this that would look like a Gameboy (Christmas themed, of course):

15267614_1951743618386491_9017533343386600756_n

Gameboy 1

Gameboy 2

As you can see, we ran cords under the door so that we could power and connect to the screen. We also spent sometime setting the code up on a Raspberry Pi, but unfortunately the bit rate to the webcam was far too low. There are ways to improve this but we were running out of time. Instead we ran the programme off a laptop on a chair behind the door (with just enough room to open it comfortably).

Are you ready to see the finished result? Without further ado, I present to you “Santa-fy Yourself” by the UC Arts Digital Lab:




Breaking the Silicon Ceiling: Empowering women with technology

Breaking the Silicon Ceiling

On Saturday, Lucy-Jane presented at the 3rd Annual FemSoc Feminist Conference on Digital Humanities and empowering women with technology. The talk was well attended but we thought we would share it here for others.

Breaking the Silicon Ceiling (1)

Despite early involvement in the development of computers, women have largely been absent from the field of Computer Science for decades. This is something that most people are aware of, but I thought I would share some statistics to show just how large this gap is. If we look at education statistics from the U.S. there is a rapid decline in Computer Science degrees awarded to women since the mid 1980s, despite a steady increase in Mathematics, Engineering, and Physics degrees and an increase in percentage of degrees awarded to women as a whole.

It is quite hard to find statistics about Computer Science in NZ, but using data from Education Counts I was able to work out that women earned just 32% of Computer Science qualifications at all levels in 2006 (Data gathered from Provider-based Enrolments: Predominant Field of Study.xlsx). Another website I read claimed that this figure is closer to 20%, though it did not cite a source for this statistic (absoluteIt (2015), Is NZ’s gender gap in tech as bad as we think?).

Breaking the Silicon Ceiling (2) Statistics from 2001 and 2006 show that in New Zealand women hold just 25-26% of ‘professional’ computing roles and 29-37% of ‘technical’ computing roles. Because women are more likely to work in these technical roles, they also likely to make less money than men, with professional roles earning $45 – 48,000 compared with $30 – 37,000 for technical roles (Hunter, A (2012), Locating women in the New Zealand computing industry)
Breaking the Silicon Ceiling (3) With increasing need for technological skills in the workforce, there are obvious employment advantages to working in the IT industry, but the importance of gender diversity in computer science goes beyond employment. As technology becomes more integral in how we interact with each other and live our lives, it is important that technological developments meet the diverse range of needs that are present in society. If women are not part of the development of technology, it is likely that technological products will not meet their needs. Furthermore, as technology continues to wield influence over people and to shape culture and society, the power that comes from developing technology will remain in the hands of men, reinforcing the current system.
Breaking the Silicon Ceiling (4) There is much debate about what exactly it is that prevents women from studying Computer Science or working in IT. Issues that people cite include: the lack of computer science related toys for girls; the geek factor present in high school which seems to affect girls more than boys; and the absence of female role-models for girls in Computer Science. I have personally found taking Computer Science courses daunting as they often involve walking into rooms full of people that don’t seem to look or act like me (especially since I like to wear heels and glittery dresses). It becomes very hard to ask for help or guidance when you feel that you need to prove you have a right to be there in the first place and it can be very hard to make friends when you see yourself as an outsider.
Breaking the Silicon Ceiling (5)

Increasingly, education providers and IT companies are attempting to encourage more women to enter the IT industry. Google, for example, has poured $50 million dollars into the Made with Code programme which provides mentors, tools, and resources for girls in high school to learn to code . They also created the Anita Borg Scholarship which provides funding and opportunities for women to study Computer Science.

IT companies have also begun to change their practices in order to become more welcoming to women. Again, another Google example, but they have done a lot of work to examine the kinds of unconscious biases that exist in their teams and products and have offered workshops to their staff in order to educate them about bias and how it can affect the decisions they make.

These are wonderful initiatives but they tend to target the next generation of women, ignoring women who are currently outside of the tech industry who could still be empowered by technology.

Breaking the Silicon Ceiling (6)

My own journey into coding is reflective of the statistics I mentioned earlier in this talk. Despite being surrounded by computers from an early age, and having a father and brother with an avid interest in computing, I managed to pass through both primary school and high school without ever studying computing. At university, I studied Arts, with a particular focus on English, while my brother undertook a degree in Computing Engineering.

It wasn’t until I picked up a Digital Humanities paper in my third year that I discovered a passion for technology. I began by learning the TEI, a set of guidelines for how to create electronic scholarly texts. As part of the course, I scanned and transcribed letters from a Cantabrian soldier in Gallipoli, and then worked with my classmates to mark them up with the TEI (You can visit the website for the project here: http://editions.canterbury.ac.nz).

Breaking the Silicon Ceiling (7)

Due to my experience in my Digital Humanities course, I landed a job at CEISMIC when I graduated with my honours degree. Some of you may have heard of CEISMIC, but just in case, CEISMIC is the Digital Archive for the Canterbury earthquakes. Since 2012 we have been gathering social and cultural data about the quakes in an effort to provide a long-term resource for researchers and future generations to learn about the 2010 and 2011 Canterbury earthquakes. Today the archive has over 100,000 items from a range of different cultural heritage organisations around New Zealand, as well as from our own University repository of earthquake data.

When I started at CEISMIC my role was primarily focused on content gathering – approaching people around Canterbury to gather earthquake-related material, before describing it, organising it and adding it to the archive. But as time went on I began to be exposed to the more technical side of archiving. The admin tools we had available for the archive were never very good so I began to write queries which would pull data from the back-end of the system using documentation our IT people had given us.

Breaking the Silicon Ceiling

Encouraged by my manager, I started learning Python and Javascript through online tutorials so that I could help him with a web app that he had written. The app included several tools: a address search which searched through addresses in our system; a history page which outlined recent updates to the archive; and a map showing the spread of our content across Christchurch (all created by Christopher Thomson). Over the next year I added two more tools: a manifest creator and a manifest checker. These allowed us to automatically create and check the spreadsheets that we used to ingest material into the archive (which we call manifests), a task that was previously time-consuming and prone to errors.

I also worked with my colleague Jennifer Middendorf to create a simple photo-describing app which is now used by our volunteers to add captions to folders of photographs without having to use the complex and often confusing manifests (You can download a copy here). This app has been vastly improved by our two volunteers, Aidan Millow and Brad McNeur.

Last year, as part of the UC Staff Tertiary Study Assistance Scheme, I took my first Computer Science course in Relational Databases, and this year I am studying Artificial Intelligence as part of COSC 367. Studying these courses alongside my full-time job has been hard work but hugely rewarding.

Breaking the Silicon Ceiling (8)

This year, my job has undergone a massive transition as our office has evolved from CEISMIC to the UC Arts Digital Lab (of which CEISMIC is a major project). This has partly been a move towards longevity as the Canterbury earthquakes are not obviously going to be relevant forever, but it also because of the expertise and knowledge that we have built in the office. Essentially the aims of the Digital Lab are to enable digital research in the College of Arts by pairing the digital expertise in the lab with the subject knowledge and research experience of UC Arts academics and students.

In my new role I have begun to collaborate with researchers on papers, to develop software and tools for student projects, and to supervise interns and summer scholars. With the number of new programming languages, tools, and research projects that I have taken up I have come to confidently call myself a Digital Humanist and to feel that if I ever wanted to, I could transition from Arts into an IT field.

Breaking the Silicon Ceiling (9)

When I think about my journey to computing, I sometimes wonder if I would have been better off studying Computer Science, instead of undertaking the Humanities education that I did. But the truth is that there were things about English that I was drawn to that I don’t think Computer Science would have ever satisfied. This is where Digital Humanities comes into the picture.

I’ve thrown around the term Digital Humanities a few times now so I should probably explain what it is. Digital Humanities is a relatively new field of study in the Humanities and Social Sciences. It is considered an interdisciplinary subject as it tends to work in collaboration with other subjects, almost like an umbrella over the Arts. Digital Humanists use computational methods to answer existing Humanities questions and to pioneer new approaches in social and cultural research. The goal of Digital Humanities is to realise the possibilities that technology poses for the arts and to fully integrate technology into the activities of humanities researchers.

Breaking the Silicon Ceiling (10) As an intersection between Arts and technology, I think that Digital Humanities is well suited to women. I feel that women gravitate towards things that allow them to communicate and connect with the world which is why so many women study the Arts. With Digital Humanities, technology is seen as more of a tool than the focus of our study. We are interested in culture, people, and society, and use technology to investigate these interests.
Breaking the Silicon Ceiling (11) Furthermore, Digital Humanities has a hacking culture. This does not mean we like to take down banking systems, but more that we are happy to use tools or practices we are familiar with or have access to. Generally Digital Humanities have come from Humanities and Social Science background and have cobbled together digital skills from online tutorials or osmosis. This makes it more accessible to people who don’t have backgrounds in IT but who may want to learn.
Breaking the Silicon Ceiling (12) And lastly, because DH is still a relatively new field there is still plenty of room for different kinds of people with new and interesting ideas. The University of Canterbury’s Digital Humanities teaching programme, for example, has only been around for three or four years and it is the first of its kind in New Zealand. Similarly, the UC Arts Digital Lab is breaking new territory in New Zealand and we are doing it not as trained IT professionals, but as people with a passion for technology and a willingness to learn.
Breaking the Silicon Ceiling (13)

I am proud to say that our office is made up primarily of women and that most of our interns and students are women too. Because we have nearly all come from Humanities and Social Science backgrounds, there is a vibe of collaboration and support in the office – we do not expect those who enter to know much about computers but hope to share our expertise and knowledge so they know a little more when they leave.

Almost unintentionally, I feel that we have created a female friendly space on campus to learn about, develop, and experiment with technology. I hope we can continue to diversify the kinds of people we have in the office so that it can be a hub for other underrepresented voices in Computer Science too.

 




Interview with Dan

Over the last eight months, the UC Arts Digital Lab has been lucky to have student Dan Bartlett working in the office as part of the Voices Against War project. Dan came on-board as part of the Summer Scholarship programme to help gather, describe, and prepare material for the project and has quickly became a source of wisdom and humour in the office. Unfortunately for us, others have spotted his talents and he begins his next journey this week at Te Rūnanga o Ngāi Tahu, working on their World War I website. Before he left, Lucy-Jane took the time to sit down and talk to him about his work on Voices Against War and his experiences in the Digital Lab.

 

Dan Bartlett and the rest of the Voices Against War team

Dan Bartlett and the rest of the Voices Against War team

 

Tell us about the Voices Against War website
Ok. Well, Voices Against War is a website about pacifists, conscientious objectors, and seditious Cantabrians in the First World War. They were speaking out against, first: compulsory military training, and second: conscription. And quite a few of them got sent to gaol for their views, either for speaking out, which was sedition, or for refusing to go when they were conscripted, and that was conscientious objection. The website is telling the stories of quite a few different men, women and their families that were involved in that movement.

Great. And what are you studying?
I’m currently studying Honours in History. New Zealand history is a particular passion of mine. I’ve been really interested lately in reforms in the 90’s under the National government, from the 1991 budget, in terms of welfare and health and how it affected people – closing down hospitals, things like that. I’m particularly interested in history when you can use it to see how it affected people and how we could learn lessons from it and not do it instead.

What have you learned working in the UC Arts Digital Lab?
Well I didn’t really have digital skills! I came to the project with research and writing skills and not, for example, the ability to upload to a website. And I’ve learnt about metadata, and how websites work. You showed me the matrix code behind things {laughs} *Editor note: He means HTML* I have way more of an understanding than I had before.

So digital archiving of information and all those things that go along with it?
Yeah, like copyright; how to accurately describe things; metadata.

And terms like ‘metadata’ you probably hadn’t come across before?
Yeah, no, I hadn’t. I might have seen it but I didn’t know what it was called.

So what is metadata (for those who don’t know)?
It’s all the data that’s extra from the item or file itself, I guess. Like the date; type of image; rights, who it came from; who gave it to the archive. So I guess it’s the holistic version of saying, “This is so-and-so in 1918.”

I always think of it like bibliographical data – data that explains the origins of…
Provenance. Yeah. But also how to find it, if you want to find it and use it for research.

Would you see yourself now as a Digital Humanist?
It’s definitely going that way. I’d hope to develop more skills. So I applied for this job with Ngāi Tahu which is going to be doing the same work – the research and writing – for their World War I website, which will be looking at Ngāi Tahu soldiers. And I think the main reason I got the job was because I have those website skills.

What I think is interesting, and what you’ve sort of talked about before, is you’ve never really taken to the digital. You’ve gone into this role, but really the role is highly focussed on the historic passions that you have. The digital is just a way for you to get there.
Yeah and I’ve discovered that using digital platforms is a really good democratic way to do public history. It’s a way to get information out there that’s more than just at one museum in a physical place in Christchurch, for example. One of the descendants’ families, they’re all over the world, and they are able to access this website and look at their Grandfather and Great-Grandfather’s profile that we’ve put up because it is digital. And I wouldn’t have really thought about it before. It’s made me appreciate websites such as Te Ara a whole lot more.

And I think you understand a little bit more about the work that goes into creating those resources. As a student I was just like, “These resources exist!”
“Thanks!” Yeah, but it takes heaps.

So would you encourage other students to take DIGI papers or to work in the lab?
Yes, definitely. It’s funny – I think I told you, but I had a couple of appointments in the Library about resources, because I was wanting to use old issues of the Times and the Guardian for British history because I was looking at the miners’ strike. And they sent me this thing saying, “You might be interested in this Digital Humanities.” But it was really quite foreign to me. I was like, “That sounds really weird!” And now I get it and it’s really helpful.
Even if you were just able to take a DIGI paper as part of your degree, I think that would be really helpful.

I think that a large problem with Digital Humanities is translating what it means to people. Because I think the term in itself is not very clear. A lot of people point out that there is no ‘Digital Humanities’, that all of the Humanities should be integrating the digital. And I think the Voices Against War website is a good example of that. The project isn’t successful because of the digital element. It is successful because it is an interesting topic that there aren’t enough resources on and that has real-world applicability. The stories that make up the project are what makes it compelling – the digital element is just what makes it available to people.
Yeah! The stories are driving the project but the digital element provides the tool to share them.

So have you been converted to the digital?
In terms of public history and archiving – yeah, big time. Because we’ve had to use all those resources for Voices Against War, like Papers Past and things like that. They’re just amazing things. But I still don’t want a cell phone or a smart phone. I hope nobody makes me get one because it just stresses me out. So converted in some ways, other ways not so much.

So there you have it – Dan Bartlett, the Digital Humanist without a cell phone. We would like to thank Dan for his work in the office, but also for putting a smile on our faces every day. We know that he will be cherished at Ngāi Tahu and that he has a bright future in public history.

tēnā rawa atu koe!




Digital Humanities Infrastructure Workshop: Part Four

Today I finish my series on the Digital Humanities Infrastructure Workshops held in November last year, by discussing my own perspective on cyberinfrastructure. But before I do this, I thought that I should outline my background in Digital Humanities and my role at CEISMIC in order to put my thoughts in context.

I was introduced to the Digital Humanities in the third year of my English degree when I took the University’s first DH paper, Electronic Scholarly Editing. This paper was run by Prof. Paul Millar, with the help of Dr Christopher Thomson as a tutor. It aimed to critically examine digital texts and equip students with the skills to create their own, namely through the TEI (a set of guidelines which specify methods for encoding machine-readable texts ). Over the next two years, I worked on two projects digitising manuscripts using the TEI. The first is a collection of World War I letters from a member of the Canterbury Mounted Rifles (which you can view at http://editions.canterbury.ac.nz), and the other a memoir in letter form from New Zealand doctor Stanley Aylward. As well as teaching me how to encode texts with the TEI, these projects opened my eyes to the opportunities that digitisation offers the Humanities, and the intensive work that goes into it. In both projects, the manuscripts required more complex and nuanced analysis than computers were capable of giving and had to be encoded by hand – a common requirement for many Digital Humanities projects.

My position at CEISMIC has further highlighted this requirement, as I work daily to gather, organise, and describe large quantities of earthquake-related data. CEISMIC’s focus has always been on social data, with an aim to collect as many stories and documents about the earthquakes possible before they are forgotten or lost. Today we have over 100,000 items in the archive – a fantastic achievement, but not an easy one since we described and annotated every item by hand. On average, our team estimates that it takes us six minutes to describe and geolocate a photograph, a number which doesn’t sound too bad until you extrapolate it over the 46,447 photographs we currently hold in QuakeStudies (adding up to 276,682 minutes, or 4645 hours, or 580 days). And that’s just the photographs – we have also archived hundreds of stories (such as with our QuakeBox project), academic research, community data (such as newsletters and artworks), newspapers, and much, much more.

Given my experiences, it would be easy for me to agree with Paul Arthur that investing in the digitisation (or in our case archiving) of social data may be the most valuable form of infrastructure for the Humanities. However, I would argue that this process is not possible without people with the skills and knowledge required.  Often I hear Humanists and people from the GLAM sector comment that they need more people who have skills both in the Humanities and the digital, and yet there are very few programmes in existence training people in both skills. As readers of this blog are likely aware, the University of Canterbury offers Digital Humanities courses at honours and masters level, and is offering a Digital Arts, Social Sciences and Humanities minor to undergraduate students for the first time next year. However, we are the only university in New Zealand that has a Digital Humanities programme, and the impetus for this came from within the College of Arts and the University. As of yet, there is no national strategy in New Zealand for the training of Digital Humanists.

Moreover if the Digital Humanities, as Alan Liu argues, is tasked with critiquing academic infrastructure and its relation to larger society, this critique needs to represent the diversity in society.  My problem with the current ‘lightly antifoundationalist’ model and ‘hacking’ is that it can only be achieved by Humanists that have digital skills. As I have already discussed, people with these skills are usually in the minority, but they also tend to come from certain groups in society – e.g men and people from high socio-economic backgrounds who have had access to computers from a young age. The problem with this is that these tools are potentially being created by one group in society, and any critique that they allow is potentially coming from one perspective. If we want our cyberinfrastructure to reflect the diverse needs and values of society, then I would argue that we need to ensure that a wide range of people are participating in the field.

Perhaps this naive, but if, as Liu claims, the shaping of academic infrastructure can have a bearing on other organisations and the community at large, then perhaps training more people in the Digital Humanities will have a factor too. I personally would love to see a world where the tech industry held equal numbers of women and men, where there was more ethnic diversity, and where the average Humanities student graduated with some technical nous. In some ways this could be seen as a form of infrastructure – training people with the skills and sensibilities to critique digital culture both in their work, but also in their wider environs. It’s my hope that doing so would widen the pool of ideas, revealing new and innovative solutions, and more nuanced critiques of infrastructure.




Digital Humanities Infrastructure Workshop: Part Three

Digital Content Analyst Lucy-Jane Walsh, continues her discussion of the UCDH Cyberinfrastructure with a summary of Alan Liu’s talk:

Against the Cultural Singularity: Digital Humanities & Cultural Infrastructure Studies ­– Alan Liu

Alan Liu began his talk with a quote from Finnish architect Eliel Saarinen: “Always design a thing by considering it in its next larger context”.  With this in mind, he chose to focus on Digital Humanities cyberinfrastructure as a sub-domain of Humanities infrastructure, and to look at how Digital Humanities can support traditional Humanities fields.

Liu argued that the Digital Humanities has a tradition of critiquing infrastructure, which is not only unique to the field, but the best mechanism for supporting traditional modes of criticism. This is because infrastructure has the same impact on individuals and communities as culture – it makes up our environment and how we interact with each other. Liu used dystopian films as an example, pointing out that whole cultures in these films are dominated by the infrastructure that is available to them. In Blade Runner, for example, flying cars make up the environment, where as in Mad Max the world is driven by fuel. Today culture could be said to be shaped by smart phones, social networking, and big data. By critiquing these systems, Digital Humanists can add to the larger debates surrounding culture while remaining in the digital sphere.

According to Liu, the current style of Digital Humanities critique is “lightly anti-foundationalist”. He cited James Smithies, Michael Dieter, Bruno Latour, Ackbar Abbas, and David Theo Goldberg as examples of this, arguing that while Digital Humanists believe in the potential for known and trusted digital tools and methodologies to provide new insights in the field of humanities, they are also distrustful of them. This is evident in Digital Humanists’ tendencies to ‘hack’ – where hacking in this context means using the skills and tools one understands and has at hand rather than investing in more formal forms of infrastructure. To Liu ‘hacking’ gives the Digital Humanities a unique perspective: it allows the field to be efficient and flexible, and to get close enough to systems to understand their weaknesses without being vulnerable to them.

In order to move forward, Liu suggested that Digital Humanities should adopt what he calls ‘Critical Infrastructure Studies’, the formal study of academic infrastructure in its relation to larger society, which he sees as the Digital Humanities’ mode of cultural studies.  Liu suggested two approaches to Critical Infrastructure Studies: the Neoinstitutionalist approach to organizations in sociology, which explores how institutional structures and norms influence the decisions and actions of individuals in the institutions; or Social Constructionist (especially Adaptive Structuration) approaches to organizational infrastructure in sociology and information science, which would investigate how the interactions and connections between people can construct beliefs and understandings of the world, and how these interactions can affect our perceptions and use of particular technologies. Liu believes that these approaches would help Digital Humanists to create new academic programmes and roles, and to advocate for the creation of national collaborative infrastructures, opening up research data to wider audiences.

Revisiting the quote from the beginning, Liu suggested that the work that Digital Humanists put into shaping academic infrastructure will have a bearing on other organisations and the community at large. This is where Liu’s title for this talk – Against the cultural singularity – comes into focus, for he argues that the current neoliberal capitalist thinking is creating a ‘cultural singularity’. He defines this as an environment where all parts of cultural are capitalized and brought under a corporate framework.  Liu argues that society would be stronger if institutions adopted their own metrics of value and success, and used these metrics to make decisions about infrastructure. He believes that by critiquing infrastructure, Digital Humanists can resist the neoliberal model and offer alternatives.




Digital Humanities Infrastructure Workshop: Part Two

Digital Content Analyst Lucy-Jane Walsh, continues her discussion of the UCDH Cyberinfrastructure workshops in November 2015:

Last week I began the blog post series by summarising James Smithies’ talk on global systems analysis of Digital Humanities infrastructure. Today I plan to move swiftly onto Paul Arthur, who is Professor and Chair in Digital Humanities at Western Sydney University, and has been involved in conversations about the future of research infrastructure in Australia for many years.

Smart Infrastructure for Cultural and Social Research – Paul Arthur

Arthur began his talk by explaining that the Humanities were less engaged with infrastructure planning in the past and that the dominant conception of infrastructure was about facilities and machines. Today, people are beginning to think about infrastructure less as tools for particular disciplines and more as a complex problem which can be viewed from many different perspectives. This has enabled the Humanities to engage more in the discussions about infrastructure and to help develop national strategies in Australia.

One example of this is the 2011 Strategic Roadmap for Australian Research Infrastructure which was developed by the Australian government through extensive consultation with the research sector. The aim of the document was to identify the priorities for national, collaborative infrastructure planning and investment from 2011 to 2016. According to Arthur, the difference between the 2011 Strategic Roadmap ­and previous infrastructure planning was that it included a dedicated section for the humanities and the arts, it placed more value on data sharing and collaboration, and it took a more distributed approach to infrastructure planning and investment – creating infrastructure that multiple disciplines could tap into, rather than discipline-specific infrastructure. This plan was never fully implemented but is still used as a road map today.

One of the key debates generated by this road map is whether we should have one infrastructure for all researchers, or a collection of interlocking resources for multiple disciplines. The argument for having one central infrastructure is that many difference resources can cause silos of knowledge and skills. It can also be difficult to generating funding for more than one infrastructure, particularly in the Humanities, leading many governments to opt for a centralised infrastructure instead. Australia has attempted to create a model somewhere in between these two approaches with their online infrastructure project, Nectar. Short for the National eResearch Collaboration Tools and Resources Project, Nectar hosts virtual laboratories where researchers can share ideas and collaborate. Nectar also supports tools for individual projects, such as HuNi (Humanities Networked Infrastructure) which combines data from many Australia cultural websites. According to Arthur, the combination of broad and specific resources that Nectar provides has been a successful model for Australia.

To Arthur, humanities infrastructure is not just information systems and laboratories, but digitised texts such as newspaper articles, records, and stories. In this talk, he argued that Humanities researchers use texts, not machines, to build knowledge, experiment, and draw conclusions. Databases such as Paperspast or Trove, he argued, are successful because of their wealth of historic data, not the computers or information systems working behind the scenes. From this perspective, the challenge for Digital Humanists becomes less about advocating for computers and more about digitising and making available large collections of social and cultural data.

As the Deputy General Editor of the Australian Dictionary of Biography (ADB) from 2010 to 2013, Arthur has a strong interest in biography, which he believes is particularly suited to digital research. This is because biographies can be studied at both the micro and the macro levels – as isolated stories that shed light on individuals, or aggregated collections providing insights on much larger movements.  Much of this macro analysis is made possible by digitising collections of biography, as this offers researchers an overview of the data, better access to the collection, and the ability to analyse the data computationally. Once ADB was digitised, for example, it became clear that there were few stories about women and Aborigines, and that many vocations were missing – an observation that would have been difficult to come by when the many thousands of biographies were only in print.

Arthur discussed his experiences at ADB when they came to digitise the biographies. Previously, the edition process was analogue in nature:  on pen and paper with a lot of face to face communication between members of the team. Arthur’s attempts to map this workflow resulted in a confusion of circles and lines, revealing the complex nature of analogue processes. In contrast, digital workflows need to be fairly rigid to work, since computers and information systems struggle to match the complexity of human interaction. For volume 18, Arthur experimented with Windows Live (now known as One Drive) and created a folder for each person in the dictionary. Within this folder were the biography and a file for notes or any additional information. Each time the biography was edited, a new version was saved on the drive, ensuring that changes could be reverted and versions compared. Using this method, ADB was able to create their first digital volume.

Initially the digitised version of the ADB replicated the print version, with the stories laid out alphabetically and grouped in accordance with their subject’s time of influence or death. However, as Arthur pointed out, digital environments are not restricted by the linear structure of the printed form and can offer many different modes of storytelling. Today the entries in the ADB can be searched by name, gender, birth, death, ethnicity, religion, occupations, author name, and printed volume. The dictionary also offers a faceted browse which allows repeated filtering of the stories by a list of predefined categories. Much of this functionality has been enabled by the additional metadata that the ADB team has been adding to the stories. This metadata is intended to show the interconnections between stories in the dictionary – for example, where the subjects are friends, enemies, or family, or they have related religions, won similar awards, or attended the same events.

In addition to adding more metadata, the ADB have also made their data available to projects such as Trove and HuNi and each story has been linked to the corresponding obituary in the Obituaries Australia digital repository. Linking data in this way can unveil more information about individuals – for example when and where they died and who came to their funeral. Moreover, it provides humanities researchers with larger, more diverse collections of linked cultural data from which they can investigate larger questions about cultural and heritage. Unfortunately there are barriers to a larger international infrastructure of interconnected biographical data, with resources such as the Oxford Dictionary of National Biography behind a subscription wall. However, projects like HuNi have revealed that, in Australia at least, this aggregation is possible.

Arthur finished his talk by pointing out that while cultural data is extremely laborious to collection, once collected its value does not depreciate over time. This suggests to me that investing in the digitisation of texts, such as biographies and newspaper articles, may be more valuable in the long run to the Humanities than information systems and computers.

Walsh will continue her discussion on these workshops in the new year.




Digital Humanities Infrastructure Workshop: Part One

Today we have a guest blogger, Lucy-Jane Walsh, Digital Content Analyst at the CEISMIC Programme, talking about her impressions of a recent seminar held by the UC Digital Humanities Programme:

A few weeks ago I attended an afternoon of short seminars about Digital Humanities cyberinfrastructure held by the Digital Humanities Programme at the University of Canterbury. Speakers included Dr James Smithies, Director of the UC Digital Humanities Programme and Co-Director of the UC CEISMIC Programme; Dr Alan Liu, Professor in the English Department at the University of California, Santa Barbara, and an affiliated faculty member of UCSB’s Media Arts & Technology graduate program; and Paul Arthur, Professor and Chair in Digital Humanities at Western Sydney University. The aim of the workshop was to begin an informal discussion on national and international Digital Humanities cyberinfrastructure – what tools and resources exist presently; how can we better leverage and improve them; and how can we advocate for their funding and development?

I must admit that I had not come across the notion of ‘cyberinfrastructure’ before this seminar series and I tend to associate the term ‘infrastructure’ with Engineering (buildings, roads, power lines). However the need for people, funding, computers, and software in the Humanities – particularly in regards to digital research and project development – is not news to me. As a Digital Content Analyst at the UC CEISMIC Programme, I not only rely on this infrastructure every day, but am also in the business of creating it. Over the next few weeks, I intend to summarise the points made by Smithies, Liu, and Arthur during the cyberinfrastructure workshop in a series of blog posts, before adding my own thoughts to the conversation. I begin with James Smithie’s talk today:

Towards a Global Systems Analysis of the Humanities – James Smithies

James Smithies was actually the last to speak at the event, but I felt that his talk was a good introduction to the topic of Digital Humanities cyberinfrastructure, so I have decided to reverse the order in my blog posts. His talk was drawn from the first chapter of his upcoming book, The Digital Modern: Humanities and new media for Palgrave Macmillan.

Smithies began the talk by discussing the politics of cyberinfrastructure. He identified Our Cultural Commonwealth – a report by the American Council of Learned Commission (ACLS) on Cyberinfrastructure for the Humanities and Social Sciences – as one of the initial attempts to charter opportunities for computationally intensive Humanities research. This report, like many early models for DH infrastructure, borrowed much of its mode of thinking from the STEM fields. It stated that, “computers should be used by scholars in the Humanities, just as microscopes should be used by scientists” (Our Cultural Commonwealth, 2006: i). In other words it is as important to invest in infrastructure in the Humanities as it is in Engineering, Maths, and Science.

Smithies argued that this STEM-based model caused tension in the Humanities, as many digital projects were given large amounts of money over more traditional projects. When these digital projects failed to deliver their promises, this infrastructure model began to generate criticism. Patrick Svensson, for example, argued that the allocation of space and the ability to collaborate with people in and outside the Humanities department is as important to Digital Humanists as computers and information systems. Feminists also called for more inclusive data models which would take into account gender and ethnic inequalities.  Susan Leigh Star argued that infrastructure should be evaluated in ethnological terms, in that it does not only represent tools or resources that we can use, but also the values and norms of the culture that created it. She argued that infrastructure is created to serve particular types of people and practices – in essence, infrastructure is political in nature and it is the task of Digital Humanists to challenge the preconceived notions of what infrastructure is and can be.

The problem with challenging the status quo is that the Digital Humanities community does not currently have a strong concept of what that is. Smithies suggested that the first step in analysing and critiquing Digital Humanities infrastructure would be to identify the cyberinfrastructure that already exists. He suggested using a systems analysis approach, borrowed from the STEM fields, to provide an initial overview of the current state of global cyberinfrastructure.

Smithies further argued that Humanists’ investigation of infrastructure should go right down to how the tools are made and whether they mirror Digital Humanities values such as openness and net neutrality. Eventually, he hopes that systems analysis will move from a model to a genre – a collection of approaches for analysing systems which reflect a multitude of values and perspectives.

Walsh will continue her discussion on these workshops next week.




2015 Kiwi Pycon

A few weeks ago, CEISMIC Digital Content Analyst Lucy-Jane Walsh attended the first day of the Kiwi PyCon Conference. This year the conference was held at the University of Canterbury with Catalyst IT as a Platinum Sponsor. Lucy-Jane discusses her experiences below:

Being a little late to the game, I was only able to attend the first day of Kiwi PyCon, a day mostly consisting of sprints and tutorials, with the usual format of talks left to the Saturday and Sunday. This suited me well – as a bit of a Python fanatic, I was itching to sit down and write some code, to learn new tricks, and perfect my old ones.

To put things in context, I learnt to code with Python, transferring from Javascript after several muddled attempts. After Javascript, Python seemed like a dream: no clumps of brackets and semi-colons, no need to define the conditions of a loop. What I like about Python is that it emphasizes code readability – indentation is used instead of curly brackets, and English words instead of punctuation. As summarized by the principals laid out in PEP (The Zen of Python): ‘beautiful is better than ugly’ and ‘simple is better than complex’.

My favourite tutorial was with Yuriy Ackermann, system administrator at Mount Maunganui College and JS Game developer at SLSNZ. Ackermann taught us how to scrape the web using Python, leading us through a script he had written to gather information about games on the digital game store, Steam. He broke the problem into three key steps – connecting, parsing, and parallelising – explaining the reasons each step were necessary and the libraries and tools he used to do them. I have summarised each steps below:

Connect

Ackermann used the urllib library to handle urls in Python. Using url.request (in Python3), he showed us how to open a url and decode and read the contents. He also showed us a cool trick for convincing google that you are not a robot. This is necessary for sites like google who reject requests made from outside a browser (encouraging developers to use their API instead). One way to get around this is to place a ‘user-agent’ in the header of the request which reflects browser behaviour. The value for the User-Agent can be found in the develop tools in your browser when you load a url (under the network tab):

2015-09-10 11_57_01-monty python - Google Search

Parse

Once the html content has been retrieved, the next step is to parse it. This means finding the parts of a string (in this case a string of html) that we are interested in and organising them into a useful structure. Ackermann used BeautifulSoup4 for this step of the scraping – a Python library built for pulling data out of HTML and XML pages. In particular, the .find() and .find_all() methods are incredible useful, the first allowing you to retrieve the first instance of a tag, and the second retrieving every instance of a tag and storing it in a list.

Ackermann used both of these methods to create a function for parsing urls from Stream. This function takes a string, such as the html from the url of one of stream’s games, and finds the name, price, currency, tags, and rating for that game. He also added some error exception to deal with 404s, timeouts, and pages not have price, tags, and names, and to clean up the data.

Parallelising

Once we had a script that could parse data for one of Steam games, it was time to run it across all of the games. The simplest way of achieving this would be to write a for-loop, but this would require a lot of requests (around 100,000) and a lot of time (7 hours at 250ms per request). On top of this, most websites check logs and will ban IPs that make too many requests. Ackermann’s solution was to move to a parallel process.

This part was somewhat harder for me to understand, having never tried parallel computing myself, and unfortunately we ran out of time. Basically Ackermann set up a server and created a bunch of online virtual machines (VMs). He got the server to send unique urls from Steam to the VMs and set them to retrieving and parsing the information. The VMs would then send it back to his server through a post request. This allowed him to run 100s of requests at once, cutting the time from hours to minutes.

For a step by step guide to this tutorial, check out Ackerman’s slides.

A huge thanks to Catalyst for sponsoring this year’s Kiwi PyCon. I really enjoyed the tutorials and meeting all the other python fans and developers. I hope next year I can attend again and get to hear the talks this time.