Digital Humanities Infrastructure Workshop: Part Two

Digital Content Analyst Lucy-Jane Walsh, continues her discussion of the UCDH Cyberinfrastructure workshops in November 2015:

Last week I began the blog post series by summarising James Smithies’ talk on global systems analysis of Digital Humanities infrastructure. Today I plan to move swiftly onto Paul Arthur, who is Professor and Chair in Digital Humanities at Western Sydney University, and has been involved in conversations about the future of research infrastructure in Australia for many years.

Smart Infrastructure for Cultural and Social Research – Paul Arthur

Arthur began his talk by explaining that the Humanities were less engaged with infrastructure planning in the past and that the dominant conception of infrastructure was about facilities and machines. Today, people are beginning to think about infrastructure less as tools for particular disciplines and more as a complex problem which can be viewed from many different perspectives. This has enabled the Humanities to engage more in the discussions about infrastructure and to help develop national strategies in Australia.

One example of this is the 2011 Strategic Roadmap for Australian Research Infrastructure which was developed by the Australian government through extensive consultation with the research sector. The aim of the document was to identify the priorities for national, collaborative infrastructure planning and investment from 2011 to 2016. According to Arthur, the difference between the 2011 Strategic Roadmap ­and previous infrastructure planning was that it included a dedicated section for the humanities and the arts, it placed more value on data sharing and collaboration, and it took a more distributed approach to infrastructure planning and investment – creating infrastructure that multiple disciplines could tap into, rather than discipline-specific infrastructure. This plan was never fully implemented but is still used as a road map today.

One of the key debates generated by this road map is whether we should have one infrastructure for all researchers, or a collection of interlocking resources for multiple disciplines. The argument for having one central infrastructure is that many difference resources can cause silos of knowledge and skills. It can also be difficult to generating funding for more than one infrastructure, particularly in the Humanities, leading many governments to opt for a centralised infrastructure instead. Australia has attempted to create a model somewhere in between these two approaches with their online infrastructure project, Nectar. Short for the National eResearch Collaboration Tools and Resources Project, Nectar hosts virtual laboratories where researchers can share ideas and collaborate. Nectar also supports tools for individual projects, such as HuNi (Humanities Networked Infrastructure) which combines data from many Australia cultural websites. According to Arthur, the combination of broad and specific resources that Nectar provides has been a successful model for Australia.

To Arthur, humanities infrastructure is not just information systems and laboratories, but digitised texts such as newspaper articles, records, and stories. In this talk, he argued that Humanities researchers use texts, not machines, to build knowledge, experiment, and draw conclusions. Databases such as Paperspast or Trove, he argued, are successful because of their wealth of historic data, not the computers or information systems working behind the scenes. From this perspective, the challenge for Digital Humanists becomes less about advocating for computers and more about digitising and making available large collections of social and cultural data.

As the Deputy General Editor of the Australian Dictionary of Biography (ADB) from 2010 to 2013, Arthur has a strong interest in biography, which he believes is particularly suited to digital research. This is because biographies can be studied at both the micro and the macro levels – as isolated stories that shed light on individuals, or aggregated collections providing insights on much larger movements.  Much of this macro analysis is made possible by digitising collections of biography, as this offers researchers an overview of the data, better access to the collection, and the ability to analyse the data computationally. Once ADB was digitised, for example, it became clear that there were few stories about women and Aborigines, and that many vocations were missing – an observation that would have been difficult to come by when the many thousands of biographies were only in print.

Arthur discussed his experiences at ADB when they came to digitise the biographies. Previously, the edition process was analogue in nature:  on pen and paper with a lot of face to face communication between members of the team. Arthur’s attempts to map this workflow resulted in a confusion of circles and lines, revealing the complex nature of analogue processes. In contrast, digital workflows need to be fairly rigid to work, since computers and information systems struggle to match the complexity of human interaction. For volume 18, Arthur experimented with Windows Live (now known as One Drive) and created a folder for each person in the dictionary. Within this folder were the biography and a file for notes or any additional information. Each time the biography was edited, a new version was saved on the drive, ensuring that changes could be reverted and versions compared. Using this method, ADB was able to create their first digital volume.

Initially the digitised version of the ADB replicated the print version, with the stories laid out alphabetically and grouped in accordance with their subject’s time of influence or death. However, as Arthur pointed out, digital environments are not restricted by the linear structure of the printed form and can offer many different modes of storytelling. Today the entries in the ADB can be searched by name, gender, birth, death, ethnicity, religion, occupations, author name, and printed volume. The dictionary also offers a faceted browse which allows repeated filtering of the stories by a list of predefined categories. Much of this functionality has been enabled by the additional metadata that the ADB team has been adding to the stories. This metadata is intended to show the interconnections between stories in the dictionary – for example, where the subjects are friends, enemies, or family, or they have related religions, won similar awards, or attended the same events.

In addition to adding more metadata, the ADB have also made their data available to projects such as Trove and HuNi and each story has been linked to the corresponding obituary in the Obituaries Australia digital repository. Linking data in this way can unveil more information about individuals – for example when and where they died and who came to their funeral. Moreover, it provides humanities researchers with larger, more diverse collections of linked cultural data from which they can investigate larger questions about cultural and heritage. Unfortunately there are barriers to a larger international infrastructure of interconnected biographical data, with resources such as the Oxford Dictionary of National Biography behind a subscription wall. However, projects like HuNi have revealed that, in Australia at least, this aggregation is possible.

Arthur finished his talk by pointing out that while cultural data is extremely laborious to collection, once collected its value does not depreciate over time. This suggests to me that investing in the digitisation of texts, such as biographies and newspaper articles, may be more valuable in the long run to the Humanities than information systems and computers.

Walsh will continue her discussion on these workshops in the new year.