All posts by Lucy-Jane Walsh


Santa-fy Yourself

By | Uncategorized | No Comments

For the last three years, the UC Arts Digital Lab has been sharing a floor with the College of Arts Office staff and has been invited to enter their Christmas-door decorating competition. This is an invitation that we do not take lightly, mostly due to an over-zealous thirst for competition, but also because we love decorating things, and (of course) we love Christmas.

This year, we decided that as a Digital Lab our door should reflect the digital skills that we have on offer. So armed with Python and a webcam, we began the process of creating a script that would turn people walking down the hallway into Santa! The idea was to detect people’s faces using facial recognition software and to then to superimpose a beard and a hat onto their face and project it onto a screen.

Read More

Interview with Dan

By | Uncategorized | No Comments

Over the last eight months, the UC Arts Digital Lab has been lucky to have student Dan Bartlett working in the office as part of the Voices Against War project. Dan came on-board as part of the Summer Scholarship programme to help gather, describe, and prepare material for the project and has quickly became a source of wisdom and humour in the office. Unfortunately for us, others have spotted his talents and he begins his next journey this week at Te Rūnanga o Ngāi Tahu, working on their World War I website. Before he left, Lucy-Jane took the time to sit down and talk to him about his work on Voices Against War and his experiences in the Digital Lab.


Dan Bartlett and the rest of the Voices Against War team

Dan Bartlett and the rest of the Voices Against War team


Tell us about the Voices Against War website
Ok. Well, Voices Against War is a website about pacifists, conscientious objectors, and seditious Cantabrians in the First World War. They were speaking out against, first: compulsory military training, and second: conscription. And quite a few of them got sent to gaol for their views, either for speaking out, which was sedition, or for refusing to go when they were conscripted, and that was conscientious objection. The website is telling the stories of quite a few different men, women and their families that were involved in that movement.

Great. And what are you studying?
I’m currently studying Honours in History. New Zealand history is a particular passion of mine. I’ve been really interested lately in reforms in the 90’s under the National government, from the 1991 budget, in terms of welfare and health and how it affected people – closing down hospitals, things like that. I’m particularly interested in history when you can use it to see how it affected people and how we could learn lessons from it and not do it instead.

What have you learned working in the UC Arts Digital Lab?
Well I didn’t really have digital skills! I came to the project with research and writing skills and not, for example, the ability to upload to a website. And I’ve learnt about metadata, and how websites work. You showed me the matrix code behind things {laughs} *Editor note: He means HTML* I have way more of an understanding than I had before.

So digital archiving of information and all those things that go along with it?
Yeah, like copyright; how to accurately describe things; metadata.

And terms like ‘metadata’ you probably hadn’t come across before?
Yeah, no, I hadn’t. I might have seen it but I didn’t know what it was called.

So what is metadata (for those who don’t know)?
It’s all the data that’s extra from the item or file itself, I guess. Like the date; type of image; rights, who it came from; who gave it to the archive. So I guess it’s the holistic version of saying, “This is so-and-so in 1918.”

I always think of it like bibliographical data – data that explains the origins of…
Provenance. Yeah. But also how to find it, if you want to find it and use it for research.

Would you see yourself now as a Digital Humanist?
It’s definitely going that way. I’d hope to develop more skills. So I applied for this job with Ngāi Tahu which is going to be doing the same work – the research and writing – for their World War I website, which will be looking at Ngāi Tahu soldiers. And I think the main reason I got the job was because I have those website skills.

What I think is interesting, and what you’ve sort of talked about before, is you’ve never really taken to the digital. You’ve gone into this role, but really the role is highly focussed on the historic passions that you have. The digital is just a way for you to get there.
Yeah and I’ve discovered that using digital platforms is a really good democratic way to do public history. It’s a way to get information out there that’s more than just at one museum in a physical place in Christchurch, for example. One of the descendants’ families, they’re all over the world, and they are able to access this website and look at their Grandfather and Great-Grandfather’s profile that we’ve put up because it is digital. And I wouldn’t have really thought about it before. It’s made me appreciate websites such as Te Ara a whole lot more.

And I think you understand a little bit more about the work that goes into creating those resources. As a student I was just like, “These resources exist!”
“Thanks!” Yeah, but it takes heaps.

So would you encourage other students to take DIGI papers or to work in the lab?
Yes, definitely. It’s funny – I think I told you, but I had a couple of appointments in the Library about resources, because I was wanting to use old issues of the Times and the Guardian for British history because I was looking at the miners’ strike. And they sent me this thing saying, “You might be interested in this Digital Humanities.” But it was really quite foreign to me. I was like, “That sounds really weird!” And now I get it and it’s really helpful.
Even if you were just able to take a DIGI paper as part of your degree, I think that would be really helpful.

I think that a large problem with Digital Humanities is translating what it means to people. Because I think the term in itself is not very clear. A lot of people point out that there is no ‘Digital Humanities’, that all of the Humanities should be integrating the digital. And I think the Voices Against War website is a good example of that. The project isn’t successful because of the digital element. It is successful because it is an interesting topic that there aren’t enough resources on and that has real-world applicability. The stories that make up the project are what makes it compelling – the digital element is just what makes it available to people.
Yeah! The stories are driving the project but the digital element provides the tool to share them.

So have you been converted to the digital?
In terms of public history and archiving – yeah, big time. Because we’ve had to use all those resources for Voices Against War, like Papers Past and things like that. They’re just amazing things. But I still don’t want a cell phone or a smart phone. I hope nobody makes me get one because it just stresses me out. So converted in some ways, other ways not so much.

So there you have it – Dan Bartlett, the Digital Humanist without a cell phone. We would like to thank Dan for his work in the office, but also for putting a smile on our faces every day. We know that he will be cherished at Ngāi Tahu and that he has a bright future in public history.

tēnā rawa atu koe!

Digital Humanities Infrastructure Workshop: Part Four

By | Seminars | No Comments

Today I finish my series on the Digital Humanities Infrastructure Workshops held in November last year, by discussing my own perspective on cyberinfrastructure. But before I do this, I thought that I should outline my background in Digital Humanities and my role at CEISMIC in order to put my thoughts in context.

I was introduced to the Digital Humanities in the third year of my English degree when I took the University’s first DH paper, Electronic Scholarly Editing. This paper was run by Prof. Paul Millar, with the help of Dr Christopher Thomson as a tutor. It aimed to critically examine digital texts and equip students with the skills to create their own, namely through the TEI (a set of guidelines which specify methods for encoding machine-readable texts ). Over the next two years, I worked on two projects digitising manuscripts using the TEI. The first is a collection of World War I letters from a member of the Canterbury Mounted Rifles (which you can view at, and the other a memoir in letter form from New Zealand doctor Stanley Aylward. As well as teaching me how to encode texts with the TEI, these projects opened my eyes to the opportunities that digitisation offers the Humanities, and the intensive work that goes into it. In both projects, the manuscripts required more complex and nuanced analysis than computers were capable of giving and had to be encoded by hand – a common requirement for many Digital Humanities projects.

My position at CEISMIC has further highlighted this requirement, as I work daily to gather, organise, and describe large quantities of earthquake-related data. CEISMIC’s focus has always been on social data, with an aim to collect as many stories and documents about the earthquakes possible before they are forgotten or lost. Today we have over 100,000 items in the archive – a fantastic achievement, but not an easy one since we described and annotated every item by hand. On average, our team estimates that it takes us six minutes to describe and geolocate a photograph, a number which doesn’t sound too bad until you extrapolate it over the 46,447 photographs we currently hold in QuakeStudies (adding up to 276,682 minutes, or 4645 hours, or 580 days). And that’s just the photographs – we have also archived hundreds of stories (such as with our QuakeBox project), academic research, community data (such as newsletters and artworks), newspapers, and much, much more.

Given my experiences, it would be easy for me to agree with Paul Arthur that investing in the digitisation (or in our case archiving) of social data may be the most valuable form of infrastructure for the Humanities. However, I would argue that this process is not possible without people with the skills and knowledge required.  Often I hear Humanists and people from the GLAM sector comment that they need more people who have skills both in the Humanities and the digital, and yet there are very few programmes in existence training people in both skills. As readers of this blog are likely aware, the University of Canterbury offers Digital Humanities courses at honours and masters level, and is offering a Digital Arts, Social Sciences and Humanities minor to undergraduate students for the first time next year. However, we are the only university in New Zealand that has a Digital Humanities programme, and the impetus for this came from within the College of Arts and the University. As of yet, there is no national strategy in New Zealand for the training of Digital Humanists.

Moreover if the Digital Humanities, as Alan Liu argues, is tasked with critiquing academic infrastructure and its relation to larger society, this critique needs to represent the diversity in society.  My problem with the current ‘lightly antifoundationalist’ model and ‘hacking’ is that it can only be achieved by Humanists that have digital skills. As I have already discussed, people with these skills are usually in the minority, but they also tend to come from certain groups in society – e.g men and people from high socio-economic backgrounds who have had access to computers from a young age. The problem with this is that these tools are potentially being created by one group in society, and any critique that they allow is potentially coming from one perspective. If we want our cyberinfrastructure to reflect the diverse needs and values of society, then I would argue that we need to ensure that a wide range of people are participating in the field.

Perhaps this naive, but if, as Liu claims, the shaping of academic infrastructure can have a bearing on other organisations and the community at large, then perhaps training more people in the Digital Humanities will have a factor too. I personally would love to see a world where the tech industry held equal numbers of women and men, where there was more ethnic diversity, and where the average Humanities student graduated with some technical nous. In some ways this could be seen as a form of infrastructure – training people with the skills and sensibilities to critique digital culture both in their work, but also in their wider environs. It’s my hope that doing so would widen the pool of ideas, revealing new and innovative solutions, and more nuanced critiques of infrastructure.

Digital Humanities Infrastructure Workshop: Part Three

By | Seminars, Visitors | No Comments

Digital Content Analyst Lucy-Jane Walsh, continues her discussion of the UCDH Cyberinfrastructure with a summary of Alan Liu’s talk:

Against the Cultural Singularity: Digital Humanities & Cultural Infrastructure Studies ­– Alan Liu

Alan Liu began his talk with a quote from Finnish architect Eliel Saarinen: “Always design a thing by considering it in its next larger context”.  With this in mind, he chose to focus on Digital Humanities cyberinfrastructure as a sub-domain of Humanities infrastructure, and to look at how Digital Humanities can support traditional Humanities fields.

Liu argued that the Digital Humanities has a tradition of critiquing infrastructure, which is not only unique to the field, but the best mechanism for supporting traditional modes of criticism. This is because infrastructure has the same impact on individuals and communities as culture – it makes up our environment and how we interact with each other. Liu used dystopian films as an example, pointing out that whole cultures in these films are dominated by the infrastructure that is available to them. In Blade Runner, for example, flying cars make up the environment, where as in Mad Max the world is driven by fuel. Today culture could be said to be shaped by smart phones, social networking, and big data. By critiquing these systems, Digital Humanists can add to the larger debates surrounding culture while remaining in the digital sphere.

According to Liu, the current style of Digital Humanities critique is “lightly anti-foundationalist”. He cited James Smithies, Michael Dieter, Bruno Latour, Ackbar Abbas, and David Theo Goldberg as examples of this, arguing that while Digital Humanists believe in the potential for known and trusted digital tools and methodologies to provide new insights in the field of humanities, they are also distrustful of them. This is evident in Digital Humanists’ tendencies to ‘hack’ – where hacking in this context means using the skills and tools one understands and has at hand rather than investing in more formal forms of infrastructure. To Liu ‘hacking’ gives the Digital Humanities a unique perspective: it allows the field to be efficient and flexible, and to get close enough to systems to understand their weaknesses without being vulnerable to them.

In order to move forward, Liu suggested that Digital Humanities should adopt what he calls ‘Critical Infrastructure Studies’, the formal study of academic infrastructure in its relation to larger society, which he sees as the Digital Humanities’ mode of cultural studies.  Liu suggested two approaches to Critical Infrastructure Studies: the Neoinstitutionalist approach to organizations in sociology, which explores how institutional structures and norms influence the decisions and actions of individuals in the institutions; or Social Constructionist (especially Adaptive Structuration) approaches to organizational infrastructure in sociology and information science, which would investigate how the interactions and connections between people can construct beliefs and understandings of the world, and how these interactions can affect our perceptions and use of particular technologies. Liu believes that these approaches would help Digital Humanists to create new academic programmes and roles, and to advocate for the creation of national collaborative infrastructures, opening up research data to wider audiences.

Revisiting the quote from the beginning, Liu suggested that the work that Digital Humanists put into shaping academic infrastructure will have a bearing on other organisations and the community at large. This is where Liu’s title for this talk – Against the cultural singularity – comes into focus, for he argues that the current neoliberal capitalist thinking is creating a ‘cultural singularity’. He defines this as an environment where all parts of cultural are capitalized and brought under a corporate framework.  Liu argues that society would be stronger if institutions adopted their own metrics of value and success, and used these metrics to make decisions about infrastructure. He believes that by critiquing infrastructure, Digital Humanists can resist the neoliberal model and offer alternatives.

Digital Humanities Infrastructure Workshop: Part Two

By | Uncategorized | No Comments

Digital Content Analyst Lucy-Jane Walsh, continues her discussion of the UCDH Cyberinfrastructure workshops in November 2015:

Last week I began the blog post series by summarising James Smithies’ talk on global systems analysis of Digital Humanities infrastructure. Today I plan to move swiftly onto Paul Arthur, who is Professor and Chair in Digital Humanities at Western Sydney University, and has been involved in conversations about the future of research infrastructure in Australia for many years.

Smart Infrastructure for Cultural and Social Research – Paul Arthur

Arthur began his talk by explaining that the Humanities were less engaged with infrastructure planning in the past and that the dominant conception of infrastructure was about facilities and machines. Today, people are beginning to think about infrastructure less as tools for particular disciplines and more as a complex problem which can be viewed from many different perspectives. This has enabled the Humanities to engage more in the discussions about infrastructure and to help develop national strategies in Australia.

One example of this is the 2011 Strategic Roadmap for Australian Research Infrastructure which was developed by the Australian government through extensive consultation with the research sector. The aim of the document was to identify the priorities for national, collaborative infrastructure planning and investment from 2011 to 2016. According to Arthur, the difference between the 2011 Strategic Roadmap ­and previous infrastructure planning was that it included a dedicated section for the humanities and the arts, it placed more value on data sharing and collaboration, and it took a more distributed approach to infrastructure planning and investment – creating infrastructure that multiple disciplines could tap into, rather than discipline-specific infrastructure. This plan was never fully implemented but is still used as a road map today.

One of the key debates generated by this road map is whether we should have one infrastructure for all researchers, or a collection of interlocking resources for multiple disciplines. The argument for having one central infrastructure is that many difference resources can cause silos of knowledge and skills. It can also be difficult to generating funding for more than one infrastructure, particularly in the Humanities, leading many governments to opt for a centralised infrastructure instead. Australia has attempted to create a model somewhere in between these two approaches with their online infrastructure project, Nectar. Short for the National eResearch Collaboration Tools and Resources Project, Nectar hosts virtual laboratories where researchers can share ideas and collaborate. Nectar also supports tools for individual projects, such as HuNi (Humanities Networked Infrastructure) which combines data from many Australia cultural websites. According to Arthur, the combination of broad and specific resources that Nectar provides has been a successful model for Australia.

To Arthur, humanities infrastructure is not just information systems and laboratories, but digitised texts such as newspaper articles, records, and stories. In this talk, he argued that Humanities researchers use texts, not machines, to build knowledge, experiment, and draw conclusions. Databases such as Paperspast or Trove, he argued, are successful because of their wealth of historic data, not the computers or information systems working behind the scenes. From this perspective, the challenge for Digital Humanists becomes less about advocating for computers and more about digitising and making available large collections of social and cultural data.

As the Deputy General Editor of the Australian Dictionary of Biography (ADB) from 2010 to 2013, Arthur has a strong interest in biography, which he believes is particularly suited to digital research. This is because biographies can be studied at both the micro and the macro levels – as isolated stories that shed light on individuals, or aggregated collections providing insights on much larger movements.  Much of this macro analysis is made possible by digitising collections of biography, as this offers researchers an overview of the data, better access to the collection, and the ability to analyse the data computationally. Once ADB was digitised, for example, it became clear that there were few stories about women and Aborigines, and that many vocations were missing – an observation that would have been difficult to come by when the many thousands of biographies were only in print.

Arthur discussed his experiences at ADB when they came to digitise the biographies. Previously, the edition process was analogue in nature:  on pen and paper with a lot of face to face communication between members of the team. Arthur’s attempts to map this workflow resulted in a confusion of circles and lines, revealing the complex nature of analogue processes. In contrast, digital workflows need to be fairly rigid to work, since computers and information systems struggle to match the complexity of human interaction. For volume 18, Arthur experimented with Windows Live (now known as One Drive) and created a folder for each person in the dictionary. Within this folder were the biography and a file for notes or any additional information. Each time the biography was edited, a new version was saved on the drive, ensuring that changes could be reverted and versions compared. Using this method, ADB was able to create their first digital volume.

Initially the digitised version of the ADB replicated the print version, with the stories laid out alphabetically and grouped in accordance with their subject’s time of influence or death. However, as Arthur pointed out, digital environments are not restricted by the linear structure of the printed form and can offer many different modes of storytelling. Today the entries in the ADB can be searched by name, gender, birth, death, ethnicity, religion, occupations, author name, and printed volume. The dictionary also offers a faceted browse which allows repeated filtering of the stories by a list of predefined categories. Much of this functionality has been enabled by the additional metadata that the ADB team has been adding to the stories. This metadata is intended to show the interconnections between stories in the dictionary – for example, where the subjects are friends, enemies, or family, or they have related religions, won similar awards, or attended the same events.

In addition to adding more metadata, the ADB have also made their data available to projects such as Trove and HuNi and each story has been linked to the corresponding obituary in the Obituaries Australia digital repository. Linking data in this way can unveil more information about individuals – for example when and where they died and who came to their funeral. Moreover, it provides humanities researchers with larger, more diverse collections of linked cultural data from which they can investigate larger questions about cultural and heritage. Unfortunately there are barriers to a larger international infrastructure of interconnected biographical data, with resources such as the Oxford Dictionary of National Biography behind a subscription wall. However, projects like HuNi have revealed that, in Australia at least, this aggregation is possible.

Arthur finished his talk by pointing out that while cultural data is extremely laborious to collection, once collected its value does not depreciate over time. This suggests to me that investing in the digitisation of texts, such as biographies and newspaper articles, may be more valuable in the long run to the Humanities than information systems and computers.

Walsh will continue her discussion on these workshops in the new year.

Digital Humanities Infrastructure Workshop: Part One

By | Seminars, Visitors | No Comments

Today we have a guest blogger, Lucy-Jane Walsh, Digital Content Analyst at the CEISMIC Programme, talking about her impressions of a recent seminar held by the UC Digital Humanities Programme:

A few weeks ago I attended an afternoon of short seminars about Digital Humanities cyberinfrastructure held by the Digital Humanities Programme at the University of Canterbury. Speakers included Dr James Smithies, Director of the UC Digital Humanities Programme and Co-Director of the UC CEISMIC Programme; Dr Alan Liu, Professor in the English Department at the University of California, Santa Barbara, and an affiliated faculty member of UCSB’s Media Arts & Technology graduate program; and Paul Arthur, Professor and Chair in Digital Humanities at Western Sydney University. The aim of the workshop was to begin an informal discussion on national and international Digital Humanities cyberinfrastructure – what tools and resources exist presently; how can we better leverage and improve them; and how can we advocate for their funding and development?

I must admit that I had not come across the notion of ‘cyberinfrastructure’ before this seminar series and I tend to associate the term ‘infrastructure’ with Engineering (buildings, roads, power lines). However the need for people, funding, computers, and software in the Humanities – particularly in regards to digital research and project development – is not news to me. As a Digital Content Analyst at the UC CEISMIC Programme, I not only rely on this infrastructure every day, but am also in the business of creating it. Over the next few weeks, I intend to summarise the points made by Smithies, Liu, and Arthur during the cyberinfrastructure workshop in a series of blog posts, before adding my own thoughts to the conversation. I begin with James Smithie’s talk today:

Towards a Global Systems Analysis of the Humanities – James Smithies

James Smithies was actually the last to speak at the event, but I felt that his talk was a good introduction to the topic of Digital Humanities cyberinfrastructure, so I have decided to reverse the order in my blog posts. His talk was drawn from the first chapter of his upcoming book, The Digital Modern: Humanities and new media for Palgrave Macmillan.

Smithies began the talk by discussing the politics of cyberinfrastructure. He identified Our Cultural Commonwealth – a report by the American Council of Learned Commission (ACLS) on Cyberinfrastructure for the Humanities and Social Sciences – as one of the initial attempts to charter opportunities for computationally intensive Humanities research. This report, like many early models for DH infrastructure, borrowed much of its mode of thinking from the STEM fields. It stated that, “computers should be used by scholars in the Humanities, just as microscopes should be used by scientists” (Our Cultural Commonwealth, 2006: i). In other words it is as important to invest in infrastructure in the Humanities as it is in Engineering, Maths, and Science.

Smithies argued that this STEM-based model caused tension in the Humanities, as many digital projects were given large amounts of money over more traditional projects. When these digital projects failed to deliver their promises, this infrastructure model began to generate criticism. Patrick Svensson, for example, argued that the allocation of space and the ability to collaborate with people in and outside the Humanities department is as important to Digital Humanists as computers and information systems. Feminists also called for more inclusive data models which would take into account gender and ethnic inequalities.  Susan Leigh Star argued that infrastructure should be evaluated in ethnological terms, in that it does not only represent tools or resources that we can use, but also the values and norms of the culture that created it. She argued that infrastructure is created to serve particular types of people and practices – in essence, infrastructure is political in nature and it is the task of Digital Humanists to challenge the preconceived notions of what infrastructure is and can be.

The problem with challenging the status quo is that the Digital Humanities community does not currently have a strong concept of what that is. Smithies suggested that the first step in analysing and critiquing Digital Humanities infrastructure would be to identify the cyberinfrastructure that already exists. He suggested using a systems analysis approach, borrowed from the STEM fields, to provide an initial overview of the current state of global cyberinfrastructure.

Smithies further argued that Humanists’ investigation of infrastructure should go right down to how the tools are made and whether they mirror Digital Humanities values such as openness and net neutrality. Eventually, he hopes that systems analysis will move from a model to a genre – a collection of approaches for analysing systems which reflect a multitude of values and perspectives.

Walsh will continue her discussion on these workshops next week.

2015 Kiwi Pycon

By | News | No Comments

A few weeks ago, CEISMIC Digital Content Analyst Lucy-Jane Walsh attended the first day of the Kiwi PyCon Conference. This year the conference was held at the University of Canterbury with Catalyst IT as a Platinum Sponsor. Lucy-Jane discusses her experiences below:

Being a little late to the game, I was only able to attend the first day of Kiwi PyCon, a day mostly consisting of sprints and tutorials, with the usual format of talks left to the Saturday and Sunday. This suited me well – as a bit of a Python fanatic, I was itching to sit down and write some code, to learn new tricks, and perfect my old ones.

To put things in context, I learnt to code with Python, transferring from Javascript after several muddled attempts. After Javascript, Python seemed like a dream: no clumps of brackets and semi-colons, no need to define the conditions of a loop. What I like about Python is that it emphasizes code readability – indentation is used instead of curly brackets, and English words instead of punctuation. As summarized by the principals laid out in PEP (The Zen of Python): ‘beautiful is better than ugly’ and ‘simple is better than complex’.

My favourite tutorial was with Yuriy Ackermann, system administrator at Mount Maunganui College and JS Game developer at SLSNZ. Ackermann taught us how to scrape the web using Python, leading us through a script he had written to gather information about games on the digital game store, Steam. He broke the problem into three key steps – connecting, parsing, and parallelising – explaining the reasons each step were necessary and the libraries and tools he used to do them. I have summarised each steps below:


Ackermann used the urllib library to handle urls in Python. Using url.request (in Python3), he showed us how to open a url and decode and read the contents. He also showed us a cool trick for convincing google that you are not a robot. This is necessary for sites like google who reject requests made from outside a browser (encouraging developers to use their API instead). One way to get around this is to place a ‘user-agent’ in the header of the request which reflects browser behaviour. The value for the User-Agent can be found in the develop tools in your browser when you load a url (under the network tab):

2015-09-10 11_57_01-monty python - Google Search


Once the html content has been retrieved, the next step is to parse it. This means finding the parts of a string (in this case a string of html) that we are interested in and organising them into a useful structure. Ackermann used BeautifulSoup4 for this step of the scraping – a Python library built for pulling data out of HTML and XML pages. In particular, the .find() and .find_all() methods are incredible useful, the first allowing you to retrieve the first instance of a tag, and the second retrieving every instance of a tag and storing it in a list.

Ackermann used both of these methods to create a function for parsing urls from Stream. This function takes a string, such as the html from the url of one of stream’s games, and finds the name, price, currency, tags, and rating for that game. He also added some error exception to deal with 404s, timeouts, and pages not have price, tags, and names, and to clean up the data.


Once we had a script that could parse data for one of Steam games, it was time to run it across all of the games. The simplest way of achieving this would be to write a for-loop, but this would require a lot of requests (around 100,000) and a lot of time (7 hours at 250ms per request). On top of this, most websites check logs and will ban IPs that make too many requests. Ackermann’s solution was to move to a parallel process.

This part was somewhat harder for me to understand, having never tried parallel computing myself, and unfortunately we ran out of time. Basically Ackermann set up a server and created a bunch of online virtual machines (VMs). He got the server to send unique urls from Steam to the VMs and set them to retrieving and parsing the information. The VMs would then send it back to his server through a post request. This allowed him to run 100s of requests at once, cutting the time from hours to minutes.

For a step by step guide to this tutorial, check out Ackerman’s slides.

A huge thanks to Catalyst for sponsoring this year’s Kiwi PyCon. I really enjoyed the tutorials and meeting all the other python fans and developers. I hope next year I can attend again and get to hear the talks this time.