Mar 142016
 

Back on 2nd December 2015 I attended a Digital Scholarship event arranged by Anouk Lang, lecturer in Digital Humanities at the University of Edinburgh.

The event ran in two parts: the first section enabled those interested in digital humanities to hear about events, training opportunities, and experiences of others, mainly those based within the College of Humanities and Social Sciences; the second half of the event involved short presentations and group discussions on practical needs and resources available. My colleague Lisa Otty and I had been asked to present at the second half of the day, sharing the range of services, skills and expertise EDINA offer for digital scholarship (do contact us if you’d like to know more), and were delighted to be able to attend the full half day event.

My notes were captured live, so all the usual caveats about typos, corrections, additions, etc. apply despite the delay in me setting this live. 

The event is opening, after a wee intro from Anouk Lang, with a review of various events and sessions around Digital Humanities, starting with those who had addtended the DHOxSS: Digital Humanities Summer School at Oxford in summer 2016.

Introduction to Digital Humanities – Rhona Alcock

Attended with bursary from College. Annual event for a range of interests and levels. Was an introductory strand on DH. Then another ~9 that were much more specialist. The introductory session was for those new to the field, or keen to broaden understanding. Gave an overview of other strands (some quite technical). Learnt a huge amount for that week. Came back and write up notes – 25 pages of them and I’ve referred back to them a huge amount. Main topics I benefited from was on planning DH research projects, on light weight usability testing (and QA). Great session on crowdsourcing and comining in your research area. Also some great sessions on knowledge exchange and engagement. More than anything what I had was the sense of connecting with others interested in DH and what they could get from it and do with it. And loads of new ideas for what I do, new audiences to take it to. Ractical advice to take it forward, tools etc. So, highly recommended.

David Oulton, College Web Team

I went to the summer school too. Found a big gap in the range of tools and resources (WordPress, Drupal, ESRI, ArcGIS, etc). There are so many tools that are just out there and can be used for all sorts of things, that can be downloaded, etc.

I’ve created a list on pbworks (dhresourcesforprojectbuilding.pbworks.com). And there are lots of resources listed there. I’m keen to encourage academics to just go and try these things, that can be set up quickly and easily.

Digital Medieval – Jeremy Piercy

Worked with new digital Bodleian interactive tools. Interactive scans of images that allow you to see hidden writing with a varieties of light – often can’t do on an artefact. Also learned a lot about ArcGIS, and how useful that can be. Also curation issues – many of our documents will eventually cease to be. This allows ongoing curation of those items. We have various high quality images and scans that will “always be accessible”. Portability of data is a key issue – not tying your work to a particular interface that may later fail/change/etc. Need to be able to move information around. I wasn’t the only person at that…

Gavin Willshaw, Digital Curator at the Library

I also went to Dig approaches to medieval and renaissance studies. Was quite specialist in many ways. Lots of hard work but learned a lot. I was also keen to understand how DH researchers work and what the library can do to support that with collections and tools. Was also some sessions on DIY digitisation – mini projects around doing that, managing that data. Tools such as Retro Mobile – ways to see what lies beneath images. Also some quite good introductory overviews of areas like TEI and IIIF for interoperability etc. Also several workshops to try stuff out directly. Really enjoyed seeing new stuff – e.g. hyperspectral imaging. Also a session on how museums can use wifi signals to track visitors movements and tailor what they do to that experience. And fed into discussions we’ve been having about Beacons and Blue Tooth in the library. From my point of view it was a really interesting mixture of tools and skills and experience. And I’m now looking for how the library can get more involved. Again, would highly recommend.

Humanities Data: Curation, Analysis, Access and Reuse – Rocio von Jungenfeld

I work in the data library, but am also finishing my PhD at ECA. I was looking at data tools and analysis tools. To compare what we do to what others do. And the recommendations from other places. Had very interesting speakers. Combination of HATHHI trust, and also OxII. They gave really practical advice on software out there, schemas, metadata, methodology, great insights to data tools and analysis. New tools to me, e.g. Gephi, and was very useful. Good experience overall – was a big Edinburgh contingent there, stuff done together. Interesting people and good lectures. Would recommend. My notes are available.

Harriet Cornell, Edinburgh Law School

I’m a post doc in HCA, and project officer for the political settlements programme. It was a complex workshop. Went for this one not introduction as I felt up to date. But this was at the sharp end in terms of the technical stuff. Galloped through technical stuff, would have liked more time on software. But reflecting afterwards it was great. Trying Open Refine – that I didn’t know about it – but also how we label and tag data and research. Really useful. Four things I took away:

  1. The capacity for DH projects – having that director of the HATHII Trust was great that that could happen
  2. Tools, particularly Open Refine – so useful. I know that Anouk and Anna have run workshops on this but it’s brilliant.
  3. Labelling and tagging. I do lots of blogging on WordPress and thinking about SEO, and just thinking about that in a different way was great.
  4. Design, Curation, Research and Longevity – thinking about the time and cost of planning and making things properly sustainable, after e.g. 10 years.

If I did it again I’d have done the introductory workshop. But this was great if you were happy to get down with Owl, ontologies, Python etc. I was tweeting from the session with the #dhoxss tag.

Linked Data for the Humanities – Anouk Lang, School of Literature, Languages and Culture

When you are a researcher your data is your baby. My lovingly curated research database with rich information about which historical figures were writing to whom, from which places, at which times. The problem is that if you want to share that data, your stuff is hard for others to use. The solution is ontologies and linked open data.

So what is an ontology? It is a structured way of understanding the context of an object – e.g. for the British Museum it might be where it was from, who acquired it and where and when, when it is from, where is it located, what is it made from. So we have linked data. Which we express as a “triple” – a subject, predicate, and object. So for a James Joyce letter then you have a lot of known individuals already – James Joyce is out there. And then locations wise there are lists of locations – you want an authority list (someone elses ontology of places). And then the predicate (e.g. when James Joyce was born) is also already available…

So, data is stored in a “triplestore” and you can query it using SPARQL, which lets you ask about name, place, location etc. There is a structure for SPARQL queries. That lets you query stuff within others databases.

So, if you are interested in using stuff in others’ databases or how to share your data with others, then you want to learn about Linked Open Data and SPARQL.

DHSO – Jim Mistiff

PhD student in English Lit. Went to the summer school and did the Linked Open Data and RDF course. Before PhD I was a Drupal developer, so had an interest from that so interesting to see it from another angle. I’ll be looking at specific application of LOD. Specifically for a text heavy usage for my own PhD. Not a perfect solution but an interesting starting point for a conversation.

Getting some definitions out of the way. LOD is a scary term for literary humanists. Any texts I’m using is a data object really. Breaking a long text up is a more useful way to think of it. Open Data helps you share data with others (and vice versa). The Linked part lets you interlink stuff. So if, say, a DB at Edinburgh (modernist data), and one at UVIC (who have modernist correspondence) you can link those together to form one complete set.

One of the results of doing LOD is that your dataset or database or a version of the internet that is easier for machines to read. Right now the internet is a set of dumb links – LOD allows text data objects to be more machine readable. I think it will be easier to explain that through an example.

So, I’m writing my PhD on Hugh MacDiamid. I’m interested in later stuff, particularly a poem called In Memoriam James Joyce. I’m interested in the construction. He wrote the poem through borrowing/plaguerising from other sources which are not credited. E.g. from an ad in the Writers and Artists Yearbook 1949. I’ve gone through that poem and found about 60% of the sources. That is a very interlinked text that maps nicely onto the idea of LOD. So, what I’d like to do is to datafy MacDiarmid. What I propose in terms of what Anouk covered is to take the text, word by word… and take the two lines from the poem, break them into triples… Can do it character by character… can automate… Can then apply Stanfords NLP to it… and then identify automatically when you’ve got a name of a real person in there, the name of a text… A very vague linked data model here… Bits in boxes (in his diagram) that are in other sources. So a mention of Pape – means Capt A.G. Pape and we could grab info from DBPedia. And then a text by that author could be pulled in/connected to. Etc. This makes assumption about what else is out there. But what it gives us is a rich version of the poem. We tend to read many texts in these ways – looking up definitions, references etc. Not suggesting change in core tasks, but using technology to enhance what we do. We could turn a LOD version of the poem into a hyperlinked version that pops up those obscure references etc. Much of what we already do, but in an automated way.

My suggestion expects and relies on other people doing parts of the work. What we can all do is get it closer to LOD than it is. There are five steps (see 5stardata.info by Tim Berners-Lee). Step one is get it online – e.g. a PDF on the web. The next step is to structure that data – a table rather than an image of a table for instance. Next step is open format – so that table in CSV rather than Exel. Next step is using RDF to point to things. Final step is LOD – the stuff of linking from proper names and quotations to other names and data stores. And then we have LOD.

DHSI, University of Victoria, Canada – Anouk Lang

This is premier DH summer school but costly to get to. Cheaper with student membership of computation in humanities(?).

I did three workshops – there for 3 weeks. I’m going to talk about Programing for Human|ist|s. My area uses R, Python etc. I did an intense week long Python course. So you start with a spec. For us we decided to construct a script that will visit a website and pull out certain bits of information relating to discussion posts (username, data of posting, content) and write those to a spreadsheet so they can be used for analysis. So, a web scraper.

Then you write Pseudocode. And that includes pulling in other people’s stuff – loads of Googling Stack Exchange for code and libraries.

So, Beautiful Soup does loads of web scraping. Then wanted it to write to a file. So then, you can build the code and run it. Then creates a file. Now when I did that I did pull out the appropriate text. It looks like a mess, but that’s a starting point. I spent the rest of the summer consolidating things, and doing some other things. So that included building something fun using a Markov Chain Generator – to feed text in, and produce lovely parodies. Can then use Python to automate Twitter posts. So we did a fun PatrickTwite Twitterbot (to go with a book launch).

Programming History Live – Anna Groundwater, History, Classics and Archeaology

I’m a historian. I’m here to talk about Programming History Live, which I attended in London in October. My first encounter was with the website. And that is a fantastic website (programminghistorian.org) – free, open, very enabling. It takes you through tutorials in lots of software you can use. Great range from Zotero, Antconc to do corpus analysis tool, to Python. Stuff on data cleaning, network analysis. Using Omeca to do online exhibits – will be used for a masters renaissance course. Comes from best DH ethos. I recommend W3C for LOD and Open Data. Also clear tutorial on SPARQL on Programming Historian. Also has web scraping tutorials.

My interest is network analysis. Martin During uses Palladio – free online network analysis tool which visualizes those networks. The site takes you through step by step. Gives you working examples with data from the website so that you do it yourself. Actually doing stuff is a great way to learn. Now recommended to several dissertation students who will use Palladio and then on Gephi. One thing to add is that these amazing exciting tools are compelling and exciting but there is theoretical underpinning that you need to understand if you use them. And Programming Historian also covers areas of that.

So, fantastic resources here (and Anouk also gave a lovely plug for EDINA’s geospatial tools and expertise, including my colleagues QGIS training).

Session on BL was led by James Baker (now at Sussex, was at BL). He’s also an SSI fellow – gives you £3k for DH work and profile there. I learned a lot on Antconc, on TEI, shell and widget for web scraping.

Antconc was with Anouk, and the tutorial written by Anouk and Heather Froehlich at Sterling. Antconc helps you analyse a corpus. It’s free to download to look at your corpus. So, for example a set of movie reviews (test data from Programming Historian) that has “shot” marked up across the reviews. I like this because it combines the patterns of big sets of data, with the ability to see the exact context. Combines distant and close reading concepts. And here “shot” is shown in its many definitions and uses. Concordance is seeing that word with surrounding words (and you can choose surrounding number of works). I’m interested in this because of cognitive geography of James I and VI using text analysis with Antconc. Some work already done on Agatha Christie using Atconc (in a paper on language and dementia by Ian Lancashire [PDF]).

Other useful resources: James Baker did a great reflective post on his blog, Cradled in Caricature. Also there are a range of people I recommend following on Twitter around digital scholarship, digital humanities, etc. including: @heatherfro, @williamjturkel, @adam_crymble, @ruthahnert, @melissaterras.

And that brought Section 1 to a close. Section 2 of this event – which I was presenting at – was collaboratively noted. See: https://docs.google.com/document/d/1U1CTcTi7hC-1gVq5HD762k2MCgSXaXDfPp5KsDNfOlk/