Jul 052016

This afternoon I’m at UCL for the “If you give a historian code: Adventures in Digital Humanities” seminar from Jean Bauer of Center for Digital Humanities at Princeton University, who is being hosted by Melissa Terras of the UCL Centre for Digital HumanitiesI’ll be liveblogging so, as usual, any corrections and additions are very much welcomed. 

Melissa is introducing Jean, who is in London en route to DH 2016 in Kraków next week. Over to Jean:

I’m delighted to be here with all of the wonderful work Melissa has been doing here. I’m going to talk a bit about how I got into digital humanities, but also about how scholars in library and information sciences, and scholars in other areas of the humanities might find these approaches useful.

So, this image (American Commissioners of the Preliminary Peace Negotiations with Great Britain. By Benjamin West, London, England; 1783 (begun). Oil on canvas. (Unframed) Height: 28 ½” (72.3 cm); Width: 36 ¼” (92.7 cm). 1957.856) is by Benjamin West, the Treaty of Paris, 1783. This is the era that I research and what I am interested in. In particular I am interested in John Adam, the first minister of the United States – he even gets one line in Hamilton: the musical. He’s really interested as he was very concerned with getting thinking and processes on paper. And on the work he did with Europe, where there hadn’t really been American foreign consuls before. And he was also working on areas of the North America, making changes that locked the British out of particular trading blocks through adjustments brought about by that peace treaty – and I might add that this is a weird time to give this talk in England!

Now, the foreign service at this time kind of lost contact once they reached Europe and left the US. So the correspondence is really important and useful to understand these changes. There are only 12 diplomats in Europe from 1775-1788, but that grows and grows with consuls and diplomats increasing steadily. And most of those consuls are unpaid as the US had no money to support them. When people talk about the diplomats of this time they tend to focus on future presidents etc. and I was interested in this much wider group of consuls and diplomats. So I had a dataset of letters, sent to John Jay, as he was negotiating the treaty. To use that I needed to put this into some sort of data structure – so, this is it. And this is essentially the world of 1820 as expressed in code. So we have locations, residences, assignments, letters, people, etc. Within that data structure we have letters – sent to or from individuals, to or from locations, they have dates assigned to them. And there are linkages here. Databases don’t handle fuzzy dates well, and I don’t want invalid dates, so I have a Boolean logic here. And also a process for handling enclosures – right now that’s letters but people did enclose books, shoes, statuettes – all sorts of things! And when you look at locations these connect to “in states” and states and location information… This data set occurs within the Napoleonic wars so none of the boundaries are stable in these times so the same location shifts in meaning/state depending on the date.

So, John Jay has all this correspondence between May 27 and Nov 19, 1794 and they are going from Europe to North America, and between the West Indies and North America. Many of these are reporting on trouble. The West Indies are ship siezures… And there are debts to Britain… And none of these issues get resolved in that treaty. Instread John Jay and Lord Granville set up a series of committees – and this is the historical precident for mediation. Which is why I was keen to understand what information John Jay had available. None of this correspondance got to him early enough in time. There wasn’t information there to resolve the issue, but enough to understand it. But there were delays for safety, for practical issues – the State Department was 6 people at this time – but the information was being collected in Philadephia. So you have a centre collecting data from across the continent, but not able to push it out quickly enough…

And if you look at the people in these letters you see John Jay, and you see Edmund Jennings Randolph mentions most regularly. So, I have this elaborate database (The Early American Foreign Service Database – EAFSD) and lots of ways to visualise this… Which enables us to see connections, linkages, and places where different comparisons highlight different areas of interest. And this is one of the reasons I got into the Humanities. There are all these papers – usually for famous historical men – and they get digitised, also the enclosures… In a single file(!), parsing that with a partial typescript, you start to see patterns. You see not summaries of information being shared, not aggregation and analysis, but the letters being bundled up and sent off – like a repeater note. So, building up all of this stuff… Letters are objects, they have relationships to each others, they move across space and time. You look at the papers of John Adams, or of any political leader, and they are just in order of date sent… Requiring us to flip back and forth. Databases and networks allow us to follow those conversations, to understand new orders to read those letters in.

Now, I had a background in code before I was a graduate student. What I do now at Princeton (as Associate Director of the Center for Digital Humanities) is to work with librarians and students to build new projects. We use a lot of relational databases, and network analysis… And that means a student like one I have at the moment can have a fully described, fully structured data set on a vagrant machine that she can engage with, query, analysis, and convey to her examiners etc. Now this student was an excel junky but approaching the data as a database allows us to structure the data, to think about information, the nature of sources and citation practices, and also to get major demographic data on her group and the things she’s working on.

Another thing we do at Princeton is to work with libraries and with catalogue data – thinking about data in MARC, MODS, or METS record, and thinking about the extract and reformatting of that data to query and rethink that data. And we work with librarians on information retrieval, and how that could be translated to research – book history perhaps. Princeton University library brought the personal library of philosopher Jaques Derrida – close to 19,000 volumes (thought it was about 15,000 until they were unpacked), so two projects are happening simultaneously. One is at the Centre for Digital Humanities, looking at how Derrida marked up the texts, and then went on to use and cite in Of Grammatology. The other is with BibFrame – a Linked Open Data standard for library catalogues, and they are looking at books sent to Derrida, with dedications to him. Now there won’t be much overlap of those projects just now – Of Grammatology was his first book so those dedicated/gifted books to him. But we are building our databases for both projects as Linked Open Data, all being added a book at a time, so the hope is that we’ll be able to look at any relationships between the books that he owned and the way that he was using and being gifted items. And this is an experiment to explore those connections, and to expose that via library catalogue… But the library wants to catalogue all works, not just those with research interest. And it can be hard to connect research work, with depth and challenge, back to the catalogue but that’s what we are trying to do. And we want to be able to encourage more use and access to the works, without the library having to stand behind the work or analyse the work of a particular scholar.

So, you can take a data structure like this, then set up your system with appropriate constraints and affordances that need to be thought about as they will shape what you can and will do with your data later on. Continents have particular locations, boundaries, shape files. But you can’t mark out the boundaries for empires and states. The Western boundary at this time is a very contested thing indeed. In my system states are merely groups of locations, so that I can follow mercantile power, and think from a political viewpoint. But I wanted a tool with broader use hence that other data. Locations seem very safe and neutral but they really are not, they are complex and disputed. Now for that reason I wanted this tool – Project Quincy – to have others using it, but that hasn’t happened yet… Because this was very much created for my research and research question…It’s my own little Mind Palace for my needs… But I have heard from a researcher looking to catalogue those letters, and that would be very useful. Systems like this can have interesting afterlives, even if they don’t have the uptake we want Open Source Digital Humanities tools to have. The biggest impact of this project has been that I have the schema online. Some people do use the American Foreign Correspondents databases – I am one of the few places you can find this information, especially about consuls. But that schema being shared online have been helping others to make their own system… In that sense the more open documentation we can do, the better all of our projects could be.

I also created those diagrams that you were seeing – with DAVILA, a programme that creates these allows you to create easy to read, easy to follow, annotated, colour coded visuals. They are prettier than most database diagrams. I hope that when documentation is appealing and more transparent,  that that will get used more… That additional step to help people understand what you’ve made available for them… And you can use documentation to help teach someone how to make a project. So when my student was creating her schema, it was an example I could share or reference. Having something more designed was very helpful.


Q1) Can you say more about the Derrida project and that holy grail of hanging that other stuff on the catalogue record?

A1) So the BibFrame schema is not as flexible as you’d like, it’s based on MARC, but it’s Linked Open Data, it can be expressed in RDF or JSON… And that lets us link records up. And we are working in the same library so we can link up on people, locations, maybe also major terms, and on th eaccession id number too. We haven’t tried it yet but…

Q1) And how do you make the distinction between authoritative record and other data.

A1) Jill Benson(?) team are creating authoritative linked open data records for all of the catalogue. And we are creating Linked Open Data, we’ll put it in a relational database with an API and an endpoint to query to generate that data. Once we have something we’ll look at offering a Triple Store on an ongoing basis. So, basically it is two independent data structures growing side by side with an awareness of each other. You can connect via API but we are also hoping for a demo of the Derrida library in BibFrame in the next year or two. At least a couple of the books there will be annotated, so you can see data from under the catalogue.

Q1) What about the commentary or research outputs from that…

A1) So, once we have our data, we’ll make a link to the catalogue and pull in from the researcher system. The link back to the catalogue is the harder bit.

Q2) I had a suggestion for a geographic system you might be interested in called Pelagios… And I don’t know if you could feed into that – it maps historical locations, fictional locations etc.

A2) There is a historical location atlas held by Newbury so there are shapefiles. Last I looked at Pelagios it was concerned more with the ancient world.

Comment) Latest iteration of funding takes it to Medieval and Arabic… It’s getting closer to your period.

A2) One thing that I really like about Pelagios is that they have split locations from their name, which accommodates multiple names, multiple imaginings and understandings etc. It’s a really neat data model. My model is more of a hack together – so in mine “London” is at the centre of modern London… Doesn’t make much sense for London but I do similar for Paris, that probably makes more sense. So you could go in deeper… There was a time when I was really interested in where all of Jay’s London Correspondents were… That was what put me into thinking about networking analysis… 60 letters are within London alone. I thought about disambiguating it more… But I was more interested in the people. So I went down a Royal Mail in London 1794 rabbit hole… And that was interesting, thinking about letters as a unit of information… Diplomatic notes fix conversations into a piece of paper you can refer to later – capturing the information and decisions. They go back and forth… So the ways letters came and went across London – sometimes several per day, sometimes over a week within the city…. is really interesting… London was and is extremely complicated.

Q3) I was going to ask about different letters. Those letters in London sound more like memos than a letter. But the others being sent are more precarious, at more time delay… My background is classics so there you tend to see a single letter – and you’d commission someone like Cicero to write a letter to you to stick up somewhere – but these letters are part of a conversation… So what is the difference in these transatlantic letters?

A3) There are lots of letters. I treat letters capaciously… If there is a “to” or “from” it’s in. So there are diplomatic notes between John Jay and George Hammond – a minister not an ambassadors as the US didn’t warrant that. Hammond was bad at his job – he saw a war coming and therefore didn’t see value in negotiating. They exchange notes, forward conversations back and forth. My data set for my research was all the letters sent to Jay, not those sent by Jay. I wanted to see what information Jay had available. With Hammond he kept a copy of all his letters to Jay, as evidence for very petty disputes. The letters from the West Indies were from Nathanial Cabbot Dickinson, who was sent as an information collector for the US government. Jay was sent to Europe on the treaty…. So the kick off for Jay’s treaty is changes that sees food supplies to British West Indies being stopped. Hammond actually couldn’t find a ship to take evidence against admiralty courts… They had to go through Philadelphia, then through London. So that cluster of letters include older letters. Letters from the coast include complaints from Angry American consuls…. There are urgent cries for help from the US. There is every possible genre… One of the things I love about American history is that Jay needs all the information he can get. When you map letters – like the Republic of Letters project at Stanford – you have this issue of someone writing to their tailor, not just important political texts. But for diplomats all information matters… Now you could say that a letter to a tailor is important but you could also say you are looking to map the boundaries of intellectual history here… Now in my system I map duplicates sent transatlantically, as those really matter, not all arrived, etc. I don’t map duplicates within London, as that isn’t as notable and is more about after the fact archiving.

Q4) Did John Jay keep diaries that put this correspondance in context?

A4) He did keep diaries… I do have analysis of how John Quincy Adams wrote letters in his time. He created subject headings, he analysed them, he recreated a filing system and way of managing his letters – he’d docket his letters, noting date received. He was like a human database… Hence naming my database after him.

Q5) There are a couple of different types of a tool like this. There is your use and then there is reuse of the engineering. I have correspondance earlier than Jay’s, mainly centred on London… Could I download the system and input my own letters?

A5) Yes, if you go to eafsd.org you’ll find more information there and you can try out the system. The database is Project Quincy and that’s on GitHub (GPL 3.0) and you can fire it up in Django. It comes with a nice interface. And do get in touch and I’ll update you on the system etc. It runs in the Django framework, can use any database underneath it. And there may be a smaller tractable letter database running underneath it.

Comment) On BibFrame… We have a Library and Information Studies programme which we teach BibFrame as part of that. We set up a project with a teaching tool which is also on GitHub – its linked from my staff page.

A quick note as follow up:

If you have research software that you have created for your work, and which you are making available under open source license, then I would recommend looking at some of the dedicated metajournals that will help you raise awareness of your project and ensure it is well documented for others to reuse. I would particularly recommend the Journal of Open Research Software (which, for full disclosure, I sit on the Editorial Advisory Board for), or the Journal of Open Source Software (as recommended by the lovely Daniel S. Katz in response to my post).


Feb 262016

Today I am at the British Library (BL) Labs Roadshow 2016 event in Edinburgh. I’m liveblogging so, as usual, all comments, corrections and additions are very much welcomed.

Introduction – Dr Beatrice Alex, Research Fellow at the School of Informatics, University of Edinburgh

I am delighted to welcome the team from the British Library Labs today, this is one of their roadshows. And today we have a liveblogger (thats me) and we are encouraging you to tweet to the hashtag #bldigital.

Doing digital research at the British Library – Nora McGregor, Digital Curator at the British Library

Nora is starting with a brief video on the British Library – to a wonderful soundtrack, made from the collections by DJ Yoda. If you read 5 items a day it would take you 80,000 years to get through the collections. One of the oldest things we have in the collection are oracle bones – 3000 years old. Some of the newest items are the UK Web Archive – contemporaneous websites.

Today we are here to talk about the digital Research Team. We support the curation and use of the BL’s Digital collections. And Ben and Mahendra, talking today, are part of our Carnegie Funded digital research labs.

We help researchers by working with those operating at the intersection of academic research, cultural heritage and technology to support new ways of exploring adn accessing the BL collections. This is through getting content into digital forms, supporting skills development, including the skills of BL staff.

In terms of getting digital content online we curate collections to be digitised and catalogued. Within digitisation projects we now have a digital curation role dedicated to that project, who can support scholars to get the most out of these projects. For instance we have a Hebrew Manuscripts digitisation project – with over 3000 manuscripts spanning 1000 years digitised. That collection includes rare scrolls and our curator for this project, Adi, has also done things like creating 3D models of artefacts like those scrolls. So these curators really ensure scholars get the most from digitised materials.

You can find this and all of our digitisation projects on our website: http://bl.uk/subjects/digital-scholarship where you can find out about all of our curators and get in touch with them.

We are also supporting different departments to get paper based catalogues into digital form. So we had a project called Collect e-Card. You won’t find this on our website but our cards, which include some in, for instance, Chinese scripts or urdu, are being crowd sourced so that we can make materials more accessible. Do take a look: http://libcrowds.com/project/urducardcatalogue_d1.

One of the things we initially set up for our staff as a two year programme was a Digital Research Support and Guidance programme. That kicked off in 2012 and we’ve created 19 bespoke one-day courses for staff covering the basics of Digital Scholarship which is delivered on a rolling basis. So far we have delivered 88 courses to nearly 400 staff members. Those courses mean that staff understand the implications of requests for images at specific qualities, to understand text mining requests and questions, etc.

These courses are intended to build capacity. The materials from these courses are also available online for scholars. And we are also here to help if you want to email a question we will be happy to point you in the right direction.

So, in terms of the value of these courses… A curator came to a course on cleaning up data and she went on to get a grant of over £70k for Big Data History of Music – a project with Royal Holloway to undertake analysis as a proof of concept around patters in the history of music – trends in printing for instance.

We also have events, competitions and awards. One of these is “Off the Map”, a very cool endeavour, now in its fourth year. I’m going to show you a video on The Wondering Lands of Alice, our most recent winner. We digitise materials for this competition, teams compete to build video games and actually this one is actually in our current Alice in Wonderland exhibition. This uses digitised content from our collection and you can see the calibre of these is very high.

There is a new competition open now. The new one is for any kind of digital media based on our digital collections. So do take a look of this.

So, if you want to get in touch with us you can find us at http://bl.uk/digital or tweet #bldigital.

British Library Labs – Mahendra Mahey, Project Manager of British Library Labs.

You can find my slides online (link to follow).

I manage a project called British Library Labs, based in the Digital Research team, who we work closely with. What we are trying to do is to get researchers, artists, entrepreneurs, educators, and anyone really to experiment with our digital collections. We are especially interested in people finding new things from our collections, especially things that would be very difficult to do with our physical collections.

What I thought I’d do, and the space the project occupies, is to show you some work from a researcher called Adam Crymble, Kings College London (a video called Big Data + Old History). Adam entered a competition to explain his research in visual/comic book format (we are now watching the video which talks about using digital texts for distant reading and computational approaches to selecting relevant material; to quantify the importance of key factors).

Other kinds of examples of the kinds of methods we hope researchers will use with our data span text mining, georeferencing, as well are creative reuses.

Just to give you a sense of our scale… The British Library says we are the world’s largest library by number of items. 180 million (or so) items, with only about 1-2% digitised. Now new acquisitions do increasingly come in digital form, including the UK Web Archive, but it is still a small proportion of the whole.

What we are hoping to do with our digital scholarship site is to launch data.bl.uk (soon) where you can directly access data. But as I did last year I have also brought a network drive so you can access some of our data today. We have some challenges around sharing data, we sometimes literally have to shift hard drives… But soon there will be a platform for downloading some of this.

So, imagine 20 years from now… I saw a presentation on technology and how we use “digital”… Well we wont use “digital” in front of scholarship or humanities, it will just be part of the mainstream methodologies.

But back to the present… The reason I am here is to engage people like you, to encourage you to use our stuff, our content. One way to do this is through our BL Labs Competition, the deadline for which is 11th April 2016. And, to get you thinking, the best idea pitched to me during the coffee break gets a goodie bag – you have 30 seconds in that break!

Once ideas are (formally) submitted to the BL there will be 2 finalists announced in late May 2016. They then get a residency with some financial (up to £3600) and technical and curational support from June to October 2016. And a winner is then announced later in the year.

We also have the BL Labs Awards. This is for work already done with our content in interesting and innovative ways. You can submit projects – previous and new – by 5th September 2016. We have four categories: Artistic; Commercial; Research; and Learning/Teaching. Those categories reflect the increasingly diverse range of those engaging with our content. Winners are announced at a symposium on 7th November 2016 when prizes are given out!

So today is all about projects and ideas. Today is really the start of the conversation. What we have learned so far is that the kinds of ideas that people have will change quite radically once you try and access, examine and use the data. You can really tell the difference between someone who has tried to use the data and someone who has not when you look at their ideas/competition entries. So, do look at our data, do talk to us about your ideas. Aside from those competitions and awards we also collaborate in projects so we want to listen to you, to work with you on ideas, to help you with your work (capacity permitting – we are a small team).

Why are we doing this? We want to understand who wants to use our material, and more importantly why. We will try and give some examples to inspire you, to give you an idea of what we are doing. You will see some information on your seat (sorry blog followers, I only have the paper copy to hand) with more examples. We really want to learn how to support digital experiments better, what we can do, how we can enable your work. I would say the number one lesson we have learned – not new but important – is that it’s ok to make mistakes and to learn from these (cue a Jimmy Wales Fail Faster video).

So, I’m going to talk about the competition. One of our two finalists last year was Adam Crymble – the same one whose PhD project was highlighted earlier – and he’s now a lecturer in Digital History. He wanted to crowdsource tagging of historical images through Crowdsource Arcade – harnessing the appeal of 80s video games to improve the metadata and usefulness fo historical images. So we needed to find an arcade machine, and then set up games on it – like Tag Attack – created by collaborators across the world. Tag Attack used a fox character trotting out images which you had to tag to one of four categories before he left the screen.

I also want to talk about our Awards last year. Our Artistic award winner last year was Mario Klingeman – Quasimondo. He found images of 44 men who Look 44 with Flickr images – a bit of code he wrote for his birthday! He found Tragic Looking Women etc. All of these done computationally.

In Commercial our entrant used images to cross stitch ties that she sold on Etsy

The winner last year, from the Research category was Spatial Humanities in Lancaster looking for disease patterns and mapping those.

And we had a Special Jury prize was for James Heald who did tremendous work with Flickr images from the BL, making them more available on Wikimedia, particularly map data.

Finally, loads of other projects I could show… One of my favourites is a former Pixar animator who developed some software to animate some of our images (The British Library Art Project).

So, some lessons we have learned is that there is huge appetite to use BL digital content and data (see Flickr Commons stats later). And we are a route to finding that content – someone called us a “human API for the BL content”!

We want to make sure you get the most from our collections, we want to help your projects… So get in touch.

And now I just want to introduce Katrina Navickas who will talk about her project.

Political Meetings Mapper – Katrina Navickas

I am part of the Digital History Research Centre at the University of Hertfordshire. My focus right now is on Chartism, the big movement in the 19th Century campaigning for the vote. I am especially interested in the meetings they held, where and when they met and gathered.

The Chartists held big public meetings, but also weekly local meetings advertised in the press and local press. The BL holds huge amounts of those newspapers. So my challenge was to find out more about those meetings – how many there were advertised in the Northern Star newspaper from 1838 to 1850. The data is well structured for this… Now that may seem like a simple computational challenge but I come from a traditional research background, used to doing things by hand. I wanted to do this more automatically, at a much larger scale than previously possible. My mission was to find out how many meetings there were, where they were held, and how we could find those meetings automatically in the newspapers. We also wanted to make connections between papers, georeferenced historical maps, and also any that appear in playbills as some meetings were in theatres (though most were in pubs).

But this wasn’t that simple to do… Just finding the right files is tricky. The XML is some years old so is quite poor really. The OCR was quite inaccurate, hard to search. And we needed to find maps from the right period.

So, the first stage was to redo the OCR of the original image files. Initially we thought we’d need to do what Bob Nicholson did with Historic Jokes, which was getting volunteers to re-do them. But actually newer OCR software (Abbyy Finereader 12) did a much better job and we just needed a volunteer student to check the text – mainly about punctuation not spelling. Then we needed to geo-code places using a gazeteer. And then we needed to use a Python code with regular expressions to extract dates and using some basic NLP to calculate the dates of words like “tomorrow” – easier as the paper always came out on a Saturday.

So, in terms of building a historical gazeteer. We extracted place names run through: http://sandbox.idre.ucla.edu/tools/geocoder. Ran through with parameters of Lat and Long to check locations. But we still needed to do some geocoding by hand. The areas we were looking at has changed a lot through slum clearances. We needed to therefore geolocate some of the historical places, using detailed 1840s georeferenced maps of Manchester, and geocoding those.

In the end, in the scale of this project, we looked at only 1841-1844. From that we extracted 5519 meetings (and counting) – and identifying text and dates. And that coverage spanned 462 towns and villages (and counting). In that data we found 200+ lecture tours – Chartist lecturers were paid to go on tours.

So, you can find all of our work so far here: http://politicalmeetingsmapper.co.uk. The website is still a bit rough and ready, and we’d love feedback. It’s built on the Umeeka (?) platform – designed for showing collections – which also means we have some limitations but it does what we wanted to.

Our historical maps are with thanks to the NLS whose brilliant historical mapping tiles – albeit from a slightly later map – were easier to use than the BL georeferenced map when it came to plot our data.

Interestingly, although this was a Manchester paper, we were able to see meeting locations in London – which let us compare to Charles Booth’s poverty maps. Also to do some heatmapping of that data. Basically we are experimenting with this data… Some of this stuff is totally new to me, including trialling a Machine Learning approach to understand the texts of a meeting advertisement – using an IPython Notebook to make a classifer to try to identify meeting texts.

So, what next? Well we want to refine our NLP parsing for more dates and other data. And I also want to connect “forthcoming meetings” to reports from the same meeting in the next issue of the paper. Also we need to do more machine learning to identify columns and types of texts in the unreconstructed XML of the newspapers in the BL Digital Collections.

Now that’s one side of our work, but we also did some creative engagement around this too. We got dressed up in Victorian costume, building on our London data analysis and did a walking tour of meetings ending in recreating a Chartist meeting in a London Pub.


Q1) I’m looking at Data mining for my own research. I was wondering how much coding you knew before this project – and after?

A1) My training had only been in GIS, and I’d done a little introduction to coding but I basically spent the summer learning how to do this Python coding. Having a clear project gave me the focus and opportunity to do that. I still don’t consider myself a Digital Historian I guess but I’d getting there. So, no matter whether you have any coding skills already don’t be scared, do enter the competition – you get help, support, and pointed in the right direction to learn the skills you need to.

Farces and Failures: an overview projects that have used British Library’s Digital Content and data – Ben O’Steen, Technical Lead of British Library Labs.

My title isn’t because our work is farce and failure… It’s intentionally to reference the idea that it can be really important early in the process to ensure we have a shared understanding of terminology as that can cause all manner of confusion. The names and labels we choose shape the questions that people will ask and the assumptions we make. For instance “Labs” might make you imagine test tubes… or puppies… In fact we are based in the BL building in St Pancras, in offices, with curators.

Our main purpose is to make the collections available to you, to help you find the paths to walk through, where to go, what you can find, where to look. We work with researchers on their specific problems, and although that work is specific we are also trying to assess how widely this problem is felt. Much of our work is to feed back to the library what researchers really want and need to do their work.

There is also this notion that people tell us things that they think we need to hear in order to help them. As if you need secret passwords to access the content, people can see us as gatekeepers. But that isn’t how BL Labs work. We are trying to develop things that avoid the expected model of scholarship – of coming in, getting one thing, and leaving. That’s not what we see. We see scholars looking at 10,000 things to work with. People ask us “Give me all of collection X” but is that useful? Collections are often collected that way, named that way for adminstrative reasons – the naming associated with a particular digitisation funder, or from a collection. So the Dead Sea Scrolls are scanned in a music collection because the settings were the same for digitising them… That means the “collection” isn’t always that helpful.

So farce… If we think Fork handles/4 Candles…

We have some common farce-inducing words:

  • Collection (see above)
  • Access – but that has different meanings, sometimes “access” is “on-site” and without download, etc. Access has many meanings.
  • Content – we have so much, that isn’t a useful term. We have personal archives, computers, archives, UK Web domain trawl, pictures of manuscripts, OCR, derived data. Content can be anything. We have to be specific.
  • Metadata – one persons metadata is anothers data. Not helpful except in a very defined context.
  • Crowdsourced – means different things to different people. You must understand how the data was collected – what was the community, how did they do it, what was the QA process. That applies to any collaborative research data collection, not just crowdsourcing.

An example of complex provenence…

Microsoft Books digitisation project. It started in 2007 but stopped in 2009 when the MS Book search project was cancelled. This digitised 49K works (~65k volumes). It has been online since 2012 via a “standard” page turning interface ut we have very low usage statistics. That collection is quite random, items were picked shelf by shelf with books missing. People do data analysis of those works and draw conclusions that don’t make sense if you don’t understand that provenance.

So we had a competition entry in 2013 that wanted to analyse that collection… But actually led to a project called the Sample Generator by Pieter Francois. This compared physical to digital collections to highlight the issues of how unrepresentative that sample is for drawing any conclusions.

Allen B Riddell looked at the HathiTrust corpus called “Where are the novels?” in 2012 which similarly looked at the bias in digitised resources.

We have really big gaps in our knowledge. In fact librarians may recognise the square brackets of the soul… The data in records that isn’t actually confirmed, inferred information within metadata. If you look at the Microsoft Books project it’s about half inferred information. A lot of the Sample Generator peaks of what has been digitised is because of inferred year of publication based on content – guesswork rather than reliable dates.

But we can use this data. So Bob Nicholson’s competition entry on Victorian Jokes led to the Mechanical Comedian Twitter account. We didn’t have a good way into these texts, we had to improvise around these ideas. And we did find some good jokes… If you search for “My Mother in-law” and “Victorian Humour” you’ll see a great video for this.

That project looked for patterns of words. That’s the same technique applied to Political Meetings Mapper.

So “Access” again… These newspapers were accessible but we didn’t have access to them… Keyword search fails miserable and bulk access is an issue. But that issue is useful to know about. Research and genealogical needs are different and these papers were digitised partly for those more lucrative genealogical needs to browse and search.

There are over 600 digital archive, we can only spend so long characterising each of them. Microsoft Books digitisation project was public domain so that let us experiment richly quickly. We identified images of people, we found image details. we started to post images to Twitter and Tumblr (via Mechanical Curator)… There was demand and we weren’t set up to deliver those so we used Flickr Commons – 1 TB for free – with the limited awareness of what page an image was from, what region. We had minimal metadata but others started tagging and adding to our knowledge. Nora did a great job of collating these images that had been started to be tagged (by people and machines). And usage of images has been huge. 13-20 million hits on average every month, over 330 M hits to date.

Is this Iterative Crowdsourcing (Mia Ridge)? We crowdsource broad facts and subcollections of related items will emerge. There is no one size fits all, has to be project based. We start with no knowledge but build from there. But these have to be purposefully contextless. Presenting them on Flickr removed the illustrations context. The sheer amount of data is huge. David Foster Wallace has a great comment that “if your fidelity to perfectionism is too high, you never do anything”. We have a fear of imperfection in all universities, and we need to have the space to experiment. We can re-represent content in new forms, it might work, it might not. Metaphors don’t translate between media – like turning pages on a screen, or scrolling a book forever.

With our map collection we ran a tagathon and found nearly 30,000 maps. 10,000 were tagged by hand, 20,000 were found by machine. We have that nice combination of human and machine. We are now trying to georeference our maps and you can help with that.

But it’s not just research… We encourage people to do new things – make colouring books for kids, make collages – like David Normal’s Burning Man installation (also shown at St Pancras). That stuff is part of playing around.

Now, I’ve talked about “Crowd sourcing” several times. There can be lots of bad assumptions of that term. It’s assumed to be about a crowd of people all doing a small thing, about special software, that if you build it they will come, its easy, its cheap, it’s totally untrustworthy… These aren’t right. It’s about being part of a community, not just using it. When you looka at Zooniverse data you see a common pattern – that 1-2% of your community will do the majority of the work. You have to nurture the expert group within your community. This means you can crowdsource starting with that expert group – something we are also doing in a variety of those groups. You have to take care of all your participants but that core crowd really matter.

So, for crowdsourcing you don’t need special software. If you build something they don’t neccassarily come, they often don’t. And something we like to flag up is the idea of playing games, trying the unusual… Can we avoid keyboard and mouse? That arcade game does that, it asks that idea of whether we can make use of casual interaction to get useful data. That experiment is based on a raspberry pi and loads of great ideas from others using our collections. They are about the game dynamic… How we deal with data – how to understand how the game dynamics impact on the information you can extract.

So, in summary…

Don’t be scared of using words like “collection” and “access” with us… But understand that there will be a dialogue… that helps avoid disappointment, helps avoid misunderstanding or wasted time. We want to be clear and make sure we are all on the same page early on. I’m there to be your technical guide and lead on a project. There is space to experiment, to not be scared to fail and learn from that failure when it happens. We are there to have fun, to experiment.

Questions & Discussion

Q1) I’m a historian at the National Library of Scotland. You talked about that Microsoft Books project and the randomness of that collection. Then you talked about the Flickr metadata – isn’t that the same issue… Is that suitable for data mining? What do you do with that metadata?

A1) A good point. Part of what we have talked about is that those images just tell you about part of one page in a book. The mapping data is one of the ways we can get started on that. So if we geotag an image or a map with Aberdeen then you can perhaps find that book via that additional metadata, even if Aberdeen would not be part of the catalogue record, the title etc. There are big data approaches we can take but there is work on OCR etc. that we can do.

Q2) A question for Ben about Tweeting – the Mechanical Curator and the Mechanical Comedian. For the Curator… They come out some regularly… How are they generated?

A2) That is mechanical… There are about 1200 lines of code that roams the collection looking for similar stuff… The text is generated from books metadata… It is looking at data on the harddrive – access to everything so quite random. If no match it finds another random image.

Q2) And the mechnical comedian?

A2) That is run by Bob. The jokes are mechanically harvested, but he adds the images. He does that himself – with a bit of curation in terms of the badness of jokes – and adds images with help of a keen volunteer.

Q3) I work at the National Library of Scotland. You said to have fun and experiment. What is your response to the news of job cuts at Trove, at the National Library of Australia.

A3 – Ben) Trove is a leader in this space and I know a lot of people are increadibly upset about that.

A3 – Nora) The thing with digital collections is that they are global. Our own curators love Trove and I know there is a Facebook group to support Trove so, who knows, perhaps that global response might lead to a reversal?

Mahendra: I just wanted to say again that learning about the stories and provenance of a collection is so important. Talking about the back stories of collections. Sometimes the reasons content are not made available have nothing to do with legality… Those personal connections are so importan.

Q4) I’m interested in your use of the IPython Notebook. You are using that to access content on BL servers and website? So you didn’t have to download lots of data? Is that right?

A4) I mainly use it as a communication tool between myself and Ben… I type ideas into the notebook, Ben helps me turn that into code… It seemed the best tool to do that.

Q4) That’s very interesting… The Human API in action! As a researcher is that how it should be?

A4) I think be. As a researcher I’m not really a coder. For learning these spaces are great, they act as a sandbox.

Q4) And your code was written for your project, should that be shared with others?

A4) All the code is on a GitHub page. It isn’t perfect. That extract, code, geocode idea would be applicable to many other projects.

Mahendra: There is a balance that we work with. There are projects that are fantastic partnerships of domain experts working with technical experts wanting problems to solve. But we also see domain experts wanting to develop technical skills for their projects. We’ve seen both. Not sure of the answer… We did an event at Oxford who do a critical coding course where they team humanities and computer scientists… It gives computer scientists experience of really insanely difficult problems, the academics get experience of framing questions in precise ways…

Ben: And by understanding coding and

Comment (me): I just wanted to encourage anyone creating research software to consider submitting papers on that to the Journal of Open Research Software, a metajournal for sharing and finding software specifically created for research.

Q5) It seemed like the Political Meetings Mapper and the Palimpsest project had similar goals, so I wondered why they selected different workflows.

A5 – Bea Alex) The project came about because I spoke to Miranda Anderson who had the idea at the Digital Scholarship Day of Ideas. At that time we were geocoding historical trading documents and we chatted about automating that idea of georeferencing texts. That is how that project came about… There was a large manual aspect as well as the automated aspects. But the idea was to reduce that manual effort.

A5 – Katrina) Our project was so much smaller team. This is very much a pilot project to meet a particular research issue. The outcomes may seem similar but we worked on a smaller scale, seeing what one researcher could do. As a traditional academic historian I don’t usually work in groups, let alone big teams. I know other projects work at larger scale though – like Ian Gregory’s Lakes project.

A5 – Mahendra) Time was a really important aspect in decisions we took in Katrina’s project, and of focusing the scope of that work.

A5 – Katrina) Absolutely. It was about what could be done in a limited time.

A5 – Bea) One of the aspects from our work is that we sourced data from many collections, and the structure could be different for each mention. Whereas there is probably a more consistent structure because of the single newspaper used in Katrina’s project, which lends itself better to a regular expressions approach.

And next we moved to coffee and networking. We return at 3.30 for more excellent presentations (details below). 

BL Labs Awards: Research runner up project: “Palimpsest: Telling Edinburgh’s Stories with Maps” – Professor James Loxley, Palimpsest, University of Edinburgh

I am going to talk about project which I led in collaboration with colleagues in English Literature, with INformatics here, with visualisation experts at St Andrews, and with EDINA.

The idea came from Miranda Anderson, in 2012, who wanted to explore how people imagine Edinburgh in a literary sense, how the place is imagined and described. And one of the reasons for being interested in doing this is the fact that Edinburgh was the world’s first UNESCO City of Literature. The City of Literature Trust in Edinburgh is also keen to promote that rich literary heritage.

We received funding from the AHRC from January 2014 to March 2015. And the name came from the concept of the Palimpsest, the text that is rewritten and erased and layered upon – and of the city as a Palimpsest, changing and layering over time. The original website was to have the same name but as that wasn’t quite as accessible, we called that LitLong in the end.

We had some key aims for this project. There are particular ways literature is packaged for tourists etc. We weren’t interested in where authors were born or died. Or the authors that live here. What we were interested in was how the city is imagined in the work of authors, from Robert Louis Stevenson to Muriel Spark or Irvine Welsh.

And we wanted to do that in a different way. Our initial pilot in 2012 was all done manually. We had to extract locations from texts. We had a very small data set and it offfered us things we already knew – relying on well known Edinburgh books, working with the familiar. The kind of map produced there told us what we already knew. And we wanted to do something new. And this is where we realised that the digital methods we weree thinking about really gave us an opportunity to think of the literary cityscape in a different mode.

So, we planned to textmine large collections of digital text to identify narrative works set in Edinburgh. We weren’t constrained to novels, we included short stories, memoirs… Imaginative narrative writing. We excluded poetry as that was too difficult a processing challenge for the scale of the project. And we were very lucky to have the support and access to British library works, as well as material from the HathiTrust, and the National Library of Scotland. We mainly worked with out of copyright works. But we did specifically get permission from some publishers for in-copyright works. Not all publishers were forthcoming, and happy for work to be text mined. We were text mining work – not making them freely available – but for some publishers full text for text mining wasn’t possible.

So we had large collections of works, mainly but not exclusively out of copyright. And we set about textmining those collections to find those set in Edinburgh. And then we georeferenced the Edinburgh placenmmaes in those works to make mapping possible. And then finally we created visualisations offering different viewpoints into the data.

The best way to talk about this is to refer to text from our website:

Our aim in creating LitLong was to find out what the topography of a literary city such as Edinburgh would look like if we allowed digital reading to work on a very large body of texts. Edinburgh has a justly well-known literary history, cumulatively curated down the years by its many writers and readers. This history is visible in books, maps, walking tours and the city’s many literary sites and sights. But might there be other voices to hear in the chorus? Other, less familiar stories? By letting the computer do the reading, we’ve tried to set that familiar narrative of Edinburgh’s literary history in the less familiar context of hundreds of other works. We also want our maps and our app to illustrate old connections, and forge new ones, among the hundreds of literary works we’ve been able to capture.

That’s the kind of aims we had, what we were after.

So our method started with identifying texts with a clear Edinburgh connection or, as we called it “Edinburghyness“. Then, within those works to actually try and understand just how relevant they were. And that proved tricky. Some of the best stuff about this project came from close collaboration between literary scholars and informatics researchers. The back and forth was enormously helpful.

We came across some seemingly obvious issues. The first thing we saw was that there was a huge amount of theological works… Which was odd… And turned out to be because the Edinburgh placename “Trinity” was in there. Then “Haymarket” is a place in London as well as Edinburgh. So we needed to rank placenames and part of that was the ambiguity of names, and understanding that some places are more likely to specifically be Edinburgh than others.

From there, with selected works, we wanted to draw out snippits – of varying lengths but usually a sensible syntactic shape – with those mentions of specific placenames.

At the end of that process we had a dataset of 550 published works, across a range of narrative genres. They have over 1600 Edinburgh place names of lots of different types, since literary engagement with a city might be a street, a building, open spaces, areas, monuments etc. In mapping terms you can be more exact, in literature you have these areas and diverse types of “place”, so our gazeteer needed to be flexible to that. And what that all gave us in total was 47,000 extracts from literary works, all focused on a place name mention.

That was the work itself but we also wanted to engage people in our work. So we brought Sir Walter Scott back to life. He came along to the Edinburgh International Book Festival in 2014. He kind of got away from us and took on a life of his own… He ended up being part of the celebrations of the 200th aniversary of Waverley. And popped up again last year on the Borders Railway when that launched! That was fun!

We did another event at EIBF in 2015 with James Robertson who was exploring LitLong and data there. And you can download that as a podcast.

So, we were very very focused on making this project work, but we were also thinking about the users.

The resource itself you can visit at LitLong.org. I will talk a little about the two forms of visualisation. The first is a location visualiser largely built and developer by Uta Hinrichs at St Andrews. That allows you to explore the map, to look at keywords associated by locations – which indicate a degree of qualitative engagement. We also have a searchable database where you can see the extracts. And we have an app version which allows you to wander in among the extracts, rather than see from above – our visualisation colleagues call this the “Frogs Eye View”. You can wander between extracts, browse the range of them. It works quite well on the bus!

We were obviously delighted to be able to do this! Some of the obstacles seemed tough but we found workable solutions… But we hope it is not the end of the story. We are keen to explore new ways to make the resource explorable. Right now there isn’t a way where interaction leaves a trace – other people’s routes through the city, other peoples understanding of the topography. There is scope for more analysis of the texts themselves. For instance we considered doing a mood map of the city, scope to see that. But we weren’t able to do that in this project but there is scope to do that. And as part of building on the project we have a bit of funding from the AHRC so lots of interesting lines of enquiry there. And if you want to explore the resource do take a look, get in touch etc.


Q1) Do you think someone could run sentiment analysis over your text?

A1) That is entirely plausible. The data is there and tagged so that you could do that.

A1 – Bea) We did have an MSc project just starting to explore that in fact.

A1) One of our buttons on the homepage is “LitLong Lab” where we share experiments in various ways.

Q2) Some science fiction authors have imagined near future Edinburgh, how could that be mapped?

A2) We did have some science fiction in the texts, including the winner of our writing competition. We have texts from a range of ages of work but a contemporary map, so there is scope to keying data to historic maps, and those exist thanks to the NLS. As to the future…  The not-yet-Edinburgh… Something I’d like to do… It is not uncommon that fictional places exist in real places – like 221 Baker Street or 44 Scotland Street – and I thought it would be fun to see the linguistic qualities associated with a fictional place, and compare to real places with the same sort of profile. So, perhaps for futuristic places that would work – using linguistic profile to do that.

Q3) I was going to ask about chronology – but you just answered that. So instead I will ask about crowd sourcing.

A3) Yes! As an editor I am most concerned about potential effort. For this scale and speed we had to let go of issues of mistakes, we know they are there… Places that move, some false positives, and some books that used Edinburgh placenames but are other places (e.g. some Glasgow texts). At the moment we don’t have a full report function or similar. We weren’t able to create it to enable corrections in that sort of way. What we decided to do is make a feature of a bug – celebrating those as worm holes! But I would like to fine tune and correct, with user interactions as part of that.

Q4) Is the data set available.

A4) Yes, through an API created by EDINA. Open for out of copyright work.

Palimpsest seeks to find new ways to present and explore Edinburgh’s literary cityscape, through interfaces showcasing extracts from a wide range of celebrated and lesser known narrative texts set in the city. In this talk, James will set out some of the project’s challenges, and some of the possibilities for the use of cultural data that it has helped to unearth.

Geoparsing Jisc Historical Texts – Dr Claire Grover, Senior Research Fellow, School of Informatics, University of Edinburgh

I’ll be talking about a current project, a very rapid project to geoparse all of the Jisc Historical Texts. So I’ll talk about the Geoparser and then more about that project.

The Edinburgh Geoparser, which has been developed over a number of years in collaboration with EDINA. It has been deployed in various projects and places, mainly also in collaboration with EDINA. And it has various main steps:

  • Use named entity recognition to identify place names in texts
  • Find matching records in a gazeteer
  • In cases of ambiguity (e.g. Paris, Springfield), resolve using contextual information from the document
  • Assign coordinates of preferred reading to the placename

So, you can use the Geoparser either via EDINA’s Unlock Text, or you can download it, or you can try a demonstrator online (links to follow).

To give you an example I have a news piece on the buriel of Richard III. You can see the Geoparser looks for entity recognition of all types – people as well as places – as that helps with disambiguation later on. Then using that text the parser ranks the likelihood of possible locations.

A quick word on gazeteers. The knowledge of possible interpretations comes from a gazeteer, which pairs place names to lat/long. So, if you know your data you can choose a gazeteer relevant to that (e.g. just the UK). The Edinburgh Geoparser is configured to provide a choice of gazeteers and can be configured to use other gazeteers.

If a place is not in a gazeteer it cannot be grounded. If the correct interprestation of a place name is not in the gazeteer, it cannot be grounded correctly. Modern gazeteers are not ideal for historical documents so historical gazeteers need to be used/developed. So for instance the DEEP (Directory of English Place Names) or PELAGIOS (ancient world) gazeteers have been useful in our current work.

The current Jisc Historical Text(http://historicaltexts.jisc.ac.uk/) project has been working with EEBO and ECCO texts as well as the BL Nineteenth Century collections. These are large and highly varied data sets. So, for instance, yesterday I did a random sample of writers and texts… which is so large we’ve only seen a tiny portion of it. We can process it but we can’t look at it all.

So, what is involved in us georeferencing this text? Well we have to get all the data through the Edinburgh Geoparser pipeline. And that requires adapting the geoparser pipeline to recognise place names to work as accurately as possible on historical text. And we need to adjust the georeferencing strategy to be more detailed.

Adapting our place name recognition relies a lot on lexicons. The standard Edinburgh Geoparser has three lexicons derived from the Alexandria Gazetteer (global, very detailed); Ordnance Survey (Great Britain, quite detailed), DEEP. We’ve also added more lexicons from more gazeteers… including larger place names in Geonames (population over 10,000), populated places from Natural Earth, only larger places from DEEP, and the score recognised place names based on how many and which lexicons they occur in. Low scored placenames are removed – we reckon people’s tolerance for missing a place is higher than their tolerance for false positives.

Working with old texts also means huge variation of spellings… There are a lot of false placenames/false negatives because of this (e.g. Maldauia, Demnarke, Saxonie, Spayne). They also result in false positives (Grasse, Hamme, Lyon, Penne, Sunne, Haue, Ayr). So we have tried to remove the false positives, to remove bad placenames.

When it comes to actually georeferencing these places we need coordinates for place names from gazetteers. We used three place names in succession: Pleiades++, GeoNames and then DEEP. In addition to using those gazeteers we can weight the results based on locations in the world – based on a bounding box. So we can prefer locations in the UK and Europe, then those in the East. Not extending to the West as much… And excluding Australia and New Zealand (unknown at that time).

So looking at EEBO and ECCO we can see some frequent place names from each gazeteers – which shows how different they are. In terms of how many terms we have found there are over 3 million locations in EEBO, over 250k in ECCO (a much smaller collection). The early EEBO collections have a lot of locations in Israel, Italy, France. The early books are more concerned with the ancient world and Biblical texts so these statistics suggest that we are doing the right thing here.

These are really old texts, we have huge volumes fo them, and there is a huge variety of the data and that all makes this a hard task. We still don’t know how the work will be received but we think Jisc will put this work in a sandbox area and we should get some feedback on it.

Find out more:

  • http://historicaltexts.jisc.ac.uk/
  • https://www.ltg.ed.ac.uk/software/geoparser
  • http://edina.ac.uk/unlock/
  • http://placenames.org.uk/
  • https://googleancientplaces.wordpress.com/


Q1) What about historical Gaelic place names?

A1) I’m not sure these texts have these. But we did apply a language tag on a paragraph level. These are supposed to be English texts but there is lots of Latin, Welsh, Spanish, French and German. We only georeferenced texts thought to be English. If Gaelic names then, if in Ordnance Survey, they may have been picked up…

Claire will talk about work the Edinburgh Language Technology Group have been doing for Jisc on geoparsing historical texts such as the British Library’s Nineteenth Century Books and Early English Books Online Text Creation Partnership which is creating standardized, accurate XML/SGML encoded electronic text editions of early print books.

Pitches – Mahendra and co

Can the people who pitched me

Lorna: I’m interested in open education and I’d love to get some of the BL content out there. I’ve been worked on the new HECoS coding schema for different subjects. And I thought that it would be great to classify the BL content with HECoS.

Karen: I’ve been looking at Copyright music collections at St Andrews. There are gaps in legal deposit music from late 18th and 19th century as we know publishers deposited less in Scottish versus BL. So we could compare and see what reached outer reaches of the UK.

Nina: My idea was a digital Pilgrims Progress where you can have a virtual tour of a journey with all sorts of resources.. To see why some places are most popular in texts etc.

David: I think my idea has been done.. It was going to be iPython – Katrina is already doing this! But to make it more unique… It’s quite hard work for Ben to support scholars in that way so I think researchers should be encouraged to approach Ben etc. but also get non-programmers to craft complex queries, make the good ones reusable by others… and have those reused be marked up as of particular quality. And to make it more fun… Could have a sort of treasure hunt jam with people using that facility to have a treasure hunt on a theme… share interesting information… Have researchers see tweets or shared things… A group treasure hunt to encourage people by helping them share queries…

Mahendra: So we are supposed to decide the winners now… But I think we’ll get all our pitchers to share the bag – all great ideas… The idea was to start conversations. You should all have an email from me so, if you have found this inspiring or interesting, we’ll continue that conversation.

And with that we are done! Thanks to all for a really excellent session!

Feb 252016
Today we have our second eLearning@ed/LTW Showcase and Network event. I’m liveblogging so, as usual, corrections and updates are welcome. 
Jo Spiller is welcoming us along and introducing our first speaker…
Dr. Chris Harlow – “Using WordPress and Wikipedia in Undergraduate Medical & Honours Teaching: Creating outward facing OERs”
I’m just going to briefly tell you about some novel ways of teaching medical students and undergraduate biomedical students using WordPress and platforms like Wikipedia. So I will be talking about our use of WordPress websites in the MBChB curriculum. Then I’ll tell you about how we’ve used the same model in Reproductive Biology Honours. And then how we are using Wikipedia in Reproductive Biology courses.
We use WordPress websites in the MBChB curriculum during Year 2 student selected components. Students work in groups of 6 to 9 with a facilitator. They work with a provided WordPress template – the idea being that the focus is on the content rather than the look and feel. In the first semester the topics are chosen by the group’s facilitator. In semestor two the topics and facilitators are selected by the students.
So, looking at example websites you can see that the students have created rich websites, with content, appendices. It’s all produced online, marked online and assessed online. And once that has happened the sites are made available on the web as open educational resources that anyone can explore and use here: http://studentblogs.med.ed.ac.uk/
The students don’t have any problem at all building these websites and they create these wonderful resources that others can use.
In terms of assessing these resources there is a 50% group mark on the website by an independent marker, a 25% group mark on the website from a facilitator, and (at the students request) a 25% individual mark on student performance and contribution which is also given by the facilitator.
In terms of how we have used this model with Reproductive Biology Honours it is a similar idea. We have 4-6 students per group. This work counts for 30% of their Semester 1 course “Reproductive Systems” marks, and assessment is along the same lines as the MBChB. Again, we can view examples here (e.g. “The Quest for Artificial Gametes”. Worth noting that there is a maximum word count of 6000 words (excluding Appendices).
So, now onto the Wikipedia idea. This was something which Mark Wetton encouraged me to do. Students are often told not to use or rely on Wikipedia but, speaking a biomedical scientist, I use it all the time. You have to use it judiciously but it can be an invaluable tool for engaging with unfamiliar terminology or concepts.
The context for the Wikipedia work is that we have 29 Reproductive Biology Honours stduents (50% Biomedical Sciences, 50% intercalculating medics), and they are split into groups of 4-5 students/groups. We did this in Semester 1, week 1, as part of the core “Research Skills in Reproductive Biology”. And we benefited from expert staff including two Wikipedians in Residence (at different Scottish organisations), a librarian, and a learning, teaching and web colleague.
So the students had an introdution to Wikipedia, then some literature searching examples. We went on to groupwprl sesssions to find papers on particular topics, looking for differences in definitions, spellings, terminology. We discussed findings. This led onto groupwork where each group defined their own aspect to research. And from there they looked to create Wikipedia edits/pages.
The groups really valued trying out different library resources and search engines, and seeing the varying content that was returned by them.
The students then, in the following week, developed their Wikipedia editing skills so that they could combine their work into a new page for Neuroangiogenesis. Getting that online in an afternoon was increadibly exciting. And actually that page was high in the search rankings immediately. Looking at the traffic statistics that page seemed to be getting 3 hits per day – a lot more reads than the papers I’ve published!
So, we will run the exercise again with our new students. I’ve already identified some terms which are not already out there on Wikipedia. This time we’ll be looking to add to or improve High Grade Serious Carcinoma, and Fetal Programming. But we have further terms that need more work.
Q1) Did anyone edit the page after the students were finished?
A1) A number of small corrections and one querying of whether a PhD thesis was a suitable reference – whether a primary or secondary reference. What needs done more than anything else is building more links into that page from other pages.
Q2) With the WordPress blogs you presumably want some QA as these are becoming OERs. What would happen if a project got, say, a low C.
A2) Happily that hasn’t happened yet. That would be down to the tutor I think… But I think people would be quite forgiving of undergraduate work, which it is clearly presented at.
Q3) Did you consider peer marking?
A3) An interesting question. Students are concerned that there are peers in their groups who do not contribute equally, or let peers carry them.
Comment) There is a tool called PeerAim where peer input weights the marks of students.
Q3) Do all of those blog projects have the same model? I’m sure I saw something on peer marking?
A3) There is peer feedback but not peer marking at present.
Dr. Anouk Lang – “Structuring Data in the Humanities Classroom: Mapping literary texts using open geodata”
I am a digital humanities scholar in the school of Languages and Linguistics. One of the courses I teach is digital humanities for literature, which is a lovely class and I’m going to talk about projects in that course.
The first MSc project the students looked at was to explore Robert Louis Stevenson’s The Dynamiter. Although we were mapping the texts but the key aim was to understand who wrote what part of the text.
So the reason we use mapping in this course is because these are brilliant analytical students but they are not used to working with structured data, and this is an opportunity to do this. So, using CartoDB – a brilliant tool that will draw data from Google Sheets – they needed to identify locations in the text but I also asked students to give texts an “emotion rating”. That is a rating of intensity of emotion based on the work of Ian Gregory – spatial historian who has worked with Lakes data on the emotional intensity of these texts.
So, the students build this database by hand. And then loaded into CartoDB you get all sorts of nice ways to visualise the data. So, looking at a map of London you can see where the story occurs. The Dynamiter is a very weird text with a central story in London but side stories about the planting of bombs, which is kind of played as comedy. The view I’m showing here is a heatmap. So for this text you can see the scope of the text. Robert Louis Stevenson was British, but his wife was American, and you see that this book brings in American references, including unexpected places like Utah.
So, within CartoDB you can try different ways to display your data. You can view a “Torque Map” that shows chronology of mentions – for this text, which is a short story, that isn’t the most helpful perhaps.
Now we do get issues of anachronisms. OpenStreetMap – on which CartoDB is based – is a contemporary map and the geography and locations on the map changes over time. And so another open data source was hugely useful in this project. Over at the National Library of Scotland there is a wonderful maps librarian called Chris Fleet who has made huge numbers of historical maps available not only as scanned images but as map tiles through a Historical Open Maps API, so you can zoom into detailed historical maps. That means that mapping a text from, say, the late 19th Century, it’s incredibly useful to view a contemporaneous map with the text.
You can view the Robert Louis Stevenson map here: http://edin.ac/20ooW0s.
So, moving to this year’s project… We have been looking at Jean Rhys. Rhys was a white Creole born in the Dominican Republic who lived mainly in Europe. She is a really located author with place important to her work. For this project, rather than hand coding texts, I used the wonderful wonderful Edinburgh Geoparser (https://www.ltg.ed.ac.uk/software/geoparser/??) – a tool I recommend and a new version is imminent from Clare Grover and colleagues in LTG, Informatics.
So, the Geoparser goes through the text and picks out text that looks like places, then tells you which it things is the most likely location for that place – based on aspects like nearby words in the text etc. That produces XML and Clare has created me an XSLT Stylesheet, so all the students have had to do is to manually clean up that data. The GeoParser gives you GeoNames reference that enables you to check latitude and longitude. Now this sort of data cleaning, the concept of gazeteers, these are bread and butter tools of the digital humanities. These are tools which are very unfamiliar to many of us working in the humanities. This is open, shared, and the opposite of the scholar secretly working in the librarian.
We do websites in class to benefit from that publicness – and the meaning of public scholarship. When students are doing work in public they really rise to the challenge. They know it will connect to their real world identities. I insist students sow their name, their information, their image because this is part of their digital scholarly identities. I want people who Google them to find this lovely site with it’s scholarship.
So, for our Jean Rhys work I will show you a mock up preview of our data. One of the great things about visualising your data in these ways is that you can spot errors in your data. So, for instance, checking a point in Canada we see that the Geoparser has picked Halifax Nova Scotia when the text indicates Halifax in England. When I raised this issue in class today the student got a wee bit embarrassed and made immediate changes… Which again is kind of perk of work in public.
Next week my students will be trying out QGIS  with Tom Armitage of EDINA, that’s a full on GIS system so that will be really exciting.
For me there are real pedagogical benefits of these tools. Students have to really think hard about structuring their data, which is really important. As humanists we have to put our data in our work into computational form. Taking this kind of class means they are more questioning of data, of what it means, of what accuracy is. They are critically engaged with data and they are prepared to collaborate in a gentle kind of way. They also get to think about place in a literary sense, in a way they haven’t before.
We like to think that we have it all figured out in terms of understanding place in literature. But when you put a text into a spreadsheet you really have to understand what is being said about place in a whole different way than a close reading. So, if you take a sentence like: “He found them a hotel in Rue Lamartine, near Gard du Nord, in Monmatre”. Is that one location or three? The Edinburgh GeoParser maps two points but not Rue Lamartine… So you have to use Google maps for that… And is the accuracy correct. And you have to discuss if those two map points are distorting. The discussion there is more rich than any other discussion you would have around close reading. We are so confident about close readings… We assume it as a research method… This is a different way to close read… To shoe horn into a different structure.
So, I really like Michel De Certeau’s “Spatial stories” in The practice of everyday life (De Certeau 1984), where he talks about structured space and the ambiguous realities of use and engagement in that space. And that’s what that Rue LaMartine type example is all about.
Q1) What about looking at distance between points, how length of discussion varies in comparison to real distance
A1) That’s an interesting thing. And that CartoDB Torque display is crude but exciting to me – a great way to explore that sort of question.
OER as Assessment – Stuart Nichol, LTW
I’m going to be talking about OER as Assessment from a students perspective. I study part time on the MSc in Digital Education and a few years ago I took a module called Digital Futures for Learning, a course co-created by participants and where assessment is built around developing an Open Educational Resource. The purpose is to “facilitate learning for the whole group”. This requires a pedagogical approach (to running the module) which is quite structured to enable that flexibility.
So, for this course, the assessment structure is 30% position paper (basis of content for the OER), then 40% of mark for the OER (30%peer-assessed and tutor moderated / 10% self assessed), and then the final 30% of the marks come from an analysis paper that reflects on the peer assessment. You could then resubmit the OER along with that paper reflecting on that process.
I took this module a few years ago, before the University’s adoption of an open educational resource policy, but I was really interested in this. So I ended up building a course on Open Accreditation, and Open Badges, using weebly: http://openaccreditation.weebly.com/.
This was really useful as a route to learn about Open Educational Resources generally but that artefact has also become part of my professional portfolio now. It’s a really different type of assignment and experience. And, looking at my stats from this site I can see it is still in use, still getting hits. And Hamish (Macleod) points to that course in his Game Based Learning module now. My contact information is on that site and I get tweets and feedback about the resource which is great. It is such a different experience to the traditional essay type idea. And, as a learning technologist, this was quite an authentic experience. The course structure and process felt like professional practice.
This type of process, and use of open assessment, is in use elsewhere. In Geosciences there are undergraduate students working with local schools and preparing open educational resources around that. There are other courses too. We support that with advice on copyright and licensing. There are also real opportunities for this in the SLICCs (Student Led Individually Created Courses). If you are considering going down this route then there is support at the University from the IS OER Service – we have a workshop at KB on 3rd March. We also have the new Open.Ed website, about Open Educational Resources which has information on workshops, guidance, and showcases of University work as well as blogs from practitioners. And we now have an approved OER policy for learning and teaching.
In that new OER Policy and how that relates to assessment, and we are clear that OERs are created by both staff and students.
And finally, fresh from the ILW Editathon this week, we have Ewan MacAndrew, our new Wikimedian in residence, who will introduce us to Histropedia (Interactive timelines for Wikipedia: http://histropedia.com) and run through a practical introduction to Wikipedia editing.
Wikimedian in Residence – University of Edinburgh – Ewan MacAndrew
Ewan is starting by introducing us to to “Listen to Wikipedia“, which turns live edits on Wikipedia right now into melodic music. And that site colour codes for logged in, anonymous, and clean up bots all making edits.
My new role, as Wikimedian in Residence, comes about from a collaboration between the University of Edinburgh and Wikimedia Foundation. And my role fits into their complimentary missions, which fit around the broad vision of imagining the world where all knowledge is openly available. My role is to enhance the teaching and curriculum, but also helping to highlight the rich heritage and culture around the university beyond that, and helping raise awareness of their commitment to open knowledge. But this isn’t a new collaboration, it is part of an ongoing collaboration through events and activities and collaboration.
It’s also important to note that I am a Wikimedian in Residence, rather than a Wikipedian in Residence. Wikimedia is the charitable foundation behind Wikipedia, but they have a huge family of projects including Wikibooks, MediaWiki, Wikispecies, etc. That includes Wikidata is the database of all knowledge that humans and machines can read, which is completely language independent – the model Wikipedia is trying to work towards.
So, what is Wikipedia and how does it work? Well we have over 5 million articles, 38 million pages, over 800 million edits, and over 130k active users.
There has been past work by the University with Wikimedia. There was the Women, Science and Scottish editathon for ILW 2015, Chris Harlow already spoke about his work, there was an Ada Lovelace editathon from October 2015, Gavin Willshaw was part of #1Lib1Ref day for Wikipedia’s 15th Birthday in January 2016. Then last week we had the History of Medicine editathon for ILW 2016 which generated 4 new articles, improved 56 articles, uploaded over 500 images to Wikicommons. Those images, for instance, have huge impact as they are of University buildings and articles with images are far more likely to be clicked on and explored.
You can explore that recent editathon in a Storify I made of our work…

View the story “University of Edinburgh Innovative Learning Week 2016 – History of Medicine Wikipedia editathon” on Storify

We are now looking at new and upcoming events, our next editathon is for International Women’s Day. In terms of ideas for events we are considering:

  • Edinburgh Gothic week – cross curricular event with art, literature, film, architecture, history, music and crime
  • Robert Louis Stevenson Day
  • Scottish Enlightenment
  • Scottish photographers and Image-a-thons
  • Day of the Dead
  • Scotland in WWI Editathon – zeppelin raids, Craiglockhart, etc.
  • Translationathons…

Really open to any ideas here. Do take a look at reports and updates on the University of Edinburgh Wikimedian in Residence activities here: https://en.wikipedia.org/wiki/Wikipedia:University_of_Edinburgh

So, I’m going to now quickly run through the five pillars of Wikipedia, which are:

  1. An encylopedia – not a gossip column, or blog, etc. So we take an academic, rigorous approach to the articles we are putting in.
  2. Neutral point of view – trying to avoid “peacock terms”. Only saying things that are certain, backed up by reliable published sources.
  3. Free content that anyone can use, edit and distribute.
  4. Respect and civility – when I run sessions I ask people to note that they are new users so that others in the community treat you with kindness and respect.
  5. No firm rules – for every firm rules there has to be flexibility to work with subjects that may be tricky, might not quite work. If you can argue the case, and that is accepted, there is the freedom to accept exceptions.

People can get bogged down in the detail of Wikipedia. Really the only rule is to “Be bold not reckless!“.

When we talk of Wikipedia and what a reliable source is, Wikipedia is based on reliable published source with reputation for fact-checking and accuracy. Academic and peer-reviewed scholarly material is often used (barring the no original research distinction). High quality mainstream publications too. Blogs are not seen as reliable generally, but sites like BBC and CNN are. And you need several independent sources for a new article – generally we look for 250 words and 3 reliable sources for a new Wikipedia article.

Ewan is now giving us a quick tour through enabling the new (fantastic!) visual editor, which you can do by editing your settings as a registered user. He’s also encouraging us to edit our own profile page (you can say hello to Ewan via his page here), formatting and linking our profiles to make them more relevant and useful. Ewan is also showing how to use Wikimedia Commons images in profiles and pages. 

So, before I finish I wanted to show you Histropedia, which allows you to create timelines from Wikipedia categories.

Ewan is now demonstrating how to create timelines, to edit them, to make changes. And showing how the timelines understand “important articles” – which is based on high visibility through linking to other pages. 

If you create a timeline you can save these either as a personal timeline, or as a public timeline for others to explore. The other thing to be aware of is that WikiData can be modified to search for more specialised topics – for instance looking at descendants of Robert the Bruce. Or even as specific as female descendants of Robert the Bruce born in Denmark. That just uses Robert the Bruce and a WikiData term called “child of”, and from those two fields you can build a very specific timelines. Histropedia uses both categories and WikiData terms… So here it is using both of those.


Q1) Does Wikidata draw on structured text in articles?

A1) It’s based on “an instance of”… “place of education” or “created on” etc. That’s one of the limitations of Histropedia right now… It can’t differentiate between birth and death date versus dates of reign. So limited to birth and death, foundation dates etc.

Q2) How is Wikipedia “language independent”?

A2) Wikipedia is language dependent. Wikidata is language independent. So, no matter what tool Wikidata uses, it functions in every single language. Wikipedia doesn’t work that way, we have to transfer or translate texts between different language versions of Wikipedia. Wikidata uses a q code that is neutral to all languages that gets round that issue of language.

Q3) Are you holding any introductory events?

A3) Yes, trying to find best ways to do that. There are articles from last week’s editathon which we could work on.

And with that we are done – and off to support our colleague Jeremy Knox’s launch of his new book: Posthumanism and the Massive Open Online Course: Contaminating the Subject of Global Education.

Thanks to all our fantastic speakers, and our lovely organisers for this month’s event: Stuart Nicol and Jo Spiller.

Apr 032014

Today I am at the Digital Humanities: What does it mean? session at Teviot debating Hall. I will be running two workshops later but will LiveBlog others talks taking place today.

We are starting with an introduction from Jessica from Forum, who is explaining the background to today’s event, in exploring what digital humanities are and what it means to be a digital only journal.

The first speaker today is Lisa Otty

Lisa Otty – Digital Humanities or How I Learned to stop worrying and love the computer

I’m going to take “digital humanities, what does it mean?” In two ways. Firstly thinking about literal definitions, but also thinking more rhetorically about what this means.

Digital humanities generate many strong opinions and anxieties – hence my title borrowed from Dr strange love. So I want to move beyond the polemic to what digital humanities actually means to practitioners.

I want to ask you about the technologies you use… From word processing to Google books, to blogs, twitter, to Python and raspberry pis (by show of hands most use the former, two code, one uses a raspberry pi to build). There is a full spectrum here.

Wikipedia is probably the most widely used encyclopedia but I suspect most academics would still be sceptical about it… Can we trust crowdsourced information? Well it’s definition of digital humanities is really useful. What we should particularly take from this definition that it is a methodology, computational methods. Like critical theory it cross cuts different disciplines, which is why to slot into universities structures.

Chris Forster, on the HASTAC blog (9/8/2010), talks about digital humanities as about direct practical use of computational methods for research, of media studies new media, using technology in the classroom, and the way new technology is rescaling research and the profession – academic publishing, social media, and alt-ac (those academic-like but from outside traditional structures, eg based in support services).

So I’ve recrafted this a but. Digital humanities is about:

Research that uses computational methods and tools. Probably the most famous proponent of this is Franco Morello, who uses quantitative computational methods in his area of literature. This is work at large scale – often called scalable reading or distance reading. So for instance looking at British novelistic genres 1740-1900 he has created a visual representation of how these genres appear and disappear – frequently in clusters. Moretti says that this maps out the expectations of genres over time.

Similarly Moretti has visualised the characters in Hamlet and their deaths, mapping out that characters closely related to the king and closely related to polonium then you are toast. Now you could find that out by reading Hamlet, but with that approach you can go and explore other texts.

Research that studies digital objects/cultural. Lev Marovich has founded the concept of cultural analytics. For instance a recent project looks at people’s self portraits online, how they present themselves, how they describe themselves. They found women take more selfies than men, women take them in their early twenties, men in their thirties, and people I’m susan Paulo like to recline in their selfies – not sure what that part tells us!

Research that builds digital objects/tools. For instance the Carnegie Mellon Docuscope which looks for linguistic markers and rhetorical patterns. Interestingly colleagues at strathclyde using this tool found that structurally Othello is a comedy.

So you may be building tools for your discipline or area of research we also see tools built around digitised texts, such as Codex Simaiticus. This has been digitised using a process which photographs the texts in many didn’t light levels and conditions, including ultra violet light. This allows scholars to work with texts in new ways, to read previously inaccessible or delicate texts. And there are 3d imaging techniques too. So digital images have really important implications for humanities scholars, particularly in areas such as archeology.

This computation research fits into four key fields:
– digitisation and TEI, the latter a metadata mark up language which is really scholarly best practice to use. Whole projects are based around setting up details in TEI.
– mapping and data visualisation – like Moretti, georeferencing etc.
– text mining/topic modelling
– physical computing – a catch all for digital imaging and similar technologies

I wanted to now focus on some projects with a close association with this university.

– digitisation and TEI – the Modernist Versions project
– mapping and data visualisation – PLEIDES, extracted georeferenced texts from ancient classical texts
– text mining – Palimpsest uses text mining to georeferences references to places in texts to allow exploration in situ using mobile phones.
– physical computing – digital imaging unit at edinburgh university library is brilliant, has a fantastic blog, a rich resource.

So to the rhetorical aspects of DH.

Roberto Busa (1949-2005) undertook a visionary project with IBM, the Index Thomisticus. He was really the first person to connect text to the internet. The world of 2005 when that project went live was very different to 1949.

The term Digital humanities was coined in 2001. Computing was already about teaching, publishing, convergent practices… The definition of DH which relates the field to to a three ring circus really connects to Chris foresters definition.

By 2009 we reached a pivotal moment for digital humanities! it moved from emergent to established (Christine ?, UCLA). Some enthusiasts saw this as the future. But it generated a kind of equal and opposite reaction… Not everyone wants borders reshaped and challenged, they were already invested in their own methods. New methods can be daunting. What seemed most worrying was what digital humanities might bring with it. Anxieties arose from very real concerns…

There has been an encroachment of science and the precariousness of the humanities with medical humanities, cognitive humanities, neuro humanities, digital humanities. Here the rhetoric sees scientific methods as more valid than humanities. People like frank morello don’t help here. And to what extent do we use these scientific approaches to validate humanities work? I don’t think the humanities would be any less precarious if all used such approaches.

And there are managerial and financial issues, Daniel Allington, himself a digital humanities scholars. He describes humanities research as cheap, disadvantagious from two perspectives, both funders and universities. Sometimes theses projects can be about impact or trendiness, not always about the research itself. matthew tanbaum(?) describes it more tactfully, with DH as “tactical coinage”, acknowledging the reality of circumstances in which DH allows us to get things done, to put it simply.

And who is in DH? Generally it is a very gendered and a very white group. Typically teenage boys are the people who teach themselves to code. The terms can be inaccessible. It can be ageist.it can seem to enforce privilegde. There are groups that are seeking to change this, but we have to be aware of the implications.

And those tools I showed before… Those are mainly commercial companies, as we all know if you do not pay for a service, you are the product, even the British newspaper archive is about digitising in order to charge via genealogy websites. DH has a really different relationship to business, to digital infrastructure. I want to tell you about this to explain the polemical responses to DH. And so that you understand the social, cultural and professional implications.

Geoffrey Harpham, in NEH bulletin (winter 2014) talk about research as being about knowledge but also the processes by which it is brought into being. We are all using digital tools. We just have to be conscious of what we are doing, what we are priviledging, what we are excluding. digital humanities scholars have put this well in a recent MIT publication. They point to questions raised:
– what haloens when anyone can speak and publish? What happens when knowledge credential in is no longer controlled solemnly by institutions of higher learning?
– who can create knowledge?

I liken this time to the building of great libraries in the nineteenth century. We have to be involved and we really have to think about what it means to become digital. We need to shape this space in critical ways, shaping the tolls we need.

Matthew Kirshenbaum talks about digital humanities as mobile and tactical signifier. He talks about the field as a network topology. DH, the keyword, the tag, constantly changes, is constantly redefined.

And in a way this is why Wikipedia is the perfect place to seek a definition, it is flexible and dynamic.

Digital Humanities has to also be flexible, it is up to all of us to make it what we want it to be.


Q1) is this an attempt for humanities to redefine itself to survive?
A1) it’s an important areas. The digital humanist does work collaboratively with the sciences. The wrong approach is to be staking out you space and defending it, collaborative work is tactical. So many post phd roles are temporary contracts around projects. We can’t just maintain the status quo, but we. Do have to think strategically about what we do, and be critical in thinking about what that means.

Q2) coming back to your Wikipedia comment, and the reinforcement of traditional privilege… I’ve become increasingly aware that Wikipedia can also be replicating traditional structures. Wikipedian in residence legitimises Wikipedia, but does it not also potentially threaten the radical nature of the space?
A2) you’ve put your finger on the problem, I think we are all aware of the gender bias in Wikipedia. And those radical possibilities, and threats are important to stay on top of, and that includes understanding what takes place behind the scenes, in order to understand what that means.

Q3) I wanted to ask about the separate nature of some of those big digital humanities centre
A3) in the USA there are some huge specialist centres at UCLS, university of Victoria, Stanford, create hugely specialist tools which are freely available but which attract projects and expertise to their organisation. In a way the lack of big centres here does make us think more consciously about what digital humanities is. I was speaking to Andrew Prescott about this recently and he thinks the big DH centres in the UK will disappear and that it will be dispersed across humanities departments. But it’s all highly political and we. Have to be aware of the politics of these tools and organisations when we Use and engage with them.

Q4) given we all have to put food on the table, how can we work with what is out there already – the Googles of the world who do hire humanities experts for instance.
A4) I didn’t mean to suggest google is bad, they are good in many ways. But DH as a tactical term is something that you can use for your benefit. It is a way to get into a job! That’s perfectly legitimate. There are very positive aspects to the term in terms of deployment and opportunities.

Q5) how do you get started with DH?
A5) a lot of people teach themselves… There are lots of resources and how too guides online. There is Stanford’s “tooling up for the digital humanities”, Roy rosewhite centre has DH tools. Or for your data you can use things like Voyant Tolls. Lots of eresoures online. Experiment. And follow DH people on twitter. Start reading blogs, read tutorials of how to do things. Watch and learn!

Q6) are there any things coming up you can. Recommend?
A6) yes, we have an event coming up on 9th June. Informations coming soon. You can sign up for that to see presentations, speak to scholars about DH, and there will be a bidding process for a small amount of money to. Try these tools. And there is also a DH network being established by institutions across Scotland so look out for news on that soon!

And with that I ran two workshops…

Panel Session

We have Jo Shaw chairing, Ally Crockford! Anna Groundwater, James Loxley! Louise Settle, Greg Walter

My project is not very digital, and largely Inhumane! I think I’m here to show you what not to do! My project is theatrical, the only 16th century play form Scotland to survive. It had never been performed since 1554. We kind of showed why that was! It is 5 and can half hours long… We got a director, actors, etc. funding to do this, and why is so hard to do financially. So we set up a website, Staging and Representing the Scottish Renaissance Court, with HD video that can be edited and manipulated. Endless blogging, twittering, and loads fore sources for teachers etc. and we have local dramatic groups who are taking the play up. The Linlithgow town Players are performing it all next year for instance


This is incomplete but my project is called digital manipulation a, grew out of AHRC project with surgeons hall in edinburgh. The city is first UNESCO city of literature but medically it is also historically one of the most important cities in the world. So makes some sense to look at those two factors together. So my site, a mock up, is Dissecting Edinburgh. A digital project, based on omeka, designed for non IT specialists but it’s still pretty tough to use actually. They have plugins and extensions. Bit like wordpress but more designed for academic curation. For instance have an extension that has been used to map literary connections between real locations and HP Lovecrafts work. And you can link sources to comment back to full text. And you can design “exhibitions” based on keywords or themes. Looking for similarities in sources, etc.mthat is the hope of what it will look like… Hopefully!

My IASH project uses historical GIS to map crime from 1900 to 1939. Looking at women’s experiences, and looking at policing. Geography became important which is how I came to use GIS. I used edinburgh Map Builder… Although if you aren’t looking just at Edinburgh you can use Digimap which has full UK coverage. I wasn’t technically minded but I came to use these tools because of my research. So I got my data from court records and archives… And out that into GIS, plot them on the map, see what changes and patterns occur. Changes appear… And suggest new questions… Like plotting entertainment venues etc. and I’ve used that in papers, online etc. I’m also working with MESH: mapping edinburghs social history which is a huge project looking at living, dying, making, feeding, drinking… Huge scale project on Edinburgh.

This is a blog site plus I suppose. This was a project Anna and I were working on from 2011-2013 based on a very long walk that Ben Jonson took. I was lucky enough to turn up a manuscript by his travelling companion. I was exploring a text, annotating it, summarising it, and creating a book… But Anna had other ideas and we found new digital tools to draw out elements of the account… Despite being about a writer and a poet it’s much more a documentary account of the journey itself. So within the blog we were able to create a map of the walk itself…. With each point a place that Jonson and his companion visited. This was all manually created with google maps. It was fun but time consuming. Then created a database used for this map. And then there markers for horse or coaches. We worked with Dave in our college we team to help with this who was great at bringing this stuff together. For each place you could find the place, the dates visited, distance form last point, details of food of drink etc. sort of tabulated the walk… And that plays to the strengths of the texts. And we could calculate Jonsons preferred walking speed… Which seemed to be thresh miles per hour – seems unlikely as he was in his forties and 20stone according to other accounts at the time…

Anyway in addition we used the blog to track the walk, each going live relative to the point that Jonson and his companion had reached. And the points on the map appeared to the same schedule – gave people a reason to go back and revisit…

The most fun was the other bit…


I’m going to talk a bit about how we did that I real time. We want edit o be creative… Because we didn’t want to do the walk! And so ewe tweeted in real time, using modernised version (and spelling) of the text in the voice of the travelling companions,and. Chunked up into the appropriate portions of the day. It felt more convincing and authentic because it was so fixed and authentic in terms of timing. (See @benjonsonswalk). We did it on trace book as well. And tweets showed on the blog so you could follow from tweet to blog… It unfolded in real time and always linked back to more detail about Ben Jonsons walk on the blog.

Now… It was an add on to the project. Not in original AHRC blog. Just built it in. It was 788 tweets. It was unbelievably time consuming! We preloaded the tweets on Hootsuite. So preloaded but we could then interact as needed. Took a month to set up. And once up and running you have to maintain it. Between us we did that. But it was 24/7. You have to reply, you have to thank them for following. We got over 1200 followers engaging. Fun bit was adding photos to tweets and blog of, say, buildings from that time that still stand. What I wasn’t expecting was what we got back from the public… People tweeted or commented with information that we didn’t know… And that made it into the book and is acknowledged. It was real Knowledge Exchange in practice!

James: the twitter factor got us major media interest from all the major newspapers, radio etc. madden. Big impact.

Anna: Although more and more projects will be doing these things, we did have a novelty factor.

Jo: what was the best thing and the worst thing about what happens?

Greg: best thing wasn’t digital, it was working with ac tors. Learned so much working together. Worst thing was… Never work with trained ravens!

Ally: best thing is that I’m quite a nerd so I love finding little links and. Connections… I found out that Robert Louis Stevenson was friends with James Demoson (?) daughter, he had discovered cloroform… Lovely comments in her texts about Stevenson, as a child watching her father at work from out of his window. Worst thing is that I’m a stickler and a nerd, ow ant to start from scratch and learn everything and how it works…. The timeload is huge.

Louise: best thing was that I didn’t know I was interested in maps before, so that’s been brilliant. Worst part was having to get up to speed with that and make data fit the right format…but using existing tools can be super time saving.

James: best thing was the enthusiasm of people out there, I’m a massive nerd and Ben Jonson fan… Seeing others interest was brilliant. Particularly when you got flare ups and interest as Ben Jonson went through their home town… Worst bit was being heckled by an incredibly rude William Shakespeare on twitter!

Anna: the other connection with shakespeare was that Jonson stayed at the george at Huntingdon. You have to hashtag everything so ewe hashtagged the place. We got there… The manager at The George write back to say that they stage a Shakespeare play every year in the courtyard. They didn’t know Jonson had stayed there… Love this posthumous meeting!

Q: what’s come across is how much you’ve learned and come to understand what you’ve been using. Wondered how that changed your thinking and perhaps future projects…

Anna: we were Luddites (nerdy geeky Luddites) but we learned so so much! A huge learning process. The best way to learn is by doing it. It’s the best way to learn those capabilities. You don’t have to do it all. Spot what you can, then go to the write person to help. As to the future… We were down in Yorkshire yesterday talking about a big digital platform across many universities working on Ben Jonson. Huge potential. Collaboration potential exciting. Possibly Europe wide, even US.

Ally: it can change the project… I looked at omeka… I wanted to use everything but you have to focus in on what you need to do… Be pragmatic, do what you can in the time, can build on it later…

Jo: you are working on your own, would co working work better?

Ally: would be better if cross pollinations cross multiple researchers working together. Initially I wanted to see what I can do, if I admin generate some interest. Started off with just me. Spoke to people at NLS, quite interested in directing digitisation in helpful ways. Now identifying others to work with… But I wanted to figure out what I can do as a starting point…

Louise: MESH is quite good for that. They are approaching people to do just part of what’s needed… So plotting brothel locations and I’d already done that… But there were snippets of data to bring in. Working with a bigger team is really useful. Linda who was at IASH last year is doing a project in Sweden and working on those projects has given me confidence to potentially be part of that…

Greg: talking about big data for someone and they said the key thing is when you move from where the technology does what you can, and moves into raising new questions, bringing something new… So we are thinking about out how to make miracle play with some real looking miracles in virtual ways…

Jo: isn’t plotting your way through a form of big data…?

Greg: it’s visualising something we had in our head… Stage one is getting play better known. When we. Get to stage two we can get to hearing their responses to it too…

Anna: interactions and crowdsourcing coming into the research process, that’s where we are going… Building engagement into the project… Social media is very much part of the research process..there are some good English literature people doing stuff. Some of Lisa Otty’s work is amazing. I’m developing a digital literature course… I’ve been following Lisa, also Elliot Lang (?) at strathclyde… Us historians are maybe behind the crowd…

Ally: libraries typically one step ahead of academics in terms of integrating academic tools and resources in accessible formats. So the Duncan street caller lets you flick through floor plans of john murray archive. It’s stunning. It’s a place to want to get to…

James Loxley: working on some of these projects has led to my working on a project with colleagues from informatics, with St. Andrews and with edina to explore and understand how edinburghs cityscape has evolved through literature. Big data, visualisation… Partly be out finding non linear, non traditional ways into the data. This really came from understanding Ben Jonsons walk text in a different structure, as a non linear thing

Q: what would you have done differently

Louise: if I’d known how the data had to be cleaned and structured up front, I’d have done it that way to start with… Knowing how I’d use it.

Ally: I think it would have had a more realistic assessment of what I needed to do, and done more research about the work involved. Would have been good to spends. Few months to look at other opportunities, people working in similar ways rather than reinventing the wheel.

Greg: in a previous project we performed a play at Hampton court, our only choice. We chose to make the central character not funny… In a comedy… A huge mistake. Always try to be funny…

Anna: I don’t think we messed up too badly…

James: I’d have folder funding into the original bid…

Anna: we managed to get some funding for the web team as pilot project thankfully. But yes, build it in. Factor it in early. I think it should be integral, not an add on.

Q: you mentioned using databases…. What kinds have you used? Acid you mentioned storify… Wondered how you used? What is immersive environment for the drama?

Greg: I don’t think it exists yet.a. Discussion at Brunel between engineers, and developers and my collaborator…

Ally: I think there is a project looking in this area…

Louise: I used access for my database…

James: to curate map data we started in excel…. Then dave did magic to make it a google map. Storify was to archive those tweets, to have a store of that work basically…

Anna: there are courses out there. Take them. I went on digimap courses, ARCGIS, social media courses which were really helpful. Just really embrace this stuff. And things change so fast….

And with that we draw to a close with thank yous to our speakers….


 April 3, 2014  Posted by at 1:57 pm Uncategorized Tagged with: , , , ,  No Responses »