Jul 052016

This afternoon I’m at UCL for the “If you give a historian code: Adventures in Digital Humanities” seminar from Jean Bauer of Center for Digital Humanities at Princeton University, who is being hosted by Melissa Terras of the UCL Centre for Digital HumanitiesI’ll be liveblogging so, as usual, any corrections and additions are very much welcomed. 

Melissa is introducing Jean, who is in London en route to DH 2016 in Kraków next week. Over to Jean:

I’m delighted to be here with all of the wonderful work Melissa has been doing here. I’m going to talk a bit about how I got into digital humanities, but also about how scholars in library and information sciences, and scholars in other areas of the humanities might find these approaches useful.

So, this image (American Commissioners of the Preliminary Peace Negotiations with Great Britain. By Benjamin West, London, England; 1783 (begun). Oil on canvas. (Unframed) Height: 28 ½” (72.3 cm); Width: 36 ¼” (92.7 cm). 1957.856) is by Benjamin West, the Treaty of Paris, 1783. This is the era that I research and what I am interested in. In particular I am interested in John Adam, the first minister of the United States – he even gets one line in Hamilton: the musical. He’s really interested as he was very concerned with getting thinking and processes on paper. And on the work he did with Europe, where there hadn’t really been American foreign consuls before. And he was also working on areas of the North America, making changes that locked the British out of particular trading blocks through adjustments brought about by that peace treaty – and I might add that this is a weird time to give this talk in England!

Now, the foreign service at this time kind of lost contact once they reached Europe and left the US. So the correspondence is really important and useful to understand these changes. There are only 12 diplomats in Europe from 1775-1788, but that grows and grows with consuls and diplomats increasing steadily. And most of those consuls are unpaid as the US had no money to support them. When people talk about the diplomats of this time they tend to focus on future presidents etc. and I was interested in this much wider group of consuls and diplomats. So I had a dataset of letters, sent to John Jay, as he was negotiating the treaty. To use that I needed to put this into some sort of data structure – so, this is it. And this is essentially the world of 1820 as expressed in code. So we have locations, residences, assignments, letters, people, etc. Within that data structure we have letters – sent to or from individuals, to or from locations, they have dates assigned to them. And there are linkages here. Databases don’t handle fuzzy dates well, and I don’t want invalid dates, so I have a Boolean logic here. And also a process for handling enclosures – right now that’s letters but people did enclose books, shoes, statuettes – all sorts of things! And when you look at locations these connect to “in states” and states and location information… This data set occurs within the Napoleonic wars so none of the boundaries are stable in these times so the same location shifts in meaning/state depending on the date.

So, John Jay has all this correspondence between May 27 and Nov 19, 1794 and they are going from Europe to North America, and between the West Indies and North America. Many of these are reporting on trouble. The West Indies are ship siezures… And there are debts to Britain… And none of these issues get resolved in that treaty. Instread John Jay and Lord Granville set up a series of committees – and this is the historical precident for mediation. Which is why I was keen to understand what information John Jay had available. None of this correspondance got to him early enough in time. There wasn’t information there to resolve the issue, but enough to understand it. But there were delays for safety, for practical issues – the State Department was 6 people at this time – but the information was being collected in Philadephia. So you have a centre collecting data from across the continent, but not able to push it out quickly enough…

And if you look at the people in these letters you see John Jay, and you see Edmund Jennings Randolph mentions most regularly. So, I have this elaborate database (The Early American Foreign Service Database – EAFSD) and lots of ways to visualise this… Which enables us to see connections, linkages, and places where different comparisons highlight different areas of interest. And this is one of the reasons I got into the Humanities. There are all these papers – usually for famous historical men – and they get digitised, also the enclosures… In a single file(!), parsing that with a partial typescript, you start to see patterns. You see not summaries of information being shared, not aggregation and analysis, but the letters being bundled up and sent off – like a repeater note. So, building up all of this stuff… Letters are objects, they have relationships to each others, they move across space and time. You look at the papers of John Adams, or of any political leader, and they are just in order of date sent… Requiring us to flip back and forth. Databases and networks allow us to follow those conversations, to understand new orders to read those letters in.

Now, I had a background in code before I was a graduate student. What I do now at Princeton (as Associate Director of the Center for Digital Humanities) is to work with librarians and students to build new projects. We use a lot of relational databases, and network analysis… And that means a student like one I have at the moment can have a fully described, fully structured data set on a vagrant machine that she can engage with, query, analysis, and convey to her examiners etc. Now this student was an excel junky but approaching the data as a database allows us to structure the data, to think about information, the nature of sources and citation practices, and also to get major demographic data on her group and the things she’s working on.

Another thing we do at Princeton is to work with libraries and with catalogue data – thinking about data in MARC, MODS, or METS record, and thinking about the extract and reformatting of that data to query and rethink that data. And we work with librarians on information retrieval, and how that could be translated to research – book history perhaps. Princeton University library brought the personal library of philosopher Jaques Derrida – close to 19,000 volumes (thought it was about 15,000 until they were unpacked), so two projects are happening simultaneously. One is at the Centre for Digital Humanities, looking at how Derrida marked up the texts, and then went on to use and cite in Of Grammatology. The other is with BibFrame – a Linked Open Data standard for library catalogues, and they are looking at books sent to Derrida, with dedications to him. Now there won’t be much overlap of those projects just now – Of Grammatology was his first book so those dedicated/gifted books to him. But we are building our databases for both projects as Linked Open Data, all being added a book at a time, so the hope is that we’ll be able to look at any relationships between the books that he owned and the way that he was using and being gifted items. And this is an experiment to explore those connections, and to expose that via library catalogue… But the library wants to catalogue all works, not just those with research interest. And it can be hard to connect research work, with depth and challenge, back to the catalogue but that’s what we are trying to do. And we want to be able to encourage more use and access to the works, without the library having to stand behind the work or analyse the work of a particular scholar.

So, you can take a data structure like this, then set up your system with appropriate constraints and affordances that need to be thought about as they will shape what you can and will do with your data later on. Continents have particular locations, boundaries, shape files. But you can’t mark out the boundaries for empires and states. The Western boundary at this time is a very contested thing indeed. In my system states are merely groups of locations, so that I can follow mercantile power, and think from a political viewpoint. But I wanted a tool with broader use hence that other data. Locations seem very safe and neutral but they really are not, they are complex and disputed. Now for that reason I wanted this tool – Project Quincy – to have others using it, but that hasn’t happened yet… Because this was very much created for my research and research question…It’s my own little Mind Palace for my needs… But I have heard from a researcher looking to catalogue those letters, and that would be very useful. Systems like this can have interesting afterlives, even if they don’t have the uptake we want Open Source Digital Humanities tools to have. The biggest impact of this project has been that I have the schema online. Some people do use the American Foreign Correspondents databases – I am one of the few places you can find this information, especially about consuls. But that schema being shared online have been helping others to make their own system… In that sense the more open documentation we can do, the better all of our projects could be.

I also created those diagrams that you were seeing – with DAVILA, a programme that creates these allows you to create easy to read, easy to follow, annotated, colour coded visuals. They are prettier than most database diagrams. I hope that when documentation is appealing and more transparent,  that that will get used more… That additional step to help people understand what you’ve made available for them… And you can use documentation to help teach someone how to make a project. So when my student was creating her schema, it was an example I could share or reference. Having something more designed was very helpful.


Q1) Can you say more about the Derrida project and that holy grail of hanging that other stuff on the catalogue record?

A1) So the BibFrame schema is not as flexible as you’d like, it’s based on MARC, but it’s Linked Open Data, it can be expressed in RDF or JSON… And that lets us link records up. And we are working in the same library so we can link up on people, locations, maybe also major terms, and on th eaccession id number too. We haven’t tried it yet but…

Q1) And how do you make the distinction between authoritative record and other data.

A1) Jill Benson(?) team are creating authoritative linked open data records for all of the catalogue. And we are creating Linked Open Data, we’ll put it in a relational database with an API and an endpoint to query to generate that data. Once we have something we’ll look at offering a Triple Store on an ongoing basis. So, basically it is two independent data structures growing side by side with an awareness of each other. You can connect via API but we are also hoping for a demo of the Derrida library in BibFrame in the next year or two. At least a couple of the books there will be annotated, so you can see data from under the catalogue.

Q1) What about the commentary or research outputs from that…

A1) So, once we have our data, we’ll make a link to the catalogue and pull in from the researcher system. The link back to the catalogue is the harder bit.

Q2) I had a suggestion for a geographic system you might be interested in called Pelagios… And I don’t know if you could feed into that – it maps historical locations, fictional locations etc.

A2) There is a historical location atlas held by Newbury so there are shapefiles. Last I looked at Pelagios it was concerned more with the ancient world.

Comment) Latest iteration of funding takes it to Medieval and Arabic… It’s getting closer to your period.

A2) One thing that I really like about Pelagios is that they have split locations from their name, which accommodates multiple names, multiple imaginings and understandings etc. It’s a really neat data model. My model is more of a hack together – so in mine “London” is at the centre of modern London… Doesn’t make much sense for London but I do similar for Paris, that probably makes more sense. So you could go in deeper… There was a time when I was really interested in where all of Jay’s London Correspondents were… That was what put me into thinking about networking analysis… 60 letters are within London alone. I thought about disambiguating it more… But I was more interested in the people. So I went down a Royal Mail in London 1794 rabbit hole… And that was interesting, thinking about letters as a unit of information… Diplomatic notes fix conversations into a piece of paper you can refer to later – capturing the information and decisions. They go back and forth… So the ways letters came and went across London – sometimes several per day, sometimes over a week within the city…. is really interesting… London was and is extremely complicated.

Q3) I was going to ask about different letters. Those letters in London sound more like memos than a letter. But the others being sent are more precarious, at more time delay… My background is classics so there you tend to see a single letter – and you’d commission someone like Cicero to write a letter to you to stick up somewhere – but these letters are part of a conversation… So what is the difference in these transatlantic letters?

A3) There are lots of letters. I treat letters capaciously… If there is a “to” or “from” it’s in. So there are diplomatic notes between John Jay and George Hammond – a minister not an ambassadors as the US didn’t warrant that. Hammond was bad at his job – he saw a war coming and therefore didn’t see value in negotiating. They exchange notes, forward conversations back and forth. My data set for my research was all the letters sent to Jay, not those sent by Jay. I wanted to see what information Jay had available. With Hammond he kept a copy of all his letters to Jay, as evidence for very petty disputes. The letters from the West Indies were from Nathanial Cabbot Dickinson, who was sent as an information collector for the US government. Jay was sent to Europe on the treaty…. So the kick off for Jay’s treaty is changes that sees food supplies to British West Indies being stopped. Hammond actually couldn’t find a ship to take evidence against admiralty courts… They had to go through Philadelphia, then through London. So that cluster of letters include older letters. Letters from the coast include complaints from Angry American consuls…. There are urgent cries for help from the US. There is every possible genre… One of the things I love about American history is that Jay needs all the information he can get. When you map letters – like the Republic of Letters project at Stanford – you have this issue of someone writing to their tailor, not just important political texts. But for diplomats all information matters… Now you could say that a letter to a tailor is important but you could also say you are looking to map the boundaries of intellectual history here… Now in my system I map duplicates sent transatlantically, as those really matter, not all arrived, etc. I don’t map duplicates within London, as that isn’t as notable and is more about after the fact archiving.

Q4) Did John Jay keep diaries that put this correspondance in context?

A4) He did keep diaries… I do have analysis of how John Quincy Adams wrote letters in his time. He created subject headings, he analysed them, he recreated a filing system and way of managing his letters – he’d docket his letters, noting date received. He was like a human database… Hence naming my database after him.

Q5) There are a couple of different types of a tool like this. There is your use and then there is reuse of the engineering. I have correspondance earlier than Jay’s, mainly centred on London… Could I download the system and input my own letters?

A5) Yes, if you go to eafsd.org you’ll find more information there and you can try out the system. The database is Project Quincy and that’s on GitHub (GPL 3.0) and you can fire it up in Django. It comes with a nice interface. And do get in touch and I’ll update you on the system etc. It runs in the Django framework, can use any database underneath it. And there may be a smaller tractable letter database running underneath it.

Comment) On BibFrame… We have a Library and Information Studies programme which we teach BibFrame as part of that. We set up a project with a teaching tool which is also on GitHub – its linked from my staff page.

A quick note as follow up:

If you have research software that you have created for your work, and which you are making available under open source license, then I would recommend looking at some of the dedicated metajournals that will help you raise awareness of your project and ensure it is well documented for others to reuse. I would particularly recommend the Journal of Open Research Software (which, for full disclosure, I sit on the Editorial Advisory Board for), or the Journal of Open Source Software (as recommended by the lovely Daniel S. Katz in response to my post).


 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>