Today I’m at the Cataloguing and Indexing Group Scotland event – their 7th Metadata & Web 2.0 event – Somewhere over the Rainbow: our metadata online, past, present & future. I’m blogging live so, as usual, all comments, corrections, additions, etc. are welcome.
Paul Cunnea, CIGS Chair is introducing the day noting that this is the 10th year of these events: we don’t have one every year but we thought we’d return to our Wizard of Oz theme.
On a practical note, Paul notes that if we have a fire alarm today we’d normally assemble outside St Giles Cathedral but as they are filming The Avengers today, we’ll be assembling elsewhere!
There is also a cupcake competition today – expect many baked goods to appear on the hashtag for the day #cigsweb2. The winner takes home a copy of Managing Metadata in Web-scale Discovery Systems / edited by Louise F Spiteri. London : Facet Publishing, 2016 (list price £55).
Engaging the crowd: old hands, modern minds. Evolving an on-line manuscript transcription project / Steve Rigden with Ines Byrne (not here today) (National Library of Scotland)
Ines has led the development of our crowdsourcing side. My role has been on the manuscripts side. Any transcription is about discovery. For the manuscripts team we have to prioritise digitisation so that we can deliver digital surrogates that enable access, and to open up access. Transcription hugely opens up texts but it is time consuming and that time may be better spent on other digitisation tasks.
OCR has issues but works relatively well for printed texts. Manuscripts are a different matter – handwriting, ink density, paper, all vary wildly. The REED(?) project is looking at what may be possible but until something better comes along we rely on human effort. Generally the manuscript team do not undertake manual transcription, but do so for special exhibitions or very high priority items. We also have the challenge that so much of our material is still under copyright so cannot be done remotely (but can be accessed on site). The expected user community generally can be expected to have the skill to read the manuscript – so a digital surrogate replicates that experience. That being said, new possibilities shape expectations. So we need to explore possibilities for transcription – and that’s where crowd sourcing comes in.
Crowd sourcing can resolve transcription, but issues with copyright and data protection still have to be resolved. It has taken time to select suitable candidates for transcription. In developing this transcription project we looked to other projects – like Transcribe Bentham which was highly specialised, through to projects with much broader audiences. We also looked at transcription undertaken for the John Murray Archive, aimed at non specialists.
The selection criteria we decided upon was for:
- Hands that are not too troublesome.
- Manuscripts that have not been re-worked excessively with scoring through, corrections and additions.
- Documents that are structurally simple – no tables or columns for example where more complex mark-up (tagging) would be required.
- Subject areas with broad appeal: genealogies, recipe book (in the old crafts of all kinds sense), mountaineering.
Based on our previous John Murray Archive work we also want the crowd to provide us with structure text, so that it can be easily used, by tagging the text. That’s an approach that is borrowed from Transcribe Bentham, but we want our community to be self-correcting rather than doing QA of everything going through. If something is marked as finalised and completed, it will be released with the tool to a wider public – otherwise it is only available within the tool.
The approach could be summed up as keep it simple – and that requires feedback to ensure it really is simple (something we did through a survey). We did user testing on our tool, it particularly confirmed that users just want to go in, use it, and make it intuitive – that’s a problem with transcription and mark up so there are challenges in making that usable. We have a great team who are creative and have come up with solutions for us… But meanwhile other project have emerged. If the REED project is successful in getting machines to read manuscripts then perhaps these tools will become redundant. Right now there is nothing out there or in scope for transcribing manuscripts at scale.
So, lets take a look at Transcribe NLS…
You have to login to use the system. That’s mainly to help restrict the appeal to potential malicious or erroneous data. Once you log into the tool you can browse manuscripts, you can also filter by the completeness of the transcription, the grade of the transcription – we ummed and ahhed about including that but we though it was important to include.
Once you pick a text you click the button to begin transcribing – you can enter text, special characters, etc. You can indicate if text is above/below the line. You can mark up where the figure is. You can tag whether the text is not in English. You can mark up gaps. You can mark that an area is a table. And you can also insert special characters. It’s all quite straight forward.
Q1) Do you pick the transcribers, or do they pick you?
A1) Anyone can take part but they have to sign up. And they can indicate a query – which comes to our team. We do want to engage with people… As the project evolves we are looking at the resources required to monitor the tool.
Q2) It’s interesting what you were saying about copyright…
A2) The issues of copyright here is about sharing off site. A lot of our manuscripts are unpublished. We use exceptions such as the 1956 Copyright Act for old works whose authors had died. The selection process has been difficult, working out what can go in there. We’ve also cheated a wee bit
Q3) What has the uptake of this been like?
A3) The tool is not yet live. We thin it will build quite quickly – people like a challenge. Transcription is quite addictive.
Q4) Are there enough people with palaeography skills?
A4) I think that most of the content is C19th, where handwriting is the main challenge. For much older materials we’d hit that concern and would need to think about how best to do that.
Q5) You are creating these documents that people are reading. What is your plan for archiving these.
A5) We do have a colleague considering and looking at digital preservation – longer term storage being more the challenge. As part of normal digital preservation scheme.
Q6) Are you going for a Project Gutenberg model? Or have you spoken to them?
A6) It’s all very localised right now, just seeing what happens and what uptake looks like.
Q7) How will this move back into the catalogue?
A7) Totally manual for now. It has been the source of discussion. There was discussion of pushing things through automatically once transcribed to a particular level but we are quite cautious and we want to see what the results start to look like.
Q8) What about tagging with TEI? Is this tool a subset of that?
A8) There was a John Murray Archive, including mark up and tagging. There was a handbook for that. TEI is huge but there is also TEI Light – the JMA used a subset of the latter. I would say this approach – that subset of TEI Light – is essentially TEI Very Light.
Q9) Have other places used similar approaches?
A9) TRanscribe Bentham is similar in terms of tagging. The University of Iowa Civil War Archive has also had a similar transcription and tagging approach.
Q10) The metadata behind this – how significant is that work?
A10) We have basic metadata for these. We have items in our digital object database and simple metadata goes in there – we don’t replicate the catalogue record but ensure it is identifiable, log date of creation, etc. And this transcription tool is intentionally very basic at th emoment.
Coming up later…
Can web archiving the Olympics be an international team effort? Running the Rio Olympics and Paralympics project / Helena Byrne (British Library)
I am based at the UK Web Archive, which is based at the British Library. The British Library is one of the six legal deposit libraries. The BL are also a member of the International Internet Preservation Consortium – as are the National Library of Scotland. The Content Development Group works on any project with international relevance and a number of interested organisations.
Last year I was lucky enough to be lead curator on the Olympics 2016 Web Archiving project. We wanted to get a good range of content. Historically our archives for Olympics have been about the events and official information only. This time we wanted the wider debate, controversy, fandom, and the “e-Olympics”.
We received a lot of nominations for sites. This is one of the biggest we have been involved in. There was 18 IIPC members involved in the project, but nominations also came from wider nominations. We think this will be a really good resource for those researching the events in Rio. We had material in 34 languages in total. English was the top language collected – reflecting IIPC memberships to some extent. In terms of what we collected it included Official IOC materials – but few as we have a separate archive across Games for these. But subjects included athletes, teams, gender, doping, etc. There were a large number of website types submitted. Not all material nominated were collected – some incomplete metadata, unsuccessful crawls, duplicate nominations, and the web is quite fragile still and some links were already dead when we reached them.
There were four people involved here, myself, my line manager, the two IIPC chairs, and the IIPC communications person (also based at BL). We designed a collection strategy to build engagement as well as content. The Olympics is something with very wide appeal and lots of media coverage around the political and Zika situation so we did widen the scope of collection.
Thinking about our user we had collaborative tools that worked with contributors context: Webex, Google Drive and Maps, and Slack (free for many contexts) was really useful. Chapter 8 in “Altmetrics” is great for alternatives to Google – it is important to have those as it’s simply not accessible in some locations.
We used mostly Google Sheets for IIPC member nominations – 15 fields, 6 of which were obligatory. For non members we used a (simplified) Google Form – shared through social media. Some non IIPC member organisations used this approach – for instance a librarian in Hawaii submitted lots of pacific islands content.
In terms of communicating the strategy we developed instructional videos (with free tools – Screencastomatic and Windows Movie Maker) with text and audio commentary, print summaries, emails, and public blog posts. Resources were shared via Google Drive so that IIPC members could download and redistributed.
No matter whether IIPC member or through the nomination form, we wanted six key fields:
- URL – free form
- Event – drop down option
- Title – free form (and English translation option if relevant)
- Olympic/Paralympic sport – drop down option
- Country – free form
- Contributing organisation – free form (for admin rather than archive purposes)
There are no international standards for cataloguing web archive data. OCLC have a working group looking at this just now – they are due to report this year. One issue that has been raised is the context of those doing the cataloguing – cataloguing versus archiving.
Communications are essential on a regular basis – there was quite a long window of nomination and collection across the summer. We had several pre-event crawl dates, then also dates during and after both the Olympics and the Paralympics. I would remind folk about this, and provide updates on that, on what was collected, to share that map of content collected. We also blogged the projects to engage and promote what we were doing. The Participants enjoyed the updates – it helped them justify time spent on the project to their own managers and organisations.
There were some issues along the way…
- The trailing backslash is required for the crawler – so if there is no trailing backslash the crawler takes everything it can find – attempting all of BBC or Twitter is a problem.
- Not tracking the date of nomination – e.g. organisations adding to the spreadsheet without updating date of nomination – that was essential to avoid duplication so that’s a tip for Google forms.
- Some people did not fill in all of the six mandatory fields (or didn’t fill them in completely.
- Country name vs Olympic team name. That is unexpectedly complex. Team GB includes England, Scotland, Wales and Northern Ireland… But Northern Ireland can also compete in Ireland. Palestine isn’t recognised as a country in all places, but it is in the Olympics. And there was a Refugee Team as well – with no country to tie to. Similar issues of complexity came out of organisation names – there are lots of ways to write the name of the British Library for instance.
We promoted the project with four blog posts sharing key updates and news. We had limited direct contact – mostly through email and Slack/messaging. We also had a unique hashtag for the collection #Rio2016WA – not catchy but avoids confusion with Wario (Nintendo game) – and Twitter chat, a small but international chat.
Ethically we only crawl public sites but the IIPC also have a take down policy so that anyone can request their site be removed.
Conclusions… Be aware of any cultural differences with collaborators. Know who your users are. Have a clear project plan, available in different mediums. And communicate regularly – to keep enthusiasm going. And, most importantly, don’t assume anything!
Finally… Web Archiving Week is in London in June, 12th-16th 2017. There is a “Datathon” but the deadline is Friday! Find out more at http://netpreserve.org/general-assembly/2017/overview. And you can find out more about the UK Web Archive via our website and blog: webarchive.org.uk/blog. You can also follow us and the IIPC on Twitter.
Explore the Olympics archive at: https://archive-it.org/collections/7235
Q1) For British Library etc… Did you use a controlled vocabulary
A1) No but we probably will next time. There were suggestions/autocomplete. Similarly for countries. For Northern Irish sites I had to put them in as Irish and Team GB at the same time.
Q2) Any interest from researchers yet? And/or any connection to those undertaking research – I know internet researchers will have been collecting tweets…
A2) Colleagues in Rio identified a PhD project researching the tweets – very dynamic content so hard to capture. Not huge amount of work yet. I want to look at the research projects that took place after the London 2012 Olympics – to see if the sites are still available.
Q3) Anything you were unable to collect?
A3) In some cases articles are only open for short periods of time – we’d do more regular crawls of those nominations next time I think.
Q4) What about Zika content?
A4) We didn’t have a tag for Zika, but we did have one for corruption, doping, etc. Lots of corruption post event after the chair of the Irish Olympic Committee was arrested!
Statistical Accounts of Scotland / Vivienne Mayo (EDINA)
I’m based at EDINA and we run various digital services and projects, primarily for the education sector. Today I’m going to talk about the Statistical Accounts of Scotland. These are a hugely rich and valuable collection of statistical data that span both the agricultural and industrial revolutions in Scotland. The online service launched in 2001 but was thoroughly refreshed and relaunched next year.
There are two accounts. The first set was created (1791-1799) by Sir John Sinclair of Ulbster. He had a real zeal for agricultural data. There had been attempts to collect data in the 16th and 17th centuries. So Sir John set about a plan to get every minister in Scotland to collect data on their parishes. He was inspired by German surveys but also had his own ideas for his project:
“an inquiry into the state of a country, for the purpose of ascertaining the quantum of happiness enjoyed by its inhabitants, and the means of its future improvement”
He also used the word “Statistics” as a kind of novel, interesting term – it wasn’t in wide use. And the statistics in the accounts are more qualitative then the quantitative data we associate with the word today.
Sir John sent minister 160 questions, then another 6, then another set a year late so that there were 171 in total. So you can imagine how delighted they were to receive that. And the questions (you can access them all in the service) were hard to answer – asking about the wellbeing of parishioners, how their circumstances could be ameliorated… But ministers were paid by the landowners who employed their parishioners so that data also has to be understood in context. There were also more factual questions on crops, pricing, etc.
It took a long time – 8 years – to collect the data. But it was a major achievement. And these accounts were part of a “pyramid” of data for the agricultural reports. He had country reports, but also higher level reports. This was at the time of the Enlightenment and the idea was that with this data you could improve the condition of life.
Even though the ministers did complete their returns, for some it was struggle – and certainly hard to be accurate. Population tables were hard to get correct, especially in the context of scepticism that this data might be used to collect taxes or other non-beneficial purposes.
The Old Account was a real success. And the Church of Scotland commissioned a New Account from 1834-45 as a follow up to that set of accounts.
The online service was part of one of the biggest digitisation projects in Scotland in the late 1990s, with the accounts going live in 2001. But much had changed since then in terms of functionality that any user might expect. In this new updated service we have added the ability to tag, to annotate, to save… Transcriptions have been improved, the interface has been improved. We have also made it easier to find associated resources – selected by our editorial board drawn from libraries, archives, specialists on this data.
When Sir John published the Old Accounts he printed them in volumes as they were received – that makes it difficult to browse and explore those. And there can be multiple accounts for the same parish. So we have added a way to browse each of the 21 volumes so that it is easier to find what you need. Place is key for our users and we wanted to make the service more accessible. Page numbers were an issue too – our engineers provide numbering of sections – so if you look for Portpatrick – you can find all of the sections and volumes where that area occurs. Typically sections are a parish report, but it can be other types of content too – title pages, etc.
Each section is associated with a Parish – which is part of a county. And there may be images (illustrations such as coal seams, elevations of notable buildings in the parish, etc.). Each section is also associated with pages – including images of the pages – as well as transcripts and indexed data used to enable searching.
So, if I search for tea drinking… Described as a moral menace in some of the earlier accounts! When you run a search like this identifies associated sections, the related resources, and associated words – those words that often occur with the search term. For tea-drinking “twopenny” is often associated… Following that thread I found a county of forfar from 1793… And this turns out to be the slighly alarming sounding home brew…
“They make their own malt, and brew it into that kind of drink called Two-penny which, till debased in consequence of multiplied taxes, was long the favourite liquor of all ranks of people in Dundee.”
When you do look at a page like this you can view the transcription – which tends to be easier to read than the scanned pages with their flourishes and “f” instead of “s”. You can tag, annotate, and share the pages. There are lots of ways to explore and engage with the text.
There are lots of options to search the service – simple search, advanced search, and new interactive maps of areas and parishes – these use historic maps from the NLS collections and are brand new to the service.
With all these new features we’d love to hear your feedback when you do take a look at the service – do let us know how you find it.
I wanted to show an example of change and illustration here. In the old Accounts of Dumfries (Vol 5, p. 119) talks about the positive improvements to housing and the idea of “improvement” as a very positive thing. We also see an illustration from the New Accounts of old habitations and new modern house of the small tenants – but that was from a Parish owned by the Duke of Sutherland who had a notorious reputation as a brutal landlord for clearing land and murdering tenants to make these “improvements”. So, again one has to understand the context of this content.
Looking at Dumfries in the Old Accounts things looked good, some receiving poor support. The increase in industry means that by the New Accounts the population has substantially grown, as has poverty. The minister also comments on the impact of the three inns in town, the increase in poaching. Transitory population can also effect health – there is a vivid account of a cholera outbreak from 15th Sept – 27th Nov in 1832. That seems relatively recent but at that point they thought transmission was through the air, they didn’t realise it was water born until some time later.
Some accounts, like that one, are highly descriptive. But many are briefer or less richly engaging. Deaths are often carefully captured. The minister for Dumfries put together a whole table of deaths – causes of which include, surprisingly, teething. And there are also records of healthcare and healthcare costs – including one individual paying for several thousand children to be inoculated against smallpox.
Looking at the schools near us here in central Edinburgh there was free education for some poor children. But schooling mostly wasn’t free. The costs for one child for reading and writing, if you were a farm labourer, it would be a 12th of your salary. To climb the social ladder with e.g. French, Latin, etc. the teaching was far more expensive. And indeed there is a chilling quote in the New Accounts from Cadder, County of Lanark (Vol 8, P. 481) spoke of attitudes that education was corrupting for the poor. This was before education became mandatory (in 1834).
There is also some colourful stuff in the Accounts. There is a lot of witchcraft, local stories, and folk stories. One of my colleagues found a lovely story about a tradition that the last person buried in one area “manned the gates” until the next one arrived. Then one day two people died and there were fisticuffs!
I was looking for something else entirely and, in Fife, a story of a girl who set sale from Greenock, was captured by pirates, was sold into a Hareem, and became a princess in Morroco – there’s a book called The Fourth Queen based on that story.
There is an anvil known as the “Reformation Cloth” – pre-reformation there was a blacksmith thought the catholic priest was having an affair with his wife… And took his revenge by attacking the offending part of the minister on that anvil. I suspect that there may have been some ministerial stuff at play here too – the parish minister notes that “no other catholic minister replaced him” – but it is certainly colourful.
And that’s all I wanted to share today. Hopefully I’ve peaked your interest. You can browse the accounts for free and then some of the richer features are part of our subscription service. Explore the Statistical Accounts of Scotland at: http://stataccscot.edina.ac.uk/. You can also follow us on Twitter, Facebook, etc.
Q1) SOLR indexing and subject headings – can you say more?
A1) They used subject headings from original transcriptions. And then there was some additions made based on those.
Comment) The Accounts are also great for Wikipedia editing! I found references to Christian Shaw, a thread pioneer I was looking to build a page about. In the Accounts as she was mentioned in a witchcraft trial that is included there. It can be a really useful way to find details that aren’t documented elsewhere.
Q2) You said it was free to browse – how about those related resources?
A2) Those related resources are part of the subscription services.
Q3) Any references to sports and leisure?
A3) Definitely to festivals, competitions, events etc. As well as some regular activities in the parish.
Beyond bibliographic description: emotional metadata on YouTube / Diane Pennington (University of Strathclyde)
I want to start with this picture of a dog in a dress…. How do you feel when you see this picture? How do you think she was feeling? [people in the room guess the pup might be embarrassed].
So, this is Tina, she’s my dog. She’s wearing a dress we had made for her when we got married… And when she wears it she always looks so happy… And people, when I shared it on social media, also thought she looked happy. And that got me curious about emotion and emotional responses… That isn’t accommodated in bibliographic metadata. As a community we need to think about how this material makes us feel, how else can we describe things? When you search for music online mood is something you might want to see… But usually it’s recommendations like “this band is similar to…”. My favourite band is U2 and I get recommended Coldplay… And that makes me mad, they aren’t similar!
So, when we teach and practice ILS, we think about information as text that sits in a database, waiting for a user to write a query and get a match. The problem is that there are so many other ways that people also want to look for information – not just bibliographic information, full text, but in other areas too, like bodily – what pain means (Yates 2015); photographs, videos, music (Rasmussen Neal, 2012) – where the full text doesn’t include the search terms or keywords inherantly; “matter and energy” (Bates, 2006) – that there is information everywhere and the need to think more broadly to describe this.
I’ve been working in this area for a while and I started looking at Flickr, at pictures that are tagged “happy”. Those tend to include smiling people, holiday photos, sunny days, babies, cute animals. Relevance rankings showed “happy” more often, people engaged and liked more with happy photos… But music is different. We often want music that matches our mood… There were differences to tags and understanding music… Heavy metal sounds angy, slower or minor key music sounds sad…
So, the work I’m talking about you can also find in an article published last year.
My work was based on the U2 song, Song for Someone. And there are over 150 fan videos created for this song.. And if I show you this one (by Dimas Fletcher) you’ll see it is high production values… The song was written by Bono for his wife – they’ve been together since they were teenagers, and it’s very slow and emotional, and reminisces about being together. So this video is a really different interpretation.
Background to this work, and theoretical framework for it, includes:
- “Basic emotions” from cognition, psychology, music therapy (Ekman, 1992)
- Emotional Information Retrieval
- omains of fandom and aca-fandom (Stein & Busse, 2009; Bennett, 2014)
- Online participatory culture, such as writing fan fiction or making cover versions of videos for loves songs (Jenkins, 2013)
- U2 acadeic study – and u2conference.com
- Intertexuality as a practic in online participatory culture (Varmacelli 2013?)
So I wanted to do a discourse analysis (Budd & Raber 1996, Iedema 2003) applied to intertextuality. And I wanted to analyse the emotional information conveyed in 150 YouTUbe cover videos of U2’s Song for Someone. And also a quantitative view of views, comments, likes and dislikes – indicating response to them.
The producers of these videos created lots of different types of videos. Some were cover versions. Some were original versions of the song with new visual content. Some were tutorials on how to play the song. And then there were videos exhibiting really deep personal connections with the song.
So the cover versions are often very emotional – a comment says that. That emotion level is metadata. There are videos in context – background details, kids dancing, etc. But then some are filmed out of a plane window. The tutorials include people, some annotated “kareoke piano” tutorials…
Intertextuality… You need to understand your context. So one of the videos shows a guy in a yellow cape who is reaching and touching the Achtung Baby album cover before starting to sing. In another video a person is in the dark, in shadow… But here Song for Someone lyrics and title on the wall, but then playing and mashing up with another song. In another video the producer and his friend try to look like U2.
Then we have the producers comments and descriptions that add greatly to understanding those videos. Responses from consumers – more likes than dislikes; almost all positive comments – this is very different from some Justin Bieber YouTube work I did a while back. You see comments on the quality of the cover, on the emotion of the song.
The discussion is an expression of emotion. The producers show tenderness, facial expressions, surrounds, music elements. And you see social construction here…
And we can link this to something like FRBR… U2 as authoritative version, and FRBR relationships… Is there a way we can show the relationship between Songs of Innocence by William Blake, Songs of Innocence as an album, cover versions, etc.
As we move forward there is so much more we need to do when we design systems for description that accommodate more than just keywords/bibliographic records. There is no full text inherent in a video or other non-textual document – an indexing problem. And we need to account for not only emotion, but also socially constructed and individually experienced emotional responses to items. Ultimate goal – help people to find things in meaningful ways to even potentially be useful in therapies (Hanser 2010).
Q1) Comment more than a question… I work with film materials in the archive, and we struggle to bring that alive, but you do have some response from the cataloguer and their reactions – and reactions at the access centre – and that could be part of the record.
A1) That’s part of archives – do we need it in every case… Some of the stuff I study gets taken down… Do we need to archive (some of) them?
Q1) Also a danger that you lose content because catalogue records are not exciting enough… Often stuff has to go on YouTube to get seen and accessed – but then you lose that additional metadata…
A1) We do need to go where our audience is… Maybe we do need to be on YouTube more… And maybe we can use Linked Data to make things more findable. Catalogue records rarely come up high enough in search results…
Q2) This is a really subjective way to mark something up… So, for instance, Songs of Innocence was imposed on my iPhone and I respond quite negatively to that… How do you catalogue emotion with that much subjectivity at play?
A2) This is where we have happy songs versus individual perspectives… Most people think The Beatles’ Here Comes the Sun is mostly seen is happy… But if someone broke up with you during it… How do we build into algorithms to tune into those different opinions..
Q3) How do producers choose to tag things – the lyrics, the tune, their reaction… But you kind of answered that… I mean people have Every Breath You Take by the Police as their first song at a wedding but it’s about a jilted lover stalking his ex…
A3) We need to think about how we provide access, and how we can move forward with this… My first job was in a record store and people would come in and ask “can I buy this record that was on the radio at about 3pm” and that was all they could offer… We need those facets, those emotions…
Q4) I had the experience of seeing quite a neutral painting but then with more context that painting meant something else entirely… So how do we account for that, that issue of context and understanding of the same songs in different ways…
A4) There isn’t one good solution to that but part of the web 2.0 approach is about giving space for the collective and the individual perspective.
Q5) How about musical language?
A5) Yeah.. I took an elective on musical librarianship. My tutor there showed me the tetrachords in Dido & Aeneid as a good example of an opera that people respond in very particular ways. There are musical styles that map to particular emotions.
Our 5Rights: digital rights of children and young people / Dev Kornish, Dan Dickson, Bethany Wilson (5Rights Youth Commission)
We are from Young Scot and Young Scot
1 in 5 young people have missed food or sleep because of the internet.
How many unemployed young people struggle with entering work due to the lack of digital skills? It’s 1 in 10 who struggle with CVs, online applications, and jobs requiring digital skills.
How young do people start building their digital footprint? Before birth – an EU study found that 80% of mothers had shared images, including scans, of their children.
Bethany: We are passionate about our rights and how our rights can be maintained in a digital world. When it comes to protecting young people online it can be scary… But that doesn’t mean we shouldn’t use the internet or technology, when used critically The 5Rights campaign aims to do ensure we have that understanding.
Dan: The UNCRC outlines rights and these are: the right to remove; the right to know – who has your data and what they are doing with it; the right to safety and support; the right to informed and conscious use – we should be able to opt out or remove ourselves if we want to; right to digital literacy – to use and to create.
Bethany: Under the right to remove, we do sometimes post things we shouldn’t but we should be able to remove things if we want to. In terms of the right to know – we don’t read the terms and conditions but we have the right to be informed, we need support. The right to safety and support requires respect – dismissing our online life can make us not want to talk about it openly with you. If you speak to us openly and individually then we will appreciate your support but restrictions cannot be too restrictive. Technology is designed to be addictive and that’s a reality we need to engage with. Technology is a part of most aspects of our lives, teaching and curriculum should reflect that. It’s not just about coding, it’s about finding information, and to understand what is reliable, what sources we can trust. And finally you need to listen to us, to our needs, to be able to support us.
And a question for us: What challenges have you encountered when supporting young people online? [a good question]
And a second question: What can you do in your work to realise young people’s rights in the digital world?
Q1) What digital literacy is being taught in schools right now?
A1) It’s school to school, depends on the educational authority. Education Scotland have it as a priority but only over the last year… It depends…
Q2) My kid’s 5 and she has library cards…
Comment) The perception is that kids are experts by default
A2 – Dan) That’s not the case but there is that perception of “digital natives” knowing everything. And that isn’t the case…
Dan: Do you want to share what you’ve been discussing?
Comment: It’s not just an age thing… Some love technology, some hate it… But it’s hard to be totally safe online… How do you protect people from that…
Dan: It is incredibly difficult, especially in education.
Comment [me]: There is a real challenge when the internet is filtered and restricted – it is hard to teach real world information literacy and digital literacy when you are doing that in an artificial school set up. That was something that came up in the Royal Society of Edinburgh Digital Participation Inquiry I was involved in a few years ago. I also wanted to add that we have a new MOOC on Digital Footprints that is particularly aimed at those leaving school/coming into university.
Bethany: We really want that deletion when we use our right to remove to be proper deleted. We really want to know where our data is held. And we want everyone to have access to quality information online and offline. And we want to right to disengage when we want to. And we want digital literacy to be about more than just coding, but also what we do and can do online.
Dan: We invite you all to join our 5Rights Coalition to show your support and engagement with this work. We are now in the final stages of this work and will be publishing our report soon. We’ve spoken to Google, Facebook, Education Scotland, mental health organisations, etc. We hope our report will provide great guidance for implementing the 5Rights.
You can find out more and contact us: 5Rights@young.scot, #5RightsYC, http://young.scot/5rights.
Q1) Has your organisation written any guidance for librarians in putting these rights into action?
A1) Not yet but that report should include some of that guidance.
Playing with metadata / Gavin Willshaw and Scott Renton (University of Edinburgh)
Gavin: Scott and I will be talking about our metadata games project which we’ve been working on for the last few years. My current focus is on PhD digitisation but I’m also involved in this work. I’ll give an overview, what we’ve learned… And then Scott will give more of an idea of the technical side of things.
A few years ago we had 2 full time photographers working on high quality digital images. Now there are three photographers, 5 scanning assistants, and several specialists all working in digitisation. And that means we have a lot more digital content. A few years ago we launched collections.ed.ac.uk which is the one stop shop into our digital collections. You can access the images at: http://images.is.ed.ac.uk/. We have around 30k images, and most are CC BY licenced at high resolution.
Looking at the individual images we tend to have really good information of the volume the image comes from, but prior to this project we had little information on what was actually in the image. That made them hard to find. We didn’t really have anyone to catalogue this. A lot of these images are as much as 10 years old – for projects but not neccassarily intended to go online. So, we decided to create this game to improve the description of our collections…
The game has a really retro theme – we didn’t want to spend too long on the design side of things, just keep it simple. And the game is open to everyone.
So, stage 1: tag. You harvest initial tags, it’s an open text box, there is no quality review, and there are points for tags entered. We do have some safety measures to avoid swear or stop words.
Stage 2: vote. You vote on the quality of others’ tags. It’s a closed system – good/bad/don’t know. That filters out any initial gobbldegook. You get points…
The tags are QAed and imported into our image management system. We make a distinction between formal metadata and crowdsourced tags. We show that on the record and include a link to the tool – so others can go and play.
We don’t see crowdsourcing as being just about free labour, but about communities of people with an interest and knowledge. We see it as a way to engage and connect with people beyond the usual groups – members of the public, educators, anyone really. People playing the game range from 7 to 70’s and we are interest to have the widest audience possible. And obviously the more people use the system, the more tags and participation we get. We also get feedback for improvements – some features in the game came from feedback. In theory it frees up staff time, but it takes time to run. But it lets us reach languages, collections, special knowledge that may not be in our team.
To engage our communities we took the games on tour across our sites. We’ve also brought the activity into other events – Innovative Learning Week/Festival of Creative Learning; Ada Lovelace Day; exhibitions – e.g. the Where’s Dolly game that coincided with the Towards Dolly exhibition. Those events are vital to get interest – it doesn’t work to expect people to just find it themselves.
In terms of motivation people like to do something good, some like to share their skills, and some just enjoy it because it is fun and a wee bit competitive. We’ve had a few (small) prizes… We also display real time high scores at events which gets people in competitive mode.
This also fits into an emerging culture of play in Library and Information Services… Looking at play in learning – it being ok to try things whether or not they succeed. These have included Board Game Jam sessions using images from the collections, learning about copyright and IP in a fun context. Ada Lovelace day I’ve mentioned – designing your own Raspberry Pi case out of LEGO, Making music… And also Wikipedia Editathons – also fun events.
There is also an organisatoin called Tiltfactor who have their own metadata games looking at tagging and gaming. They have Zen Tag – like ours. But also Nextag for video and audio. And also Guess What! a multiplier game of description. We put about 2000 images into the metadatagames platform Tiltfactor run and got huge numbers of tags quickly. They are at quite a different scale.
We’ve also experimented with Lady Grange’s correspondence in the Zooniverse platform, where you have to underline or indicate names and titles etc.
We’ve also put some of our images into Crowdcrafting to see if we can learn more about the content of images.
There are Pros and Cons here…
- Hosted service
- Easy to create an account
- Easy to set up and play
- Range of options – not just tagging
- Easy to load in images from Dropbox/Flickr
- Some limitations of what you can do
- Technical expertise needed for best value – especially in platforms like Crowdcrafting.
What we’ve learned so far is that it is difficult to create engaging platform but combining with events and activities – with target theme and collections – work well. Incentives and prizes help. Considerable staff time is needed. And crowdsourced tags are a compliment rather than an alternative to the official record.
Scott: So I’ll give the more technical side of what we’ve done. Why we needed them, how we built them, how we got on, and what we’ve learned.
I’ve been hacking away at workflows for a good 7 years. We have a reader who sees something they want, and they request the photograph of the page. They don’t provide much information – just about what is needed. These make for skeleton records – and we now have about 30k of these. It also used to be the case that buying a high end piece of kit can be easier to buy in for a project than a low level cataloguer… That means we end up with data being copied and pasted in by photographers rather than good records.
We have all these skeletons… But we need some meat on our bones… If we take an image from the Incunabula we want to know that there’s a skeleton on a horse with a scyth. Now the image platform we have does let us annotate an image – but it’s hidden away and hard to use. We needed something better and easier. That’s where we came up with an initial front end. When I came in it was a module for us to use. It was Gavin that said “hey, this should be a game”. So the nostalgic computer games thing is weirdly appealing (like the Google Maps Pacman Aprils Fool!). So it’s super simple, you put in a few words…
And it is truly lo-fi. It’s LAMP (Linux, Apache, MySQL, PHP) – not cool! Front end design retrofit. Authentication added to let students and staff login. In terms of design decisions we have a moderation module, we have a voting module, we have a scoreboard, we have stars for high contributors. And now more complex games: set no of items, clock, featured items, and Easter Eggs within the game. For instance in the Dolly the Sheep game we hid a few images with hideous comic sans that you could stumble upon if you tagged enough images!
Where we do have moderation, voting module, thresholds, demarcation… Tiltfactor told us we’re the only library putting data back in from the crowd to our system – people are really nervous about this but we demarcate it really carefully.
We now have a codebase we can clone. We skin it up differently for particular events or exhibitions – like Dolly – but it’s all the same idea with different design and collections. This all connects up through (authenticated) APIs back into the image management system (Luna).
So, how have we gotten on?
- 283 users
- 34070 tags in system
- 15616 tags from our game
- 18454 tags from Tiltfactor metadata games pushed in
- 6212 tags pushed back into our system – that’s because of backlog in the moderation (upvotes may be good enough).
So, what next? Well we have MSc projects coming up. We are having a revamp with an intern signed up for the summer – responsiveness, links to social media, more gamification, more incentives, authentication for non UoE users, etc.
And also we are excited about IIIF – about beautification of websites with embedded viewers, streamlining (thumbnails through URL; photoshopping through URL etc) and annotations. You can do deep zoom into images without having to link out to do that with an image.
We also have the Polyglot Project – coming soon – which is a paleography project for manuscripts in our collections of any age, in any language. We asked an intern to find a transcription and translation module using IIIF. She’s come up with something fantastic… Ways to draw around text, for users to add in annotations, to discuss annotations, etc. She’s got 50-60 keyboards so almost all languages supported. Not sure how to bring back into core systems but really excited about this.
That’s basically where we’ve gotten to. And if you want to try the games, come and have a play.
Q1) That example you showed for IIIF tagging has words written in widely varied spellings… You wouldn’t key it in as written in the document.
A1 – Scott) We do have a project looking at this. We have a girl looking for dictionaries to find variance and different spellings.
A1 – Gavin) There are projects like Transcribe Bentham who will have faced that issue…
Comment – Paul C) It’s a common issue… Methods like fuzzy searching help with that…
Q2) I’m quite interested about how you identify parts of images, and how you feed that back to the catalogue?
A2 – Scott) Right now I think the scope of the project is… Well it will be interesting to see how best to feed into catalogue records. Still to be addressed.
Q3 – Paul C) You built this in-house… How open is it? Can others use it?
A3 – Gavin) It is using Luna image management system…
A3 – Scott) It’s based on Luna for derivatives and data. It’s on Github and it is open. The website is open to everyone. You login through EASE – you can join as an “EASE Friend” if you aren’t part of the University. Others can use the code if they want it…
And finally it was me up to present…
Managing your Digital Footprint : Taking control of the metadata and tracks and traces that define us online / Nicola Osborne (EDINA)
Obviously I didn’t take notes on my session, but you can explore the slides below:
Look out for a new blogpost very soon on some of the background to our new Digital Footprint MOOC
, which launched on Monday 3rd April. You can join the course now, or sign up to join the next run of the course next month, here: https://goo.gl/jgHLQs
And with that the event drew to a close with thank you’s to all of the organisers, speakers, and attended!