Jun 142017
 

Following on from Day One of IIPC/RESAW I’m at the British Library for a connected Web Archiving Week 2017 event: Digital Conversations @BL, Web Archives: truth, lies and politics in the 21st century. This is a panel session chaired by Elaine Glaser (EG) with Jane Winters (JW), Valerie Schafer (VS), Jefferson Bailey (JB) and Andrew Jackson (AJ). 

As usual, this is a liveblog so corrections, additions, etc. are welcomed. 

EG: Really excited to be chairing this session. I’ll let everyone speak for a few minutes, then ask some questions, then open it out…

JB: I thought I’d talk a bit about our archiving strategy at Internet Archive. We don’t archive the whole of the internet, but we aim to collect a lot of it. The approach is multi-pronged: to take entire web domains in shallow but broad strategy; to work with other libraries and archives to focus on particular subjects or areas or collections; and then to work with researchers who are mining or scraping the web, but not neccassarily having preservation strategies. So, when we talk about political archiving or web archiving, it’s about getting as much as possible, with different volumes and frequencies. I think we know we can’t collect everything but important things frequently, less important things less frequently. And we work with national governments, with national libraries…

The other thing I wanted to raise in

T.R. Shellenberg who was an important archivist at the National Archive in the US. He had an idea about archival strategies: that there is a primary documentation strategy, and a secondary straetgy. The primary for a government and agencies to do for their own use, the secondary for futur euse in unknown ways… And including documentary and evidencey material (the latter being how and why things are done). Those evidencery elements becomes much more meaningful on the web, that has eerged and become more meaningful in the context of our current political environment.

AJ: My role is to build a Web Archive for the United Kingdom. So I want to ask a question that comes out of this… “Can a web archive lie?”. Even putting to one side that it isn’t possible to archive the whole web.. There is confusion because we can’t get every version of everything we capture… Then there are biases from our work. We choose all UK sites, but some are captured more than others… And our team isn’t as diverse as it could be. And what we collect is also constrained by technology capability. And we are limited by time issues… We don’t normally know when material is created… The crawler often finds things only when they become popular… So the academic paper is picked up after a BBC News item – they are out of order. We would like to use more structured data, such as Twitter which has clear publication date…

But can the archive lie? Well material is much easier than print to make an untraceable change. As digital is increasingly predominant we need to be aware that our archive could he hacked… So we have to protect for that, evidence that we haven’t been hacked… And we have to build systems that are secure and can maintain that trust. Libraries will have to take care of each other.

JW: The Oxford Dictionary word of the year in 2016 was “post truth” whilst the Australian dictionary went for “Fake News”. Fake News for them is either disinformation on websites for political purposes, or commercial benefit. Mirrium Webster went for “surreal” – their most searched for work. It feels like we live in very strange times… There aren’t calls for resignation where there once were… Hasn’t it always been thus though… ? For all the good citizens who point out the errors of a fake image circulated on Twitter, for many the truth never catches the lie. Fakes, lies and forgeries have helped change human history…

But modern fake news is different to that which existed before. Firstly there is the speed of fake news… Mainstream media only counteracts or addresses this. Some newspapers and websites do public corrections, but that isn’t the norm. Once publishing took time and means. Social media has made it much easier to self-publish. One can create, but also one can check accuracy and integrity – reverse image searching to see when a photo has been photoshopped or shows events of two things before…

And we have politicians making claims that they believe can be deleted and disappear from our memory… We have web archives – on both sides of the Atlantic. The European Referendum NHS pledge claim is archived and lasts long beyond the bus – which was brought by Greenpeace and repainted. The archives have also been capturing political parties websites throughout our endless election cycle… The DUP website crashed after announcement of the election results because of demands… But the archive copy was available throughout. Also a rumour that a hacker was creating an irish language version of the DUP website… But that wasn’t a new story, it was from 2011… And again the archive shows that, and archive of news websites do that.

Social Networks Responses to Terrorist Attacks in France – Valerie Schafer. 

Before 9/11 we had some digital archives of terrorist materials on the web. But this event challenged archivists and researchers. Charlie Hebdo, Paris Bataclan and Nice attacks are archived… People can search at the BNF to explore these archives, to provide users a way to see what has been said. And at the INA you can also explore the archive, including Titter archives. You can search, see keywords, explore timelines crossing key hashtags… And you can search for images… including the emoji’s used in discussion of Charlie Hebdo and Bataclan.

We also have Archive-It collections for Charlie Hebdo. This raises some questions of what should and should not be collected… We did not normally collected news papers and audio visual sites, but decided to in this case as we faced a special event. But we still face challenges – it is easiest to collect data from Twitter than from Facebook. But it is free to collect Twitter data in real time, but the archived/older data is charged for so you have to capture it in the moment. And there are limits on API collection… INA captured more than 12 Million tweets for Charlie Hebdo, for instance, it is very complete but not exhaustive.

We continue to collect for #jesuischarlie and #bataclan… They continually used and added to, in similar or related attacks, etc. There is a time for exploring and reflecting on this data, and space for critics too….

But we also see that content gets deleted… It is hard to find fake news on social media, unless you are looking for it… Looking for #fakenews just won’t cut it… So, we had a study on fake news… And we recommend that authorities are cautious about material they share. But also there is a need for cross checking – the kinds of projects with Facebook and Twitter. Web archives are full of fake news, but also full of others’ attempts to correct and check fake news as well…

EG: I wanted to go back in time to the idea of the term “fake news”… In order to understand from what “Fake News” actually is, we have to understand how it differs from previous lies and mistruths… I’m from outside the web world… We are often looking at tactics to fight fire with fire, to use an unfortunate metaphor…  How new is it? And who is to blame and why?

JW: Talking about it as a web problem, or a social media issue isn’t right. It’s about humans making decisions to critique or not that content. But it is about algorithmic sharing and visibility of that information.

JB: I agree. What is new is the way media is produced, disseminated and consumed – those have technological underpinnings. And they have been disruptive of publication and interpretation in a web world.

EG: Shouldn’t we be talking about a culture not just technology… It’s not just the “vessel”… Isn’t the dissemination have more of a role than perhaps we are suggesting…

AJ: When you build a social network or any digital space you build in different affordances… So that Facebook and Twitter is different. And you can create automated accounts, with Twitter especially offering an affordance for robots etc which allows you to give the impression of a movement. There are ways to change those affordances, but there will also always be fake news and issues…

EG: There are degrees of agency in fake news.. from bots to deliberate posts…

JW: I think there is also the aspect of performing your popularity – creating content for likes and shares, regardless of whether what you share is true or not.

VS: I know terrorism is different… But any tweet sharing fake news you get 4 retweets denying… You have more tweets denying than sharing fake news…

AJ: One wonders about the filter bubble impact here… Facebook encourges inward looking discussion… Social media has helped like minded people find each other, and perhaps they can be clipped off more easily from the wider discussion…

VS: I think also what is interested is the game between social media and traditional media…You have questions and relationship there…

EG: All the internet can do is reflect the crooked timber of reality… We know that people have confirmation bias, we are quite tolerant of untruths, to be less tolerant of information that contradicts our perceptions, even if untrue.You have people and the net being equally tolerant of lies and mistruths… But isn’t there another factor here… The people demonised as gatekeepers… By putting in place structures of authority – which were journalism and academics… Their resources are reduced now… So what role do you see for those traditional gatekeepers…

VS: These gatekeepers are no more the traditional gatekeepers that they were…. They work in 24 hour news cycles and have to work to that. In France they are trying to rethink that role, there were a lot of questions about this… Whether that’s about how you react to changing events, and what happens during election…. People thinking about that…

JB: There is an authority and responsibiity for media still, but has the web changed that? Looking back its suprising now how few organisations controlled most of the media… But is that that different now?

EG: I still think you are being too easy on the internet… We’ve had investigate journalism by Carrell Cadwalladar and others on Cambridge Analytica and others who deliberately manipulate reality… You talked about witness testimony in relation to terrorism… Isn’t there an immediacy and authenticity challenge there… Donald Trump’s tweets… They are transparant but not accountable… Haven’t we created a problem that we are now trying to fix?

AJ: Yes. But there are two things going on… It seems to be that people care less about lying… People see Trump lying, and they don’t care, and media organisations don’t care as long as advertising money comes in… A parallel for that in social media – the flow of content and ads takes priority over truth. There is an economic driver common to both mediums that is warping that…

JW: There is an aspect of unpopularity aspect too… a (nameless) newspaper here that shares content to generate “I can’t believe this!” and then sharing and generating advertising income… But on a positive note, there is scope and appetite for strong investigative journalism… and that is facilitated by the web and digital methods…

VS: Citizens do use different media and cross media… Colleagues are working on how TV is used… And different channels, to compare… Mainstream and social media are strongly crossed together…

EG: I did want to talk about temporal element… Twitter exists in the moment, making it easy to make people accountable… Do you see Twitter doing what newspapers did?

AJ: Yes… A substrate…

JB: It’s amazing how much of the web is archived… With “Save Page Now” we see all kinds of things archived – including pages that exposed the whole Russian downing a Ukrainian plane… Citizen action, spotting the need to capture data whilst it is still there and that happens all the time…

EG: I am still sceptical about citizen journalism… It’s a small group of narrow demographics people, it’s time consuming… Perhaps there is still a need for journalist roles… We did talk about filter bubbles… We hear about newspapers and media as biased… But isn’t the issue that communities of misinformation are not penetrated by the other side, but by the truth…

JW: I think bias in newspapers is quite interesting and different to unacknowledged bias… Most papers are explicit in their perspective… So you know what you will get…

AJ: I think so, but bias can be quite subtle… Different perspectives on a common issue allows comparison… But other stories only appear in one type of paper… That selection case is harder to compare…

EG: This really is a key point… There is a difference between facts and truth, and explicitly framed interpretation or commentary… Those things are different… That’s where I wonder about web archives… When I look at Wikipedia… It’s almost better to go to a source with an explicit bias where I can see a take on something, unlike Wikipedia which tries to focus on fact. Talking about politicians lying misses the point… It should be about a specific rhetorical position… That definition of truth comes up when we think of the role of the archive… How do you deal with that slightly differing definition of what truth is…

JB: I talked about different complimentary collecting strategy… The Archivist as a thing has some political power in deciding what goes in the historical record… The volume of the web does undercut that power in a way that I think is good – archives have historically been about the rich and the powerful… So making archives non-exclusive somewhat addresses that… But there will be fake news in the archive…

JW: But that’s great! Archives aren’t about collecting truth. Things will be in there that are not true, partially true, or factual… It’s for researchers to sort that out lately…

VS: Your comment on Wikipedia… They do try to be factual, neutral… But not truth… And to have a good balance of power… For us as researchers we can be surprised by the neutral point of view… Fortunately the web archive does capture a mixture of opinions…

EG: Yeah, so that captures what people believed at a point of time – true or not… So I would like to talk about the archive itself… Do you see your role as being successors to journalists… Or as being able to harvest the world’s record in a different way…

JB: I am an archivist with that training and background, as are a lot of people working on web archives and interesting spaces. Certainly historic preservation drives a lot of collecting aspects… But also engineering and technological aspects. So it’s poeple interested in archiving, preservation, but also technology… And software engineers interested in web archiving.

AJ: I’m a physicist but I’m now running web archives. And for us it’s an extension of the legal deposit role… Anything made public on the web should go into the legal deposit… That’s the theory, in practice there are questions of scope, and where we expend quality assurance energy. That’s the source of possible collection bias. And I want tools to support archivists… And also to prompt for challenging bias – if we can recognise that taking place.

JW: There are also questions of what you foreground in Special Collections. There are decisions being made about collections that will be archived and catalogued more deeply…

VS: In BNF my colleagues are work in an area with a tradition, with legal deposit responsibility… There are politics of heritage and what it should be. I think that is the case for many places where that activity sits with other archivists and librarians.

EG: You do have this huge responsibility to curate the record of human history… How do you match the top down requirements with the bottom up nature of the web as we now talk about i.t.

JW: One way is to have others come in to your department to curate particular collections…

JB: We do have special collections – people can choose their own, public suggestions, feeds from researchers, all sorts of projects to get the tools in place for building web archives for their own communities… I think for the sake of longevity and use going forward, the curated collections will probably have more value… Even if they seem more narrow now.

VS: Also interesting that archives did not select bottom-up curation. In Switzerland they went top down – there are a variety of approaches across Europe.

JW: We heard about the 1916 Easter Rising archive earlier, which was through public nominations… Which is really interesting…

AJ: And social media can help us – by seeing links and hashtags. We looked at this 4-5 years ago everyone linked to the BBC, but now we have more fake news sites etc…

VS: We do have this question of what should be archived… We see capture of the vernacular web – kitten or unicorn gifs etc… !

EG: I have a dystopian scenario in my head… Could you see a time years from now when newspapers are dead, public broadcasters are more or less dead… And we have flotsom and jetsom… We have all this data out there… And kinds of data who use all this social media data… Can you reassure me?

AJ: No…

JW: I think academics are always ready to pick holes in things, I hope that that continues…

JB: I think more interesting is the idea that there may not be a web… Apps, walled gardens… Facebook is pretty hard to web archive – they make it intentionally more challenging than it should be. There are lots of communication tools that disappeared… So I worry more about loss of a web that allows the positive affordances of participation and engagement…

EG: There is the issue of privatising and sequestering the web… I am becoming increasingly aware of the importance of organisations – like the BL and Internet Archive… Those roles did used to be taken on by publicly appointed organisations and bodies… How are they impacted by commercial privatisation… And how those roles are changing… How do you envisage that public sphere of collecting…

JW: For me more money for organisations like the British Library is important. Trust is crucial, and I trust that they will continue to do that in a trustworthy way. Commercial entities cannot be trusted to protect our cultural heritage…

AJ: A lot of people know what we do with physical material, but are surprised by our digital work. We have to advocate for ourselves. We are also constrained by the legal framework we operate within, and we have to challenge that over time…

JB: It’s super exciting to see libraries and archives recognised for their responsibility and trust… But that also puts them at higher risk by those who they hold accountable, and being recognised as bastions of accountability makes them more vulnerable.

VS: Recently we had 20th birthday of the Internet Archive, and 10 years of the French internet archiving… This is all so fast moving… People are more and more aware of web archiving… We will see new developments, ways to make things open… How to find and search and explore the archive more easily…

EG: The question then is how we access this data… The new masters of the universe will be those emerging gatekeepers who can explore the data… What is the role between them and the public’s ability to access data…

VS: It is not easy to explain everything around web archives but people will demand access…

JW: There are different levels of access… Most people will be able to access what they want. But there is also a great deal of expertise in organisations – it isn’t just commercial data work. And working with the Alan Turing Institute and cutting edge research helps here…

EG: One of the founders of the internet, Vint Cerf, says that “if you want to keep your treasured family pictures, print them out”. Are we overly optimistic about the permanence of the record.

AJ: We believe we have the skills and capabilities to maintain most if not all of it over time… There is an aspect of benign neglect… But if you are active about your digital archive you could have a copy in every continent… Digital allows you to protect content from different types of risk… I’m confident that the library can do this as part of it’s mission.

Q&A

Q1) Coming back to fake news and journalists… There is a changing role between the web as a communications media, and web archiving… Web archives are about documenting this stuff for journalists for research as a source, they don’t build the discussion… They are not the journalism itself.

Q2) I wanted to come back to the idea of the Filter Bubble, in the sense that it mediates the experience of the web now… It is important to capture that in some way, but how do we archive that… And changes from one year to the next?

Q3) It’s kind of ironic to have nostalgia about journalism and traditional media as gatekeepers, in a country where Rupert Murdoch is traditionally that gatekeeper. Global funding for web archiving is tens of millions; the budget for the web is tens of billions… The challenges are getting harder – right now you can use robots.txt but we have DRM coming and that will make it illegal to archive the web – and the budgets have to increase to match that to keep archives doing their job.

AJ: To respond to Q3… Under the legislation it will not be illegal for us to archive that data… But it will make it more expensive and difficult to do, especially at scale. So your point stands, even with that. In terms of the Filter Bubble, they are out of our scope, but we know they are important… It would be good to partner with an organisation where the modern experience of media is explicitly part of it’s role.

JW: I think that idea of the data not being the only thing that matters is important. Ethnography is important for understanding that context around all that other stuff…  To help you with supplementary research. On the expense side, it is increasingly important to demonstrate the value of that archiving… Need to think in terms of financial return to digital and creative economies, which is why researchers have to engage with this.

VS: Regarding the first two questions… Archives reflect reality, so there will be lies there… Of course web archives must be crossed and compared with other archives… And contextualisation matters, the digital environment in which the web was living… Contextualisation of web environment is important… And with terrorist archive we tried to document the process of how we selected content, and archive that too for future researchers to have in mind and understand what is there and why…

JB: I was interested in the first question, this idea of what happens and preserving the conversation… That timeline was sometimes decades before but is now weeks or days or less… In terms of experience websites are now personalised and our ability to capture that is impossible on a broad question. So we need to capture that experience, and the emergent personlisation… The web wasn’t public before, as ARPAnet, then it became public, but it seems to be ebbing a bit…

JW: With a longer term view… I wonder if the open stuff which is easier to archive may survive beyond the gated stuff that traditionally was more likely to survive.

Q4) Today we are 24 years into advertising on the web. We take ad-driven models as a given, and we see fake news as a consequence of that… So, my question is, Minitel was a large system that ran on a different model… Are there different ways to change the revenue model to change fake or true news and how it is shared…

Q5) Teresa May has been outspoken on fake news and wants a crackdown… The way I interpret that is censorship and banning of sites she does not like… Jefferson said that he’s been archiving sites that she won’t like… What will you do if she asks you to delete parts of your archive…

JB: In the US?!

Q6) Do you think we have sufficient web literacy amongst policy makers, researchers and citizens?

JW: On that last question… Absolutely not. I do feel sorry for politicians who have to appear on the news to answer questions but… Some of the responses and comments, especially on encryption and cybersecurity have been shocking. It should matter, but it doesn’t seem to matter enough yet… 

JB: We have a tactic of “geopolitical redundancy” to ensure our collections are shielded from political endangerment by making copies – which is easy to do – and locate them in different political and geographical contexts. 

AJ: We can suppress content by access. But not deletion. We don’t do that… 

EG: Is there a further risk of data manipulation… Of Trump and Farage and data… a covert threat… 

AJ: We do have to understand and learn how to cope with potential attack… Any one domain is a single point of failure… so we need to share metadata, content where possible… But web archives are fortunate to have the strong social framework to build that on… 

Q7) Going back to that idea of what kinds of responsibilities we have to enable a broader range of people to engage in a rich way with the digital archive… 

Q8) I was thinking about questions in context, and trust in content in the archive… And realising that web archives are fairly young… Generally researchers are close to the resource they are studying… Can we imagine projects in 50-100 years time where we are more separate from what we should be trusting in the archive… 

Q9) My perspective comes from building a web archive for European institutions… And can the archive live… Do we need legal notice on the archive, disclaimers, our method… How do we ensure people do not misinterpret what we do. How do we make the process of archiving more transparent. 

JB: That question of who has resources to access web archives is important. It is a responsibility of institutions like ours… To ensure even small collections can be accessed, that researchers and citizens are empowered with skills to query the archive, and things like APIs to enable that too… The other question on evidencing curatorial decisions – we are notoriously poor at that historically… But there is a lot of technological mystery there that we should demystify for users… All sorts of complexity there… The web archiving needs to work on that provenance information over the next few years… 

AJ: We do try to record this but as Jefferson said much of this is computational and algorithmic… So we maybe need to describe that better for wider audiences… That’s a bigger issue anyway, that understanding of algorithmic process. At the British Library we are fortunate to have capacity for text mining our own archives… We will be doing more than that… It will be small at first… But as it’s hard to bring data to the queries, we must bring queries to the archive. 

JW: I think it is so hard to think ahead to the long term… You’ll never pre-empt all usage… You just have to do the best that you can. 

VS: You won’t collect everything, every time… The web archive is not an exact mirror… It is “reborn digital heritage”… We have to document everything, but we can try to give some digital literacy to students so they have a way to access the web archive and engage with it… 

EG: Time is up, Thank you our panellists for this fantastic session. 

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)