Today I’m at the Digital Day of Ideas 2016, taking place at the University of Edinburgh Business School. You can read more about the event here. I’ll be running a workshop with Lorna Campbell later today on Tweeting and Blogging for Academics but, until then, I’ll be liveblogging the day here.
As usual, this is a liveblog and there may therefore be some small errors or typos – do let me know if there are any questions, comments or corrections.
Welcome from Professor Dorothy Miell, Vice-Principal and Head of the College of Humanities & Social Science
I’m really delighted to welcome you to our fifth Day of Ideas, and to see that we have not only staff and postgrad students from across the College of Humanities and Social Sciences, but also others from across Edinburgh and beyond. We are at a point now where digital scholarship no longer sits separately from other research, but is embedded in work across the college. The fact that the college in engaged in initiatives such as the Data Science Institute and the Alan Turing Institute is a great success for those who have been working in this space. Under our Digital Scholarship banner we now have training and support infrastructure to enable digital scholarship to thrive.
We have three main ways of communicating Digital Scholarship: we have a website, we have a mailing list, and we have a Twitter account and hashtag and do use that today to find each other, to share ideas, to make connections.
Today we’ll see some of the best digital scholarship from across the humanities and social sciences. You will also have the opportunity to try some tools and approaches out yourself in our workshop sessions. We also have lots of breaks and times to network and meet and discuss so do make use of these. I’d like to thank all who have helped put today together, but particularly Anouk Lang and Cath O’Shea.
We have three excellent keynotes today: Ted Underwood, who joins us from the University of Illinois, Karen Gregory from University of Edinburgh, and Lorna Hughes from University of Glasgow. And I’d like to thank the workshop leaders who will be leading those hands on sessions later on.
Anouk: I am delighted to introduce Ted Underwood, our first keynote for today. He has undertaken some of the most creative work in digital scholarship, including our image for today – a visualisation of topic modelling all articles published in the journal PMLA. His work talks about the infrastructure and literacies that underpin digital scholarship, and he is always there engaging in public, lending insightful critical comments and I am delighted to welcome him to open our Digital Day of Ideas…
Keynote 1: Ted Underwood, “Predicting the Past” (Chair: Anouk Lang)
I’d like to thank Anouk for that very embaressing introduction! And to all behind organising today.
So, today I’m going to talk about digital research opportunities that feel like gleaming castles in the cloud. Especially when we talk about really large scale research projects they really don’t fit into your ordinary research agenda, they are at a different scale and they change and shift as you approach and feel like you are getting this.
So I’ve started with this strange metaphor to show that digital scholarship and humanities are often tied up with ideas of scale. Humanities scholars are used to reading one book at a time, but when we have, say HathiTrust (which I’ll be talking about later), we are looking at 14 million volumes. So what does that mean? As scholars we are good at reading specific authors, movements, finite time periods… But that bigger scale allows you to see large trends that are not apparent to us when we look in that detail. What do I mean? Well if you look at references to the body in HathiTrust texts, if you could the percentage over time, it steadily increases (until the data set ends at 1922). Now I’m expanding on a finding from a student pamphlet (Heuser and Le-Khac (2012)) but this has been extended in terms of timescale, number of works, and also to compare with biography – which does not show this change over time. So, what is behind this? Is it an increase in writing on sex? Is it about, say detective/PI fiction where a fist hits a jaw… But this idea of looking for parts of the body doesn’t neatly fit into the disciplines we work in. This is one of our golden eggs… But how do we then interpret this data to be convincing and significant for a disciplinary audience.
This scale of data is interesting but hard to interpret, and not yet right for a disciplinary audience. So I want to talk about how we can take digital libraries and look at specific disciplinary starting points and questions. If I return to those references to the body… If we use words as known data and we then pose a question about the thing we already discuss – say, the difference between biographies and fiction. I am applying a social sciences move and applying it to the humanities. And that difference between biographies and fiction is a very blurry continuum… Scale provided by a digital library allows us to explore that boundary, and how it changes over time.
If I look at my references to the body image it looks like biography and fiction are moving apart… But that is just one arbitrary factor. We need to look at that data in different ways. And to do that we need to really consider a thought experiment. If you were to take a book off the shelf, read a few pages, would you know that it is historical fiction or biography? Having tried this I can tell you it is not that easy to do. We use our knowledge of real or fictional names to tell… We bring a lot of our own expectations and understandings. But what if we had a naïve reader who could read just materials from the 1850s and, based on works known to be fiction or known to be biography, to infer the type of some unknown texts. Now it’s hard to do that with real people but we can use computing to create a naïve reader.
So, we can take a known set of 1850 novels and a known set of 1850s biographies, we can get the computer to organise works into these areas. Then those labelled examples and learning, allows you to approach a new data set. So, this is a “supervised model”, which learns from labeled examples. Unsupervised models use no examples at all. So when we are thinking about supervised and unsupervised models I would argue that a supervised model gives us just the amount of novelty that is useful. We know there is already blurriness there, we know what should be in some of these data sets, and that is a good approach.
So, using this data, to what extent can we predict what is taking place? Well we can get between about 87/88% accuracy and 98% accuracy in distinguishing fiction from biography here, depending on the thirty year spans. So here I have some confidence that this is a useful model, partly because I am applying an existing approach to a larger data set and new tools. I have work coming out next week which looks at what else can be detected from these models – detective fiction (1829-1989) can be picked out by frequency of word use (e.g. use of “whoever”, lack of references to education) with accuracy of 91%; science fiction (1770-1989) can be detected at an accuracy of 88%. But it turns out that gothic fiction is harder to detect, with accuracy at 76%.
Now, in that graph of predicting the difference between biography and fiction we see that biography and fiction look more different over time. But the distinction is blurry and difficult. These statistical models allow us ways to explore the data. The genres differentiating is also an indication of the model becoming more confident in detecting difference. If you are an 18th century literary scholar you know that fiction starts by borrowing style or copying from biography, and we know that already… But we didn’t know that that continues to be a trend over 3 centuries, that is new.
So, I’ve said that using established approaches, and applying to larger data sets through supervised models. Now these models are using the top 3200 words… It is hard to understand the significance of each factor, but we can see some words that correlate positively and negatively with fiction. So, to come back to references to the body, we can see that reflects a wider trend, since parts of the body are correlated with fiction and not with non-fiction. We don’t know exactly why that is the case exactly but we can understand that change, that it relates to increased use of physical description, etc.
These models aren’t limited to things like genre. In many fields genre isn’t that interesting to many scholars here. But we could look at social factors, for instance gender. And we could look at how people are described, understand what is said about them, and the social stereotypes embedded in those conventions, those descriptions. And we would expect that to alter over time. So, for gender we will take characters labeled with the “he” and “she” pronoun, see how easy they are to distinguish, and then see if we can infer the gender from the way that they are described. Now, this work uses Bamman, Underwood and Smith (2014)’s “A Bayesian Model of… “(?) / Book NLP to understand parts of sentence, and see which words and descriptions map to which characters…
So how easy is it to infer grammatical gender? Well it changes over time, it is most easy in the mid nineteenth century, but then gets harder… So if we look at the 19th century words associated with gender we see minds and consciousness etc. associated with men, heart associated with women. And we also see changes in the way men are described, with jaw and chest increasingly used to describe men – the chest has never been so prevalent in descriptions of men as it is right now!
And with that I shall thank my funders and hand over to you for questions.
Q1) Are you planning to use syntax in the same way that you’ve used frequency of words?
A1) There’s a little of that in Book NLP, but word frequency is actually really good, syntax doesn’t add a huge amount.
Q2) Obviously this is English language texts, but what about other language texts?
A2) Well texts in translation still works pretty well, genre is a pretty translatable pattern. If I show you some work on science fiction – which is interesting as you first have to establish what science fiction is – then you will see Jules Verne is detectable as science fiction, even though that is a work in translation.
Q2) But is that because those texts are typical of the genre, or are key texts in the genre?
A2) That’s a good question and hard to know for sure.
Q3) Is there any difference in how male and female authors gender their characters?
A3) A great question and one I want to look at next. I’ve seen some work from Marcus Hewitt about recognising the gender of the author, he notes that you may be able to detect differences between their writing, but he observes that actually women authors just let their female characters speak more, that word frequency isn’t perhaps as significant as that.
Q4) [I’m afraid I didn’t hear this question]
A4) There are lots of fun things you can try when you look at texts. I’m using a simple algorithm that is well worn, well tested. For most accurate classification then every article would be useful. But actually when I’m presenting work to others the small tweaks for accuracy regularised logistic regression adds a little blurriness… I’ve avoided doing the super complicated stuff.
Q5) Have you tried to apply this approach to the corpus of a single author?
A5) When I started doing this, the thing was that authors have very distinct styles and you can cluster all of, say, Jane Austen’s work. I haven’t then looked within a single author’s work but others may well have. Using those models on that scale can be useful if there is work we need at that scale, and can be interpreted more readily. There are scholars who do that though, analysing choice of works for characters etc.
Q6) As someone who works on earlier texts I was wondering about the librarian classification of works that you used, as for the era I work on there are real problems in the accuracy of labelling texts as fiction or non fiction.
A6) Of course that can be an issue – including in the texts that I looked at. But does it matter enough? When I look at contested genres, e.g. science fiction, I take that supervised model – the texts librarians have marked up; I look at other lists defining the genre, and of words associated with the genre… And then apply predictive models that uses all of those definitions, to confirm OR break up those categories. For science fiction is confirms that model, for gothic it breaks down.
Q7) Have you tried using character N-grams in your work?
A7) I haven’t myself. I know it is used in the language community… But there are lots of things you could use. Scholars have tried using character social networks but that isn’t hugely helpful. But lexical model based on word frequencies are hugely useful in identifying genres etc. We tend to take words for granted but they have huge usefulness.
Q8) Have you used this to look at differences between works that are/will be more popular or successful, and those that are not?
A8) I have a paper that will be coming out that looks at reviewed poems and obscure poems, which show definite differences. Now that shows the difference between those poems, which tells us something about the expectations and interests of the literary community – values change but you can get 90% accuracy.
Q9) A controversial question: can you predict accepted papers for publications/conferences?
A9) That is a controversial question! Now I’ve just said that effectively I’d be reifying the prevailing judgement but we are already meeting this
Q10) Could you say something about errors here?
A10) Yes… So there is a novel by Roy Friedman called The Insurrection of Hippolytus Brandenburg, which looks like biography because it is constructed as documents of a fictional revolution… And Henry Fielding’s Tom Jones is also on the borderline between fiction and biography, but Fielding calls himself a biographer, and the form of that book reflects that. So these outliers are often telling us something interesting.
Q11) How do these types of approach fit with some of the ways we would usually critique, debate, disagree with literary analysis.
A11) I’m not doing this in a quest for unarguable truths. We can have disagreement here and I welcome that, and those traditional methods complement this approach. And the thing with these methods is that you can do a form of replication, reanalysis, to ask different questions and interrogate the same data in different ways, and that’s a version of disagreement that we are less familiar with but bring new opportunities for debate and analysis.
Keynote 2: Lorna Hughes, “Content, Co-Curation and Innovation: Digital Humanities and Cultural Heritage Collaboration” (Chair: Anna Groundwater)
Anna Groundwater is introducing Lorna’s work and welcoming her to Edinburgh.
Thanks to Anna for that lovely introduction and to all of the organisers today for inviting me along. I’m going to talk today about digital humanities and some opportunities for taking this forward in the future. But I want to talk particularly about cultural collaboration. But I want to start with a bit of a rant…
There has been a lot of discussion about digital humanities, including some really unedifying below the line discussion… Even more so regarding social media there there has been a real stramash. But it’s an interesting background noise to this event – which has a much more enlightened perspective. But these ideas of digital humanities do raise issues of identity… For me digital humanities methods are about doing, with practice a way to understand what is possible.
I will be talking about how digital heritage collections can transform research in the humanities, and how understanding the place of this work in the lifecycle of the materials, and engagement in the GLAM sector. And I think both aspects of that have implications for sharing and managing these materials. And I’ll be talking more about cultural engagement and engagement with primary research.
Digital Humanities, since the term was coined, has been a growing area but there have been a lot of assumptions both in academia and in mainstream media. And that is I think because “digital humanities” has become what Matt Kirshenbaum calls a “free floating signifier”. We engage with digital materials, with digital tools, with sharing and dialogue. For many this is a sea change in the academic process. For some the call to “Join Us” in the Manifesto for the digital humanities can seem a touch evangelical. But DH is a rich and complex field, and reductive presentation of the field does it a disservice.
If I were to define it I would say that digital humanities includes digital content (digital collections, and projects with digital outputs), with methods (discovering, annotating, comparing, referring, sampling, illustrating and representing digital content), and tools (for processing and analysis). The content scale and availability changes, and the tools certainly change. But DH can achieve two things: existing research processes can be conducted better and/or faster; and researchers can conceptualise completely new research questions. I think that latter aspect is where things get really interesting, generating whole new research questions and approaches. And to do this DH is highly interdisciplinary and reliant upon appropriate research infrastructures and networks.
Alan Lew and others have critique Digital Humanities and the comparison with maker culture, etc. but I don’t think that these things are so far apart. By buiding digital materials and content we can have impact on the whole humanities research lifecycle, with disruptive and transformative interventions. At the core of that is the content, the digital stuff that has been generated in large scale digitisation projects. I think developing deeper understanding of the resources is a good place to start. Much of what is categorised as DH is about consumption of content. There are huge opportunities but also real challenges. Tim Hitchcock talks about the limitations of OCR and metadata which are not easily found in digital surrogates of archive resources. “Most digital resources hide more than they reveal” says [?].
What we have available is limited by funding, by fixed term projects, by complexities in generating the digitised content. For instance the Welsh Newspapers online archive has gaps in it, but it is not always immediately apparent where these gaps are, and it can be easy (and very risky) for a scholar to assume they are engaging with a full set of content when they are viewing a partial collection of content.
Meanwhile we have more to do to use, link and enable analysis of digital heritage within scholarly research life cycle, we still need tools for analysis and reuse to be more closely coupled with the resource, with the content. Right now you need to extract the data and then run those separately to generate, e.g. an n-gram. Very few libraries embed those kinds of tools with the resources (KB, the Netherlands digital library does). And I’d like to take this content and link out to other public collections that contextualise and enrich that material. And we also need to think about design, reuse, repurposing. We can’t always imagine what others will want to do with our content. Generally these resources have been developed by preservation agencies and they have focused on access above other aspects. To make effective use and reuse of digital content for research we also need to look at co-production of resources.
An example of co-production was the Cymru1914.org: The Welsh Experience of the First World War. This was a collaboration with the National Library of Wales, local archives, BBC Wales, and with academic researchers. This collection is based upon an open archive and I focused on use and reuse from the outset, partly as materials that are well used are more sustainable. We curated materials with the greatest impact for research, for teaching, for the public. That meant identifying popular content, content on topics of interest to future research. We used a participatory design approach to ensure we had a bilingual and accessible interface, and that it would be easy to navigate and understand the context of the content being viewed.
I also wanted to talk about “The snows of yesteryear: narrating extreme weather”, which was a project in partnership with the University of Wales, Aberystwyth University, UK Met Office, Performer Eddie Ladd, and the National Library of Wales. We collected narratives of extreme weather events from a wide range of archival sources. We also gathered commentary from the community on local experience of extreme weather. We also worked with climate scientists to explore representation and integration of narrative and documentary sources for research. And this culminated in a performance that explored a dramaturgical responses to historical occurrences informing contemporary experiences.
This collaboration of disciplines and data types is an act of curation but how can we replicate and preserve these relationships, this curated combination of resources and collections, over time. That’s a real challenge and is experience that we will need to understand as we preserve increasing amounts of born digital content.
So, both of these projects were highly collaborative but what makes DH so much fun also makes it hard to document and replicated. But there is a resource that allows you to show relationships and articulate those collections: NeMo: Methods Ontology for Digital Humanities (Nemo.dcu.gr), which is a conceptual framework that explicitly articulates connections, resources, tools, and dependencies within the digital humanities process. This should make the fragmented, difficult to define process of DH somewhat easier to discern. Some of the debates on the nature of DH comes from a lack of transparency in methodologies, and in differences between disciplines, which can limit the opportunities for reuse, for review, for critique.
If I were to summarise some lessons learned, then I would say that “Digital Transformations” offers significant new type of disruption, enabling new modes of collaboration and communication, changing paradigms of understanding and the creation of new knowledge, research that is otherwise impossible. It depends on, and adds value to, digital collections. And digital collection used for scholarship are more likely to be sustained. Digital heritage for research also involves the opportunity for partnernship, for involving extended communities of practice as part of a fluid collaborative work.
Sheila Clyde(?), UCL, notes that libraries often produce digital resources at scale but with restrictions reflecting funding of digitisation, by contrast scholars create tools well designed for their work but which do not scale up for others’ use, and co-creation can bridge that gap to an extent. And that brings us back to the importance of curation, and the meaning of curation as being close to the content. Digital research enables different ways to frame content, to understand that content in diverse contexts. The experimental processes digital humanities enable and encourage new ways to interpret and bring narratives to the content. The practical nature of digital humanities is particularly important. And these benefits are mutual. Marconi has written about the approaches and value for museums, but this has relevance to others. He speaks about the fact that delivering services to the public is a core part of their purpose, and digital materials can again bridge the gap between preservation and curation roles for the museum and this need to deliver real value for the public, for museum users.
Now I am contractually obliged to mention the new Kelvin Hall redevelopment at Glasgow University, Glasgow Life, and National Library of Scotland, a space that will be a forum for taking forward collaboration and co-creation in cultural heritage.
So i terms of digital humanities, where it will flourish and where it will go. There has been a lot of emphasis on where DH should be placed within an organisation. Until recently I would have argued that the library is an excellent place for DH, I increasingly think that this is financially not sustainable. The open access revolution means that libraries are now much more involved in opening up research and publications, fulfilling funder and mandate requirements which is taking significant contribution from the library. So the libraries still have a core role here, but other spaces are important too, and the bringing together of different organisations in cultural heritage is important. And a framework for DH is important for understanding why we do these things, how we do this – what is the role of infrastructure and environments that allow better use, re-use and linking of digital content, and how we ensure we have enough attention on the human infrastructure too – the scholarly ecosystem around digital research and communication?
Q1) I was very taken with your description of some of the challenges that libraries currently face around digitisation at the moment, do you have any thoughts on how we might change that situation?
A1) A great question. There really isn’t funding for digitisation on its own anymore. The Commission won’t fund it, few funders will support it, and ad hoc funding isn’t enough to meet the needs. So my approach has tended to be about embedding digitisation into other research projects. But the long term sustainability of digitisation projects can be a real challenge there… And that is an issue that has to be thought about up front, to ensure assets are not hosted locally and not shared across the institution. We have to work with the researchers throughout the organisation and making that information visible across the organisation, or more widely.
Q2) I really liked your focus on the important of using and reusing digital cultural heritage resources, and the opportunities for research arising from that. I was wondering about reuse of these resources for teaching and learning?
A2) I think that’s a good question. I think really it comes down to anticipating the widest possible array of communities will be interested and engaging in reuse of these materials. The success of a project isn’t about me using them – it’s about others finding and using them. And it is therefore important for them to be openly and freely reused.
Q3) I’m interested in the distinction between primary and secondary materials. I’m not sure that I entirely agree with the criticism of primary materials, maybe for secondary materials. For my own organisation we have too much demand for primary materials and we cannot focus on a single community, maybe we can work with and host some secondary materials…
A3) I agree… That second layer of curation is something that is hard to maintain, doesn’t work well, and is perhaps best left to other organisations/stakeholders.
Q4) Do you think there is any value in some resources not being available digitally for research?
A4) There are some resources you need to work with physically. Even newspapers – those are very visual resources which it is tough to revisit digitally. There are types of research that still rely on the physical, and those that work with the digital or the digital and the physical. There are things you can only do at scale digitally. You cannot engage with thousands of papers physically… But looking at a stack of papers can show you trends, changes, interest levels (e.g. in WWII and the way the stories run through a single issue), there are digital methods emerging there too. And we need the critical framework to know what to focus on, so it is important we support and develop that.
- Analysing Online Discussion Data with Google Sheets and Google Analytics (Martin Hawksey, ALT)
- Make Your Own Chat Bot (Sian Bayne & Kathrin Haag, Edinburgh)
- Data Visualisation with D3.js (Uta Hinrichs, St Andrews)
- Drupal (Jim Benstead, Edinburgh)
- The Edinburgh Geoparser (Bea Alex, Edinburgh)
- QGIS (Tom Armitage, EDINA)
- Tweeting and Blogging for Academics (Nicola Osborne and Lorna Campbell, Edinburgh/EDINA)
- WordPress (James Loxley, Edinburgh)
Thanks to all who came to our workshop! Slides and such will be available shortly!
Keynote 3: Karen Gregory, “Conceptualizing Digital Sociology as Critical, Interdisciplinary Practice” (Chair: Sian Bayne)
Sian: Karen joined us at the University of Edinburgh last year and she is a lecturer in Digital Sociology and founded a group on digital labor. She is developing the MSc in Digital Sociology. She has written some really nice feminist interpretations of algorithms, she’s written on Uber, and there’s pretty much nothing interesting she hasn’t written on.
Karen: Thank you so much. It is so wonderful to be here, and the fact that I am here is testament that
I was at the eLearning@ed Edinburgh is really becoming this epicentre for digital work and digital education, and it is so collaborative and also so outward facing, it is just a scholars dream!
How many of us are on the internet? How many have Twitter or Facebook? How many have a cell phone turned on? How many have their FitBit? How many have been photographed by CCTV? How many of you have used your bank card today? And how many of you wish email would just go away?
Now, all of those things emphasise that we are increasingly in a datafied world. This datafication of everyday life has mostly taken place outside the scope of democratic participation. Most of us are outside the interest of the NSA and yet the US and UK government can monitor pretty much all that we do. Business that uses our data is so commonplace we have the much used phrase that “if you aren’t paying for it, you are the product”.
But what are these data lives that we are living? What sort of social relations are being engendered through technology? We are in need of a Digital Sociology that can theoretically and methodologically tell us about these digital lives. But do we need a new word here? Is it really different from data science? Or internet research? I’m going to talk about three areas of my work and make the argument that we need to see the digital as, itself, an actor. And we also need to think of what we mean by social… In sociology we need to move from thinking about the who, and also the what. That additional “what” may be a small word but it’s important.
Thinking of the what we have to consider “privacy, the private self, human and civil rights, governance and citizenship, finance, health and medicine and even, of course, education”. I want you to take these definitions and information with you today but also to have you question how our labour, as academics contributes to this data society.
Digital Sociology was coined as a term in 2007 by a former colleague in New York, the first book on this was from Edinburgh colleagues Kate Orton-Johnson and Nick Prior. For them the magnitude of digital intrusion is a prompt for skepticism – they are not convinced that these tools and technologies actually change our lives. Is Uber really changing our relationship to transport? Are we really thinking of ourselves as brands. And this type of perspective really brings together sociology with a long history that includes internet studies, information and communication studies, etc. We are magpies building upon this work, in the context of sociology.
Digital Sociology (see Deborah Lupton) intentionally means to include the vast range of settings and spaces and contexts that this includes, not just what we do online, not just what we once labelled “cyber”. And that includes the settings that brings about new technologies. And in fact digital sociology wants to draw attention to the very social conditions that make these things take place, that enables these “assemblies of social actors” (LaTour). The emphasis on sociological questions is an attempt to understand the push and pull of social lives and the bigger macro forces. To what extent are we making digital, and to what extent is digital making us. And this is an extension of (?) Mills’ work – we remain interested in understanding the rather complicated nature of human agency in an increasingly technocratic society.
So, whilst Digital Sociology is concerned with datafied society, we are not necessarily enthusiastically heading towards big data. Digital Sociology allows new ways to consider data and the acquisition from data – from social media, from wearables etc. Noortje Marres (What is Digital Sociology 2013) talks about the way in which digitisation affects or alters the relations between researchers and researched and those between the objects, methods, techniques of social research. Just because you can envisage the data, doesn’t mean you can do that, and it doesn’t mean that you have the data or tools available to do that. Interdisciplinary work sounds great but when you start to critically engage with how a platform like Facebook works, immediately highlights the asymmetrical nature of privacy that occurs with their insulated protection of income.
Evelyn Ruppert states that “such methods do not just describe society but “help to create it anew”. It’s not just an ontological method, it’s also an ethical issue. And it’s not possible to address this entirely through research. When we translate a click, a gesture, an interaction, we have to be clear in what we are doing, how we change that data, what am I doing when I take your data, translate and interpret your data. Instead of taking data that’s just out there, we have to ask “what kind of data am I” and what am I putting out there in the world. And that issue is just getting harder to define as we increasingly engage with wearables.
So some work we’ll look at:
- Gregory, Karen (2014) Enchanted Entrepreneurs: the labor of esoteric practitioners in New York City. Dissertation. City University of New York.
I didn’t start out as a digital scholar, I am an academic concerned with labor but as I followed the trail researching those who make their money from spirituality, from Tarot… They were both engaged in this labor, often pushed out of traditional labor markets, making their living but also aspiring and reaching for enlightenment – some of my subjects spoke of themselves as the Oprah generation of Tarot readers. And in attending conferences, and being part of this community they are engaging in the internet, social media, they are becoming creators of digital brands. These are assemblages of people, spirits, personal brands, etc. and this digital labor provides a potential model for how people are creating these brands, and engaging in different kinds of labor. If you are interested in people, you will find yourself a digital scholar.
- Lupton, Deborah (2016) The quantified Self. Polity Press
Deborah Lupton writes about self tracking, the use of FitBits and wearables that extend practices (e.g. diaries), and changing that activity into data, into outputs, into reports. Wearables seem niche right now but they are becoming mainstreams. It’s about wristbands, headbands, patches in fabric that can monitor physical status and instantly uploaded. We have connected devices for blood pressure, for weighing scales etc. Doctors routinely ask patients to monitor moods. Some committed self trackers regularly send stool samples for analysis. They use off the shelf genetic testing products. This data is co-produced… It is of the human but it is also produced. And this data is valuable – not just for the individual for self-improvement – but also for others. Quantified self reflects many surveillance practices, informing e.g. employers, insurance etc. Frank Pasqual in Black Box Society talks about a shift in the responsibility for risk, and of this data having a life beyond our lives. And Pasqual talks about us needing to engage with this data. Again and again sociologists caution that data without context is not useful
- Skeggs, Bev and Yuill, Simon (2015). Values and Value project
This project is really going for it. They are looking at how value and capitalisation are influencing the idea of value and behaviour in social media. Specifically they are looking at Facebook, and what happens when economic value is attached to things that are not traditionally economic. Facebook assigns financial value on friendships and likes and in this way monetises friendship. This is a data project, based on a browser plugin that tracks participants’ use of social media, and highlighting to them what is occurring behind the scenes to understand how that data in monetised.
Skeggs and Yuill talk about not making assumptions but understanding how Facebook and how it makes decisions. But studying this relies upon the tools that Facebook itself makes available. They speak about the importance of having our own research tools, not just relying on commercial tools. They have done this but “this process is a complex one”. They document how they work together to establish data from the API… But they are also curious about how Facebook is tracking behaviour of users when they are not on the site, tracking other browsing. The team had to build two tools, and found that they were overwhelmed with two strands of data and needed a website to capture that. So, this isn’t just about tracking that data, but also it is about a reflexive process of building this tool. This isn’t seamless data science, this is deeply critical, reflective, and wants to bring to live the social life of their own research methods.
Whilst some of the challenges Skeggs and Yuill faced were technical teething projects, they also faced social challenges around what can and cannot be studied. If digital means studying the data and the tools themselves. It isn’t just the result or findings but also the social conditions and power relations that are inherently underpinning these tools themselves. If we just take the data we risk understanding the data as the social, rather than understanding the social as the data.
So, this brings me to the conclusion here. Increasingly – and disturbingly for me – we are choosing to just live with this datafication. But more concerningly still we are “increasingly, we are now seeing digital technologies and data-aggregating apparatuses developing in the service of a positivism that desires not only to represent social life, but to predict its very outcomes”. That’s emerging in predictive policing, predictive learning analytics… This dystopia – and I don’t use that term lightly – raises issues of privacy, of human rights that we haven’t begun to think about. This is turning us into a database society… This leap puts us into new populations, and that should trigger conversations , for instance, about race and racism.
As academics does our digital labor resist these changes, or are we creating it? And how do we shape our students to engage with this world, to engage critically. That is more important than our own research I think.
Q1) I have been wondering about how I situate myself… And I was wondering if you could say something about the connection between your work and the sort of digital humanities work we saw this morning.
A1) Yes… I think one of the things we have to do with Digital Humanities… There is all this intensive awesome work, but it’s so hard to pick out where the question came from, where the ideas come from. In sociology we play about with data in different ways. We need to make that sort of connection. But I think those working with data will likely have come through the digital humanities route. We have to work together to understand what data means – collaboration is essential.
A2) Digital humanities show what is possible when you intentionally go out to create data, to create a database. Martin Hawksey talking about Twitter in the workshop said that you sign away a lot of permissions when you share data with the platform. He showed an example of tracking that can seem scary… And we don’t know how our data is being used. How do we reconfigure social trust and what does that mean?
Q3) You talked about the giving away of your digital self… The Data Protection rules coming from the EU is a social response in EU Human Rights context to those pressures. But is unclear how this plays out in the future.
A3) I agree that it doesn’t solve the problem, it adds to the assemblage, it raises issues of who lobbies for those changes, what those mean…
Q3) And those changes change what you can access and research… I was at a conference in the US where a researcher said he wanted all my data – in the US he would have been shouted down I think…
A3) Nothing we are talking about here is easy. There is such complexity here. Too much data is an issue. Too much trust in data can also be a huge issue.
Q4) In that vein of complex questions, how do we mesh that EU top-down corporation thing with our very human desire to just try things, and have a go, and sign up for things for pragmatic reasons.
A4) One part of this is social trust… Becky Trubeti(?) writes about how we like that connection, we don’t want to give up those opportunities. But – long answer – there must be a better labor movement around digital platforms. We have a language of immateriality but we need a more comprehensive movement. Platform movements are trying to co-opt communities and make new platforms… Is that easy work? Can you get VC funding for that? We don’t know. There are big fights on the table here. If you took away these digital tools and technologies though, it would be like losing a limb! That isn’t a good solution either.
Q5) How do we empower students to know what sources to trust and not to trust? And to be aware of all these subjects, to make these better platforms.
A5) Had I stayed in NY in a public university I would have been working on a cooperative code school. We can educate students to be digitally savvy users, but we also need to inform the next generation of makers and developers. The language of entrepreneurialism is problematic, and takes us down a path that won’t resolve anything.
Q6) Is that a call for a new philanthropism, in the way that the 19th century quaker movement led to places like Port Sunlight, etc. in the UK?
A6) Everything I’m saying sounds great to those in a liberal society but none of it will work without an eye on the capital, without money flowing towards different types of digital platforms.