Jul 012011

Today I am again at the FIFTH BLOOMSBURY CONFERENCE ON E-PUBLISHING AND E-PUBLICATIONS, SOCIAL MEDIA AND THE ACADEMY: Enhancing and enabling scholarly communication. I’ll be liveblogging but the usual health warnings apply: expect typos etc. And if you’d like to read about Day One it’s here.

Anthony Watkinson is introducing today by saying that yesterday he didn’t attempt to define “social media” but no-one seems to be worried about that so he’s not going to define it again! And so to our first presentation…

Anthony introduces Jason Hoyt by mentioning that Mendeley was one of the tools mentioned by researchers in the CIBER surveys (see yesterdays notes) on social media.

Adopting and Adapting Tools

Description of Session from organisers: Research has shown that scholars often adapt existing social media tools. The speakers in this session, Steve Scott (Macmillan Digital Science), Jason Hoyt (Mendeley), Robert Simpson (Galaxy Zoo) and Daniel Mietchen (Jena) will describe what tools are provided and what tools are used seriously including crowdsourcing and wikis.

Research Gone Social: Mendeley – The World’s largest research collaboration platform and database – Jason Hoyt, Chief Scientist at Mendeley (www.mendeley.com)

Jason said that they are based at Clerkenwell Road and there is an office in New York – everyone is welcome to drop by, they’re always happy to see visitors and every Friday evening there tends to be “Beer O’Clock”. Only about two people have not heard of Mendeley before.

It was interesting that Anthony talked about not defining social media. I think it can be useful to think of many of these tools including Mendeley as “online collaborative tools” – that may get more buy in from researchers.

Mendeley went out to fix th enumber of people worldwide needing to manage and share research and do this more efficiently. Mendeley is built around PDFs – uploading a document extracts metadata, indexes, there are citation plug-ins etc. It’s a reference manager with a collaborative social layer with groups in which you can share these documents with sticky notes, highlights, comments etc. You can sync materials to iPhone or iPad or different computers.

Mendeley are big fans of science. We’ve crowdsourced the world’s biggest research database turning into a Big Data app platform – 110million+ crowdsources documents, 300k added per day in real time and the Mendeley API is being used in various apps – and surfaces most popular papers, trends, etc.

Mendeley deduplicates documents – there are about 44million unique documents. Thomson Reuters Web of Knowledge database has around 40 million unique documents (but took 50 years to build, rather than 2 years).

Technology behind the site is robust and scalable. Tagging, recommendation engine, real-time reading states, all help make materials managable and findable. Thomson Reuters impact factor has a delay of about 2 years. The real-time reading stats on Mendeley lets you see articles impact in real time.

The 15 largest userbases include University of Cambridge, Stanford, MIT, University of Michigan, UCL etc. And there is a network of 550 “Mendeley Advisors” that support Mendeley at their institutions. 88k groups (linear groups growth though membership has been exponential growth.

Jason’s Three Myths of Social Media:

#1 Built it and they will come

Building stuff doesn’t attract people. You have to fit into people’s workflow – people don’t have time to try out new things that don’t fit them. It takes time and focus – it can’t be a side project for government or publisher, it has to be your core business focus.

#2 Scientists are on Facebook, so they’ll want to join an academic network

But people inherantly separate personal life from career. You have to understand the different motivations people have for using these things

#3 Academics do NOT want to socialize online

Socializing needs immediate and obvious value – there has to be some aspect that works for the individual but also has benefit for group collaborations. Adding social features, like commenting, can fail. Where is the immediate value add? That doesn’t preclude commenting not being useful but we haven’t yet hit upon the incentives to make commenting useful to people using sites. Surveys on academics using social media haven’t found these yet then.


Mendeley is still growing, have a long way to go, but we have hit upon something and taken a very boring thing like a PDF and made it a social object.

So how about how can “social media” help institutional repositories? Jason sees four barriers to more positive IR uptake:

  • Not understanding copyright laws or lack of confidence in understanding
  • Rekeying bibliographic information
  • Finding full text versions to deposit
  • Time

And that means worldwide the deposit rate is less than 10%. So Mendeley, Symplectic and CARET put in a bid to JISC to help deposit rates for universities – http://jisc-dura.blogspot.com. The pilot is at University of Cambridge – they already have a partnership for Symplectic. Deposit is hooked up to D-Space, helps locate full text and licence and deposits, then shows in desktop and online Mendeley feed. [a similar tool, OA-RJ, that works for a variety of repositories runs at EDINA – more info on the OA-RJ website]. The IR tool is free and JISC-funded. Makes sense in business terms to make this free and available.

How can social media help librarians – we have lots of statistics on usage, why not start exposing that to librarians? Librarians or university admins can track stats of their groups, can benchmark their stats, see readership levels. And authors and uni admins can see how researchers in other organisations are using their papers – this hasn’t launched yet but is looking free at the moment.

Zooniverse – Robert Simpson (rob@zooniverse.org)

Zooniverse was born out of a project called Galaxy Zoo which emerged because researchers at Oxford University were faced with a huge database. One researcher, Kevin Schvinski(sp?) did a week of intense work – did 50k in a week but there were a million in the database. But his friend had heard of Stardust at home – using your idle cycles (not your computers like SETI) – you had to pass a test to take part but it worked. So the idea was to build a website presenting galaxies for classifying, had to take a test to take part. And it was a huge hit. They got huge uptake and classified more galaxies in an hour than Kevin had in a week.

A group of projects have come out of this project!

There is Ice Hunters, Ancient Texts, Galaxy Zoo Mergers (uses javascript engine in browser, Galaxy Zoo Supernova (real time info to spot when a supernova has gone off – you get email alerts for this and are looking at mobile apps), Old Weather, The Milky Way Project, Galaxy Zoo Hubble, Moon Zoo, Solar Stormwatch (engine is learning and can predict forthcoming storms – and has a Twitter stream! and can also therefore flag up aurura), Planet Hunters (searches for exoplanets).

430k people are taking part in Zooniverse projects! 320k people took part in GalaxyZoo to date, processing 40 million classifications – do a lot of error checking and redundancy.

Moon Zoo – high resolution images of the moon (www.moonzoo.org). You do crater counting, sizing etc. Can do a lot of research with this. Done 51k square miles of the moon so far – 3 switzerlands!

Old Weather are logs of ships from the first world war. The officers had to log the weather every four hours, and their location. HUgely important for climate scientists – all sorts of measures from places we don’t have monitoring. And the data is reliable and from trained people working with calibrated kit. And in addition to weather volunteers transcribe the history of the ship, info on board etc. Recorded daily life, illness, etc. Gone through half a million pages – two thirds of the way through so far and planning to add more logs.

The Milky Way Project – we ask people to draw bubbles around stars forming. They map other phenomena as well. We ask people to map and measure our own galaxy in various ways. This was just launched in December.

Planet Hunters – a NASA probe called Keplar that just logs star brightness in one location. GRaphs changing brightness of stars. Stars do move and change but sometimes hidden in the data you see tell tale signs of a planet going round a star (like spotting a mosquito in front of your car headlight). 33k participants. So far we have found 69 potential planets that NASA hadn’t spotted yet. We can let people know which planets have been discovered!

We’ve developed some rules and ethics here. Crowdsourcing isn’t really what we do – we do crowd analysis. So we have rules and ethics for doing this:

  • Be Open about what you’re doing – don’t try to play a game, and don’t try and test things on people (they HATE that!) Be clear on what you are doing, what it is for.
  • Treat public as collaborators – they are not users, and they are not customers. They are helping us. Take them seriously when they ask questions – when you take them seriously it can go really well.
  • Don’t waste people’s time – this is key. We have rules for what projects we take on at the Zooniverse. Research will get done is a core rule for us. There have been “citizen science” projects that have made people do loads of work that haven’t actually created any results that justifies that.
  • Everyone is important – which relates to this… Quite a lot of people do a few galaxies and wander off. There are others that do 100s (a very few did ALL 900k in the original Galaxy Zoo!). And there are people doing between 5 and 25 galaxies. Some are between 25-100 galaxies. The two lower groups are most populas. The most classifications come from the 5 to 25 galaxy classifications group (30-45% of classifications). Designing the task you are doing well is super important as those first 5 to 25 clicks are crucial!

We did ask out users why they take part – various answers – people like the scale of the project, some are scientists, people think it is beautiful, astronomers were popular, but the far biggest answer was “I want to contribute to research”. They were really into the project and wanted to help. Really helps us understand the motivation.

A quick example of one of the Milky Way Project – easy to spot very popular/frequently identified phenomena. All the projects reach a point where the data has to be dealt with and translated into computer understandable catalogues. The Galaxy Zoo paper explains how we do this at Zooniverse in general.

So, back to Old Weather – you can see that people are very good at this stuff if you design the task well. We’ve developed interfaces to view and explore our data now. So if you go to oldweather.org/voyages you can follow a ship and it’s annotations, the colour of the line changes to match sea temperature. And you can see animations of ships and barometric pressure. We have projects with the National Maritime Museum across Europe as it would be great to have other ships in here as well. So all those transcriptions are now searchable so we can look at the people who are sick on board. You can see a sudden peak in the data – probably Spanish Flu. Looking at the whole fleet we see other spikes as well. Cool stuff that can be extracted.

We have a forum for all our projects for discussion – some lovely snippits from the logs have been identified. And nice data visuals: 31 happy posts, 500+ unhappy ones. People also talk about bad weather more than good. And what do folks do on board? Dance is more popular than boxing! Cricket and Football most popular though. And we have a treasure map – we looked for incidences of the word “overboard’ – stuff lost overboard including hand buckets etc. But some are dead bodies as well. But lots of family history interest in Old Weather – lots of missing or gaps in information that may be covered in the data.

A Dutch school teacher spotted something in Galaxy Zoo and asked about it, went to observe further, Hubble observed. But basically a quasar turned off and we are seeing a light echo from that. Hannes Bor Vurp (sp?) is the name of that object now – after that teacher – this is why people are so much better than computers!

And another example – users found a new type of Galaxy now called Galazy Zoo Peas now!

Random fact: there are only 11 armies in the world larger than the Zooniverse!

And finally time put in vs. time got out… For a project at huge scale it’s important to see time in against work done. there are equivelent to 128 FTE people working on Zooniverse projects based on work done (23 FTE on Old Weather). They prove the concept can work!

And a David McCandless image: box of 200 billion hours watched on telly by US citizens per year. Smapp blue box = 100 millions hours it took to create Wikipedia. Clay Shirky talks about the cognitive surplus – which we can do something amazing with!

Wikis as a tool for publishing scholarly Workflows – Daniel Mietchen

All of Daniel’s notes and slides etc. are online here: http://species-id.net/wiki/User:Daniel_Mietchen/Talks/e-publishing_2011/Start see also his Twitter feed: @evoMRI. And you can add questions directly to wiki page.

Daniel has a slide about how to add video, images, links, content to PDFs. What I find interesting for epublishing discussions is the editorial note on that paper – which said it was a catalyst for publishers to discuss sharing multimedia content. Usually treated as additional materials, an extra that can be linked to. Some just don’t accept that material. But multimedia use on the rise especially for communicating science.

Others have realised that we have to go to other tools – workshop Beyond the PDF – can see Daniel’s presentation on that wiki. Several publishers with advanced features. Elsevier has sidebars with extra features. Others have similar functionality – you can grab metadata to download or use in Mendeley. Publishers imagine minor differences only really. So started thinking about this, Googling. Noticed a post in German giving criteria for a Journal of the Future – translated and added to Wikiversity – been evolved and added to so do have a look. So, for instance:

  • Dynamics – publishing pretends that an article occurs in ten pages once a year but researchers work all the time and this could be reported. Most researchers have a lab notebook – would love to see an Old Weather for lab notebooks but secrecy probably prevents that! But journal of the future needs to note that research is ongoing, not a static thing.
  • Scope – very few people read new articles in the paper version. So people have RSS, email alerts, Google, PubMed or similar. All one source. If only one journal in practice, why journals with one focus. On paper that made sense but we are in the age of electronic publishing and the article as object not the journal. No need to have scope of journals limited.
  • Access – if you can’t access journals it isn’t useful so journals need to be open and accessible, including the public, the researchers of the future.
  • Replicability – publishing presents barriers to replicability – the length of paper, the data not being shared, the workflow not being shared. We really have to share all that data and the journal of the future should include that.
  • Review – this shouldn’t happen over months but should be ongoing. If you think in wiki terms it is good for the page to be reviewed, check whether improvements are good, review could go on permanently. Wikipedia does this but doesn’t limit edits to experts.
  • Presentation – interlink with visualisations and other tools to enhance presentation, analysis and understanding of data.
  • Transparancy – disclosure is now required in journals for current relationships. Transparancy could again be useful in a more long term way and more broad way – how has data been acquired, ethical issues etc. Most ethical review boards don’t publish their paperwork but these can be informative even if you don’t publish every bit of detail.
  • Sustainability – if we do share data, or make data ongoing, then how will we manage that and make it work sustainable, economical. What is deletion policy etc?

So there are some journals that do link to materials. RNAbiology require authors to submit publicly understandable synopsys to Wikipedia as well as to the journal so that new families of RNA are added to Wikipedia. And requires authors to share some of their content since Wikipedia is CC-BY-SA. And the journal gets exposure through reference back through Wikipedia. Interestingly the journal is published by LANDES Biosciences which is not always known for open access so it’s a really interesting model.

There is a similar model around new species discovery elsewhere from species-id.net. And encourages page creators to add metadata, citations, version IDs etc. Meanwhile at Jmol you can interact with molecule models and visualisations, you can see videos here as well (including gross toad vs. bug). This sort of multimedia really enhances the paper – the paper was very dry but the video is increadibly engaging.

Daniel has a big comparison table of different wiki tools and ejournals – worth a look (see wiki link above). And suggests that peer reviews should be public – they are important parts of the publication process and although some journals do include these most do not. Valuable records including important notes on papers.

So, if we consider crowdsource/analysis aspects of wikis. Think of the work as one giant lab! Many people want to be researchers in their spare time – many projects have stages where those performing research may not be the best qualified to do that. Having ways to crowdsource research and attribute it appropriately then let experts analyse are worth considering.

Wikimedia approach splits material by type/format of concept. Scholars would prefer splits by topic – WikiLab, WikiEvents, WikiJobs etc [sounds like Nature models] and those latter could be revenue streams. If we have wikis rather than journals it would be easy to see where gaps in knowledge are, what has been added by a research project. Many articles include only a few elements of new content – on a wiki adding just paragraphs to show new material and updates would make sense rather than repeating 80% of paper. And you could call for funders rather than calls for projects!

Daniel is currently working on the Encyclopedia of Original Research – drafting proposal in public for a wiki of open access articles.

The major exception to public information is funding – explaining why proposals have or have not been funded. Examples do already exist for all steps of the research cycle being captured in blogs but not yet on this sort of wiki. But aso exampes fo things like Github – collaborative coding. In software you can always Fork a project. Not so easy in wikis.

Panel Session / Q&A

Q1) Tula: In Daniel’s vision where would copyright lie? With the author? Could there be duplicate locations for articles?

A1) Daniel: My vision is for science in general to be in the public domain. But CC-BY-SA licence is not bad. We have legal and norm requirements to attribute – think we should do away with the legal copyright requirement. But for this project we are taking CC-BY-SA content from PubMed, add to wiki, do Linked Data annotation etc.

Robert: Didn’t you just describe Mendeley to a point? Obviously there is theme?

Daniel: Yes, we are in a theme, but they cannot cope with multiple versions?

Jason: Yup, so far we don’t show multiple versions.

Q2) The final article is often far from what reviewers read so that review note addition would be a problem

A2) Daniel: Yes but you could show the version they saw – you can link to multiple versions. And you can show interaction that helps clarification etc. At Frontiers journal you have that interaction but not in public yet.

Q3) Anthony: For Robert – when I heard Clint Lintock(sp?) talk about these projects I was very interested in how the “training” of volunteers takes place. And they meet up quite indirectly.

A3) Robert: For StormWatch so there is an online test but otherwise volunteers are self-trained. But there are forums for all projects – and we’ve started to do our own software for forums – and that discussion is how the Galaxy Zoo

Q4) Bess: I know that science projects around space are typically underfunded

A4) Robert: As a sideline these projects do public outreach function. Participants do tend to be geeks and nerds though. But because it is kind of different you do get media attention and that’s nice. If you’ve tried using Facebook for anything it doesn’t work for reaching out to people as the traffic we get from there isn’t useful, people don’t actually take part in our sites when they are referred from there.

Comment) Nicola (me): It depends on your specific audience though. For astronomy geeks they tend to be expert amateurs and perhaps Facebook isn’t where they hang out. For a project I’ve been working on (AddressingHistory) we’ve found a number of genealogists and local historians do hang out in Facebook as it’s a personal interest, it’s a space they hang out in, so we get useful traffic there.

Robert: I don’t think our audiences are that different but we don’t find Facebook traffic does anything, it’s not useful as something weird happens after they click away from Facebook…

Claire: For Events I find Facebook can work very well but you do use it in a slightly different ways to other social networks.

Q5) Anthony: Most journals come from some sort of community?

A5) Daniel: Envision federation of wikis. There is the NMR wiki for MRI work for instance – most journals require you to send in data in non editable forms but on the wiki you can graph but also edit data into graphs.

Q6) Tula: Do you see a visualisation tool on all these wikis?

A6) Daniel: Not for each wiki but a separate space with an API to feed into wikis

Robert: Personally I don’t find wikis very usable. My concern would be that you are educating people in using wikis – I’m not sure if the time and energy wouldn’t be better placed in finding the right tools for the job. You need standards

Daniel: I am not talking about one tool. Any public, editable, commentable, collaborative space I am thinking of as a wiki.

Q7) Caroline ?: I am concerned about the Journal Galactica type idea – I don’t want to shop in a mall, I want a store that has fewer tailored/curated items. These wonderful tools and all this data is out there. We don’t follow and evolve that fast.

A7) Daniel: YEs but there are other tools that let you discover what you want elsewhere

Jason: You get cross-discipline serendipity. So we do Amazon-style recommendations. You just focus on your own little world but have a choice of all sorts of materials

Q8) Claire Bower, BMJ: As we don’t have all open access journals – how do you deal with PDFs behind the paywall?

A8) Jason: Like other sharing tools on the web it’s up to the end user to decide what to share. In the US we follow DMCA. There haven’t been many take downs requested so far.

And then we had some tea… And now back for:

Scholars at work (1)
Description of Session from organisers: The central sessions of the second day brings together scholars from different disciplines who explain how they use social media in the research process. In the first session Jeremy Frey (Southampton) explores e-science and especially chemistry 2.0 to explain how collaboration can progress science.

Claire Ross, PhD Student at UCL

Claire is researching in digital humanities.

Social media is not new but many academics do not use these tools for work or research at all. There were three main barriers when we did a study on this:

  • Time – they are very busy and are not convinced of value of
  • Trust – can’t attribute authorship in a clear way in many cases.
  • Authority – many do not like the fact that anyone can comment or interact “our knowledge is truth!” – which is a worrying thing for those in the humanities to say.

The Research cycle – people are using social media at all stages but these are mainly early adopters. However these tools can be invaluable for finding new research opportunities.

DH Now – follows everyone who claims to be a Digital Humanist on Twitter and grabs every post and link and compiles in real time an aggregation of papers and information – it shows what goes on in the community now.

DH Answers- you can post questions here, discuss questions and explore opportunities

Identification of knowledge, creation. dissemination – most popular tools for humanists are Google Docs, Skype, Twitter. Claire’s most recent work was with the British Museum from a tweet on her first day at work – the head of digital replied to her. A year of statistics, two papers etc. have all come from a single tweet! Claire posts work to Twitter and Facebook – it really does widen your reach!

Social Media & Claire – Claire isn’t a digital native, she didn’t have a computer until 18, and didn’t have the internet properly until 23 so I am learning just as much as everyone else. It isn’t age that makes the difference in social media. I truly believd that it enhances research capacity, it improves the quality of work through enhanced ability to find, use and disseminate information, it is a key communication network. I can discover, filter and discuss. I can meet people I would never speak to at a conference – it’s amazing for making new connections if you are a shy person.

One of my main ways of getting my voice across is my blog (Clairey Ross: Digital Nerdosaurus). This is how Claire talks to people, her primary method of dissemination, it’s a log of her PhD, shares her papers and work. This shows that I am research active. I have had conference papers I would never have been invited to because of her blog. Is blogging damaging her academic career? Heck no. And Claire is a member of the HASTAC community – the Humanities, Arts, Science, and Technology Advanced Collaboratory, it’s classed as academic citizen journalism. Claire has connections with over 150 different scolars discussing the same issues from different viewpoints.

Moving onto Twitter – it’s where Claire gets the majority of her information. One screen at her desk is for Twitter, the other for everything else. Twitter is not for young people – most users are over 35. There is a Beiber factor (teen girls) but for Claire it connects to a huge academic network and community. And it provides instant feedback. There are at least 60 departments and libraries at UCL on Twitter – you get a better connection even to your home institution.

At UCLDH – the UCL Digital Humanities – we do a lot of research into this. We look at how social media changes how researchers interact. Primarily we share via the DH Blog – it’s our main publishing tool and it is open and informal.

There was a discussion on Twitter about what presentation tools are best and Prezi came up – so here;s one I made earlier. This was on how Twitter can be used at conferences – we tools tweets from conferences, we open coded and analysed the tweets, put them through textual analysis and found 300+ unique users. It did become an enabled backchannel – sharing links, following up questions, having side discussions. etc. Twitter changes that one to many relationship hugely and that’s what I like and what the DH like. Does it provide a more participatory conference culture? Yes! “instead of zoning out I can be an active participant in more than just the panel sessions” says one tweeter.

QRator (qrator.org) – a collaborative project between DH, CASA and UCL Museum and Collections – the idea is to create new kinds of content, co-curated by members of the public. Alows people to interpret the items in the museum, add comments etc. Very very social. The project is based around the idea of the Internet of Things. QRator came about when there was a presentation on Tales of Things (ToT). ToT has also worked with Oxfam to share tales of donations and items which is really interesting.

Transcribe Bentham – Jeremy Bentham is the founding father of UCL and there is a myth that he gets wheeled into meetings still! UCL holds 60k of his manuscripts, 10,000 images of these have been digitised, they have been added to the transcription desk, and users can enter transcribe the text into a plain text box. It’s a complex and time consuming process. Users are asked to transcribe papers between 100 words and 1000 words in length. It’s a tough set of documents to transcribe. The project launched in June last year, as of this year there are nearly 2000 volunteers signed up, and 83% of the manuscripts have been completely transcribed. The site is built on wikimedia with a simple front end and has recently won an award.

Our next project is on digital identity on social media. We have had various discussions about Fcebook being bad for scholarly work. And how there are separate personal and professional personas. For me I don’t and they are mixed. But that has to be a concious choice and many are not aware of what this could mean. It’s interesting to see how people perceive themselves online and how they think they can manipulate their profile but it’s actually all about how others perceive themselves. Is none work discussion professional? Can you actually separate personal and professional? How does social media fit in? We’ll be exploring these areas…

How do you manage information overload? You Filter with RSS and Google Reader – easy to aggregate and filter. And a TWitter client is a must for filtering, storing, discarding. And, crucially you need to create the right network – follow people you know and trust and make sure they are useful for you.

And as Claire finishes our chair asks why the Humanities is separated from other areas. Claire says that it’s all research so it is artificial. After yesterday’s presentation I think it is teaching and reseach

question from Ian Cornelius: a few years ago I worked on the Jeremy Bentham papers and found them very tricky to work with. You talked about authority of social media but is “confidence” a better work. So if I looked at that Transcribe Bentham work and I knew that I was viewing the best bet for that transcription I would be concerned. It’s about having the confidence to know what you look at and what the context comes with that.

Claire responds: there are about 15 core participants and many others drifting in and out. I would suggest that you look at the partially transcribed papers to follow on from what has been volunteered. In Transcribe Bentham there are two researchers dedicated to the project who check a transcription for quality and when sure of the accuracy they lock it off to new edits.

question: About quality control – for you and Robert – it seems that at Galaxy Zoo you do the quality control with many people doing the same data. But I was wondering about interaction – leaving comments and exchanging around what is and is not good quality data. But in Transcribe Bentham I’m not sure how you would do that discussion exactly other than with the researchers.

Claire: transcribe Bentham has a forum but it’s not connected to a specific edit/item/page. But that might be interesting to do.

Robert: the difference between the projects is the length of the text and the data is very numeric so it’s relatively easy to see what likelihood there is of inaccuracy. But with Bentham do multiple people edit the same thing or do they collaboratively transcribe?

Claire: They are collaboratively transcribed with a version history.

Comment: overlaying multiple texts can be great for error correction. But I heard someone say something about  as many as 10 transcripts – did I imagine that?

Alexandra: It was Old Weather -they did have 5 copies of transcriptions then moved to 3 versions for each.

Robert: Yes, and when they pay people to do this stuff it’s only 2 people who transcribe.

MIcheal: I’ve been involved in Digital Humanities for maybe 20 years and a lot of what we do is creating sources using text ei (?) – but what is the methodological infrastructure that you are moving towards? I always bring up with my doctoral students is that you need a systematic method for looking at these sources – I think you have it but you didn’t see it.

Claire: I do what is classified as second wave Digital Humanities – we don’t have a pure methodological structure for what we do. I use grounded theory – you look at the data from the ground up to develop theories.

Micheal: From sociology but doesn’t use sociological concepts.

claire: particularly looking at objects and letting people define themselves. We aren’t trying to test a hypothosis but seeing what comes out of it.

Jason: An idea that you guys may have thought about for Jeremy Bentham might have been a ReCaptcha type approach?

Robert: Our data was fragmented already so got round the length issue. But you lose context and serendipity.

Claire: Most of our transcribers are interested in Bentham and his work. It could be really interesting to do a ReCaptcha approach but maybe to combine them as our participants find the material itself so interesting.

Anthony: People find that crowdsourcing is more effective and cheaper than just using experts or research students to do all of this work.

Claire: We thought Transcribe Bentham would change the world. Two RA’s working for the same time would transcribe more data BUT we have widened awareness of Jeremy Bentham and particularly to A-Level students. So it may not be as efficient but this does far more public engagement, outreach and impact.

Anthony: Any final points before lunch?

Robert: I’m up for an arguement about Facebook!

Anthony: I don’t want my relatives in the same space as my colleagues

Claire: When I joined the department my Facebook profile was just for personal stuff  and then my boss friended me. I was nervous about it but I can see just as much of her drunken photos as she can see of mine and I’m fine with that now. I think too many scholars just write off Facebook as a personal space.

Alex Murphy: Does blogging affect what can/cannot go into publication?

Claire: I’m an early career researcher and a lot of my blog posts feeds into articles and publications. The difference is in timing – the blog is instant but the publications can take a year.

And so, it’s off for lunch… And we’re back…

Scholars at work (2)
Description of Session from organisers: Scholars from a range of disciplines will explain their ways of working. Acceptances include Tom Coates (Imperial), Alex Murphy (Edinburgh), Claire Ross (UCL) and Alun Salt (Leicester).

Our first speaker has given us a great hand out – will try and take a picture to share here later!

The Fanosearch Project – Tom Coates, Imperial College London

This is a project at Imperial with colleagues in Moscow, Sydney and Tokyo.By training Tom is a geometer, he studies the mathematics of shapes. And traditionally we can break up shapes into simpler pieces – but it’s not a very useful approach unless I can tell you what the atomic pieces are. So the hope is that you can understand shapes almost by doing chemistry for shapes – breaking them down and understanding

Fano Varieties named after Gianno Fano who studied them in the 1930s:

  • 1 in dimension 1 – basically a straight line
  • 10 in dimension 2 (del Pezzo, 1880s) – shapes with a curved surface or shapes splitable into curved surfaces – so a plain and “the blow up of the plain the point” – like a spiral shape say. [I think]
  • 105 in dimension 3 (Mori-Mukai, 1990s) – part of amazing work done by Mori that won him the Fields prize
  • Almost nothing is known in dimensions 4, 5,…  there must be a trillion trillion but we know of about 250 ish.

So an example of a geometical shape – a hyperboloid which is basically the shape of a cooling tower. Those grids we see on those towers show it can be broken into straight lines – it’s a molecule but not an “atom” with the atoms being the straight lines.

So why might Fano Varieties be a problem that we can solve? We want to combine ideas from different fields – we combine geometry, maths (and string theory), computer technology (which has improved hugely). We think we can turn this problem into process of sorting through 476 millions shapes – totally plausible with a UCL or Imperial sized computer cluster. An entirely tractable machine computation problem.

The programme is international – collaboration across four countries but Tom points out this would be a typical collaborative project in some disciplines – in theoretical physics for instance – but unusual for mathematics. And we have wider collaborators who use our data and/or we use their tools. So lets say a bit about what we do:

Most of our work is in “meatspace” – it’s in face to face meetings, it’s arguing in front of a whiteboard, it’s pencil and paper calculations, it’s writing code, it’s analyzing data etc. The bulk is offline. But managing/gluing together input in different locations does require use of technological tools: email, phone, Skype (a lot), IM (brilliant for coding even when collaborating with nearby colleagues), DropBox (for writing stuff), www.arXiv.org (all papers pretty much there), StackOverflow (you are writing some piece of python code and get an error – if you Google it StackOverflow usually ranks highly and is full of other programmers’ arguements about the same things, tips, techie help – they have a sort of Karmic status thing. Basically as a consequence most computing answers are there).

Also using some tools:

  • collaborative research blog: http://coates.ma.ic.ac.uk/fanosearch
  • sharing data – via the blog which is great for our less formal collaborators
  • sharing ideas
  • On Twitter as @fanosearch – was being used internally for sharing ideas, new shapes or similar. Got written up in New Scientist but now our results of computer calculations that were being shared there are part of our public outreach – so we add images and updates there too.


Q1) Claire Ross: Do you want to say anything about how you are using Twitter for this project compared, say, to digital humanists

A1) Tom: I’m not sure we use it the same way. So in the last week we’re running the first 4 million calculations – so there were lots of progress report tweets. Occasionally I’d remember to post a picture. But also because we’d messed up some code that made the computers run slower I had to tweet to explain the reason for the slowness, and explain that we’d be switching everything off and fixing it. So there are manual status updates for the team but mainly it’s the computer tweeting automatically where data is found.

Claire: Interesting how you are using automatic tweets – for Tales of Things our objects tweets when they are scanned, they share updates etc. It’s an interesting thing…

Q2) I was wondering if the collaborative tools make up for meeting the team and collaborating in person? How did you meet your collaborators?

A2) Tom: No, somehow this is glue or tools to smooth or streamline something that existed before. It wasn’t that I started a blog to start blogging about maths – I got annoyed with sharing data via email and the time differences caused problems. The blog solved a problem rather than being used for the sake of it. So I met the team through some social links, some student/mentor links, and some serendipity of senior people who have visionary ideas about how to combine fields. So a series of individuals and MAGMA  was a connection through other personal links – standard/traditional networking, conferences etc.

Chair: I’ve worked with mathematicians and they often work alone so I was interested about how differently mathematicians might be using social media

Tom: Yes, we tend to work out of each others sight lines even but there are different ways of working for some projects. And in terms of social media Tim Gowers and co – some serious mathematics blogging that has really taken off.

Q3) Tula: Does this have real application?

A3) Tim: Yes. We imagine ways in which physics works in particular dimensions that you find in shapes [note I suspect this might be a bit muddled!]. We use this hard problem to understanding mathematics for string theory and that might strengthen our understanding in how things work at very small scales.

Dark Matter and the use of Social Media in an Unsocial Environment – Alex Murphy

So, first some science. So it starts with Fritz Zwicky – an important but unpleasant man. He is most famous for looking at a series of galaxies. In 1993 he looked at Galaxy clusters, he observed their motion (in fairly innovative techniques of the time), and he tried to applied the laws of physics that we know, and he deducted that there must be more mass present than is seen (off by a factor of 400). So, here is a picture from Galaxy Zoo of M101 – a beautiful galaxy of spinning materials. So, thanks to a demo of a duck we see the effects of mass and attraction on the size of area in which things spin.

The answers to what is Dark Matter? Well the answers seem to be in particle physics. The “standard model” is about fundamental particles – electrons, quarks, guons and about Fundamental forces. But it tells us nothing about why things do what they do.

But SuperSymmetry does help – it explains why we see the range of particles and forces that we do, predicts particles we have yet to see. The lightest of these, the WIMP, has just the right properties to be dark matter. Would have been made in the big bang and would be stable and a big influence on cosmology. It’s particularly nice that this is an independent prediction!

Models with dark matter and dark energy do work, without these simulations do not: it appears to be neccassary.  Last week in the Ft there were the top 10 hottest field and the composition of the universe and dark matter was in that ten – this stuff matters.

The challenge is: the WIMP-like Dark Matter hypothesis can be applied to the earth by creating tools that detect only dark matter. So we go to Boulby Mine – a very dark place producing potash and salt. £900M/yr commercial operation. It’s a huge place so we’d never be able to fund the infrastructure just for physics. The lab is in the mine shaft 1100 m underground. There the natural radioactive backgrounds are significantly reduced.So what’s it like underground – a fantastic wee video is telling us that it is a lot dark. A lot. Quite a walk to work.

And now for a video animating the mine and the set up – a very light clean room with data acquisition, wireless networks etc. They are measuring the reactions in liquid xenon – there is only a range of 4 degrees at which it is a liquid and that must be kept stable for many months. If it does heat up there is potential for explosion. There is a high voltage in use (up to 17kV – another experiment there at 33kV). There are around 30 slow control readings moniteored and logged. About 120 channels of data reading out 20 us timelines at 10 ns sampling – 100s of Terabytes of data. It’s a harsh environment and this experiment has run continuously for 320 days. In the cold snap in February there were real challenges to keep the experiment safe. It’s 4 hours travel, the access to the lab takes an hour, environment is hot, dry, dusty, tiring (detectors in separate clean room), costs about £100/day, mitigate as much as posisble but mines are inherantly dangorous places.

There is a live feed of data but fairly slow of a related project (http://www.hep.shef.ac.uk/research/dm/driftwatch) If any variables go wrong the person on duty gets a text from the detector to alert them. The ZEPLIN project (now completed) also had stats online – it was hard to notice problems though so we changed the screen background colour if connection to the mine was lost – made  a big difference as much as posisble.

We like to do as much as possible of this remotely: use Xdisplay, Remote desktops, ssh, etc; can control voltage, temperature, data acquisition; Site-wide power failures happen about once a month and it’s a problem! We have a UPS power for about 8 hours and safety measures for switching pumps and power on safely. There are network speed issues – link between surface and underground is fast but has a high failure rate so we have a phone-line modem backup (“shutdown” signal). Sometime the link between the min and civilisation is slow – low bandwitch applications vital such as Twitter feeds and non-video skype. Data are sent to surface on discs and backed up etc.

So, @zeplin3 tweeted updates (a protected Twitter feed) of what was happening. It was there as a tool, it was useful and easy. Allows the key facts to get anywhere in a fast, reliable, useful format. We didn’t want our competitors to see where we were getting with results hence protected Tweets.

Contrary to what Claire said I do have a problem with the Supervisor/Student relationship. One student, who had lots of issues, finds it impossible not to share images and comments from the mine. Her majesty’s mine inspectorate likes to ensure that mines are safe. If you take random pictures you are bound to spot things that might be an issue for health and safety or something confidential and that shouldn’t be shared. The pressure was from the mine who were concerned to make sure no photo indicated any breach of health and safety. They want a safe space but extra safety inspections are expensive.

Robert comments: the problem isn’t the pictures but the lack of adherance to standards.

Yes, this is true. But paranoid concerns are an issue even if there is nothing inappropriate being photographed.

Comment: but this isn’t student supervisor relationship issue?

I defriended him but someone else viewed. But the rule is that you DO NOT share your research and this student did not really understand that yet.

Comment, Anna: At CERN they have an internal 3 month process to check all results and verify them and they are terrified that someone will leak the discovery of the Higgs.

Tom: They have dealt with that to some extent because they have aggressively structured it so that media know to expect verification rather than any result.

To be fair Twitter/blogs are fabulous as a source of news and gossip on the field and essential in teaching and fantastic for that.

How did they do?

A successful 320 day run. Fully remote operation. 24 hour monitoring throughout from somewhere in the world. Stable operation within operating margins – 1 excursion due to a 2 day site-wide power outage, fantastically 0 health and safety occurrences, and lots of publications including a world class publication coming in the next few days.

Research Blogging and the Annals of Botany – Alun Salt (e.g: http://bit.ly/Archvhist)

Work for Annals of Botany as weblogs editor, but not a life scientist! Actually an ancient historian. Thesis was on ancient astronomy. Through to 2005 was doing regular conferences etc. with interdisciplinary papers. You generally have to be a specialist and member of a society to give a paper and get value on conferences. For personal reasons had to rein in the attendance of conferences I set up a social network and connect to others’ work and related fields. They’ll be some good and some bad stuff to hear about here!

So, here is Alun’s research and social network – with two way channels in green (twitter, blogs, email), input streams in blue (flipboard, mailing lists, google reader, the library), and outputs in red (zotero, mendeley, facebook). And I’ve erroneously missed off Google Scholar to find stuff – my field isn’t big enough for finding stuf in Mendeley – if I were a biologist I would find great stuff there though.

So mailing lists are still very popular in ancient history and archeologists although I’m increasingly using them next. On Twitter you can create bespoke lists but individual tweets might be off topic. You do get spammy messages on list – on Twitter you’re more likely to get LOLcats! But it is easy to miss tweets – how many people can you usefully follow? But it’s easy to miss emails especially if the subject line is dull. It’s swings and roundabouts. Yes, in Twitter nuance can be lost in brevity but in emails “for gods sake could you try brevity”. But something positive here: guaranteed audience for email messages – uncertain audience for tweets due to RTs.

So Becky Goetz at Rice on Twitter/FB sent out an obscure question about a moon in 1625 – and instantly had an answer. People have misdated this information because of use of different calendars in different locations in the 1600s. But this sort of dialogue helps prove the case for Twitter – a great way to get quick answers.

The other thing is filters – Flipboard (pulls out links from tweets), Google Reader, Email, Library – all filtering content and letting you save some stuff as you find it.

You see a lot of offline connections replicated online. But as an interdisciplinarian it’s great to curate your own groups of contacts and colleagues. Works well when integrated – arrange things online then meet up offline.

So Alun is filtering the universe of stuff into what might be or is useful. The more active my network is, the more effective that filter is.

So I also work for the Annals of Botany and they wanted connections between the journal and the botanist – and bipass all the filters with information on genetics, nutrition, pollination, ecology, biofuels. A general feed is fine for undergraduate botanists but not for most researchers so something general leaves you in the spam bin. Actually you need to be filtered and be a useful source for people. So push the journal to RSS (not dead, just not obvious – it’s a glue for building other things), to Citation Tools, and RSS can lead to Tweets and Facebook. And then we have a weblog that is fed by the journal and feeds into other places. Journal is at http://annbot.org and FB is http://facebook.com/AnnalsOfBotany and @annbot and feedburner as AobBlog, and http://aobblog.com/ and http://bit.ly/poln8n <– for pollon, obviously.

So a lot of information pushing out but we don’t all use the same channels, most readers just use one or a few channels. They can pick and choose where they want to get the information and when. Our readers are not passive receivers. But we need to make it easy for people to find our journal’s content.

We also add other people’s work into our feed. It’s useful content even if it’s not from our journal. Being the referrer to good information, making ourselves useful by being a participant in the research process, rather than just pushing our ownership of stuff.

Also uses Mendeley for related research recommendations on a page, very helpful. This is on Then Dig (http://bit.ly/thendig/) where we’ve come up a call for posts themed on distance though posts are cross disciplinary. The idea about the blog is that you can do a peer reviewed publication – Alun is skeptical but actually for conference abstracts – if Alun had blogged what he says today and it was reviewed we couldn’t scoop what he’d said today, he’d have witnesses! Also these ideas of blogs as proceedings of online conferences, discussion etc.

University of Leicester: The Centre for Interdiciplinary Science (http://www.le.ac.uk/iscience) – teach social media in 2 hours. A bit of an issue – what do you teach? The need to teach this is because others in the department don’t use social media which isn’t a good example. Students already have a filter – syllabuses are the focus for undergraduate students who tend to be focused on the grades. Only students who plan to go on to do research find the big value in this.

Everyone is the centre of their own research universe. You can’t dominate a space. You just need to participate and help yourself by being useful for others.

Find Alun Salt online: http://alunsalt.com and @alun.

Chair also recommends another Alan/un: “Social Media: A Guide to Researchers by Alan Cann” – available from the RIN website.

Panel Session / Q&A

Chair: asking Anna Kenway, Edinburgh to ask some questions.

Comments from Anna: Was very taken yesterday with the comment that you shouldn’t tell scholars which tools to use, but to let the research to drive the solutions not the other way around. My main concern about the social media is the interaction between the private space and the professional space and they are quite tricky to untangle unless you want entirely separate identities. How do you deal with that?

Alun: I wouldn’t be on Facebook if it wasn’t for the Annals of Botany. I tweet just to a limited list of friends first and foremost, often automatically from RSS.

Anna: There is a lot of noise about. I’m a big fan of Wikis as a collaboration space in the School of Physics and with the eScience Institute. Even though we initially set the wiki up as a service the Science Board expected to see outcomes from research themes. It came back to bite us a bit as not all research themes were using it. Going back to RSS I wanted to do something useful for researchers – a feed for funding. Did an aggregated feed from funding bodies but you couldn’t filter it at all. In the end it was someone at Edinburgh Research and Innovation (ERI) blogging that generated a useful feed – a human filter.

Chair: Do any have any questions for your fellow speakers

Tom: How do you use Skype?

Alex: We use a camera on the laptop to show what’s happening and get advice for instance.

Chair: Are mathematicians doing real research on their blogs?

Tom: Some people are quite conciously doing research on their blog, like Tim Gower. Some fields can do that, so it’s totally conceivable that you can do some mathematics without the hardcore computing power. Terry Tao (?) blogs hugely as a commenting on mathematical research he is seeing – it’s less research but hugely important

Chair: Is blogging replacing publication?

Tom: I think that’s just conspicuously bogus. Similar things have been said about the arXiv – people upload papers before publications but no killing of peer reviewed journals (unless you are being secretive about what you are sharing for competitive reasons).

Q1) Robert: When you are constructing your Annals of Botany landscape there – you put Facebook and Twitter to the side and feed your RSS in. And that’s really important. You make your content available in lots of places. Maybe that’s very important!

A1) Alun: By the time I get home I suspect I’ll have an email saying “why are’t we on Google Plus” – RSS is the glue that pulls things together.

Q2) Sarah: Tom you said that your collaborators are global but you said you do most of your work face to face. Interesting as social media is often promoted as the solution to globally spread partners. Coud you design an ultimate social media space and what would it be?

A2) Tom: My dream tool would include a virtual whiteboard client that doesn’t suck. This will differ person to person. I make most progress in mathematics when debating it with someone – we’re not all the same, many are the reverse. To have a face to face meeting with a whiteboard online right now isn’t good. For people who share completed thoughts the social media stuff may be better.

Alex: Maybe I’m being defeatist here but we discussed at the workshop with Anna and co and we discussed how we’d written our own tools but we’d stopped as commercial tools are so good and quicker to use.

Q3) Adam Cornelius: I get a sense that using social media as a source of communication for your projects is important but is a system for understanding that communication, analysing stuff in your field etc. At a time we thought about citation analysis to find gaps in fields – could social media do that.

A3) Tom: Not enough mathematicians are using social media for meaningful results

Alex: I feel underqualified to answer that

Alun: I don’t think in archeology

Anna: For mathematically based subjects there is no good way to write equations – blackboards are important.

Q4) How does what you currently do relate to later data sharing efforts?

A4) Alan: Nothing terribly social about it. I don’t tend to publish data before it’s gone into a paper. Other people could analyse faster than me if I did share my results earlier.

Alex: I would love to easily get my hands on old Skype chats where things were all worked out but I’ve forgotten the decisions.

Tom: They’ve broken the /htmlhistory command recently!

Q5) Filtering – I constantly filter email, tweets etc. There is always a human element in that your eye catches different things.  Secondly how do you gather and store your data from social media – it’s so much about what is there now.

A5) Tom: I can look back at the blog – it’s on my server so that’s easy. I wouldn’t want to put my data where it might disappear.

Alex: I do worry that we are not future proof to changes in software. And hacking things – if Googlemail got hacked I’d be screwed if that went offline. I’m stuffed without gmail.

Alun: In terms of what you’re missing there is an idea that Twitter is an echo chamber – if you miss something you may see it retweeted. I use trunk.ly at the moment – it will grab links and reshare and feed other tools. And that’s tied into pinboard – you grab data in multiple places.

And now… for a teabreak…

Future hopes and visions.

Description of Session from organisers: The concluding session of the conference will end with predictions of how social media will/might change the nature of scholarly communication. What will be lost and what will be gained? The list of speakers will begin with Cameron Neylon (STFC) and Geoffrey Bilder (Crossref).

First up it’s Cameron Neylon. He is not a social media guru thank you. He has been and is a bioscience researchers, I work with scientists to help them with their research but I’ve also been interested over the last 5 or 6 years in tools that help researchers track and record their work. I came at this as someone who wanted an excuse to write a scheme to fit into a grant to get more money. We built a lab notebook based on a blogging system. Got involved in wikis and people in the open access community, and potential for collaboration. Once you soak this up you do become a person who does talk about social media.

So what I thought would be interesting to start with would be to talk about what is possible, what the technology 5 or 6 years ago could enable. And why we are not using it. And then look at communities that indicate which possibilities may actually become reality. I think geoff will tell the same story from a different direction.

My background is in the sciences but there are enough commonalities between those. At the bottom what we need to do as researchers is to communicate. If someone doesn’t see it, use it, think about it, get benefit from it then it’s not research and not justifying public expenditure. In the nineteenth century paper and print were the technologies. Journals and books were probably the most sensible business and technical model for sharing research. But some things changed the fundamental economices of all of these things, particularly since the early 1990s. We can share research outputs to the whole world as they occur in real time. I do it. I share my results live in my lab book. We can communicate the research easily and efficiently. We can easily create narrative documents online, they can be reviewed and shared online. It’s not clear to me what the purpose of a journal is anymore. The technology is there – filtering through search engines to help people find what they are looking for. Social connections between people are also highly effective for

I don’t do search for research for literature in various areas I work in. I rely in people in networks, forums, Twitter to do that for me. And it works. They are very effective ways to find new information. We can share data, we can use it etc. We are not great at wrapping that data up in ways that are easily findable and discoverable but it’s not that bad and it’s cheap. The technical problems can be tackled – we have different views and approaches but there are angles of attack for improvement. We can share the narrative, the raw materials, we can create catalogues of materials (physical research objects) to enable people to have access and ask questions. Most of our tools are from the consumer web not the research community. And there is another thing we can do as researchers and that is to connect out to a far wider community outside the lab, outside the field etc. That can go all the way from SETI@home – where just a download and passive – through to Galaxy Zoo where people are engaged – through to the Open Dinosaur project which is getting people to read reports and share info on a Google Doc. Through to things like the Climate Code project – fortran updated by experts to make a useful modern resource. The tools are there to enable people to feed in, to make information available. We spend around $5billion on communications using 18th century technology and we can save spending this.

But we could have changed 7 years ago. Open Access is just getting traction but we could have made research openly available in the early-mid 1990s. Could have shared data since 2005/7 ish. We could be embedding enagagement with the wider public in the research process all the time but right now exceptions not the rule. Why do we choose not to do that? Well it’s the issue of collective action. These things only work when everybody does them. Social media sites for researchers adn scholars tend to be tumbleweed spaces unless an existing community transfers to this online space.

The question really isn’t technical anymore. Most technical challenges are trivial (especially if you respend that journal money). So the question is where are there relatively large communities doing things and which tools will become part of the mainstream. Will journals go over the next 5 to 10 years? Probably not, very much part of research stream, part of reward schemes and they are a useful place to tell stories about research. Even most quantitive scientists share through telling stories.

But what we are starting to see is a much wider adoption of blogging as a form of notetaking, for describing and critiquing the literarture. It’s not clear to me that reviews and review journals will survive for a long period. The review is only valuable when it’s specific for you. People create reviews regularly, often in blogs, and those are often found in Google Scholar (as well as Google). So that’s one place that you see things starting to shift.

And there are other projects where things are starting to change. Mathematicians at the top of their field are changing small portions of the wider mathematical community. In theoretical physics some papers never make it to journals. Looking at citation patterns it’s pretty clear that poeple read ArXiv rather than the journals. The peak of reading and citations there is the date of publication. People read the deposit copy and cite when published. There is a big enough community there that it is changing the community there. Research evaluation in theoretical physics does reward citation in the arXiv.

In citizen science various projects have shown the potential to get a lot of people or the right people involved in their research. The movement is happening.

Sharing data is becoming more popular – people get more highly cited when they make their data available. Communities sharing data effectively are doing the best in terms of maintaining their science, their research in the eyes of their funders.

The other thing that is changing at the moment is the process of research evaluation. For a long time the research community has done more or less it’s own thing. But that’s changing. My personal view is actually that’s a good thing because as researchers we need to be better embedded in the wider community, and we do need to think about how our research makes a difference, why and to start maximising how that makes a difference. It’s one thing to argue that research has a long term impact for competitiveness of country, for other science. But you cannot make that arguement if your results are not available for others to use, if your process isn’t shared.

So we need to maximise the impact of our research so that it can be found, used and not repeated again and again. The governmenet as research funders do have a view that a wide variety of research is important. That applied research must be relavent and reach out to the community. But interest and research driven work is important for innovation, for competitiveness and for creating a viable research community. People talk about impact. We can talk about citation and research impact, we can talk about economic impact. We could also talk about social impact, health impact, research led policy making impact. Perhaps we need to measure evidance of reuse as a way to measure impact of Research. This is not exclusive to science but applies to humanities too. It’s a way of talking about research in a way that makes sense. If research has slow income you can at least ask the researcher if they have thought about maximising impact and evaluating that impact.

That impact is an agenda, a political agenda, a policy agenda. You have to engage with it, it’s not optional. If we talk about that agenda being about reuse, exploitation, discoverability then we are talking about effective communication – ensuring that work is fully indexed by Google, Ensuring that a random member of the public has at least some route back into the research, contributing in a way back to the research. There are examples of all of these things happening today. Not all will become a reality for all sorts of reasons. But in the end the question is how do we optimise communication. We have a lot of tools and a lot of people using them in an interesting way but we need to think about what tools, how do we use them in the most effective way, how do we create the best opportunities in the long term to communicate effectively in the long term.

Communication Conflation: Anti-Patterns in Academic Social Media – Geoffrey Bilder, Director of Strategic Initiatives at CrossRef

I was asked to come here to predict things. I am often asked to predict things and I’m pretty bad at predicting things – but I’ll pull out patterns of bad predictions to help you think about social media and new technology you are equipped to know what may succeed and when it may succeed. As Cameron says there are loads of things we could have done but have not and that’s important.

When I talk about anti-patterns this is an idea from architecture. These talk about how things come together in complex ways to make a whole. Software take this to describe technique below the programme level and above the algorithm level. Anti-patterns are where people follow patterns that do not work despite knowing they do not work.

One of the things that pushed me to think about predictions and the anti pattern business was a conference in 2009 – Science Online London. It’s a great conference. In 2009 it was nominally about blogging. But really we were all excited about Google Wave – particularly pertinent today as we scramble for Google Plus accounts. At the time I really resented those without accounts. I really thought it would radically transform the ways scholars would work together. We gave this a good shot. The 2010 conference lacked any mention of Wave at all. It was dead. The only mention is me and I ask why we don’t mention Wave.

We do this a lot – we get excited and they don’t quite work out. We don’t reflect on why we fail to predict things properly. And obviously is that I am a geek. Anthony always says these people are real researchers, real scientists, have real data but I don’t I’m a computer geek. We talked shambrarians yesterday. I am Foublisher – a Fake Publisher.

My first computer was a Commodor PET and I loved programming, I was amazed. I was thrilled that I could programme the machine to print text. Other kids said “huh?”. I said look how fast it can do these things and got very excited and scared them away! But I did do some programming which was good. I went to Brown and moved quickly away from what I was supposed to be doing towards computing. At the time we formed a scholarly computing club and we were all excited about the SGML handbook. SGML was like an XML forerunner only it was impossibly complicated. But the response to it was “huh?”. I did a lot of pushing of SGML.

Brown, at the time, was doing loads of research into hypertext led by Ted Nelson. I wrote my own html tool usig SGML called Abulafia. Response “huh?” again. “what are people going to use it for”.

Steve Jobs got kicked out of Apple and went to Next. They had a relationship to Brown and I was one of the first programmers for Next. The Next was seen as the “Academic Workshop” – it ran on UNIX, it had built in TCP/IP Networking so could be on the internet easily right away, SGML Support, email, visual programming. And, yes, again “huh?” and occasionally “how does it print?”.

This seems really sad. All these great things that people don’t get. But I got it a bitmyself…

This guy, Time Berner-Lee came up with the web, used Markup (HTML), written on the Next. Should have been up my street. I was like “Meh!” where are the bi-directional links? Why doesn’t it use SGML. And you have me here to predict? I was a Newton developer! I have a bad track record!

But view some graphs that explain 90% of IT disfunction here.

There’s the Gartner Hype Cycle that maps visibility and maturity. The joke is you peak when you hit the front cover of Wired! NeXTStep was the right hors – it’s now part of OS and iOS so on loads of your smartphone. XML is the ne SGML and everywhere with CSS. Everyone is looking at text and data mining now. DRM and Second Life has dipped. PDAs disappeared but they came back as phones. Things change name as they move through the curve.

And a graph of perceived benefit of technologies against the expertise of the person. The totally niave are thrilled. The semi-experts are jaded and used to supporting stuff. But the super expert get thrilled again! They see the potential.

By contrast perceived risk is the oppositte curve – low risk perception by nieve users, experienced semi experts are very cautious, but the super experts see low risk as they see work arounds, ways to cope. So Group A are nieve and excitable – investors, bankers, CEOs etc. Group C are super expert and excitable (propeller heads, geeks) and Group A types seek us out for advice. Group A decides to move on things. But Group B these cautious jaded people freak out – this is the IT helpdesk, This is the problem and inertia in your organisation. So take everything we say should be taken with a pinch of salt.

A quick catalog of Anti-patterns – the list was absurdly long so this is a random selection:

  • Listening to “Silverbacks” anti pattern – this is about talking to people at the top of game, expert in their field. I talked about geeks but it’s also senior scientists. They say they don’t need to be in a journal perhaps because they have gotten so fast. We heard yesterday about Kroto not publishing once he had his Nobel – these people are the brand. Where you are in your career really matters. But we forget about that, we have so many grad students, we have users who don’t know what signposts for credibility are, we have undergraduates that don’t know heuristics, and we have beaurocrats needing cues, and we have business people the same, and we have citizen scientists who don’t have the cues for this stuff.
  • Internet Trust anti pattern – this is another obsession of mind. It goes like this: a system is started by self-selecting core of high-trust technologists and specialists. Touted as authority-less and non-hierachical (but not true). The broader public start using it and the system nearly breaks under strain of untrustworthy users, regularity systens are put into place in order to restore order (sometimes automated, sometime not). System not touted as authority-less again, people drift (e.g. IRC, Email scams etc, Psishing,

There are technologies to organise – flashmobs, maps, some pointless things that can be awesomely subversive – naked train rides, zombie marches, pillow fights! Not just used trivially but also protests, hugely powerful. We get excited that social media can mobilise people. Until we don’t like the people

Astroturfing: fake grass roots. The Fake Brookes Brothers riots at recounts – these were all Republican operatives at polling stations shouting against recounts. Totally manufactured protests. They threw an election probably – the media went for it for long enough to interfere. This is happening all the time – TeaParty campaigns, Patients First, Recess Rally, Freedom Works – these claim to grassroots but they are fabricated by corporate interests. Our tech can be used by the opposition.

And people call publishers on this stuff – astroturfing around publishers.

You see intersting things happening with social media tools and it is powerful. But you see people saying in Iran a lot of people posed at dissadents sending others to servers. Governement can entrap protestors. They can use the tools just as they’ve also done. Cory Doctorow said ages ago “all complex systems have parasites”. The problem we have is best put in Umberto Eco’s: Travels in Hyperreality: “Semiological guerilla warfare” – we have heuristics undermined for understanding trust.

  • The Trust Us Anti Pattern – the worst thing you can ever say is “Trust Me”.
  • The Radical Transparancy Anti Pattern – phrase from David Brim (Transparant Society) – all the same things we achieve through privacy can be achieved through the opposite. But it’s a boil the ocean solution – everyone has to be in and that’s not that possible. Every time I think privacy cannot be an issue I am surprised. I work on ORCID and researchers say that they want to hide anything in their profile – they want to hide their name, they want to hide their publications (say if involved in animal testing). And in DataOne which I’m working on with Carole Tenopir – it’s environmental but why would you want privacy? Well to protect species etc. You do have to hide some stuff sometimes! And we do have situations like supeonas over environmental emails – if you had your enemies trawling through your email achives think of what they could piece together and construct from that. Anyone would be afraid of that.
  • Distributed System Anti Pattern – a magical technical term. But all distributed systems need a centralised system to make it usable again. We think it’s harder to coopt or go rogue with stuff that is distributed. But if we put that energy into creating organisations that are guaranteed responsible to constituancies and controlled by constituancies than we;d maybe be OK.
  • Anti Pattern – scholar workstations, humanities toolkits, data lifecycle, scientific workflows, common ontologies. Researchers will ask obscure questions like “can it mix Aramaic and Korean” – the missing stuff is what they do. Researchers work on edge cases, it’s what they do!
  • The Computers Should be As Easy as Toasters Anti Pattern – why are computers so complex? Well they do so many different things. Shouldn’t they be simpler? No! We should become more complex! We need to think about Computation Thinking – look at Janet Wang’s paper on Computational Thinking – until people understand their computers they cannot make the best use of them!
  • If we Make Programming “Visual” it will be easier Anti Pattern – Whether it’s Yahoo! Pipes or other systems this is the most common anti pattern we go through. It can help to think computationally. See a scrolling list of visual languages – long and unpromising!
  • The 4P Anti Pattern – this is a qoute from our chair. And I read Ian’s report. Why don’t we consider email social media – we use it even if we loathe it. What makes me apprehensive about calling it social media? If you categorise technologies that can be easily used by academics we are very reluctant to mix the personal with the professional. Academics will say that the best stories are best told over a glass of wine, and there is what is said at the bar. Twitter freaks me out as it’s hard to keep personal and professional personas separate there. Facebook and LinkedIn seem easier, rare breeches but
  • Synecdoche Anti Pattern – So this is where we talk about the “record industry” the “library” the “newspaper” industry when we mean music, journalism and… well what do we mean by libraries – scholarly communication.
  • Amplification without Attenuation Anti Pattern – too much to read – almost everything that makes our life easier as an author makes our life five times as hard. Ian put up a graph fo how Twitter is used – I picked up that Twitter is used to post but not to listen. About disseminating, getting more out. I want less information (see article by Carol Palmder and Alan Renier). Help researchers practice “Reading Avoidance”.
  • The Anti Pattern Anti Pattern – I’ve been quite negative here but the worst you can say is “we tried that, it didn’t work” – actually you have to reconsider ideas, you have to consider timing, you have to try again.

Panel Session / Q&A

Q1) Chair: I don’t think the general public access papers that are written but think research councils to work on making them publicly

A1) Cameron: Philip Lord said that no one should be able to publish anything with a reading age higher than 13 – the general public includes the people in the discipline next door. The public do look at papers where they are available. 40% of unique IPs in PubMed are domestic network suppliers

Geoffrey: I actually agree. Publishers have 90% of logs from people they do not recognise as clients. Any other industry they’d go “who are they and how do I get a piece of it – publishers are weird in that respect”. But I am more radical than you on articles. They tell stories so badly, so tersely, so curt. If you go to publishers they have blogs or podcasts associated with articles – you ask why they have researchers explaining their article in a podcast and they are hugely popular as it’s in English. Abstracts are horrible to read – bad narrative and bad data.

Cameron: If we paid more attention to what poeple are using we could improve what’s provided.

Geoffrey: We can say lets look at people who may be interested in this. Less than 23% of people with higher science degrees go into those professions so loads of qualified “general public” readers out there.

Chair: I’m writing a history of journals in the 20th century – we have 4000 words to do this in. I don’t know when we moved away from narrative stuff.

Geoffrey: The old science writing is gross and bizarre but engagingly narrative.

Chair: The people in the most obscure fields are the most into opening up their research. You cannot think of journals of less interest in the public interest than the open access Ribonucleaic Research.

Q1) Damien: A lot of clinical research the methods are not very interesting to read – it’s a bit like the Highway code… but very useful.
A1) Cameron: I think we should publish 90% less papers, because most of these should just be the data – the paper cites a data discovery.

Geoffrey: The one thing that I find surprising in the open access world is that they have challenged everything except this critical thing about citations. They subscribe to the value of citation. That process and rewards for citations is what drives all those articles.

Cameron: We’ve moved from researchers being judged on high profile impact factor journals published in towards being judged on articles, that’s a move in the right direction

Geoffrey: No party in the current system – not even the big publishers – like this system. They hate it. I know who is enforcing it. It’s lawyers. I think university evaluation committees are so scared of being prosecuted for not being objective drives this.

Cameron: I think it’s fear but also inertia. We don’t understand this person’s work, we can’t judge that work. There is so much fear that you have not appointed to the right frame, has done the right thing, has been really unusual and exceptional and dangerous but in very specific ways. It’s fundamnentally inertia.

Q2) Tula: I like the Anti Pattern idea – social media seems positive about projecting yourself into the world. But on the negative side you can be tracked, you can be monitored, sort of forensic. Kind of scary. People can track that.

A2) Geoffrey: It’s worse than that. People say don’t just be corporate about an organisations tweets, make it personal. That’s awkward to the person behind that. On the one hand they are told to tweet for an organisation but if you say something wrong what is happening

Q3) Ian: You talked about citations for

A3) Cameron: Citations are from other papers in Arxiv – from ArXiv to ArXiv. Interest drops off at publication.

Q4) Why is social media different

A4) Can take things out of context – in short amounts of words on Twitter. I find myself seeing a complex discussion. I can’t speak in short sentences, tweeting is bloody hard. And they can go on for hours. When I’m here people can talk to me, there is a high bandwidth, I know people will amplify this. I’m kindo of the industry gadfly and they pack me out to speak. I criticse the industry

Cameron: I don’t make thos edistinctions but I do think carefully about what I say in any space. Discoverability and permanence of statements. We don’t culturally know how to do that.

Tula: Even in a bar you can record that stuff.

Cameron: Yes, as a society we have to deal with this stuff. The first generation of those with college photos on Facebook are no longer those applying for jobs, they are interviewing you – the world is changing – you should

Anna: People don’t understand that email is a semi public document open to FOI.

Geoffrey: People didn’t realise that until recently, yes they should of. I grew up in Peuto Rico so I have a concern about people who are annoyed about getting good value for money from Twitter – I like the noise and personal stuff. That interests me. We miss the social in the social if we look only for business info in the space.

Robert: They have missed the point. We ever filter more and more but you need that serendipidous stuff, you need that to avoid being in your own bubble.

Cameron: We need to get into building networks that challenge you.

Geoffrey: A comment about Google Plus you see that people like that you can more easily define communities but automatically that appeals to me, I like the idea of separating stuff out. Might be onto something there.

Comment: A whole bunch of people don’t use email or phones – they are young and they are coming

Cameron: We are reaching a really interesting point where interfaces are really changing that will make such a huge interest. I’ve been talking to people in South Africa. We are used to people 25 years old or so who have access to the internet all their lives. In South Africa people come into university who first access the internet on the phone and drag their lecturers into the future.

Comment: People are saying the watch is dead as we have cell phones but we are in the minority of the world

Geoffrey: Context is what do researchers use. We talk about what they use. They all use email. Did they use it then brains were full up. No it’s useful so they use it!

Me: But it’s also something they have been obliged and forced to use in their day to day administration

Chair: I was interested in the idea of a network that challenges you

Cameron: Yes, I use the network as a filter, to focus my mind on useful things. It’s very easy in social media which is part research, part politics, part hand waving it’s easy to get reinforcement from people who aree with me. So I intentionally follow people who annoy me or disagree with me. We have tools to find tools for people who are like you, we don’t have tools for relavant information but important counter arguements. A real need that’s there. It’s easy to build echo chambers. If we are to be rational responsible citizens we have to find ways to improve the quality and bredth of information we bring in for ourselves.

Chair: I strongly believe in this, it’s the most important thing I’ve heard for the last few days. If we live little bubbles we’re in trouble.

Anna: Is it true that search engines steer the results you get and that reinforces the bubble even more.

Geoffrey: If you have an iPad go use Flipbook or Cite – they tailor content to you. It’s quite scary. If I spent my life reading this I’d spend my life thinking everyone agrees. YOu need other groups.

Tula: Isn’t it part of that Anti-Pattern things

Cameron: The reason this is in the news is because someone is promoting their book, The Filter Bubble. Google filter for location, who you are etc.

Geoffrey: Here is the thing I don’t understand about Google, you go to France you plug in adn the page is in France – why would you change language!

Sarah: There was a TED talk about filter bubble and he had two friends searching for the same thing – totally diffrernt results as different political persuasions

Geoffrey: Sometimes I show a screen shot of CNN in the US and UK – may be contradictory pages e.g. H1N1. – says it soars in UK, says not as bad as expected in US.

 July 1, 2011  Posted by at 9:38 am Events Attended, LiveBlogs  Add comments

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>