Jun 192013
 

Today EDINA is hosting a talk by Martin Hawksey on data visualisation. He has posted a whole blog post on this, which includes his slides, so I won’t be blogging verbatim but hoping to catch key aspects of his talk.

Martin will be talking about achievable and effective ways to visualise data. He’s starting with Jon Snow’s 1850s map of cholera deaths identifying the epicentre of the outbreak through maps of death. And on an information literacy note you do need to know how to find the story in the graphics. Visualisation takes data, takes stories, and turns them into something of a narrative, explaining and enabling others to explore that data.

Robin Wilton georeferenced that original Snow data then Simon Rodgers (formally of Guardian, latterly of twitter) put data into CartoDB. This re interpretation of the data really makes the infected pump jump out at you, the different ways of visualising that data make the story even clearer.

Not all visualisations work, you may need narration. Graphics may not be meaningful to all people in the same way. E.g. Location of the pumps on these two maps. So this is where we get into theory. Reptinsp, a French cartographer, came up with his own systems of points, lines, symbols etc. but not based on research etc, his own cheat system. If you look at Gestalt psychology you get more research based visualisatsions – laws of similarity, proximity, continuity. There is something natural about where the eye is drawn but there is theory behind that too.

Jon Snows map was about explaining and investigating the data. His maps were explanatory visualisation and we have that same idea in Simon Rodgers map but it is also an exploratory visualisation, the reader/viewer can interact and interrogate it. But there are limitations of both approaches. Within both maps it’s essentially a heat map, more of something (in this case deaths). And you see that in visualisations you often get heat maps that actually map population rather than trends. Tony Hirst says “all charts are lies”. They are always an interpretation of the data from the creator’s point of view…

So going back to Simon Rodgers map we see that the radius of a dots based on the number of deaths. Note from the crowd “how to lie with statistics”. Yes, a real issue is that a lot of the work to get to that map is hidden, lots of room for error and confusion.

So having flagged up some examples and pitfalls I want to move onto the process of making data visualisations. Tools include Excel, Carto GB, Gephi, IBM Many Eyes, etc. but in addition to those tools and services you can also draw. Even now so many visualisations are made via drawing, if only final tweaking. Sometimes a sketch of a visualisation is the way to prototype ideas too. There are also code options, D3JS, SigmaJS, R, GGplot, etc.

Some issues around data: data access can be an issue, hard to find, hard to identify source data etc. Tony Hirst really recommends digging around for feeds, for RSS, find the stuff that feeds and powers pages. There are tools for reshaping feeds and data. Places like Yahoo Pipes, which lets you do drag and drop programming with input data. And I’ve started touching upon data shapes. Data may be provided in certain ways or shapes, but it may not suit your use. So a core skill is the transformation of data to reshape data, tools like Yahoo Pipes, Open Refine – which also lets you clean up data as well. I’ve tried Open Refine with public Jiscmail lists, to normalise for those with multiple user names.

So now the fun stuff…

For the Olympics last year for the cultural Olympiad last yer in Scotland we had the #citizenrelay tracking the progress of The Olympic torch. So lots of data to play with. First talk twitter (Topsy) media timeline. Uses Timeline by verity plus Topsy data. This was really easy to do. So data access was using Topsy, it pulls in data from Twitter to make its own archive. Has API to allow data. Make it easy to query for media against a hashtag. Can return data in XML but grabbed in Jason. Then output created with timelineJS. You can also use google spreadsheet template from timelineJS template (manually or automatically). Used spreadsheet her, yahoo pipes to manipulate. Can pull data in with google spreadsheets, when you’ve created the formula it will constantly refresh and update. So self updates when published.

Originally Topsy allowed data access without API key but now they require it. Google app script, JavaScript based – see big Stack Overflow community – has similar curl function for fetching URLs and dumping back into spreadsheet. Have also done this with yahoo pipes (use
Rate module for API key aspect).

Next as the relay went around the country they used Audioboo. When you upload AudioBoo geolocates your Boos. So AudioBoo has an API (without key) and you can filter for a tag. You can get the data out in XML, JSON and CSV option but they also produce KML. If you can access a public KML file and paste into Google Maps search box then it just gives you the map. Can then embed, or share link to that file. So super easy visualisation there. But disappointingly didn’t embed audio in the map pins. But that’s a google map limitation. Google Earth does let you do that though…

So using Google Earth we only have a bit of work to do. We need t work out the embed code. So Google now provides a template that lets you bring in placemark data (place marker templates). You can easily make changes here. And you can choose how to format variables. Yu can fill in manually but can also be automatically done SL use Google AppScript here. I go to AudioBoo API, grabs as JSON, then parses it. Then for each item push to spreadsheet. So for partial Geodata these Google templates are really useful.something else to mention: Google Spreadsheets are great, sit in the cloud. But recently was using Kasabi and it went down… And everything relying on it went live. Sometimes useful to take a flat capture as spreadsheet for back up.

So the next visualisation… Used NodeXL (SNA). This is an open source plug in for excel. It has a umber of data importers, including for twitter, Facebook, media wiki, etc. just from the menu). And it has lots of room for reformatting etc. then a grid view from that.

And this is where we start chaining tools together. So I had twitter data, I had NodeXL to identify community (who follows who, who is friends with who) so used Gephi, which lets you start using network graphs. A great way to see how nodes relate to which other. Often using for Social Network Analysis but people have also used it for cocktail recipes (there’s an academic paper on it). There is a recipe site that lets you reform recipes using same approach. Gephi is another tool.. You spend an hour playing… And then wonder about how to convey to others and you can end up with flat graphic. So I created something called TAGS Explorere to let anyone interact – and there are others who have done similar.

Another example here. A network of those using #ukoer hashtag and looking for bridges in the community, the key people. This is an early visualisation I created. It was generated From twitter connections and tag use with Gephi, but then combined and finished in a drawing package.

This is another example looking at different sources. A bubble chart for click throughs of tweets. Man get a degree of that info from bit.ly. But if you use another service it’s hard to get click through however can see referrals in Google Analytics – each twitter URL is unique to each person who tweets it so you can therefore see click through rate for an individual tweet. This is created in google spreadsheet. An explore interactively, reshape for your own exploration. So this spreadsheet goes and uses google analytics API and Twitter API then combines with some reshaping. One thing to be aware of is that spreadsheets have a duality of value and formulae. So when you call on APIs etc. it can get confusing. So sometimes good to use two sheets, second flr manipulaton. There’s a great blog post on this duality – “spreadsheet addiction”. if you are at IWMW next week I’m doing a whole session at Google Analytics data and reshaping.

Q&A

Comment: study/working group on social network analysis, some of these techniques could be buildpt onto our community of expertise here.

Comment: would have to slow way down for me but hopefully we can devise materials and workshops to make these step by step.

Martin: But there are some really easy wins, like that Google Maps one. And there is a good community of support around these tools. But for instance R, if I ask on Stack Overflow then I will get an answer back.

Q) is there a risk that if you start trying to visualise data you might miss out on proper statistical processes and vigour?

Martin: yes, that is a risk. People tend to be specialists in one area rather than all of them. Manchester Metroplitan use R as part of analysis of student surveys, recruitment etc. this was from an idea of Mark Stubbs, head of eLearning, raised by speaking to specialist in Teridon flight. r is wily used in the sciences and increasingly in big data analysis. So there it started with expert who did know what he was doing.

Q: have you done much with data mining or analysis, like Google N Gram?

Martin: not really. Done some work on sentiment analysis and social network data though.

 June 19, 2013  Posted by at 3:34 pm Events Attended, LiveBlogs Tagged with: ,  No Responses »
Oct 122012
 

Note: If you reached this post through the Guardian Higher Education Network you may want to browse the panel above this post, or explore the main page of this blog for the latest posts and updates.

I sometimes receive quite specific requests about social media, new tech or other slightly more tangental things.  A few weeks ago I was asked for advice on Visualisation tools for a research project and thought that others here might be interested in the tools, sites and resources that came to mind.

The links and recommendations come from a mixture of angles: some I’ve looked at or been aware of through specific work projects; some come recommended by colleagues as new, interesting, or well crafted; and some came from looking for visualisation options for my MSc in eLearning dissertation. Do let me know what you think of any of these tools or the list itself and I’ll be very happy to update the list if you have others to recommend!

Tools 

This section generally focuses on online tools (with varying policies over data use/retention) that allow you to visualise your data one way or another:

Wordle is about the simplest visualisation tool but can be effective if you want a word/tag cloud: http://www.wordle.net/

Image of the Closing Session at OR2012 with Wordle by Adam Field shown in the background.

Image of the Closing Session at OR2012 with Wordle by Adam Field shown in the background.

Textal is a new and more academically-targetted and mobile-friendly alternative to Wordle, specifically designed for use with text research data sets. I think it should be due out soon… : http://www.textal.org/.

FigShare is a site for sharing academic data, particularly scientific data. It includes some automatic visualisation functionality as well as inspiration via other people’s shared resources, graphs, visualisations: http://figshare.com/

ManyEyes is an IBM tool for visualising data – very useful and once data is uploaded it can be re-visualised: http://www-958.ibm.com/software/data/cognos/manyeyes/

Visual.ly is a consumer web 2.0 tool for visualising data – generally social media related data – and is probably primarily useful as a source of inspiration for other visualisations: http://visual.ly/

Google Apps/Drive includes a series of pretty good visualisation tools that can be accessed from any spreadsheet. Standard Excel type charts can be accessed via:

Insert>Chart

You can also access more sophisticated visualisations from

Insert > Gadget

There are various examples of these being used well on the web but they really come into their own when you hook up a data collection form to a spreadsheet and then visualise it – it all connects up rather nicely.

Voyant Tools offers a number of approaches to large cohorts of prepared text-based data. It’s worth noting that, as with all of these tools really, you should anonomise and edit the text before submitting it. That’s particularly important for Voyant Tools as you can’t edit the data once it’s up and you can’t delete it easily either. But it does clever stuff in a simple way and for free: http://voyant-tools.org/.

Data-Driven Documents is a site focusing on D3.js, a JavaScript library for working with data – lots of very practical but very technical materials and ideas here: http://d3js.org/

SIMILE Widgets are a great wee set of visualisation tools from a project at MIT that are relatively easy to reuse and widely used on websites to make swishy looking previews etc.: http://simile-widgets.org/

Timeline JS is a flexible way to create timeline visualisations – useful if that type of visual is what you’re after: http://timeline.verite.co/

Tableau is a free data visualisation tool and rather less techie tool to handle than many of those mentioned above. I haven’t had much experience of using it but have heard good things: http://www.tableausoftware.com/public/

SourceMap is a web service that lets you create one type of visualisation – maps visualising “where things come from” whether those be sources, commodities, trade routes, etc. Very useful but only if that’s the visualisation you actually want to create: http://sourcemap.com/ You can find some good examples of these visualisations over on my Trading Consequences’ colleague Jim Clifford’s blog.

British Tallow trade map by Jim Clifford (click through to see his full blog post about these maps).

British Tallow trade map by Jim Clifford. Click through to see his full blog post about these maps.

Gource is a specific version control visualisation codebase – again it’s very niche but nice is that’s your niche: https://code.google.com/p/gource/

Logstalgia is, similarly, a specific visualisation codebase for access log visualisation: https://code.google.com/p/logstalgia/

Dedoose is also worth noting. This is a text analysis tool and isn’t really a visualisation tool but there are visual aspects and it does help you reimagine and reinterpret text data by colour coding, tagging, grouping and viewing trends as you mark up your data: http://www.dedoose.com/

 

Useful Lists of Visualisation Tools and Resources

These are some articles and listings I’ve found useful in the past – I suspect there are many more to add…

The Next Web did a great guide to visualisation tools in May 2012 (some of which have already been mentioned): http://thenextweb.com/dd/2012/05/10/want-to-make-your-own-data-visualizations-check-out-this-awesome-set-of-tools/

ComputerWorld also shared a very useful post on good free data visualisation tools. The article is here: http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visualization_and_analysis and you can view a chart of all of the tools featured here: http://www.computerworld.com/s/article/9214755/Chart_and_image_gallery_30_free_tools_for_data_visualization_and_analysis

GoGeo (http://www.gogeo.ac.uk/) includes a visualisation software area where you can find several useful tools: http://www.gogeo.ac.uk/gogeo-java/resources.htm?&searchcat=Software&search=visualisation. There are also a number of useful collections of geographically related visualisation tools featured in the news section: http://www.gogeo.ac.uk/gogeo-java/resources.htm?&searchcat=News&search=visualisation

Downloadable Software

I must note two fabulous blogs for finding out about these: Tony Hirst’s OUseful blog; Martin Hawksey’s MASHe blog. Both are brilliant resources and contain many many more recommendations for software for visualisation and data analysis.

R – Free software for visualisation: http://www.r-project.org/

Gephi – Powerful – but complex to start out with – open source tool for data visualisation: http://gephi.org/

Expertise – Technical

These are useful website

Visualizing.org is a site dedicated to visualisation and includes a wealth of examples and useful links – very worthwhile browsing this for ideas, practical solutions etc: http://www.visualizing.org/

Visual Complexity is a collection of best practice visualisations which can be searched, browsed, etc: http://www.visualcomplexity.com/vc/

FlowingData is a blog collecting best practice visualisations and usually also indicating technology used: http://flowingdata.com/

Visualisation of Facebook photo virality featured on Flowing Data. Click through to read the full article.

Visualisation of Facebook photo virality featured on Flowing Data. Click through to read the full article.

There are also some individuals whose blogs are always well worth a read:

Steven Gray specialises in working with data and geospatial data visualisation with several very interesting current projects (including Textal). His Big Data Toolkit website http://bigdatatoolkit.org/ includes updates on his research, links to useful resources, discussion of ideas, etc.

Melissa Terras is co-director of the Centre for Digital Humanities at UCL and has worked on a variety of visualisation, research and interaction projects around Digital Humanities, including Textal, which can be found on her website: http://www.ucl.ac.uk/dis/people/melissaterras

Martin Hawksey (already mentioned above) of JISC CETIS blogs at MASHe (http://mashe.hawksey.info/) and often examines data analysis and visualisation including some superb work on Twitter data and visualisation. A search or browse of his blog for visualisations should find some interesting examples using web and downloadable data visualisation tools. As with any of these notable folks he is likely to respond to comments or questions so do comment on his blog!

Visualisation of UK University Twitter Following patterns by Martin Hawksey. Click through to read more about this visualisation and view his and Tony Hirst's IWMW 2012 presentation on Data Visualisation.

Visualisation of UK University Twitter Following patterns by Martin Hawksey. Click through to read more about this visualisation and view his and Tony Hirst’s IWMW 2012 presentation on Data Visualisation.

Tony Hirst (already mentioned above) of the Open University blogs at OU Useful (http://blog.ouseful.info/) and his posts often revolve around visualisation of data, particularly social data. I would recommend having a browse around his site (e.g: http://blog.ouseful.info/?s=visualisation) and leaving comments/questions.

Aaron Quigley of St Andrews University (http://www.cs.st-andrews.ac.uk/~aquigley/) is an expert on Human Computer Interaction and shares great resources and ideas around HCI and visualisation regularly. Aaron is also working on the Trading Consequences project and occasionally blogs about visualisation plans/issues related to that project here: http://tradingconsequences.blogs.edina.ac.uk/

The giCentre at City University London looks at geographic information and visualisation is a major part of that work. Their projects – which have included special commissions for the BBC and others – and related materials can be found here: http://www.soi.city.ac.uk/organisation/is/research/giCentre/

Patrick McSweeney of University of Southampton has worked on a couple of nice visualisation projects and hacks – notably his OR2012 Developer Challenge winning concept of provenanced visualisation within/connect to the repository  – and usually shares the technologies behind them. You can browse recent projects here: http://users.ecs.soton.ac.uk/pm5/portfolio/projects/

 

Expertise – Artistic/Creative/Inspirational

This section focuses on those who offer visual inspiration and expertise. I had hoped to include Douglas Coupland who worked on a very creative data visualisation project a few years back but I can’t recall the name of the project nor find the link – do let me know if you can help me out with a link here.

Hint.fm is a site collating new ways to visualise data of various sorts. This is about novel artistic rather than automated approaches: http://hint.fm/

Information is Beautiful, which I’m sure you’ve all seen before, is the home of David McCandles’ work and is really useful for inspiration/artistic visualisation and interpretation of data: http://www.informationisbeautiful.net/

Pinterest includes a number of visualisation boards that may be useful as inspiration/a connecting point to further websites and technical details: http://pinterest.com/search/boards/?q=visualisation

Culture Hack Scotland has included some fantastic visualisation and interpretation work in the past – and I’m sure the same is true for other hackdays working with large data sets. For previous projects in 2012 and 2011 have a look here: http://www.welcometosync.com/hack/

And finally…

Ellie Harrison is a visual artist based in Glasgow who specialises in interpreting data, including some lovely visualisation work. Her website is here: http://www.ellieharrison.com/ and her internet projects can be found here: http://www.ellieharrison.com/index.php?pagecolor=2&pageId=menu-internet

Screenshot from Ellie Harrison's most recent web project Trajectories. Click through to access this art project which uses visualisation to explore self comparison.

Screenshot from Ellie Harrison’s most recent web project Trajectories. Click through to access this art project which uses visualisation to explore self comparison.

 

Hopefully some of the above will be of interest/useful to you as well as the person who originally asked the question. As I’ve already said I’d appreciate any comments, additions, etc. you may have. Visualisations aren’t the core thing I spend my time on but images and visual aspects are so important to making an impact on social media that they are, of course, an area of great interest.

May 102011
 

This weekend my colleague Gavin and I decided it would be useful (and fun!) to head along to Culture Hack Scotland, a 24 hour hackday organised by the Edinburgh Festivals Innovation Lab and themed around both the festivals and the wider Scottish cultural scene.

The Edinburgh Festivals Innovation Lab is a new(ish) initiative which has emerged from Edinburgh Festivals, the organisation that is jointly funded by all 12 of the official Edinburgh Festivals to enable them to work together throughout the year, promote initiatives and festival content etc. The idea for the Innovation Lab apparently emerged out of discussions with all of the festivals about their use or interest in digital technology: there were lots of ideas and potential for projects but they didn’t necessarily have the time or skills to take these forward. Last year the Lab hired their inaugal geek-in-residence Ben Werdmuller (he of Elgg fame) and the Culture Hack Day was a significant outcome of the work he has been doing over the last few months. Continue reading »

Jan 252011
 

Today I will be live blogging Haggis and Mash, a mashed library event taking place at the National eScience Centre in Edinburgh. This means this post will appear incomplete/updated all day. Formatting tweaks and the odd fix of typos and names coming soon. Apologies for picture quality – largely taken on my phone at handy opportunities.

The day kicked off with a lot of hellos and registration.

I’ll be blogging all day – for more information on the programme have a look at the mashed library wiki here: http://www.mashedlibrary.com/wiki/index.php?title=Haggis_and_Mash.

Continue reading »