Mar 222018
 

Today I am at the Data Fest Data Summit 2018, two days of data presentations, showcases, and exhibitors. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to the Data Lab – Gilian Doherty, The Data Lab CEO

Welcome to Data Summit 2018. It’s great to be back, last year we had 25 people with 2000 people, but this year we’ve had 50 events and hope to reach over 3500 people. We’ve had kids downloading data from the space station, we’ve had events on smart meters, on city data… Our theme this year is “Data Warrior” – a data warrior is someone with a passion and a drive to make value from data. You are data warriors. And you’ll see some of our data warriors on screen here and across the venue.

Our whole event is made possible by our sponsors, by Scottish Enterprise and Scottish Government. So, let’s get on with it!

Our host for the next two days is the wonderful and amazing Maggie Philbin, who you may remember from Tomorrow’s World but she’s also had an amazing career in media, but she is also chair of UK Digital Skills and CEO of Teen Tech, which encourages young people to engage with technology.

Intro to the Data Summit – Maggie Philbin

Maggie is starting by talking to people in the audience to find out who they are and what they are here for… 

It will be a fantastic event. We have some very diverse speakers who will be talking about the impact of data on society. We have built in lots of opportunities for questions – so don’t hesitate! For any more information do look at the app or use the hashtag #datafest18 or #datasummit18.

I am delighted to introduce our speaker who is back by popular demand. She is going to talk about her new BBC Four series Contagion, which starts tonight.

The Pandemic – Hannah Fry

Last year I talked about data for social good. This year I’m going to talk about a project we’ve been doing to look at pandemics and how disease spreads. When we first started to think about this, we wanted to see how much pandemic disease is in people’s minds. And it turns out… Not many.

Hannah’s talk was redacted from this post yesterday but, as Contagion! has now been broadcast, here we go: 

Influenza killed 100 million people in the 20th Century. The Spanish Flu killed more people in one year than both World Wars. Which seems surprising but that may be partly because Pandemic Flu is very different from Seasonal Flu. Pandemic Flu is where a strain of flu jumps from animals to humans and spreads so fast that we can’t vaccinate fast enough. For that reason Pandemic Flu is the top of the UK Government’s Risk Register.

So, what we decided to do was essentially a TV stunt with a real purpose. We built a simple smart phone app. The App captures where people are, and how many people they are with. That allows us to see how disease might spread. Firstly to do that for TV of course, but secondly this is proper citizen science for real research. So, I spent a year calling in lots of favours, getting on all sorts of media, asking people to download an app.

But we also needed a patient zero, and we also needed a ground zero. We picked Haselmere in Surrey, which is a sort of Goldilocks town, just big enough, well connected.. A beautiful English town… Just the type you’d like to destroy with an imaginary virus. And I was patient zero… So I went there, went to the gym, went to the shops, went to the pub,,, But unknown to me I also walked past others with the app… So when I stood need to one of these , it was for enough time to infect that person… And so now there were two people and then many more… A pharmacist got infected early on and continued infecting out…

These patterns are based on our best mathematical models for infection… And you can quickly see pockets of infection developing and growing. Spreading quickly to a whole town. But those dots on a map are all real people…

Looking at some real infection sites…. So, in Petersfield there is a school were a few kids from Haselmere attend, commuting by train. Three kids running our app… By day three, two were infected, one wasn’t. They went to the break room, and outside, and the third person got infected… And then infected their family…

I wanted to also talk about a person from Haselmere who work in London on Day Two. Two people from the town don’t know each other, but they took the train home, and the one infected the other…

Now, this is just the Haselmere experiment, but we did a nationwide experiment…

We persuaded 30,000 people to download the app and take part… Again, it starts with me walking around Haselmere. By a month in, London is swamped. Two months in it sweeps Scotland. By three months it’s in North Ireland. Really by then only the North of Scotland was safe! What is startling isn’t the speed of the spread, but also how many people get infected… This is the most accurate model we have to date. The most accurate estimate for a Spanish Flu type virus, is a staggering 43,343,849. A conservative fatality rate of 2% would be 886,877 deaths. But that’s worst case scenario… That’s no interventions… Which is why this data and this model are so important as they allow you to understand and trial intervention. Generally most people infect the same small number of people, but some super spreaders have a much bigger impact. If you target super spreaders with early vaccination – just vaccinating a targeted 10% – makes a huge difference. It really slows the spread, giving yourself a fighting chance to overcoming infection.

We know these pandemics can and will happen, but it’s about what you plan for and how you intervene. The only way to answer those big questions and to know how to intervene, is to understand that data, to understand that spread. So we are anonymising this data set and releasing it to the academic community – as a new gold standard for understanding infection. Data really does save lives.

Q&A

Q1) So, Shetland is safe…. Unless the infection started there.

A1) When we spoke to one person about what they’d do in a pandemic, they said they’d get in a car with their kids and just

Q2) I’m from the NHS and there has been a lot of work of super spreaders, closing schools… Has there been work on the most efficient, mathematically effective patterns to minimise infection.

A2) Schools are an interesting one… Closing schools sounds like it makes everything simple. Sometimes shutting schools means kids share in an unpredictable manner as they will go places too. And then you reopen schools and reinfect potentially… And that’s without the economic impact. These are all questions we are thinking about.

Q3) That’s awesome and scary. What about people developing immunity.

A3) Our model is no immunity, and no-one recovers. But you can build that data in later, adding rish assumptions. And some of the team working on this are looking at infection transmitted through the air – some viruses can stick around a few hours.

Q4) I remember the SARS book. I’m very paranoid… Brought suits, gloves, bleach… In New Zealand you need a two week supply of stuff in your house… If we did that, how would that make a difference.

A4) Yes… So for instance the government always pushes messages about hand washing whenever flu is taking place. It doesn’t feel that that would make a big difference… But at a population level it really does…

Q5) My question is whether you will make the data available for other people – for epidemiology but also for transport, for infrastructure.

A5) Yes, absolutely. We wanted to make this as scientifically rigorous as possible. The BBC gives us the scale to get this work done. But we are now in the process of cleaning the data to share it. Julia Gog at Cambridge is the lead here so look out for this.

Q6) What about data privacy here?

A6) At a national level the data is accurate to 1 km squared, with one pin every 24 hours. Part of the work to clean the data is checking if it can be reverse engineered to make sure that privacy is assured. For Haselmere there is more detail… We are looking at skewing location, at just sharing distance apart rather than location, and at whether there is any way you can reverse engineer the dataset if you’ve seen the TV programme, so we are being really careful here.

Business Transformation: using the analytics value chain – Warwick Beresford-Jones, Merkle Aquila

I’ll be talking about the value chain. This is:

Data > Insight > Action > Value (and repeat)

Those two first aspects are “generation” and the latter two are “deployment”. We are good at the first two, but not so much the action and value aspects. So we take a different approach, thinking right to left, which allows faster changes. Businesses don’t always start with an end in mind, but we do have accessible data, transformatic insights, organisational action, and integrated technology. In many businesses much of the spend is on technology, rather than the stage where change takes place, where value is generated for the business. So that a business understands why they are investing and what the purpose of this.

I want to talk more about that but first I want to talk about the NBA and the three point line, and how moving that changed the game by changing basket attempts…And that was a tactical decision of whether to score more points, or concede fewer points, enabling teams to find the benefit in taking the long shot. Cricket and Football similar use the value chain to drive benefit, but the maths work differently in terms of interpreting that data into actions and tactics.

Moving back to business… That right to left idea is about thinking about the value you want to derive, the action required to do that, and the insights required to inform those actions, then the data that enables that insight to be generated.

Sony looked at data and customer satisfaction and wanted to reduce their range down from 15 to 4 handsets. But the data showed the importance of camera technology – and many of you will now have Sony technology in the cameras in your phones, and they have built huge value for their business in that rationlisation.

BA wanted to improve check in experiences. They found business customers were frustrated at the wait, but also families didn’t feel well catered for. And they decided to trial a family check in at Heathrow – that made families happier, it streamlined business customers’ experience, and staff feedback has also been really positive. So a great example of using data to make change.

So, what questions you should be asking?

  • What are the big things that can change our business and drive value?
  • Can data analytics help?
  • How easy will it be to implement the findings?
  • How quickly can we do?

Q&A
Q1) In light of the scandal with Facebook and Cambridge Analytica, do you think that will impact people sharing their data, how their data can be used?

A1) I knew that was coming! It’s really difficult… And everyone is also looking at the impact of GDPR right now. With Facebook and LinkedIn there is an exchange there in terms of people and their data and the service. If you didn’t have that you’d get generic broadcast advertising… So it depends if people would rather see targeted and relevant advertising. But then with some of what Facebook and Cambridge Analytica is not so good…

Q2) How important is it for the analysts in an organisation to be able to explain analytics to a wider audience?

A2) Communication is critical, and I’d say equally important as the technical work.

Q3) What are the classic things people think they can do with data for their business, but actually is really hard and unrealistic?

A3) A few years ago I was meeting with a company, and they gave an example of when Manchester United had a bad run, and Paddy Power had put up a statue of Alex Ferguson with a “do not break glass sign” and they asked how you can have that game changing moment. And that is really hard to do.

Q4) You started your business at your kitchen table… And now you have 120 people working for you. How do you do that growth?

A4) It’s not as hard as you think, but you have to find the right blend of raw talent with experience – lots of tricky learning.

Project Showcase

How will you make a difference? I’m going to talk about how I’ve made major change for one of Scotland’s biggest organisation. I was working for Aggreko, the leader of mobile modular power and temperature solutions. They provide power for the Olympics, the World Cup, the Superbowl… A huge range of events across the world.
We are now watching a short video on how Aggreko supplies large scale mobile power (30 MW set up in 17 days) to cover local demand in Macha Pichu when a hydroelectric plant has to be shutdown for maintenance. 
In the dark old days Aggreko was a reactive organisation. A customer would ring with an issue, then Aggreko would send an engineer out. And then they moved to monitoring the mobile power kit to help monitor equipment across the world on a 24/7 basis. My team build the software to undertake that monitoring, to respond to every alert, alarm, any issue customers might face. And in fact in many cases to fix an issue before a customer ever became aware of it. And that meant far greater reliability and efficiency. And doing that we wondered how we might be able to predict issues, to predict how eqyuipment might fail. We didn’t know how to do that and we weren’t afraid to ask…
So we went to the Data Lab, took my idea to their board, and they funded a year long pilot to work with University of Strathclyde and Microsoft, as well as needing to build a team of engineers, technicians, specialists to be part of the team to take this far. This was a group of massively smart group, but also some big egos… A lot of what I had to do was to ensure there was good collaboration across those teams. The collaboration is really what made this project a real success. We created an advanced analytics team which allowed us to put models into use, some of which could predict an issue 2 weeks ahead of any issue, and being able to manage those issues for our customers.
The guys at Data Lab helped me to make a difference, they were brilliant and all that help is available to you too. So what are you waiting for?  
There are various ways to resolve this, but they are not easy. There is work for the 1% of large companies, but that leaves SME out. And 50k SMEs go out of business every year in the UK. So, what is the solution? Well, let me tell you about Previse and what we do. We think we have a unique solution. David Brown, one of our co-founders, had experience in the sector, and he didn’t want to accept the status quo. Accounting the oldest processes and data that a company is, but no-one is using that in this sort of way. So what do we do?
Previse finds data, engages with data, pulls in other data… And looks at what can work. We can look at all data on every invoice from every supplier. We then determine a score, and a threshold…. So that when invoices come in they can be prioritised and mostly approved and paid immediately. The process is the same for the buyer but it makes a huge difference for the supplier. Placing an invoice through Previse you can send and have approved invoices very swiftly, and without chasing and additional work. That is a huge difference in cost and time. The large corporates we’ve been talking with – including 70% of large FTSE companiess – are really enthusiastic and want us to help them.
And our experience in Scotland has been incredible. The Data Lab helped us throughout, finding the right universities to work with. We work with Heriot Watt (Mike Chantler) and with MBN to find the right resources, and Scottish Enterprise have helped us make Scotland our hub for data science and software engineers. We’ve employed 5 people in the last 6 months, and we’ll double that by the end of the year. We can generate growth, but it’s also about making real change with data.
If SMEs are paid on time, that allows them to thrive and grow. It’s a huge problem and we think it can be resolved.
Our platform consists of four modules: sustainability; mapping; reporting and advanced. But I’ll talk about our mapping module and some projects we’ve worked on:
  • Mapping the water footprint of your crops – a project with the University of Edinburgh, funded by Data Lab. This brings together a wide range of crop data layers. We have an overlay based on water for crop growing, and overlays of gray water, or the erosion potential – for instance there is high erosion potential on the west coast of Scotland, mmostly low erosion in the east of Scotland.
  • Forests 2020 is a Mexican application supported by the UK Space Agency, and we work with University of Edinburgh, University of Leicester, and Carbomap. Here we can see deforestation patterns, and particular crop areas.
  • Innovate UK: farm data, which is a collaboration with Rothamsted Research, Environment Systems, and Innovate UK – this is at an early stage looking at crop rotation data for UK and export markets. And you can also see the soil you are growing on, what can be planted, what sort of fertilisers to use.
  • Sustainability risk – supports  understanding of risks such as water depletion, and the various factors impacting and shifting that.
  • We also have tools for government to know how to plan what type and locations they should be building power plants in.

So, in conclusion, layering data allows us to gain new insights and understanding.

After a good lunch and networking session we are now back in the main hall, starting with a video on the use of data in Heineken production process. And an introduction to Stefaan Verhulst, a former Glasgow graduate now based in New York.

Data Driven Public Innovation In Partnership With The Private Sector: The Emerging Practice Of Data Collaboratives – Stefaan Verhulst, Co-founder and Chief Research and Development Officer, The Gov Lab

I’m delighted to be back in Scotland for this event looking at how data can be help society, and how society can be. That is also the focus of The Gov Lab in New York. And we also look at how we can unleash data for good.

An example  want to give you is the earthquake in Nepal a few years ago. It was a terrible event but it was also inspiring too, because Ncell, a cell phone operator, and Flowminder (based in Sweden and the UK) worked together to map the flow of people to intervene, to save lives. It is a great example of using data in the public good. And it’s an example of the growth of available data, including web crawling/scraping/search analysis; social media; retail data etc. all collected by the private sector. But we also have new data science to address this data, to gain meaning from this data. And often that expertise to extract meaning is sitting in the private sector.

So, the real question is how we extract value and engage with the private sector around data they collect. That’s a whole different ballgame from open government data. It’s not just about data sharing, but about new kinds of public-private sharing around data for the public good. So we have set up new programmes of Data Collaboratives. So we set up the Data Collaboratives Explorer allows you to explore those collaborations taking place – there are over 100 in there already. From that collaborative work we have gained some insights that I will share today.

So, firstly, data collaboratives are important across the policy lifecycle:

  • That starts with situation analysis. Corporations in the US have worked together in the US to understand the scale of the opioid epidemic, for instance.
  • Our second value proposition is about knowledge creation. For instance, post hurricane season how does the mosquito population change and how does that change mosquito born diseases.
  • Our third value proposition is prediction, fr instance projects to predict suicide risk from search results – a project in Canada and also in India.
  • And then we have evaluation and impact assessment. An example here is Vision Zero Labs looking at traffic safety and experiments in spatial composition to influence and reduce risk of accidents.

In those collaboratives we see different models in use. These include: data pooling – enabling sharing and analysis across the collaboration; prizes and challenges – opening some data as a source of generating new insights through innovative ideas and projects that benefit both public and private sector, e.g. BBVA’s Innova challenge; research partnerships – with collaboration across private sector and public or academic sector – such as work on fake news on Twitter; intelligence products – JP Morgan Chase has an institute to extract insights from their own data and actually that can be hugely detailed and valuable; API – for instance Zillow allows you to access real time mortgage and housing market data; trusted intermediary – for instance Dalberg who acts between telecommunications companies and others.

So, there are many ways to set up a data collaborative. But why would the private sector want to do this? Well, they may be motivated by reciprocity – sharing data may lead to access to specialist expertise; research and insights; revenue; regulatory compliance; reputation and retainment of talent – often corporations need to retain talent through solving harder or more interesting problems; responsibility.

But there are challenges too. For instance the taxi and limousine agency in New York regulates all taxi operations, including Uber. In their wisdom they shared the data… But that exposed some celebrity locations (and less salubrious locations). The harm here wasn’t huge but that data in a different cultural contexts could present a much higher risk. So, some of the concerns around sharing data include:

  • privacy and security
  • generalisability and data quality (e.g. not everyone has a cell phone)
  • competitive concerns
  • cultural challenges – there is something of a culture of hoarding data within organisations.

So, to move towards data responsibility we really need risk and value assessment that recognises data as a process, and part of a wider value chain. We need fair information practices and processes – our principles are about 30 years out of date and we urgently need new principles and processes. GDPR helps, but not all the challenges we may have. We need new methods and approaches. And that means having a decision tree across the data cycle.

There are risks in sharing data, but there are also risks in not sharing the data. If we had not have used the NCell data in Nepal, we would have had more deaths. So we have to respond not just to risks, but also to opportunity cost of not sharing data. What is your responsibility as a corporation?

I’ve given lots of examples here… But how do we make data driven public innovation systemic? We need data stewards in organisations so there is someone who can sign off on data collaboratives, we need that profession in place in organisations to enable work with the public sector. We need methods – like the Unicef collaboratory around childhood obesity, that’s a new methology. We also need new evidence, of how data can be used and what impact it will have. And finally we need a movement – this all won’t happen without a movement to establish data collaboratives, and I’m delighted to be here today as part of this movement, and ultimately use data to improve peoples lives.

Q&A

Q1) In light of Cambridge Analytica and Trump, aren’t we misusing data?

A1) I think use is part of that value chain and we have to have a debate about what kind of use we are comfortable with, and which we are not. And that case also raises questions about freedom of expression, and a need to regulate against deceptive behaviours.

Q1) Several years ago hashtags brought down governments in the Middle East, and now we have governments in those countries controlling the public through hashtags. It’s scary.

A1) I’ve been working in privacy for many years, and I really encourage a comparison of risks and value. And to do a cost-benefit analysis. We need to rebalance that.

Gillian is introducing our special guest… Minister Derek MacKay

Message from the Scottish Government – Derek Mackey, MSP, Cabinet Secretary for Finance & Constitution, the Scottish Government

I’m not sure that I’ve thought of myself as a data warrior before, but I did teach the Social Security Minister how to use Instagram the other week! I say that partly as I have an appeal and a plea for you… The First Minister has a huge set of followers on Twitter, but I’m stuck just below 18k… Maybe you are the audience to take me over that line!

There’s a lot I want to cover in terms of the excitement of this event. We have a strong reputation and record in Scotland. With responsibility for the budget and internationalisation, this is really exciting. I’m particularly enthused by the international representation including Brazil, Singapore, USA, and Ireland too. This event allows us to put the spotlight on data science in Scotland. It is a natural place for people to come and do business. And this is a great event with business leaders here, with experience to share with others.

Our government, Scottish Enterprise and Data Lab are working together to build innovation and business in Scotland. We are fortunate in Scotland to have world class data resources. Scotland has Universities, 5 of which are in the top 100, and we have 70% of reseach rated as excellent in the last REF. We can feel this group. Data Driven Innovation has the potential to deliver £20bn value to Scotland in the next five years. This buzz can be harnessed to make Scotland the Data Capital in Europe. I paricularly support the growth in FinTech. Many people describe themselves as disruptors – that would have once been seen as a negative but is now a real positive, about opening new opportunities. And data helps us deliver our work, one example of which is the Cancer Challenge which is helping us understand how best to use our resources for the best outcomes.

The Scottish Government Innovation Action Plan seeks to build a sustainable economy, with skills crucial to that, including funding for business growth, innovation, etc. We’ve also launched the Scottish Digital Academy and the Data Science Accellerator to look at how things are changing, to innovate working methods – such as CivTech’s innovative models. We are really serious about business growth, the economy and skills. We have invested in innovation, education and internationalisation. We are the strongest part of the UK outside London and the SouthEast.

So, the Scottish Government supports your enthusiasm for data, for what can be done with data. High tech, low carbon is the future we see that, and we want to be country welcome in Europe and the rest of the world – we don’t support the UK government’s view on Europe.

I commend your work and hope that you have a fruitful and enjoyable time here. And we hope the collaboration of our agencies helps to bear fruit now and in the future.

Improving Transparency In The Extractives Industry Using Data Science – Erin Akred, Lead Data Scientist, DataKind

I am a data scientist from DataKind where we harness data for the improvement of humanity. We exist to use data to see the kind of world we want to see. The challenge we face is that many not for profits, charities, government agencies etc. do not have the resources to do the types of datascience that the private sector (e.g. Netflix) can. So we link pro bono data scientists with organisations with a social mission.

Last year we did a project looking at automating detecting mines from earth observation imagery. We are used to using this data for other purposes, but this is a challenging problem. I will talk more about this but I wanted to talk more about DataKind.

Our founder, Jake, was working at the New York Times on data science, and saw people volunteering and attending hack events at the weekend, giving back on their talents… So he thought perhaps I could partner with a mission driven organisation, could I organise a similar event and make this happen… He started DataKind and we’ve been developing what we can offer these mission-driven organisations who also want to benefit from Data Science. So we now pair data scientists with mission driven projects. We have over 18k community members worldwide, 6 chapters in 5 countries (US, Bangalore, Singapore, Dublin, London, San Francisco, Washington DC), we have chapter applicants in 40+ global cities; 228 events worldwide; and we’ve worked on over 250 projects generating about $20m value generated in volunteer effort.

On example project has been with the Omidyar Network to look at data science solutions that might enable social actors to operate more effectively and efficiently in their efforts to combat corruption in the extractives industry. Now we don’t start with the data that is out there. Our funders really want impact, and we think of that as impact per dollar. So, anyway, the context of this work was illegal mining which can cause conflict in Eastern Demographic Republic of Congo, it includes poor environmental outcomes, and social challenges. As data scientists we partner with other organisations to ensure we know how to get value out of data insights.

To understand illegal mining we have to know where it is taking place. So we did work on machine learning from images. We worked with Global Forest Watch and IPIS.

Now, not all of our projects are successful… Usually projects fails because of issues in:

  • Problem statement – a well thought through problem statement is really important.
  • Datasets
  • Data Scientists
  • Funding
  • Subject Matter Expertise
  • Social Actors

Now, I spoke to someone last night who has run lots of Kaggle projects – crowdfunded data science challenges. Now in those projects you have data, data scientists but you don’t have subject matter experts – and that’s crucisl knowledge and skills to have on board. For instance when looking at malaria, there was a presumption that mosquito nets would be helpful, but the way they work looks like a shrine, like death… And they don’t want to sleep in them. So they used them as fiishing nets.

When we work with an organisation we do want a data set, but we also want an organisation open to seeing what the data reveals, not trying to push a particular agenda. And we have subject matter experts that add crucial context and understanding of the data, of any risks or concerns with the data as well.

We start with, e.g.:

We want to create image classification models

Using publicly available earth satellite imagery

So that those owrking in the transparancy sector can be made aware of irregular mining activity

So that they can improve environmental and conflict issues due to mining. 

Some of the data we use is open – and a lot of data I’ve work with is open – but also closed data, data generated by mission-driven organisational apps, etc.

And the data scientists on these projects are at the top of their game, who these organisations could not afford to work with or recruit earlier.

So, for this project we used a random forest analyser on the data, to find mine locations. We had had generated training data for this project which determined that we can pick out where illegal mining work has occured with good accuracy.

To find out more and get involved – and I’d encourage you to do that – go to: dataking.org/getinvolved

Q&A

Q1) Where do you see DataKind going?

A1) We do a lot with not a lot of money. I had assumed that DataKind was 100 people when I joined, it was less than 10. I would love to see this model replicated in other countries. And conferences… Bringing volunteer data scientists together with providers enables us to increase the opportunity for these things to happen. Bringing these people together, those conferences are rich experiences that amplify the impact of what we are doing.

Q2) For the mining project you can access the data online. The US Federal Government is hosting the data, and we used Google Earth engine in this work.

From Analytics To AI: Where Next For Government Use Of Data? – Eddie Copeland, Director of Government Innovation, Nesta

I’ve been talking to anyone who will listen over the last 5 years about the benefits of public sector data. We have been huge proponents of using open data, but often data has been released in a vague hope that someone else might do something with it. And we have the smart cities agenda, generating even more data that often we have no idea how to use. But there is a missing link there… The idea that public organisations should be the main consumer of their own data, for improving their own practice.

Now you’ll have read all those articles asking if data is the new “oil”, the new “fuel”, the new “soil”! I don’t much care about the analogy but the key thing is that data is valuable. Data enables the public sector to work better, it enables many of the tried and tested ways of working better. Doing more and better with less. But that’s hard to do. For a public sector organisation with lots of amazing data on opportunities and challenges in my area, but not the next door area, how can I understand that bigger picture. We can target resources to the most vulnerable areas, but we need data to tell us where those are. Without visibility across different organisations/parts of the public sector (e.g. in family and child services), how can that data be used to understand appropriate support and intervention?

Why do we focus on data issues? Well, there is a technology challenge as so many public sector organisations have different IT services. And you have outrageous private sector organisations who charge the public sector to access their own data – they should be named and shamed. Even when you get the data out the format can be inconsistent, it’s hard to use. Then there is what we can do with the data – we often urge on the side of caution, not what is useful. Historically the main data person in public sector organisations was the “data protection officer” – the clue is in the title!  It takes an organisational leap to collaborate on issues where that makes sense.

I used to work for a think tank and I got bored of that, I really wanted to be part of a “do tank”, to actually put things into action. And I found this great organisation called Nesta and we have set up the London Office of Data Analytics:

  • an impactful problem – it takes time, backing, support you have to have a problem that matters
  • a clearly defined intervention – what would you do differently if you had all the information you could want about the problem you want to solve (data science is not the innovation)
  • what is the information asset you would need to undertake that intervention?
  • what intervention do you need to undertake to solve that issue?

So when we looked at London the issue that seemed to fit these criteria was unlicensed Houses of Multiple Occupancy, and how we might predict that. We asked housing officers how they identified these properties, we looked at what was already known, we looked at available information around those indicators. And then developing machine learning to predict those unlicensed HMOs – we are now on the third version of that.

We have also worked on a North East Data Pilot to join up data across the region to better understand alcohol harms. But we didn’t know what intervention might be used, which has made this harder to generate value from.

And we are now working on the Essex Centre for Data Analytics, looking at the issue of modern slavery.

Having now worked through many of these examples, we’ve found that data is the gateway drug to better collaboration between organisations. Just getting all the different players in the room, talking about the same problem in the same way, is hugely valuable. And we see collaborations being set up across the place.

So, things we have learned:

  1. Public sector leaders need to create the space and culture for data to make a difference – there is no excuse for not analysing the data, and you’ll have staff who know that data and just need the excuse to focus and work on this.
  2. Local authorities need to be able to link their own data – place based and person based data.
  3. We need consistent legal advice across the public sector. Right now lots of organisations are all separately getting advice on GDPR when they face common issues…

So, what’s next? Nesta is an innovation organisation. There is excitement about technologies of all types. For this audience AI probably is overhyped but nonetheless that has big potential, particularly algorithmic decision making out in the field. Policy makers talk about evidence based decision making, but AI can enable us to take that out into the field. Of course algorithms could do great things, but we also have examples that are bad… Companies hiring based on credit records is not ok. Public sector bodies not understanding algorithmic bias is not ok. For my own part I published 10 principles for a code of conduct for public sector organisations to use data centres – I’d love your feedback at bit.ly/NestaCode.

It is not OK to use AI to informa a decision if the person using it could not reasonable understand its basic objectives, function and limitations. We would face a total collapse of trust that could set us back a decade. And we’ve seen over the last week what that could mean.

Q&A

Q1) Aren’t the problems you are talking about are surely people problems?

A1) Public organisations are being asked to do more with less, and that makes it difficult for that time to be carved out to focus on these challenges, that’s part of why you need buy in and commitment at senior level. There is a real challenge here about finding the right people… The front line workers have so much knowledge but you have organisations who

Q2) Your comment that you have to understand the AI, GDPR require a right to explanation to use of data and that’s very hard to do unless automated.

A2) Yes, that’s a really untested part of GDPR. If local authorities buy in data they have to understand where that data is from, what data is being used and what that means. In the HMO example local front line staff can look at those flags from the prediction and add their own knowledge of the context of, for instance, a local landlord’s prior record. But that understanding of how to use and action that data is key.

Data Driven Business. It’s Not That Hard.- Alex Depledge, Founder Resi.co.uk,, Former CEO Hassle.com

That’s a deliberately provocative title – I knew that this would be a room full of intellectuals and I’m going to bring back down to earth. I’m known for setting up hassle.com, and I think it’s fitting that I am following Eddie talking about the basics and the importance of getting the basics right. So many companies that say they are running a data driven business, and they are not… Few are actually doing this.

I started my professional life at Accenture. I met my co-founder there. About 7 years into our friendship she emailed me and said “I’ve got it. I need a piano teacher, I’ve been Googling for four hours, we need a place to find music teachers”. And I said “that’s a rubbish idea”. And then I needed a wysteria trimmed… And we decided we wanted to build a marketplace for local services… We had a whole idea, a powerpoint deck, and thought that great, we’ll get a team in India or Singapore to build it… Sounded great, but nothing happened.

And then Jules quit her well paid job and she said “it’s ok, I’ve brought a book!” – and it was a Ruby on Rails book… She started coding… And she built a thing. And that led to us going through a Springboard process… We had some data but I was trying to pull in money. We were attracting some customers, but not a lot of service providers… We were driven by intuition or single conversations… So one day I said that I’m quitting and going back to the day job… And I was frustrated… And a collague said “maybe we should look at what the data says?”… And so they looked. And they found that 1 in 4 people coming to the website wants a cleaner. And we were like “holy shit!”. Because we didn’t have any cleaners. So we threw away what we had, we set up a three page site. We went all in so you could put a postcode in, find a cleaner, and book them. We got 27 bookings, then double that… And we raised some funding – £250k just when we desperately needed it. We found cleaners, we scaled up, we got much bigger investment. And we scaled up to 100 people.

Then we really turned into a data driven business, building what people want, try it, check the data, iterate. Our VC at Axel pushed us to use mobile… We weren’t convinced. We checked the data that actually people booked cleaners from their desk at lunchtime. At our pinnacle we moved 10k cleaners around London at one point. We had to look at liquidity and we needed cleaners to have an average of 30 hours of work per week… too few and cleaners weren’t happy, too high and jobs weren’t taken up. So at 31 hours we’d start recruiting.

From there we looked at expansion and what kind of characteristics were needed. We needed cities like a donut – clients in the middle, cleaners at the outside. We grew but then we got some unwanted attention and chose to sell. For £32 million. And the company that brought us had 80 engineers.. And they migrated 16 countries onto our platform which had been built by 8 engineers.

So, we sold our business…. And I thought I’m not going to do that again…

And then I wanted a new kitchen… So I had an architect in… spent £@500… 45 days later I got plans… and 75 days later I had an illustration of how it would look so I could make a decision. And so I started Resi, the first online architect. And it took me just 4 months to be convinced that this could be a business. We set up a page of what we thought we might do. I spent £10 per day on Facebook A/B testing ads. And we’ve had a huge amount of business…. We wanted to find the sweet spot for achitects and how long the work would take. Again we needed to know how much time was needed for each customer. So 3 hours is our sweet spot. Our business is now turning over £1 million a year after one year. And only one person works with data, he also does marketing. He looked at our customers and when they convert and how our activities overlaid. After 10 days we weren’t following up, and adding some intervention (email/text etc.) tripled our conversions.

We’ve also been able to look at hotspots across the UK, and we can target our marketing in those areas, and also understand that word of mouth… We can take advantage of that.

I’m a total data convert. I still don’t like spreadsheets. Data informs our decisions – not quite every decision as instinct matters too. But every piece of data analysis we did was doable in a spreadsheet by someone in high school… It doesn’t take machine learning, or AI, or big data. Even simple analysis can create tremendous results.

Q&A

Q1) What next?

A1) I always said I didn’t want to dine out on one story… Like Hassle. But I don’t know the end for Resi yet… Invite me back in a few years!n

Q1) The learning for a few hours of work was huge.

A1) Our entire business was based on a single piece of analysis – what were our customers looking for led to £32m.

The AI Race: Who’s Going To Win? – Vicky Brock (VB – chairing), CEO, Get Market Fit; Alex Depledge (AD), Founder Resi.co.uk, Former CEO Hassle.com; Joel KO (JK), Founding CEO, Marvelstone Ventures; Chris Neumann (CN), Early Stage Investor

CN: I’m a recovering entrepreneur. As an investor I’ve had a global purview on what’s going on in the AI race. And I think it’s interesting that we see countries and areas which haven’t always been at the cutting edge of technology, really finding the opportunities here. Including Edinburgh.

JK: We are funders based in Singapore and investing in FinTech. The AI technology has been arising… I’m hoping to invest in AI start ups and incubators.

AD: You already know who I am. In my brief hiatus between companies I was an entrepreneur in residence in Index Ventures, and I saw about 300 companies come in saying they were doing AI or Machine Learning so I have some knowledge here. But also knowing a leading professor in data ethics I don’t care who wins, but I care that Pandora isn’t let out of her box until governments have a handle on this because the risks are great.

VB: I’m a serial entrepreneur around data. And machine learning or AI can kind of be the magic words for getting investment. There is obvious hype here… Is it a disruptor?

CN: I’ve seen a lot of companies – like Alex – say they use ML or AI… In some ways its the natural progression from being data driven. I do think there will be an incredible impact on society over the next 10 years from AI. But I don’t think it will be the robots and tech from science fiction, it will probably be in more everyday ways.

VB: Is AI the key word to get funding…

JK: I see many AI start ups… But often actually it’s a FinTech start up… But they present themselves that way as funders like to hear that… There is so much data… And AI does now spread into data lives… Entrepreneurs see AI as a way to sell themselves to investors.

VB: At one stage it was “big data” then “AI” but you’ve had some little data… What did you see when you were entrepreneur in residence?

AD: No disrespect to investors but they focus on financials and data, but actually I’d often be asking about what was happening under the bonnet… So if they were were using machine learning, ask about that, ask about data sets, ask where it’s coming from… But often they do interesting data work but it’s a good algorithm or calculation… It’s not ML or AI. And that’s ok – that’s something I wanted to bring out in my presentation.

VB: What’s looking exciting now?

CN: We see really interesting organisations starting to do fascinating work with AI and ML. I focus on business to business work, but that often looks less exciting to others. So I am excited about an investment I’ve made in a company using BlockChain to prove GDPR compliance. I spoke with a cool company here using wearables and AI for preventing heart attacks, which is really amazing.

JK: I have been here almost a week, met start ups, and they were really really practical. They have the sense to make a revenue stream from the technology. And these very new start ups have been very interesting to me personally.

VB: You’ve started your next company, did you cross lots of ideas off first…

AD: Jules and I had a list of things we wouldn’t do… Chris talked about B2B… We talked about not doing large scale or consumer ideas. We whittled our list of 35 ideas down to 4 each and they were all B2B… But they bored us. We liked solving problems we’ve experienced. My third business I hope will be B2B as getting to £10m is a bit more straightforward than in B2C.

VB: AI requires particular skillsets… How should we be thinking about our skillsets and our talents.

CN: Eddie talked earlier about needing to know what the point in. It can be easy to get lost in the data, to geek out… And lose that focus. So Alex just asking that question, finding out who gives a damn, that’s really important. You have to do something worthwhile to somebody, there’s no point doing it .

JK: With AI… In ten years… Won’t be coding. AI can code itself. So my solution is that you should let your kids play outside. In Asia lots of parents send kids to coding schools… They won’t need to be engineers… Parents’ response to the trend is too early and not thought through…

AD: I totally agree. Free play and imagination and problem solving is crucial. There aren’t enough women in STEM. But you can over focus on STEM. It’s data and digital literacy from any angle, it could be UX, marketing, product management, or coding… In London we hav ethis idea that everyone should be coding, but actually digital literacy is the skills we need to close. And actually that comes down to basic literacy and numeracy. It’s back to basics to me.

VB: I’d like to make a shout out for arts and social sciences graduates. We learn to ask good questions…

AD: Looking at recent work on where innovation comes from, it comes from the intersectionality of disciplines. That’s when super exciting stuff happens…

Q&A

Q1) Mainly for Alex… I’m machine learning daft… And I love statistics. And I know the value of small scale statistics. And the value of machine learning and large scale data – not so much AI. How do you convey that to business people?

AD) We don’t have a stand out success in the UK. But with big corporates I tell them to start small.. Giving engineers space to play, to see what is interesting… That can yield some really interesting results. You can’t really show people stuff, you need to just try things.

VB) Are you trying to motivate people to use data in your company?

JK) Yes, with investors you see patterns… I tell kids to start start ups as early as possible… So they can fail earlier… Because failures then lead to successful businesses next time.

CN) A lot of folk won’t be aware that for many organisations there is a revenue stream around innovation… It’s a really difficult thing to try to bring in innovative practices into big organisations, or collaborate with them, without squishing that. There are VCs and multinationals who will charge you a lot of money to behave like a start up… But you can just start small and do it!

The Revolutionary World Of Data Science – Passing On That Tacit Knowledge! – Shakeel Khan, Data Science Capability Building Manager, HM Revenue & Customs

I’ve been quite fortunate in my role in that I’ve spend quite a lot of time working with both developed and developing economies around data science. There is huge enthusiasm across the world from governments. But there is also a huge fear factor around rogue players, and concerns about the singularity – machines exceeding humans’ capabilities. But there are genuine opportunities there.

I’ve been doing work in Pakistan, for DFID, where they have a huge problem with Dengy Fever. They have tracked the spread with mobile phone data, enabling them to contain it at source. That is saving lives. That’s a tremendous outcome. Closer to home, John Bell at Cambridge University has described AI as the saviour of our health services, as AI can enable us to run our services more effectively and more economically.

In my day job at HMRC, you can’t underestimate what the work that we do enables in terms of investment in the country and its services.

I want to talk about AI at three stages: Identify; Adopt; Innovate.

In terms of data science and what is being done around the world… The United Arab Emirates have set up their Ministry of AI and a 2031 Articificial Intelligebce Strategy. We have the Alan Turing Institute looking at specific problems but across many areas, some really interesting work there. In Edinburgh we have the amazing Data Lab, and the research that they are doing for instance with cancer, and we have the University of Edinburgh Bayes Centre. Lots going on in the developed world. But what about the developing world? I’ve just come back from Rwanda, who had a new Data Revolution Policy. I watched a TED talk a few weeks back that emphasised that what is not needed in sub0-saharan Africa is help, what they need is the tools and means to do things themself.

Rwanda is a hugely progressive country. They have more women in parliament (62.8%) than any country in the world. Their GDP is $8.3bn. They have a Data Revolution Policy. They are at the start of their journey. But they are trying to bring tacit knowledge in, to leapfrog development… Recognising the benefit of that tacit knowledge and of those face to face engagements.

For my role I am split about 50/50 between international development and work for HMRC. So I’ll say a bit more about the journey for developed economies…

Defining Data Science can be quite abstract. You have to make a benefits case, to support the vision, to share a framework and some idea of timeline, with quick wins, to build teams, to build networks. Having a framework allows organisations to build capabilities in a manageable way…

A new Data Science Centre going up in Kigali, Rwanda, will house 200 data scientsists – thats a huge commitment.

The data science strategic framework is about data; people skills; cultural understanding and acceptance – with senior buy in crucial for that… And identifying is also about data ethics, skills development – we have been developing frameworks for years that we can now share. For Rwanda we think we can reduce the time to develop data capabilities from maybe 5 years to perhaps 3. Similarly in Pakistan.

When you move to the adopt phase… You really need to see migrationa cross sectors. I started my career in finance. When I came to HMRC I did a review of machine learning and how that was being used, how that machine learning was generating benefit. We managed to bring in £29 bn that would otherwise be lost, partly through machine learning. One machine learning model can, effetively, bring in tens or hundreds of millions of pounds so they have to be well calibrated and tested. So, I developed the HMRC Predictive Analytics Handbook (from June 2014), which we’ve shared across HMRC but also DWP, across collaeagues in government.

In terms of Innovate, it is about understanding the field and latest developments. However HMRC are risk averse, so we want to see where innovation has worked elsewhere. So I did some work with Prof David Hand at Imperial College London about 20 years ago, and I got back in touch, and we developed a programme of data science learning. Not about Imperial providing training, it was a partnership between HMRC and Imperial. We looked closely at the curriculum and demonstrate value added, and look at how we could innovate what we do.

University of Edinburgh Informatics is a really interesting one. I read a document a few years ago by the late Prof. Jon Oberlander about the way that the academic and public and private sectors working together could really benefit the Scottish economy. Two years of work led to a programme in natural language processing that was the result of close collaboration in HMRC. Jon Oberlander was hugely influential, and passionate about conversational technology and the scourge of isolation. And was able to ask lots of questions about AI, and when that will be truly conversational. I hope to continue that work with Bayes, but also wanted to say thank you to Jon for that.

AI is increasingly touching our lives. Wherever we are in the world, sharing our tacit knowledge will be incredibly important.

Q&A

Q1) Rwanda has clearly made a deep impression. What were the most suprising things?

A1) People have stereotypes about sub saharan Africa that just aren’t true. For instance when you get off the plane you cannot take plastic bags in – they are an incredibly environmental country. I saw no litter anyway in the country. The people of Rwanda are truly committed to improving the lives of people.

Q2) Do you use the same machine learning methods for low income and high income tax payers/avoiders?

A2) There are some basic machine learning methods that are consistent, but we are also looking at more novel models like boosted trees.

Q3) I worked in Malawi and absolutely back up your comment about the importance of visiting. You talked about knowledge from yourself to Rwanda, how was the knowledge exchange the other way?

A3) Great question. It wasn’t learning all from developed to developing. We learnt a great deal from our trip. That includes cultural aspects. I terms of the foundations of data science, we in the UK have used machine learning in financial services and retail for 30 – 40 years, that isn’t really achievable in these countries at the moment and there it is learning going from developed to developing.

Closing comments – Maggie Philbin

I’ve been reflecting on the (less serious) ways data might influence my life. My son in law is in a band (White Lies) and that has given me such an insight into how the music industry use data – the gender and age of people who access your music, whether they will go to gigs etc. And in fact I was very briefly in a band myself during my Swap Shop days… We made a mock up Top of the Pops… Kids started writing in… And then BBC records decided to put it out… We had long negotiations about contracts… But I was sure no-one would buy it… It reached number 15… So we went from parodying Top of the Pops to being on Top of the Pops. And thank you to Scotland – we made number 9 here! But I hadn’t negotiated hard – we just got 0.5%. And if we’d had that data understanding that White Lies have, who knows where we would have been.

So, day one has been great. Thank you to The Data Lab, and to all the sponsors. And now we adjourn for drinks.

 March 22, 2018  Posted by at 10:53 am Events Attended, LiveBlogs Tagged with: , ,  No Responses »
May 082015
 
Image of surgical student activity data presented by Paula Smith at the Learning Analytics Event

Today I am at the UK Learning Analytics Network organised by the University of Edinburgh in Association with Jisc. Read more about this on the Jisc Analytics blog. Update: you can also now read a lovely concise summary of the day by Niall Sclater, over on the Jisc Analytics blog.

As this is a live blog there may be spelling errors, typos etc. so corrections, comments, additions, etc. are welcome. 

Introduction – Paul Bailey

I’m Paul Bailey, Jisc lead on the Learning Analytics programme at the moment. I just want to say a little bit about the network. We have various bits of project activities, and the network was set up as a means for us to share and disseminate the work we have been doing, but also so that you can network and share your experience working in Learning Analytics.

Housekeeping – Wilma Alexander, University of Edinburgh & Niall Sclater, Jisc

Wilma: I am from the University of Edinburgh and I must say I am delighted to see so many people who have traveled to be here today! And I think for today we shouldn’t mention the election!

Niall: I’m afraid I will mention the election… I’ve heard that Nicola Sturgeon and Alex Salmond have demanded that Tunnucks Teacakes and Caramel Wafers must be served at Westminster! [this gets a big laugh as we’ve all been enjoying caramel wafers with our coffee this morning!]

I’ll just quickly go through the programme for the day here. We have some really interesting speakers today, and we will also be announcing the suppliers in our learning analytics procurement process later on this afternoon. But we kick off first with Dragan.

Doing learning analytics in higher education: Critical issues for adoption and implementation – Professor Dragan Gašević, Chair in Learning Analytics and Informatics, University of Edinburgh

I wanted to start with a brief introduction on why we use learning analytics. The use of learning analytics has become something of a necessity because of the growing needs of education – the growth in the number of students and the diversity of students, with MOOCs being a big part of that realisation that many people want to learn who do not fit our standard idea of what a student is. The other aspect of MOOCs is their scale: as we grow the number of students it becomes difficult to track progress and the feedback loops between students and instructions are lost or weakened.

In learning analytics we depend on two types of major information systems… Universities have had student information systems for a long time (originally paper, computerised 50-60 years ago), but they also use learning environments – the majority of universities have some online coverage of this kind for 80-90% of their programmes. But we also don’t want to exclude other platforms, including communications and social media tools. And no matter what we do with these technologies we leave a digital trace, and that is not a reversible process at this point.

So, we have all this data but what is the point of learning analytics? It is about using machine learning, computer science, etc. approaches in order to inform education. We defined learning analytics as being “measurement, collection, analysis, and reporting” of education but actually that “how” matters less than “why”. It should be about understanding and optimising learning and the environments in which learning occurs. And it is important not to forget that learning analytics are there to understand learning and are about understanding what learning is about.

Some case studies include Course Signals at Purdue. They use Blackboard for their learning management system. They wanted to predict students who would successfully complete students, and to identify those at risk. They wanted to segment their students into at high risk, at risk, or not at risk at all. Having done that they used a traffic light system to reflect that, and they used that traffic light system for students was shown both to staff and students. When they trialed that (Arnold and Pistilli 2012) with a cohort of students, they saw greater retention and success. But if we look back at how I framed this, we need to think about this in terms of whether this changes teaching…

So, also at Purdue, they undertook a project analysing the email content of instructors to students. They found that more detailed feedback, they just increased the summative feedback. So this really indicates that learning analytics really has to feed into changes in teaching practices in our institutions, and we need our learning analytics to provide signalling and guidance that enables teaching staff to improve their practice, and give more constructive feedback. (see Tanes, Arnold, King and Remnet 2011).

University of Michigan looked at “gateway course” as a way to understand performance in science courses (see Wright, McKay, Hershock, Miller and Triz 2014). They defined a measure for their courses, which was “better than expected”. There were two measures for this: previous GPA, and goals set by students for the current course. They then used predictive models for how students could be successful, and ways to help students to perform better than expected. They have also been using technology designed for behavioural change, which they put to use here… Based on that work they generated personalised messages to every students, based on rational for these students, and also providing predicted performance for particular students. For instance an example here showed that a student could perform well beyond their own goals, which might have been influenced by the science course not being their major. The motivator for students here was productive feedback… They interviewed successful students from previous years, and used that to identify behaviours etc. that led to success, and they presented that as feedback from peers (rather than instructors). And i think this is a great way to show how we can move away from very quantitative measures, towards qualitative feedback.

So, to what extent are institutions adopting these approaches? Well, there are very few institutions with institution-wide examples of adoptions. For instance University of Michigan only used this approach on first year science courses. They are quite a distributed university – like Edinburgh – which may be part of this perhaps. Purdue also only used this on some course.

Siemans, Dawson and Lynch (2014) surveyed the use of learning analytics in the HE sector, asking about the level of adoption and type of adoption, ranking these from “Awareness” to “Experimentation” to “Organisation/Students/Faculty”, “Organisational Transformation” and “Sector Transformation”. Siemens et al found that the majority of HE is at the Awareness and Experimentation phase. Similarly Goldstein and Katz (2005) found 70% of institutions at phase 1, it is higher now but bear in mind that 70% doesn’t mean others are further along that process. There is still much to do.

So, what is necessary to move forward? What are the next steps? What do we need to embrace in this process? Well lets talk a bit about direction… The metaphors from business analytics can be useful, borrow lessons from that process. McKinsey offered a really interesting business model of: Data – Model – Transform (see Barton and Court 2012). That can be a really informative process for us in higher education.

Starting with Data – traditionally when we choose to measure something in HE we refer to surveys, particularly student satisfaction surveys. But this is not something with a huge return rate in all countries. More importantly surveys are not the accurate thing. We also have progress statistics – they are in our learning systems as are data but are they useful? We can also find social networks from these systems, from interactions and from course registration systems – and knowing who students hang out with can predict how they perform. We also find that we can get this data, but then how do we process and understand that data? I know some institutions find a lack of IT support can be a significant barrier to the use of learning analytics.

Moving onto Model… Everyone talks about predictive modelling, the question has to be about the value of a predictive model. Often organisations just see this as an outsourced thing – relying on some outsider organisation and data model that provides solutions, but does not do that within the context of understanding what the questions are. And the questions are critical.

And this is, again, where we can find ourselves forgetting that learning analytics is about learning. So there are two things we have to know about, and think about, to ensure we understand what analytics mean:

(1) Instructional conditions – different courses in the same school, or even in the same programme will have a different set of instructional conditions – different approaches, different technologies, different structures. We did some research on an University through their Moodle presence and we found some data that was common to 20-25% of courses, but we did identify some data you could capture that were totally useless (e.g. time online). And we found some approaches that explained 80% of variance, so for example extensive use of Turnitin – not just for plagiarism but also by students for gathering additional feedback. One of our courses defied all trends… they had a Moodle presence but when we followed up on this, found that most of their work was actually in social media so data from Moodle was quite misleading and certainly a partial picture. (see Gasevic, Dawson, Rogers, Gasevic, 2015)

(2) Learner agency – this changes all of the time. We undertook work on the agency of learners, based on log data from a particular course. We explored 6 clusters using cluster matching algorithms… We found that there was a big myth that more time on task would lead to better performance… One of our clusters spent so much time online, another was well below. When we compared clusters we found the top students were that group spending the least time online, the other cluster spending time online performed average. This shows that this is a complex questions. Learning styles isn’t the issue, learning profiles is what matters here. In this course, one profile works well, in another a different profile might work much better. (see Kovanovic, Gasevic, Jok… 201?).

And a conclusion for this section is that our analytics and analysis cannot be generalised.

Moving finally to Transform we need to ensure participatory design of analytics tools – we have to engage our staff and students in these processes early in the process, we won’t get institutional transformation by relying on the needs of statisticians. Indeed visualisations can be harmful (Corrin and de Barba 2014). The University of Melbourne looked at the use of dashboards and similar systems and they reported that for students that were high achieving, high GPA, and high aspirations… when they saw that they were doing better than average, or better than their goals, they actually under-perform. And for those doing less well we can just reinforce issues in their self efficacy. So these tools can be harmful if not designed in a critical way.

So, what are the realities of adoption? Where are the challenges? In Australia I am part of a study commissioned by the Australian Government in South Australia. This is engaging with the entire tertiary Australian institution. We interviewed every VC and management responsible for learning analytics. Most are in phase 1 or 2… Their major goal was to enable personalised learning… the late phases… They seemed to think that magically they would move from experimentation to personalised learning, they don’t seem to understand the process to get there…

We also saw some software driven approaches. They buy an analytics programme and perceive job is done.

We also see a study showing that there is a lack of a data-informed decision making culture, and/or data not being suitable for informing those types of decisions. (Macfadyen and Dawson 2012).

We also have an issue here that researchers are not focused on scalability here… Lots of experimentation but… I may design beautiful scaffolding based on learning analytics, but I have to think about how that can be scaled up to people who may not be the instructors for instance.

The main thing I want to share here is that we must embrace the complexity of educational systems. Learning analytics can be very valuable for understanding learning but they are not a silver bullet. For institutional or sectoral transformation we need to embrace that complexity.

We have suggested the idea of Rapid Outcome Mapping Approach (ROMA) (Macfadyen, Dawson, Pardo, Gasevic 2014) in which once we have understood the objectives of learning analytics, we also have to understand the political landscape in which they sit, the financial contexts of our organisations. We have to identify stakeholders, and to identify the desired behaviour changes we want from those stakeholders. We also have to develop engagement strategy – we cannot require a single approach, a solution has to provide incentives for why someone should/should not adopt learning analytics. We have to analyse our internal capacity to effect change – especially in the context of analytics tools and taking any value form them. And we finally have to evaluate and monitor chance. This is about capacity development, and capacity development across multiple teams.

We need to learn from successful examples – and we have some to draw upon. The Open University adopted their organisational strategy, and were inspired by the ROMA approach (see Tynan and Buckingham Shum 2013). They developed the model of adoption that is right for them – other institutions will want to develop their own, aligned to their institutional needs. We also need cross-institutional experience sharing and collaboration (e.g. SOLAR, the Society for Learning Analytics Research). This meeting today is part of that. And whilst there may be some competition between institutions, this process of sharing is extremely valuable. There are various projects here, some open source, to enable different types of solution, and sharing of experience.

Finally we need to talk about ethical and privacy consideration. There is a tension here… Some institutions hold data, and think students need to be aware of the data held… But what if students do not benefit from seeing that data? How do we prepare students to engage with that data, to understand this data. The Open University is at the leading edge here and have a clear policy on ethical use of student data. Jisc also have a code of practice for learning analytics which I also welcome and think will be very useful for institutions looking to adopt learning analytics.

I also think we need to develop an analytics culture. I like to use the analogy of, say, Moneyball, where analytics make a big difference… but analytics can be misleading. Predictive models have their flaws, their false positives etc. So a contrasting example would be the Trouble with the Curve – where analytics mask underlying knowledge of an issue. We should never reject our tacit knowledge as we look at adopting learning analytics.

Q&A

Q – Niall): I was struck by your comments about asking the questions… But doesn’t that jar with the idea that you want to look at the data and exploring questions out of that data?

A – Dragan): A great question… As a computer scientist I would love to just explore the data, but I hang out with too many educational researchers… You can start from data and make sense of that. It is valid. However, whenever you have certain results you have to ask certain questions – does this make sense in the context of what is taking place, does this make sense within the context of our institutional needs, and does this make sense in the context of the instructional approach? That questioning is essential no matter what the approach.

Q – Wilma) How do you accommodate the different teaching styles and varying ways that courses are delivered?

A – Dragan) The most important part here is about the development of capabilities – at all levels and in all roles including students. So in this Australian study we identified trends, found these clusters… But some of these courses are quite traditional and linear, others are more ambitious… They have a brilliant multi-faceted approach. Learning analytics would augment this… But when we aggregate this information… But when you have more ambitious goals, the more there is to do. Time is required to adopt learning analytics with sophistication. But we also need to develop tools to the needs of tasks of stakeholders… so stakeholders are capable to work with them… But also not to be too usable. There aren’t that many data scientists so perhaps we shouldn’t use visualisations at all, maybe just prompts triggered by the data… And we also want to see more qualitative insights into our students… their discussion… when they are taking notes… And that then really gives an insight… Social interactions are so beneficial and important to benefit student learning.

Q – Wilbert) You mentioned that work in Australia about Turnitin… What was the set up there that led to that… Or was it just the plagiarism prediction use?

A – Dragan) Turned out to be the feedback being received through Turnitin… Not plagiarism side. Primarily it was on the learner side, not so much the instructors. There is an ethical dilemma there if you do expose that to instructors… If they are using the system to get feedback… Those were year one students, and many were international and from Asia and China where cultural expectation of reproducing knowledge is different… So that is also important.

Q) Talking about the Purdue email study, and staff giving formative feedback to students at risk – how did that work?

A) They did analysis of these messages, and the content of them, and found staff mainly giving motivational messages. I think that was mainly because traffic light system indicated at risk nature but not why that was the case… you need that information too..

Q) Was interested in rhetoric of personalised learning by Vice Chancellors, but most institutions being at stage 1 or 2… What are the institutional blockers? How can they be removed?

A) I wish I had an answer there! But the senior leaders are sometimes forced to make decisions based on financial needs, not just about being driven by data or unaware of data. So in Australian institutions many are small organisations, with limited funding… and existence of the institutions is part of what they have to face, quite aside from adoption of learning analytics. But also University of Melbourne is a complex institution, a leading researcher there but cannot roll out same solution across very different schools and courses….

Niall: And with that we shall have to end the Q&A and hand over to Sheila, who will talk about some of those blockers…

Learning Analytics: Implementation Issues – Sheila MacNeill, Glasgow Caledonian University

I was based at CETIS involved in learning analytics for a lot of that time… But for the last year and a half I have been based at Glasgow Caledonian University. And today I am going to talk about my experience of moving from that overview position to being in an institution and actually trying to do it… I’m looking for a bit of sympathy and support, but hoping to also contextualise some of what Dragan talked about.

Glasgow Caledonian University has about 17,000 students, mostly campus based although we are looking at online learning. We are also committed to blended learning. We provide central support for the university, working with learning technologies across the institution. So I will share my journey… joys and frustrations!

One of the first things I wanted to do was to get my head around what kind of systems we had around the University… We had a VLE (Blackboard) but I wanted to know what else people were using… This proved very difficult. I spoke to our IS department but finding the right people was challenging, a practical issue to work around. So I decided to look a big more broadly with a mapping of what we do… looking across our whole technology position. I identified the areas and what fitted into those areas:

  • (e) Assessment and feedback – Turnitin – we see a lot of interest in rubrics and marking and feedback processes that seem to be having a big impact on student success and actually plagiarism isn’t its main usefulness the more you use it, Gradecentre, Wikis/blogs/journals, peer/self assessment, (e)feedback.
  • (e) Portfolios – wikis/blogs/journals, video/audio – doing trials with nursing students of a mobile app in this space.
  • Collaboration – discussion boards, online chat, video conferencing etc.
  • Content – lectures, PDFs, etc….

I’ve been quite interested in Mark (?) idea of a “core VLE”. Our main systems group around SRS (students records system – newly renamed from it’s former name, ISIS), GCU Learn, the Library, 3rd Party Services. When I did hear from our IS team I found such a huge range of tools that our institution has been using, it seems like every tool under the sun has been used at some point.

In terms of data… we can get data from our VLE, from Turnitin, from wikis etc. But it needs a lot of cleaning up. We started looking at our data, trying it on November data from 2012 and 2013 (seemed like a typical month). And we found some data we would expect, changes/increases of use over time. But we don’t have data on a module level, or programme level, etc. Hard to view in detail or aggregate up (yet). We haven’t got data from all of our systems yet. I would say we are still at the “Housekeeping” stage… We are just seeing what we have, finding a baseline… There is an awful lot of housekeeping that needs to be done, a lot of people to talk to…

But as I was beginning this process I realised we had quite a number of business analysts at GCU who were happy to talk. We have been drawing out data. We can make dashboards easily, but USEFUL dashboards are proving more tricky! We have meanwhile been talking about Blackboard about their data analytics platform. It is interesting thinking about that… given the state we are in about learning analytics, and finding a baseline, we are looking at investing some money to see what data we can get from Blackboard that might enable us to start asking some questions. There are some things I’d like to see from, for example, combining on campus library card data with VLE data. And also thinking about engagement and what that means… Frustratingly for me I think that it is quite hard to get data from Blackboard… I’m keen that next license we sign we actually have a clause about the data we want, in the format we want, when we want it… No idea if that will happen but I’d like to see that.

Mark Stubbs (MMU) has this idea of a tube map of learning… This made me think of the Glasgow underground map – going in circles a bit, not all joining up. We really aren’t quite there yet, we are having conversations about what we could, and what we should do. In terms of senior management interest in learning analytics… there is interest. But when we sent out the data we had looked we did get some interesting responses. Our data showed a huge increase in mobile use – we didn’t need a bring your own device policy, students were already doing it! We just need everything mobile ready. Our senior staff are focused on NSS and student survey data, that’s a major focus. I would like to take that forward to understand what is happening, and more structured way…

And I want to finish by talking about some of the issues that I have encountered. I came in fairly naively to this process. I have learned that…

Leadership and understanding is crucial – we have a new IS director which should make a great difference. You need both carrots and stick, and that takes a real drive from the top to make things actually start.

Data is obviously important. Our own colleagues have issues access data from across the institution. People don’t want to share, they don’t know if they are allowed to share. There is a cultural thing that needs investigating – and that relates back to leadership. There are also challenges that are easy to fix such as server space. But that bigger issue of access/sharing/ownership all really matter.

Practice can be a challenge. Sharing of experience and engagement with staff, having enough people understanding systems, is all important for enabling learning analytics here. The culture of talking together more, having a better relationship within an institution, matters.

Specialist staff time matters – as Dragan highlighted in his talk. This work has to be prioritised – a project focusing on learning analytics would give the remit for that, give us a clear picture, establish what needs to be done. To not just buy in technology but truly assess needs before doing that, and in adopting technology. There is potential but learning analytics has to be a priority if it is to be adopted properly.

Institutional amnesia – people can forget what they have done, why, and what they do not do it before… More basic house keeping again really. Understanding, and having tangible evidence of, what has been done and why is also important more broadly when looking at how we use technologies in our institutions.

Niall: Thanks for such an honest appraisal of a real experience there. We need that in this community, not just explaining the benefits of learning analytics. The Open University may be ahead now, but it also faced some of those challenges initially for instance. Now, over to Wilma.

Student data and Analytics work at the University of Edinburgh – Wilma Alexander, University of Edinburgh

Some really interesting talks already to do, I’ll whiz through some sections in fact as I don’t need to retread some of this. I am based in Information Services. We are a very very large, very old University, and it is very general. We have a four year degree. All of that background makes what we do with student data, something it is hard to generalise about.

So, the drivers for the project I will focus on, came out of the understanding we already have about the scale and diversity of this institution. Indeed we are increasingly encouraging students to make imaginative cross overs between schools and programmes which adds to this. Another part of the background is that we have been seriously working in online education, and in addition to a ground breaking digital education masters delivered online, we also have a number of online masters. And further background here is that we have a long term set of process that encourages students to contribute to the discussions within the university, owners and shapers of their own learning.

So, we have an interest in learning analytics, and understanding what students are doing online. We got all excited by the data and probably made the primary error of thinking about how we could visualise that data in pretty pictures… but we calmed down quite quickly. As we turned this into a proper project we framed it much more in the context of empowering students around their activities, about data we already have about our students. We have two centrally supported VLEs at Edinburgh (and others!) which are Blackboard Learn, our main largest system with virtually all on campus programmes use that VLE in some way, but for online distance programmes we took the opportunity to try out Moodle – largely online programmes, and largely created as online distance masters programmes. So, already there is a big distance between how these tools are used in the university, never mind how they are adopted.

There’s a video which shows this idea of building an airplane whilst in the air… this projects first phase, in 2014, has felt a bit like that at times! We wanted to see what might be possible but also we started by thinking about what might be displayed to students. Both Learn and Moodle give you some data about what students do in your courses… but that is for staff, not visible to students. When we came to looking at the existing options… None of what Learn offers quite did what we wanted as none of the reports were easily made student facing (currently Learn does BIRT reports, course reports, stats columns in grade center etc). We also looked at Moodle and there was more there – it is open source and developed by the community so we looked at available options there…

We were also aware that there were things taking place in Edinburgh elsewhere. We are support not research in our role, but we were aware that colleagues were undertaking research. So, for instance my colleague Paula Smith was using a tool to return data as visualisations to students.

What we did as a starting point was to go out and collect user stories. We were asking both staff and students, in terms of information available in the VLE(s), what sort of things would be of interest. We framed this as a student, as a member of staff, as a tutor… as “As a… I want to… So that I can…”. We had 92 stories from 18 staff and 32 students. What was interesting here was that much of what was wanted was already available. For staff much of the data they wanted they really just had to be shown and supported to find the data already available to them. Some of the stuff that came in as “not in scope” was not within the very tight boundaries we had set for the project. But a number of things of interest, requests for information, that we passed on to appropriate colleagues – so one area for this was reading lists and we have a tool that helps with that so we passed that request onto library colleagues.

We also pooled some staff concerns… and this illustrates what both Dragan and Sheila have said about the need to improve the literacy of staff and students using this kind of information, and the need to contextualise it… e.g: “As a teacher/personal tutor I want to have measures of activity of the students so that I can be alerted to who are “falling by the wayside” for instance – a huge gap between activity and that sort of indicator.

Student concerns were very thoughtful. They wanted to understand how they compare, to track progress, they also wanted information on timetables of submissions, assignment criteria/weighting etc. We were very impressed by the responses we had and these are proving valuable beyond the scope of this project…

So, we explored possibilities, and then moved on to see what we could build. And this is where the difference between Learn and Moodle really kicked in. We initially thought we could just install some of the Moodle plugins, and allow programmes to activate them if they wanted to… But that fell at the first hurdle as we couldn’t find enough staff willing to be that experimental with a busy online MSc programme. The only team up for some of that experimentation were the MSc in Digital Education team, which was done as part of a teaching module in some strands of the masters. This was small scale hand cranked from some of these tools. One of the issues with pretty much all of these tools is that they are staff facing and therefore not anonymous.So we had to do that hand cranking to make the data anonymous.

We had lots of anecdotal and qualitative information through focus groups and this module, but we hope to pin a bit more down on that. Moodle is of interest as online distance students… there is some evidence that communication, discussion etc. activity is a reasonable proxy for performance here as they have to start with the VLE.

Learn is a pretty different beast as it is on campus. Blended may not have permeated as strongly on campus. So, for Learn what we do have this little element that produces a little click map of sorts (engagements, discussion, etc)… For courses that only use the VLE for lecture notes, that may not be useful at all, but for others it should give some idea of what is taking place. We also looked at providing guidebook data – mapping use of different week’s sections/resources/quizzes to performance.

We punted those ideas out. The activity information didn’t excite folk as much (32% thought it was useful). The grade information was deemed much more useful (97% thought it was useful)… But do we want our students hooked on that sort of data? Could it have negative effects, as Dragan talked about. And how useful is that overview?

When it came to changes in learning behaviour we had some really interesting and thoughtful responses here. Of the three types of information (discussion boards, grade, activity) it was certainly clear though that grade was where the student interest was.

We have been looking at what courses use in terms of tools… doing a very broad brush view of 2013/14 courses we can see what they use and turn on from: some social/peer network ability – where we think there really is huge value, the percentage of courses actively using those courses on campus are way below those using the VLE for the other functions of Content+Submission/Assessment and Discussion Boards.

So context really is all – reflecting Dragan again here. It has to work for individuals on a course level. We have been mapping our territory here – the university as a whole is hugely engaged in online and digital education in general, and very committed to this area, but there is work to do to join it all up. When we did information gathering we found people coming out of the woodwork to show their interest. The steering group from this project has a representative from our student systems team, and we are talking about where student data lives, privacy and data protection, ethics, and of course also technical issues quite apart from all that… So we also have the Records Management people involved. And because Jisc has these initiatives, and there is an EU initiative, we are tightly engaging with the ethical guidance being produced by both of these.

So, we have taken a slight veer from doing something for everyone in the VLEs in the next year. The tool will be available to all but what we hope to do is to work very closely with a small number of courses, course organisers, and students, to really unpick on a course level how the data in the VLE gets built into the rest of the course activity. So that goes back into the idea of having different models, and applying the model for that course, and for those students. It has been a journey, and it will continue…

Using learning analytics to identify ‘at-risk’ students within eight weeks of starting university: problems and opportunities – Avril Dewar, University of Edinburgh

This work I will be presenting has been undertaken with my colleagues at the Centre for Medical Education, as well as colleagues in the School of Veterinary Medicine and also Maths.

There is good evidence that performance in first year will map quite closely to performance as a whole in a programme. So, with that in mind, we wanted to develop an early warning system to identify student difficulties and disengagement before they reach assessment. Generally the model we developed worked well. About 80% of at risk students were identified. And there were large differences between the most and least at-risk students – the lowest risk score and the highest risk score which suggests this was a useful measure.

The measures we used included:

  • Engagement with routine tasks
  • Completion of formative assessment – including voluntary formative assessment
  • Tutorial attendance (and punctuality where available) – but this proved least useful.
  • Attendance at voluntary events/activities
  • Virtual Learning Environment (VLE) exports (some)
    • Time until first contact proved to be the most useful of these

We found that the measures sometimes failed because the data exports were not always that useful for appropriate (e.g. VLE tables of 5000 colums). Patterns of usage were hard to investigate as raw data on, e.g. time of day of accesses, not properly usable though we think that is useful. Similarly there is no way to know if long usage means a student has logged in, then Googled or left their machine, then returned – or whether it indicates genuine engagement.

To make learning analytics useful we think we need the measures, and the data supporting them, to be simple, to be comprehensible, accessible – and also comparable to data from other systems (e.g. we could have used library data alongside our VLE issues), to scale easily – e.g. common characteristics between schools, not replicating existing measures, discriminates between students – some of the most useful things like the time to first contact, central storage.

We also found there were things that we could access but didn’t use. Some for ethical and some for practical reasons. IP addresses for location was an ethical issue for us, discussion boards similarly we had concern about – we didn’t want students to be put off participating in discussions. Or time taken to answer individual questions. We are concerned that theoretical issues that could be raised could include: evidence that student has been searching essay-buying websites; student is absent from class and claims to be ill but IP address shows another location, etc.

There were also some concerns about the teacher-student relationship. Knowing too much can create a tension in the student-teacher relationship. And the data one could gather about a student could become a very detailed tracking and monitoring system… for that reason we always aim to be conservative, rather than exhaustive in our data acquisition.

We have developed training materials and we are making these open source so that we can partner with other schools, internationally. Whilst each school will have their own systems and data but we are keen to share practice and approaches. Please do get in touch if you would like access to the data, or would like to work with us.

Q&A

Q – Paula) Do you think there is a risk of institutions sleep walking into student dissatisfaction. We are taking a staged approach… but I see less effort going into intervention, to the staff side of what could be done… I take it that email was automated… Scalability is good for that, but I am concerned students won’t respond to that as it isn’t really personalised at all. And how were students in your project, Avril, notified.

A – Avril) We did introduce peer led workshops… We are not sure if that worked yet, still waiting for results of those. We emailed to inform our students if they wanted to be part of this and if they wanted to be notified of a problem. Later years were less concerned, saw the value. First year students were very concerned, so we phrased our email very carefully. When a student was at risk emails were sent individually by their personal tutors. We were a bit wary of telling students of what had flagged them up – it was a cumulative model… were concerned that they might then engage just with those things and then not be picked up by the model.

Niall: Thank you for that fascinating talk. Have you written it up anywhere yet?

Avril: Soon!

Niall: And now to Wilbert…

The feedback hub; where qualitative learning support meets learning analytics – Wilbert Kraan, Cetis

Before I start I have heard about some students gaming some of the simpler dashboards so I was really interested in that.

So, I will be sort and snappy here. The Feedback Hub work has just started… this is musings and questions at this stage. This work is part of the larger Jisc Electronic Management of Assessment (EMA) project. And we are looking at how we might present feedback and learning analytics side by side.

The EMA project is a partnership between Jisc, UCISA and HeLF. It builds on earlier Jisc Assessment and Feedback work And it is a co-design project that identifies priorities, solution areas… and we are now working on solutions. So one part of this is about EMA requirements and workflows, particularly the integration of data (something Sheila touched upon). There is also work taking place on an EMA toolkit that people can pick up and look at. And then there is the Feedback Hub, which I’m working on.

So, there is a whole assessment and feedback lifecycle (as borrowed from a model developed by Manchester Metropolitan, with they permission), This goes from Specifying to Setting, Supporting, Submitting, Marking and production of feedback, Recording of grades etc… and those latter stages is where the Feedback Hub sits.

So, what is a feedback hub really? It is a system that provides a degree programme of life wide view of assignments and feedback. The idea is that it moves beyond the current module that you are doing, to look across modules and across years. There will be feedback that is common across areas, that gives a holistic view of what has already been done. So this is a new kind of thing… When I look at nearest tools I found VLE features – database view of all assignments for a particular student for learner and tutor to see. A simple clickable list that is easy to do and does help. Another type is a tutoring or assignment management system – capturing timetables of assignments, tutorials etc. These are from tutor perspective. Some show feedback as well. And then we have assignment services – including Turnitin – about plagiarism, but also management of logistics of assignments, feedback etc.

So, using those kinds of tools you can see feedback as just another thing that gets put in the learning records store pot in some ways. But feedback can be quite messy, hard to disentangle in line feedback from the document itself. Teachers approach feedback differently… though pedagogically the qualitative formative feedback that appears in these messy ways can be hugely valuable.  Also these online assessment management tools can be helpful for mapping and developing learning outcomes and rubrics – connecting that to the assignment you can gain some really interesting data… There is also the potential for Computer Aided Assessment feedback – sophisticated automated data on tests and assignments which work well in some subjects. And possibly one of the most interesting learning analytics data is on the engagement with feedback. A concern from academic staff is that you can give rich feedback, but if the students don’t use it how useful it is really? So capturing that could be useful…

So, having identified those sources, how do we present such a holistic view? One tool presents this as an activity stream – like Twitter and Facebook – with feedback part of that chronological list of assignments… We know that that could help. Also an expanding learning outcomes rubric – click it to see feedback connected to it, would it be helpful? We could also do text extraction, something like Wordle, but would that help? And the other thing that might see is clickable grades – to understand what a grade means… And finally should we combine feedback hub with analytics data visualisations.

Both learning analytics and feedback track learning progress over time, and try to predict the future. Feedback related data can be a useful learning analytics data source.

Q&A

Q – Me) Adoption and issues of different courses doing different things? Student expectations and added feedback?

A) This is an emerging area… IET in London/University of London have been trialing this stuff… they have opened that box… Academic practice can make people very cautious…

Comment) Might also address the perennial student question of wanting greater quality feedback… Might address deficit of student satisfaction

A) Having a coordinated approach to feedback… From a pedagogical point of view that would help. But another issue there is that of formative feedback, people use these tools in formative ways as well. There are points of feedback before a submission that could be very valuable, but the workload is quite spectacular as well. So balancing that could be quite an interesting thing.

Jisc work on Analytics – update on progress to date– Paul Bailey, Jisc and Niall Sclater. 

Paul: We are going to give you a bit of an update on where we are on the Learning Analytics project, and then after that we’ll have some short talks and then will break out into smaller groups to digest what we’ve talked about today.

The priorities we have for this project are: (1) basic learning analytics solution, an interventions tool and a student tool; (2) code of practice for learning analytics; and (3) learning analytics support and network.

We are a two year project, with the clock ticking from May 2015. We have started by identifying suppliers to initiate contracts and develop products; then institutions will be invited to participate in the discovery stage or pilots (June-Sept 2015). Year 1 in Sept 2015-2016 we will run that discovery stage (10-20 institutions), pilots (10+ institutions); institutions move from discovery to pilot. Year 2 will be about learning from and embedding that work. And for those of you that have worked with us in the past, the model is a bit different: rather than funding you then learning from that, we will be providing you with support and some consultancy and learning from this as you go (rather than funding).

Michael Webb: So… we have a diagram of the process here… We have procured a learning records warehouse (the preferred supplier there is H2P). The idea that VLEs, Student Information Systems and Library Systems feeding into that. There was talk today of Blackboard being hard to get data out of, we do have Blackboard on-board.

Diagram of the Jisc Basic Learning Analytics Solution presented by Paul Bailey and Michael Webb

Diagram of the Jisc Basic Learning Analytics Solution presented by Paul Bailey and Michael Webb

Paul: Tribal are one of the solutions, pretty much off the shelf stuff. Various components and we hope to role it out to about 15 institutions in the first year. The second option there will be the open solution, which is partly developed but needs further work. So the option will be to engage with either one of those solutions, or to engage with both perhaps.

The learning analytics processors will feed the staff dashboards, into a student consent service, and both of those will connect to the alert and intervention system. And there will be a Student App as well.

Michael: The idea is that all of the components are independent so you can buy one, or all of them, or the relevant parts of the service for you.

Paul: The student consent service is something we will develop in order to provide some sort of service to allow students to say what kinds of information can or cannot be shared (of available data from those systems that hold data on them). The alert and intervention system is an area that should grow quite a bit…

So, the main components are the learning records warehouse, the learning analytics processor – for procurement purposes the staff dashboard is part of that, and the student app. And once you have that learning records warehouse is there, you could build onto that, use your own system, use Tableau, etc.

Just to talk about the Discovery Phase, we hope to start that quite soon. The invitation will come out through the Jisc Analytics email list – so if you want to be involved, join that list. We are also setting up a questionnaire to collect readiness information and for institution to express interest. Then in the discovery process (June/July onward) there will be a select preferred approach for the discovery phase. This will be open to around 20 institutions. We have three organisations involved here: Blackboard; a company called DTP Solution Path (as used by Nottingham Trent); and UniCom. For the pilot (September(ish) onward) we have a select solution preference (Year 1-15 (proprietary – Tribal) and 15 open).

Niall: the code of practice is now a document of just more than two pages around complex legal and ethical issues. They can be blockages to move that forward… so this is an attempt to have an overview document to help institution to overcome those issues. We have a number of institutions who will be trialing this. That’s at draft stage right now, and with advisory group to suggest revisions. It is likely to launch by Jisc in June. Any additional issues are being reflected in a related set of online guidance documents.

Effective Learning Analytics project can be found: http://www.jisc.ac.uk/rd/projects/

Another network on 24th June at Nottingham Trent University. At that meeting we are looking to fund some small research type projects – there is an Ideascale page for that. About five ideas in the mix at the moment. Do add ideas (between now and Christmas) and do vote on those. There will be pitches there, for ones to take forward. And if you want funding to go to you as a sole trader rather than to a large institution, that can also happen.

Q&A

Q) Will the open solution be shared on something like GitHub so that people can join in

A) Yes.

Comment – Micheal: Earlier today people talked about data that is already available, that’s in the discovery phase when people will be on site for a day or up to a week in some cases. Also earlier on there was talk about data tracking, IP address etc, and the student consent system we have included is to get student buy-in for that process, so that you are legally covered for what you do as well. And there is a lot of focus on flagging issues, and intervention. The intervention tool is a really important part of this process, as you’ll have seen from our diagram.

For more information on the project see: http://analytics.jiscinvolve.org/wp/

Open Forum – input from participants, 15 min lightning talks.

Assessment and Learning Analytics – Prof Blazenka Divjak, University of Zagreb (currently visiting University of Edinburgh)

I have a background in work with a student body of 80,000 students, and use of learning analytics. And the main challenge I have found has been the management and cleansing of data. If you want to make decisions, learning analytics are not always suitable/in an appropriate state for this sort of use.

But I wanted to today about assessment. What underpins effective teaching? Well this relates to the subject, the teaching methods, the way in which students develop and learn (Calderhead, 1996), and awareness of the relationship between teaching and learning. Assessment is part of understanding that.

So I will talk to two case studies across courses using the same blended approach with open source tools (Moodle and connected tools).

One of these examples is Discrete Math with Graph Theory, a course for the Master of Informatics course with around 120 students and 3 teachers. This uses problem (authentic) posing and problem solving. We have assessment criteria and weighted rubrics (AHP method). So here learning analytics are used for identification of performance based on criteria. We also look at differences between groups (gender, previous study, etc.). Correlation of authentic problem solving with other elements of assessments – hugely important for future professional careers but not always what happens.

The other programme, Project Management for the Master of Entrepreneurship programme, has 60 students and 4 teachers. In this case project teams work on authentic tasks. Assessment criteria + weighted rubrics – integrated feedback. The course uses self-assessment, peer-assessment, and teacher assessment. Here the learning analytics are being used to assess consistency, validity, reliability of peer-assessment. Metrics here can include the geometry of learning analytics perhaps.

Looking at a graphic analysis of one of these courses shows how students are performing against criteria – for instance they are better at solving problems than posing problems. Students can also benchmark themselves against the group, and compare how they are doing.

The impact of student dashboards – Paula Smith, UoE

I’m going to talk to you about an online surgery course – the theory not the practical side of surgery I might add. The MSc in Surgical Sciences has been running since 2007 and is the largest of the medical distance learning programmes.

The concept of learning analytics may be relatively new but we have been interested in student engagement and participation, and how that can be tracked and acknowledged for a long time as it is part of what motivates students to engage. So I am going to be talking about how we use learning analytics to make an intervention but also to talk about action analytics – to make changes as well as interventions.

The process before the project I will talk about had students being tracked via an MCQ system – students would see a progress bar but staff could see more details. At the end of every year we would gather that data, and present a comparative picture so that students could see how they were performing compared to peers.

Our programmes all use bespoke platforms and that meant we could work with the developers to design measures on student engagement – for example number of posts. A crude way to motivate students. And that team also created activity patterns so we could understand the busier times – and it is a 24/7 programme. All of our students work full time in surgical teams so this course is an add on to that. We never felt a need to make this view available to students… this is a measure of activity but how does that relate  to learning? We need more tangible metrics.

So, in March last year I started a one day a week secondment for a while with Wilma Alexander and Mark Wetton at IS. That secondment has the objectives of creating a student “dashboard” which would allow students to monitor their progress in relation to peers; to use the dashboard to identify at-risk students for early interventions; and then evaluate what (if any) impact that intervention had.

So, we did see a correlation between in-course assessment and examination marks. The exam is 75-80% (was 80, now 75) in the first year. It is a heavily weighted component. You can do well in the exam, and get a distinction, with no in course work during the year. The in-course work is not compulsory but we want students to see the advantage of in course assessments. So, for the predictive modelling regression analysis revealed that only two components had any bearing on end of year marks, which were discussion board ratings, and exam performance (year 1); or exam performance (year 2). So, with that in mind we moved away from predictive models we decided to do a dashboard for students to present a snapshot of their progress against others’. And we wanted this to be simple to understand…

So, here it is… we are using Tableau to generate this. Here the individual student can see their own performance in yellow/orange and compare to the wider group (blue). The average is used to give a marker… If the average is good (in this example an essay has an average mark of 65%) that’s fine, if the average is poor (discussion board which are low weighted has an average of under 40, which is a fail at MSc level) that may be more problematic. So that data is provided with caveats.

Paula Smith shows visualisations created using Tableu

Paula Smith shows visualisations created using Tableu

This interface has been released – although my intervention is just an email which points to the dashboard and comments on performance. We have started evaluating it: the majority think it is helpful (either somewhat, or a lot). But worryingly a few have commented “no, unhelpful”, and we don’t know the reasons for that. But we have had positive comments on the whole. We asked about extra material for one part of the course. And we asked students how the data makes them feel… although the majority answered ‘interested’, ‘encouraged’, and ‘motivated’, one commented that they were apathetic about it – actually we only had a 15% response rate for this survey which suggests that apathy is widely felt.

Most students felt the dashboard provided feedback, which was useful. And the majority of students felt they would use the dashboard – mainly monthly or thereabouts.

I will be looking further at the data on student achievement and evaluating it over this summer, and should be written up at the end of the year. But I wanted to close with a quote from Li Yuan, at CETIS: “data, by itself, does not mean anything and it depends on human interpretation and intervention“.

Learning Analytics – Daley Davis, Altis Consulting (London) 

We are a consulting company and we are well established in Australia so I thought it would be relevant to talk about what we do there on learning analytics. Australia are ahead on learning analytics and that may well be because they brought in changes to funding fees in 2006 so they view students differently. They are particularly focused on retention. And I will talk about work we did with UNE (University of New England), a university with mainly online students and around 20,000 students in total. They wanted to improve student attrition. So we worked with them to set up a system for a student early alert system for identifying students at risk on disengaging. It used triggers of student interaction as predictors. And this work cut attrition from 18% to 12% and saving time and money for the organisation.

The way this worked was that students had an automated “wellness” engine, with data aggregated at school and head of school levels. And what happened was that staff were ringing students every day – finding out about problems with internet connections, issues at home etc. Some of these easily fixed or understood.

The system picked up data from their student record system, their student portal, and they also have a system called “e-motion” which asks students to indicate how they are feeling every day – four ratings and also a free text box (that we also mined).

Data was mined with weightings and a student who had previously failed a course, and a student who was very unhappy were both aspects weighted much more heavily. As was students not engaging for 40 days or more (versus other levels, weighted more lightly).

Daley Davis shows the weightings used in a Student Early Alert System at UNE

Daley Davis shows the weightings used in a Student Early Alert System at UNE

Universities are looking at what they already have, coming up with a technical roadmap. But they need to start with the questions you want to answer… What do your students want? What are your KPIs? And how can you measure those KPIs. So, if you are embarking on this process I would start with a plan for 3 years, toward your perfect situation, so you can then make your 1 year or shorter term plans in the direction of making that happen…

Niall: What I want you to do just now is to discuss the burning issues… and come up with a top three…

And [after coffee and KitKats] we are back to share our burning issues from all groups…

Group 1:

  • Making sure we start with questions first – don’t start with framework
  • Data protection and when you should seek consent
  • When to intervene – triage

Group 2:

  • How to decided on what questions to decide on, and what questions and data are important anyway?
  • Implementing analytics – institutional versus course level analytics? Both have strengths, both have risks/issues
  • And what metrics do you use, what are reliable…

Group 3:

  • Institutional readiness for making use of data
  • Staff readiness for making use of data
  • Making meaning from analytics… and how do we support and improve learning without always working on the basis of a deficit model.

Group 4:

  • Different issues for different cohorts – humanities versus medics in terms of aspirations and what they consider appropriate, e.g. for peer reviews. And undergrads/younger students versus say online distance postgrads in their careers already
  • Social media – ethics of using Facebook etc. in learning analytics, and issue of other discussions beyond institution
  • Can’t not interpret data just because there’s an issue you don’t want to deal with.

Group 5:

  • Using learning analytics at either end of the lifecycle
  • Ethics a big problem – might use analytics to recruits successful people; or to stream students/incentivise them into certain courses (both already happening in the US)
  • Lack of sponsorship from senior management
  • Essex found through last three student surveys that students do want analytics.

That issue of recruitment is a real ethical issue. This is something that arises in the Open University as they have an open access policy so to deny entrance because of likely drop out or likely performance would be an issue there… How did you resolve that?

Kevin, OU) We haven’t exactly cracked it. We are mainly using learning analytics to channel students into the right path for them – which may be about helping select the first courses to take, or whether to start with one of our open courses on Future Learn, etc.

Niall: Most universities already have entrance qualifications… A-Level or Higher or whatever… ethically how does that work

Kevin, OU) I understand that a lot of learning analytics is being applied in UCAS processes… they can assess the markers of success etc..

Comment, Wilma) I think  the thing about learning analytics is that predictive models can’t ethically applied to an individual…

Comment, Avril) But then there is also quite a lot of evidence that entry grades don’t necessarily predict performance.

Conclusions from the day and timetable for future events – Niall Sclator

Our next meeting will be in June in Nottingham and I hope we’ll see you then. We’ll have a speaker, via Skype, who works on learning analytics for Blackboard.

And with that, we are done with a really interesting day.

Jun 192013
 

Today EDINA is hosting a talk by Martin Hawksey on data visualisation. He has posted a whole blog post on this, which includes his slides, so I won’t be blogging verbatim but hoping to catch key aspects of his talk.

Martin will be talking about achievable and effective ways to visualise data. He’s starting with Jon Snow’s 1850s map of cholera deaths identifying the epicentre of the outbreak through maps of death. And on an information literacy note you do need to know how to find the story in the graphics. Visualisation takes data, takes stories, and turns them into something of a narrative, explaining and enabling others to explore that data.

Robin Wilton georeferenced that original Snow data then Simon Rodgers (formally of Guardian, latterly of twitter) put data into CartoDB. This re interpretation of the data really makes the infected pump jump out at you, the different ways of visualising that data make the story even clearer.

Not all visualisations work, you may need narration. Graphics may not be meaningful to all people in the same way. E.g. Location of the pumps on these two maps. So this is where we get into theory. Reptinsp, a French cartographer, came up with his own systems of points, lines, symbols etc. but not based on research etc, his own cheat system. If you look at Gestalt psychology you get more research based visualisatsions – laws of similarity, proximity, continuity. There is something natural about where the eye is drawn but there is theory behind that too.

Jon Snows map was about explaining and investigating the data. His maps were explanatory visualisation and we have that same idea in Simon Rodgers map but it is also an exploratory visualisation, the reader/viewer can interact and interrogate it. But there are limitations of both approaches. Within both maps it’s essentially a heat map, more of something (in this case deaths). And you see that in visualisations you often get heat maps that actually map population rather than trends. Tony Hirst says “all charts are lies”. They are always an interpretation of the data from the creator’s point of view…

So going back to Simon Rodgers map we see that the radius of a dots based on the number of deaths. Note from the crowd “how to lie with statistics”. Yes, a real issue is that a lot of the work to get to that map is hidden, lots of room for error and confusion.

So having flagged up some examples and pitfalls I want to move onto the process of making data visualisations. Tools include Excel, Carto GB, Gephi, IBM Many Eyes, etc. but in addition to those tools and services you can also draw. Even now so many visualisations are made via drawing, if only final tweaking. Sometimes a sketch of a visualisation is the way to prototype ideas too. There are also code options, D3JS, SigmaJS, R, GGplot, etc.

Some issues around data: data access can be an issue, hard to find, hard to identify source data etc. Tony Hirst really recommends digging around for feeds, for RSS, find the stuff that feeds and powers pages. There are tools for reshaping feeds and data. Places like Yahoo Pipes, which lets you do drag and drop programming with input data. And I’ve started touching upon data shapes. Data may be provided in certain ways or shapes, but it may not suit your use. So a core skill is the transformation of data to reshape data, tools like Yahoo Pipes, Open Refine – which also lets you clean up data as well. I’ve tried Open Refine with public Jiscmail lists, to normalise for those with multiple user names.

So now the fun stuff…

For the Olympics last year for the cultural Olympiad last yer in Scotland we had the #citizenrelay tracking the progress of The Olympic torch. So lots of data to play with. First talk twitter (Topsy) media timeline. Uses Timeline by verity plus Topsy data. This was really easy to do. So data access was using Topsy, it pulls in data from Twitter to make its own archive. Has API to allow data. Make it easy to query for media against a hashtag. Can return data in XML but grabbed in Jason. Then output created with timelineJS. You can also use google spreadsheet template from timelineJS template (manually or automatically). Used spreadsheet her, yahoo pipes to manipulate. Can pull data in with google spreadsheets, when you’ve created the formula it will constantly refresh and update. So self updates when published.

Originally Topsy allowed data access without API key but now they require it. Google app script, JavaScript based – see big Stack Overflow community – has similar curl function for fetching URLs and dumping back into spreadsheet. Have also done this with yahoo pipes (use
Rate module for API key aspect).

Next as the relay went around the country they used Audioboo. When you upload AudioBoo geolocates your Boos. So AudioBoo has an API (without key) and you can filter for a tag. You can get the data out in XML, JSON and CSV option but they also produce KML. If you can access a public KML file and paste into Google Maps search box then it just gives you the map. Can then embed, or share link to that file. So super easy visualisation there. But disappointingly didn’t embed audio in the map pins. But that’s a google map limitation. Google Earth does let you do that though…

So using Google Earth we only have a bit of work to do. We need t work out the embed code. So Google now provides a template that lets you bring in placemark data (place marker templates). You can easily make changes here. And you can choose how to format variables. Yu can fill in manually but can also be automatically done SL use Google AppScript here. I go to AudioBoo API, grabs as JSON, then parses it. Then for each item push to spreadsheet. So for partial Geodata these Google templates are really useful.something else to mention: Google Spreadsheets are great, sit in the cloud. But recently was using Kasabi and it went down… And everything relying on it went live. Sometimes useful to take a flat capture as spreadsheet for back up.

So the next visualisation… Used NodeXL (SNA). This is an open source plug in for excel. It has a umber of data importers, including for twitter, Facebook, media wiki, etc. just from the menu). And it has lots of room for reformatting etc. then a grid view from that.

And this is where we start chaining tools together. So I had twitter data, I had NodeXL to identify community (who follows who, who is friends with who) so used Gephi, which lets you start using network graphs. A great way to see how nodes relate to which other. Often using for Social Network Analysis but people have also used it for cocktail recipes (there’s an academic paper on it). There is a recipe site that lets you reform recipes using same approach. Gephi is another tool.. You spend an hour playing… And then wonder about how to convey to others and you can end up with flat graphic. So I created something called TAGS Explorere to let anyone interact – and there are others who have done similar.

Another example here. A network of those using #ukoer hashtag and looking for bridges in the community, the key people. This is an early visualisation I created. It was generated From twitter connections and tag use with Gephi, but then combined and finished in a drawing package.

This is another example looking at different sources. A bubble chart for click throughs of tweets. Man get a degree of that info from bit.ly. But if you use another service it’s hard to get click through however can see referrals in Google Analytics – each twitter URL is unique to each person who tweets it so you can therefore see click through rate for an individual tweet. This is created in google spreadsheet. An explore interactively, reshape for your own exploration. So this spreadsheet goes and uses google analytics API and Twitter API then combines with some reshaping. One thing to be aware of is that spreadsheets have a duality of value and formulae. So when you call on APIs etc. it can get confusing. So sometimes good to use two sheets, second flr manipulaton. There’s a great blog post on this duality – “spreadsheet addiction”. if you are at IWMW next week I’m doing a whole session at Google Analytics data and reshaping.

Q&A

Comment: study/working group on social network analysis, some of these techniques could be buildpt onto our community of expertise here.

Comment: would have to slow way down for me but hopefully we can devise materials and workshops to make these step by step.

Martin: But there are some really easy wins, like that Google Maps one. And there is a good community of support around these tools. But for instance R, if I ask on Stack Overflow then I will get an answer back.

Q) is there a risk that if you start trying to visualise data you might miss out on proper statistical processes and vigour?

Martin: yes, that is a risk. People tend to be specialists in one area rather than all of them. Manchester Metroplitan use R as part of analysis of student surveys, recruitment etc. this was from an idea of Mark Stubbs, head of eLearning, raised by speaking to specialist in Teridon flight. r is wily used in the sciences and increasingly in big data analysis. So there it started with expert who did know what he was doing.

Q: have you done much with data mining or analysis, like Google N Gram?

Martin: not really. Done some work on sentiment analysis and social network data though.

 June 19, 2013  Posted by at 3:34 pm Events Attended, LiveBlogs Tagged with: ,  No Responses »
May 102011
 

This weekend my colleague Gavin and I decided it would be useful (and fun!) to head along to Culture Hack Scotland, a 24 hour hackday organised by the Edinburgh Festivals Innovation Lab and themed around both the festivals and the wider Scottish cultural scene.

The Edinburgh Festivals Innovation Lab is a new(ish) initiative which has emerged from Edinburgh Festivals, the organisation that is jointly funded by all 12 of the official Edinburgh Festivals to enable them to work together throughout the year, promote initiatives and festival content etc. The idea for the Innovation Lab apparently emerged out of discussions with all of the festivals about their use or interest in digital technology: there were lots of ideas and potential for projects but they didn’t necessarily have the time or skills to take these forward. Last year the Lab hired their inaugal geek-in-residence Ben Werdmuller (he of Elgg fame) and the Culture Hack Day was a significant outcome of the work he has been doing over the last few months. Continue reading »