Mar 232018

Today I am back at the Data Fest Data Summit 2018, for the second day. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to Data Summit Day 2 – Maggie Philbin

We’ve just opened with a video on Ecometrica and their Data Lab supported work on calculating water footprints. 

I’d like to start by thanking our sponsors, who make this possible. And also I wanted to ask you about your highlights from yesterday. These include Eddie Copeland from Nesta’s talk, discussion of small data, etc. 

Data Science for Societal Good — Who? What? Why? How? –  Kirk Borne, Principal Data Scientist and Executive Advisor, Booz Allen Hamilton

Data science has a huge impact for the business world, but also for societal good. I wanted to talk about the 5 i’s of data science for social good:

  1. Interest
  2. Insight
  3. Inspiration
  4. Innovation
  5. Ignition

So, the number one, is the Interest. The data can attrat people to engage with a problem. Everything we do is digital now. And all this information is useful for something. No matter what your passion, you can follow this as a data scientist. I wanted to give an example here… My background is astrophysics and I love teaching people about the world, but my day job has always been other things. About 20 years ago I was working in data science at NASA and we saw an astronomical – and I mean it, we were NASA – growth in data. And we weren’t sure what to do with it, and a colleague told me about data mining. It seemed interesting but I just wasn’t getting what the deal was. We had a lunch talk from a professor at Stanford, and she came in and filled the board with equations… She was talking about the work they were doing at IBM in New York. And then she said “and now I’m going to tell you about our summer school” – where they take kids from inner city kids who aren’t interested in school, and teach them data science. Deafening silence from the audience… And she said “yes, we teach the staff data mining in the context of what means most for these students, what matters most. And she explained: street basketball. So IBM was working on a software called IBM Advanced Calc specifically predicting basketball strategy. And the kids loved basketball enough that they really wanted to work in math and science… And I loved that, but what she said next changed my life.

My PhD research was on colliding galaxy. It was so exciting… I loved teaching and I was so impressed with what she had done. These kids she was working with had peer pressure not to be academic, not to study. This school had a graduation rate of less than 50%. Their mark of success for their students was their graduation rate – of 98%. I was moved by that. I felt that if this data science has this much power to change lives, that’s what I want to do for the rest of my lives. So my life, and those of my peers, has been driven by passion. My career has been as much about promoting data literacy as anything else.

So, secondly, we have insight. Traditionally we collect some data points but we don’t share this data, we are not combining the signals… Insight comes from integrating all the different signals in the system. That’s another reason for applying data to societal good, to gain understanding. For example, at NASA, we looked at what could be combined to understand environmental science, and all the many applications, services and knowledge that could be delivered and drive insight from the data.

Number three on this list is Inspiration. Inspiration, passion, purpose, curiousity, these motivate people. Hackathons, when they are good, are all about that. When I was teaching the group projects where the team was all the same, did the worst and least interestingly. When the team is diverse in the widest sense – people who know nothing about Python, R, etc. can bring real insights. So, for example my company run the “Data Science Bowl” and we tackle topics like Ocean Health, Heart Health, Lung Cancer, drug discovery. There are prizes for the top ten teams, this year there is a huge computing prize as well as a cash prize. The winners of our Heart Health challenge were two Wall Street Quants – they knew math! Get involved!

Next, innovation. Discovering new solutions and new questions. Generating new questions is hugely exciting. Think about the art of the possible. The XYZ of Data Science Innovation is about precision data, precision for personalised medicine, etc.

And fifth, ignition. Be the spark. My career came out of looking through a telescope back when I lived in Yorkshire as a kid. My career has changed, but I’ve always been a scientist. That spark can create change, can change the world. And big data, IoT and data scientists are partners in sustainability. How can we use these approaches to address the 17 Sustainability Development Goals. And there are 229 Key Performers Indicators to measure performance – get involved. We can do this!

So, those are the five i’s. And I’d like to encapsulate this with the words of a poet…. Data scientists – and that’s you even if you don’t think you are one yet. You come out of the womb asking questions of the world. Humans do this, we are curious creatures… That’s why we have that data in the first place! We naturally do this!

“If you want to build a ship, don’t drum up people to gather wood adn don’t assign them tasks and work, but rather teach them to yearn for the vast and endless sea”

– Antoine de Saint-Exupery.

This is what happened with those kids. Teach people to yearn for the vast and endless sea, then you’ll get the work done. Then we’ll do the hard work

Slides are available here:


Comment, Maggie Philbin) I run an organisations, Teen Tech, and that point that you are making of start where the passion actually is, is so important.

KB) People ask me about starting in data science, and I tell them that you need to think about your life, what you are passionate about and what will fuel and drive you for the rest of your life. And that is the most important thing.

Q1) You touched on a number of projects, which is most exciting?

A1) That’s really hard, but I think the Data Bowl is the most exciting thing. A few years back we had a challenge looking at how fast you can measure “heart ejection fraction – how fast the heart pumps blood out” but the way that is done, by specialists, could take weeks. Now that analysis is built into the MRI process and you can instantly re-scan if needed. Now I’m an astronomer but I get invited to weird places… And I was speaking to a conference of cardiac specialists. A few weeks before my doctor diagnosed me with a heart issue…. And that it would take a month to know for sure. I only got a text giving me the all clear just before I was about to give that talk. I just leapt onto that stage to give that presentation.

The Art Of The Practical: Making AI Real – Iain Brown, Lead Data Scientist, SAS

I want to talk about AI and how it can actually be useful – because it’s not the answer to everything. I work at SAS, and I’m also a lecturer at Southampton University, and in both roles look at how we can use machine learning, deep learning, AI in practical useful ways.

We have the potential for using AI tools for good, to improve our lives – many of us will have an Alexa for instance – but we have to feel comfortable sharing our data. We have smart machines. We have AI revolutionising how we interact with society. We have a new landscape which isn’t about one new system, but a whole network of systems to solve problems. Data is a selleble asset – there is a massive competitive advantage in storing data about customers. But especially with GDPR, how is our data going to be shared with organisations, and others. That matters for individuals, but also for organisations. As data scientists there is the “can” – how can the data be used; and the “should” – how should the data be used. We need to understand the reasons and value of using data, and how we might do that.

I’m going to talk about some exampes here, but I wanted to give an overview too. We’ve had neural networks for some time – AI isn’t new but dates back to the 1950s. .Machine learning came in in the 1980s, deep learning in the 2010s, and cognitive computing now. We’ve also had Moore’s Law changing what is theoretically possible but also what is practically feasible over that time. And that brings us to a definition “Artificial Intelligence is the science of training systems to emulate human tasks through learning and automation”. That’s my definition, you may have your own. But it’s about generating understanding from data, that’s how AI makes a difference. And they have to help the decision making process. That has to be something we can utilise.

Automation of process through AI is about listening and sensing, about understanding – that can be machine generated but it will have human involvement – and that leads to an action being made. For instance we are all familiar with taking a picture, and that can be looked at and understood. For instance with a bank you might take an image of paperwork and passports… Some large banks check validity of clients with a big book of pictures of blacklisted people… Wouldn’t it be better to use systems to achieve that. Or it could be a loan application or contract – they use application scorecards. The issue here is interpretability – if we make decisions we need to know why and the process has to be transparent so the client understands why they might have been rejected. You also see this in retail… Everything is about the segment of one. We all want to be treated as individuals… How does that work when you are one of millions of individuals. What is the next thing you want? What is the next thing you want to click on? Shop Directory, for instance, have huge ranges of products on their website. They have probably 500 pairs of jeans… Wouldn’t it be better to apply their knowledge of me to filter and tailor what I see? Another example is the customer complaint on webchat. You want to understand what has gone wrong. And you want to intervene – you may even want to do that before they complain at all. And then you can offer an apology.

There are lots of applications for AI across the board. So we are supporting our customers on the factors that will make them successful in AI, data, compute, skillset. And we embed AI in our own solutions, making them more effective and enhancing user experience. Doing that allows you to begin to predict what else might be looked at, based on what you are already seeing. We also provide our customers with extensible capabilities to help them meet their own AI goals. You’ll be aware of Alpha Go, it only works for one game, and that’s a key thing… AI has to be tailored to specific problems and questions.

For instance we are working on a system looking at optimising the experience of watching sports, eliminating the manual process of tagging in a game. This isn’t just in sport, we are also working in medicine and in lung cancer, applying AI in similar 3D imaging ways. When these images can be shared across organisations, you can start to drive insights and anomalies. It’s about collaborating, bringing data from different areas, places where an issue may exist. And that has social benefit of all of us. Another fun example – with something like wargaming you can understand the gamer, the improvements in gameplay, ways to improve the mechanics of how game play actually works. It has to be an intrinsic and extrinsic agreement to use that data to make that improvement.

If you look at a car insurer and the process and stream of that, that’s typically through a call centre. But what if you take a picture of the car as a way to quickly assess whether that claim will be worth making, and how best to handle that claim.

I value the application, the ways to bring AI into real life. How we make our experiences better. It’s been attributed to Voltaire, and also to Spiderman, that “with great power comes great responsibility”. I’d say “with great data power comes great responsibility” and that we should focus on the “should” not the “could”.


Comment) A correction on Alpha Go: Alpha Zero plays Chess etc. It’s without any further human interaction or change.

Q1) There is this massive opportunity for collaboration in Scotland. What would SAS like to see happen, and how would you like to see people working together?

A1) I think collaboration through industry, alongside academia. Kirk made some great points about not focusing on the same perspectives but on the real needs and interest. Work can be siloed but we do need to collaborate. Hack events are great for that, and that’s where the true innovation can come from.

Q2) What about this conference in 5 years time?

A2) That’s a huge question. All sorts of things may happen, but that’s the excitement of data science.

Socially Minded Data Science And The Importance Of Public Benefits – Mhairi Aitken, Research Fellow, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh

I have been working in data science and public engagement around data and data science for about eight years and things have changed enormously in that time. People used to think about data as something very far from their everyday lives. But things have really changed, and people are aware and interested in data in their lives. And now when I hold public events around data, people are keen to come and they mention data before I do. They think about the data on their phones, the data they share, supermarket loyalty cards. These may sound trivial but I think they are really important. In my work I see how these changes are making real differences, and differences in expectations of data use – that it should be used ethically and appropriately but also that it will be used.

Public engagement with data and data science has always been important but it’s now much easier to do. And there is much more interest from funders for public engagement. That is partly reflecting the press coverage and public response to previous data projects, particularly NHS data work with the private sector. Public engagement helps address concerns and avoid negative coverage, and to understand their preferences. But we can be even more positive with our public engagement, using it to properly understand how people feel about their data and how it is used.

In 2016 myself and colleagues undertook a systematic review of public responses to sharing and linking of health data for research purposes (Aitken, M et al 2016 in BMC medical ethics, 17 (1)). That work found that people need to understand how data will be used, they particularly need to understand that there will be public benefit from their data. In addition to safeguards, secure handling, and a sense of control, they still have to be confident that their data will be used for public benefits. They are even supportive if the benefit is clear but those other factors are faulty. Trust is core to this. It is fundamental to think about how we earn public trust, and what trust in data science means.

Public trust is easy to define. But what about “public benefit”. Often when people call about data and benefits from data. People will talk about things like Tesco Clubcard when they think of benefit from data – there is a direct tangible benefit there in the form of vouchers. But what is the public benefit in a broader and less direct sense. When we ask about public benefit in the data science community we often talk about economic benefits to society through creating new data-driven innovation. But that’s not what the public think about. For the public it can be things like improvements to public services. In data-intensive health research there is an expectation of data learning to new cures or treatments. Or that there might be feedback to individuals about their own conditions or lifestyles. But there may be undefined or unpredictable potential benefits to the public – it’s important not to define the benefits too narrowly, but still to recognise that there will be some.

But who is the “public” that should benefit from data science? Is that everyone? Is it local? National? Global? It may be as many as possible but what is possible and practical? Everyone whose data is used? That may not be possible. Perhaps vulnerable or disadvantaged groups? Is it a small benefit for many, or a large benefit for a small group.  Those who may benefit most? Those who may benefit the least? The answers will be different for different data science projects. That will vary for different members of the public. But if we only have these conversations within the data science community we’ll only see certain answers, we won’t hear from groups without a voice. We need to engage the public more with our data science projects.

So, closing throughts… We need to maintain a social license for data science practices and that means continual reflection on the conditions for public support. Trust is fundamental – we don’t need to make the public trust us, we have to actually be trustworthy and that means listening, understanding and responding to concerns, and being trustworthy in our use of data. Key to this is finding public benefits of data science projects. In particular we need to think about who benefits from data science and how benefits can be maximised across society. Data scientists are good at answering questions of what can be done but we need to be focusing on what should be done and what is beneficial to do.


Q1) How does private industry make sure we don’t leave people behind?

A1) BE really proactive about engaging people, rather than waiting for an issue to occur. Finding ways to get people interested. Making it clear what the benefits are to peoples lives There can be cautiousness about opening up debate being a way to open up risk. But actually we have to have those conversations and open up the debate, and learn form that.

Q2) How do we put in enough safeguards that people understand what they consent to, without giving them too much information or scaring them off with 70 checkboxes.

A2) It is a really interesting question of consent. Public engagement can help us understand that, and guide us around how people want to consent, and what they want to know. We are trying to answer questions where we don’t always have the answers – we have to understand what people need by asking them and engaging them.

Q3) Many in the data community are keen to crack on but feel inhibited. How do we take the work you are doing and move sooner rather than later.

A3) It is about how we design data science projects. You do need to take the time first to engage with the public. It’s very practical and valuable to do at the beginning, rather than waiting until we are further down the line…

Q3) I would agree with that… We need to do that sooner rather than later rather than being delayed deciding what to do.

Q4) You talked about concerns and preferences – what are key concerns?

A4) Things you would expect on confidentiality, privacy, how they are informed. But also what is the outcome of the project – is it beneficial or could they be discriminatory, or have a negative impact on society? It comes back to causing public benefits – they want to see outcomes and impact of a piece of work.


Automated Machine learning Using H2O’s Driverless AI – Marios Michailidis, Research Data Scientist,

I wanted to start with some of my own background. And I wanted to talk a bit about Kaggle. It is the world’s biggest preictive modelling competition platform with more than a million members. Companies host data challenges and competitors from across the world compete to solve them for prizes. Prizes can be monetary, or participation in conferences, or you might be hired by companies. And it’s a bit like Tennis – you gain points and go up in the ranking. And I was able to be ranked #1 out of a half million members t here.

So, a typical problem is image classification. Can I tell a cat from a dog from an image. That’s very doable, you can get over 95% accuracy and you can do that with deep learning and neural net. And you differentiate and classify features to enable that decision. Similarly a typical problem may be classifying different bird song from a sound recording – also very solvable. You also see a lot of text classification problems… And you can identify texts from a particular writers by their style and vocabulary (e.g. Voltaire vs Moliere). And you see sentiment analysis problems – particularly for marketing or social media use.

To win these competitions you need to understand the problem, and the metric you are being tested on. For instance there was an insurance problem where most customers were renewing, so there was more value in splitting the problem into two – one for renewals, and then a model for others. You have to have a solid testing procedure – really strong validation environment that reflects what you are being tested on. So if you are being tested on predictions for 3 months in the future, you need to test with past data, or test that the prediction is working to have the confidence that what you do will be appropriately generalisable.

You need to handle the data well. Your preprocessing, your feature engineering, which will let you get the most out of your modelling. You also need to know the problem-specific elements and algorithms. You need to know what works well. But you can look back for information to inform that. You of course need access to the right tools – the updated and latest software for best accuracy. You have to think about the hours you put in and how you optimize them. When I was #1 I was working 60 hours on top of my day job!

Collaborate – data science is a team sport! It’s not just about splitting the work across specialisms, it’s about uncovering new insights by sharing different approaches. You gain experience over time, and that lets you focus your efforts on where you can focus your effort for the best gain. And then use ensembling – combine the methods optimally for the best performance. And you can automate that…

And that brings us to H2O’s diverless AI which automates AI. It’s an AI that creates AI. It is built by a group of leading machine learning engineers, academics, data scientists, and kaggle Grandmasters. It handles data cleaning and feature engineering. It uses cutting edge machine learning algorithms. And it optimises and combines them. And this is all through a hypothesis testing driven approach. And that is so important as if I try a new feature or a new algorithm, I need to test it… And you can exhaustively find the best transformations and algorithms for your data. This allows solving of many machine learning tasks, and it is all in parallel to make it very fast.

So, how does it work? Well you have some input data and you have a target variable. You set an objective or success metric. And then you need some allocated computing power (CPU or GPU). Then you press a button and H2O driverless AI will explore the data, it will try things out, it will provide some predictions and model interpretability. You get a lot of insight including most predictive insights. And the other thing is that you can do feature engineering, you can extract this pipeline, these feature transformations, then use with your own modelling.

Now, I have a minute long demo here…. where you upload data, and various features and algorithms are being tried, and you can see the most important features… Then you can export the scoring pipeline etc.

This work has been awarded Technology of the Year by InfoWorld, it has been featured in the Gartner report.

You can find out more on our website: and there is lots of transparency about how this work, how the model performs etc. You can download a free trial for 3 weeks.


Q1) Do you provide information on the machine learning models as well?

A1) Once we finish with the score, we build the second model which is simple to predict that score. The focus on that is to explain why we have shown this score. And you can see why you have this score with this model… That second interpretability model is slightly less automated. But I encourage others to look online for similar – this is one surrogate model.

Q2) Can I reproduce the results from H2O?

A2) Yes. You can download the scoring practice, it will generate the code and environment to replicate this, see all the models, the data generated, and you can run that script locally yourself – it’s mainly Python.

Q3) That’s stuff is insane – probably very dangerous in the hands of someone just learning about machine learning! I’d be tempted to throw data in… What’s the feedback that helps you learn?

A3) There is a lot of feedback and also a lot of warning – so if test data doesn’t look enough like training data for instance. But the software itself is not educational on it’s own – you’d need to see webinars, look at online materials but then you should be in a good position to learn what it is doing and how.

Q4) You talked about feature selection and feature engineering. How robust is that?

A4) It is all based on hypothesis testing. But you can’t test everything without huge compute power. But we have a genetic algorithm to generate combinations of features, tests them, and then tries something else if that isn’t working.

Q5) Can you output as a model as eg a deserialised JSON object? Or use as an API?

A5) We have various outputs but not JSON. Best to look on the website as we have various ways to do these things.


Innovation Showcase

This next session showcases innovation in startups. 

Matt Jewell, R&D Engineer, Amiqus

I’m an R&D Engineer at Amiqus, and also a PhD student in Law at Edinburgh University. Firstly I want to talk about Amiqus, and our mission is to make civil justice accessible to the world. And we are engaged in GDPR as a data controller, but also as a trust and identity provider – where GDPR is an opportunity for us. We created amiqusID to enable people to more easily interact with the law – with data from companies house, driving licenses, etc.

As a PhD student in law there is some overlap in my job and my PhD research, and I was asked about in data ethics. So I wanted to note GDOR Article 22 (3) which states that

“the data controller shall implement suitable measures to safeguard the data subject’s rights and frredoms and legitimate interests, at least the right to obtain human intervention on he part of the controller, to express his or her point of view and to the contest the decision.”

And that’s across the board. GDPR recommits us to privacy, but also embeds privacy as a public good. And we have to think about what that means in our own best practices, because our own practices will shape what happens – especially as GDPR is still quite uncertain, still untested in law.

Carlos Labra, CEO & Co-Founder, Particle Analytics

I come from a mechanical engineering background, so this work is about simulation. And specifically we look at fluids simulation in aircraft. Actually particle simulation is the next step in industry, and that’s because it has been incredibly difficult to do this simulation with computers. We can do basic computer models for large scale materials but not appropriate for particles. So in Particle Analytics we are trying to address this challenge.

So, a single simulation for a silo, and my model for a silo, has to calculate the interactions between every single particle (in the order of millions), in very small time intervals. That takes huge computing power. So for instance one of our clients, Astec, works on asphalt dryer/mixer technology and we are using particle analytics to enable them to establish and achieve new energy-based KPIs (Key Performance Indicators) that could make enormous savings per machine per year, purely by optimising to different analytics.

So we look at spatial/temporal filters, multiscale analysis, and reduce data size/noise. The Data operators generate new insights and KPIs. So the cost of simulation is going down, and the insights are increased.

Steven Revill, CEO & Co-Founder, Urbantide

I’m here to talk to you about our platform USmart which is making smart data. How do we do this? Well, when we started a few years ago we recognised that our businesses, organisations, and places, would be helped by artificial intelligence based on data. That requires increased collaboration around data and increasing reuse of data. Too often data is in silos, and we need to break it out and share it. But we also need to be looking at real time data from IoT devices.

So, our solution is USmart. It collects data from any source in real time, and we create value with automatic data pipelines with analytics, visualisation and AI ready. And that enables collaboration – either with partners in a closed way, or as open data.

So, I want to talk about some case studies. Firstly Smartline, which is taking housing data to identify people at risk of, or in, fuel poverty. We have 80m data points so far, and we expect to reach up to 700m+ soon. This data set is open and when it goes live we think it will be the biggest open data set in the UK.

Cycling Scotland is showing the true state of cycling, helping them to make their case for funding and gain insght.

And we are working with North Lanarkshire Council on business rates, which could lead to saving of £18k per annum, but can also identify incorrect rates of £!00k+ value.

If you want to find out more do come and talk to me, take a look at USmart, and join the USmart community.

Martina Pugliese, Data Science Lead, Mallzee

I am data science lead for Mallzee – proudly established and run from Edinburgh. Mallzee is an app for clothes, allowing you to like or dislike a product. We show you 150+ brands. We’ve had 1.4m downloads, 500m ratings on products, 3m products rated. The app allows you to explore products, but it also acts as a data collection method for us and for our B2B offering to retailers. So we allow you to product test, very swiftly, your products before they hit the market.

Why do this? Well there are challenges that are two sides of the same coin: Overstock where you have to discount and waste money; and Understock where you have too little of the best stock and that means you don’t have tine to make the best return on your products.

As well as gathering data, we also monitor the market for trends in pricing, discounting, something new happening… So for instance only 50.8% of new products last quarter were sold at full price. We work to help design, buying and merchandising teams improve this rate by 6-10% through customer feedback.

So, data is our backbone. For the consumer we enable discovery, we personalise the tool to you – it should save you time and money. At the same time the data also enables performance prediction. We have granular user segmentation. And it goes back to you – the best products go on the market. And long term that should have a positive environmental impact in reducing waste.

Maggie Philbin: Thank you. I’m going to ask you to feedback on each others ideas and work.

Carlos: I’m new to the data science world, so for me I need to learn more – and these presentations are so useful for that.

Martina: This is really useful for me, and great to see that lots of different things going on.

Matt: My work focuses on smart cities, so naturally interested in Steven’s presentation. Less keen on problematising the city.

Steven: Really interesting to discuss things backstage, but also exciting to hear Martina talking about how central data is for your business right now.

Maggie: And that is part of the wonderful things about being at Data Fest, that opportunity to learn from and hear from each other, to network and share.

We are back from lunch with a video on work in the Highlands and Islands using ambient technologies to predict likelihood of falls etc. 

Transforming Sectors With Data-Enabled Innovation – Orsola De Marco, Head of Startups, Open Data Institute

I’m going to talk about transforming sectors with data. The ODI, founded by Tim Berners-Lee and Nigel Shadbolt, focuses on data and what data enables.We think about data as infrastructure. If you think of data as roads you see that the number of roads do not matter as much as how they are connected… In the context of data we need data that can be combined, that is structured for connection and combination. And we look at data through open data and open innovation. What the ODI’s work has in common is that open innovation is at the core. This is not just about innovating, but also about making your organisation more porous, bringing in the outside. And I love the phrase “if you are the smartest person in the room, then you are in the wrong room”, because so often innovation comes from collaboration and from the outside.

Open innovation has huge potential value. McKinsey in 2013 predicted $3-5 trillian impact of open data; Lateral Economics (2014) puts that at more like $20 tn.

When we talk about open innovation and collaboration, we can talk about the corporate-startup marriage. We used to see linear solution having good returns, but that is no longer the case. Problems are now much more complex, and startups are great at innovation, at thinking laterally, at finding new approaches. But corporates have scale, they have reach, and they have knowledge of their industries and markets. If you bring these two together, it’s clear you can bring a good opportunity to live.

As example I wanted to share here is Transport for London who wanted to release open data to enable startups and SMEs to use it. CityMapper is one of the best known of these tools built on the data. Last year, after several years of open data, they commissioned a Deloitte report (2017) that this release had generated huge savings for TfL.

Another example is Arup. Historically their innovation had been taking place in house. They embraced a more open approach, and worked with two of our start ups Macedon C and Smart Sensors. Macedon C helped Arup explore airport data so that Arup didn’t need to do that processing. Smart Sensors installed 200 IoT sensors, sharing approaches to those sensors, what it means to implement IoT in buildings, how they could use this technology. And they rolled them out to some of their services.

Those are some examples. We’ve worked with 120 startups across the world. And they have generated over £37.2M in sales and investment. These are real businesses bringing real value – not just a guy in a shed. The major challenge is on the supply side of the data. A lot of companies are reluctant to share, mentioning three blockers: (1) it feels very risky to open data up – that issue feels highly relevant this week; (2) its expensive to do especially if you don’t know the value coming back; (3) perceived lack of data literacy and skills. Those are all important… But if you lead and innovate, you get to set the tone for innovation in your sector.

The idea of disruption is raised a lot, but it is real. But to actually disrupt you do really need a culture of open innovation is essential to lead. It needs to be brought in at senior level and brought into the sector.

Data infrastructure can transform sectors. And joining forces between data suppliers and users are important there. For instance we are working on a project called Open Active, with Sport England. A lack of information on what was going on in different areas was an issue for people getting active. We were involved at the outset and could see that data was the blocker here… If you tried to aggregate information it was impossible. So, in the first year of the programme we brought providers into the room, agreed an open standard, and that enabled aggregation of data. We are now in the second phase and, now that the data is consistent and available, we are bringing start ups in to engage and do things with that data. And those start ups aren’t all in sports, some are in healthcare sector – using sports data to augment information shared by medics. And from leisure companies helping individuals to find things to do with their spare time.

Another example is the Open Banking sector. Over 60% of UK banking customers haven’t changed their bank account in 5 years. And many of those haven’t changed them in 20 years. So this initiative enables customers to grant secure access to your banking details for e.g. mortgage lenders, or to enable marketplaces to offer energy switching companies. Our experience in this programme was to facilitate these banks, and took that experience of data portability… And now we are working with Mexico on a FinTech law that requires all banks to have an open API.

In order to innovate in sectors it’s important to widen access to data. This doesn’t mean not taking data privacy seriously, or losing competitive advantage.

And I wanted to highlight a very local programme. Last year we began a project in the peer to peer accommodation market. The Scottish expert advisory panel noted that whilst a lot of data is generated, no real work is looking at the impact of the sharing economy in accommodation. That understanding will enable policy decisions tied to real concerns. We will be making recommendations on this very soon. If you are interested, do get in touch and be part of this.


Q1) You talked a lot about the value of data. How do you measure that economic value like that?

A1) We base value on sales and investment generated, and/or time or money saves in processes. It’s not an exact science but it looks for changes to the status quo.

Q2) What is the most important and valuable thing from your experience here?

A2) I think I’ll approach that answer in two ways. We do innovate work with data but we often facilitate conversations between data provider and start ups. For making data available we remove those blockers; for start ups it’s helping that facilitate those conversations, it’s helping them grow and develop and tailoring that support.

Q3) What next?

A3) Our model is a sector transformation model. We talk to a sector about sharing and opening up, and then we have start ups in an accelerator so that data will find a use. That’s a huge difference from just publishing the data and wondering what will happen to it.

Designing Things with Spending Power – Chris Speed, Chair of Design Informatics, University of Edinburgh

I have a fantastic team of designers and developers, and brilliant students who ask questions, including what things will be like in Tomorrow’s World!  We look at all kinds of factors here around data. So I want to credit that team.

Many of you in the room will be aware that data is about value constellations, rather than value chains. These are complex markets, many players – which may be humans but also which may be bots. That changes our capacity to construct value, since we have agents that construct value. And so I will talk about four objects to look at the disruption that can be made, and what that might mean, especially as they gain agency, to gain power. One of the things we thought was, what happens when we give things spending power.

See diagram from Rand organisation comparing centralised with decentralised and distributed – we see this model again and again… But things drift back occasionally (there’s only one internet banking platform now, right?). I’m going to show this 2014 bitcoin blockchain transaction video – they move too fast to screengrab these days! So… what happens when we have distributed machines with spending power? And when transactions go down to absolutely tiny transactions and amount of money.

So, we run BlockExchange workshops, with lego, to work on the idea of blockchain, what it means to be a distributed transaction system.

Next we have the fun stuff… What happens when we have things like Ethereum… And smart contracts. What could you do with digital wallets. If the UN gives someone a digital password, do they need sovereignty. So, we undertake bodily experiments with this stuff. We ran a physical experiment – body storming – with bitcoin wallets and smart contracts… A bit like Pokemon Go but with cash – if you hit a hotspot the smart contract assigns you money, Or when you enter a sink, you lose bitcoin. So, here is video of our GeoCoin app and also an experiment running in Tel Aviv.

These three banking volunteers design to design a new type of cinema experience… They enter the cinema by watching two trailers that are pickupable in the street… Another colleague decides not to do this… They gain credit by tweeting about trailers… bodystorming allows new ideas to be developed (confusingly, there is no cinema… This is, er, a cinema of the mind – right Chris?). 

Next we have a machine with a bitcoin wallet. Programmable money allows us to give machines buying power… Blockchain changes the history to things, adding value to value… So, we set up a coffee machine Bitbarista, with an interface that asks the coffee drinker to make decisions about what kind of coffee they want, what values matter… Mediating the space between values and value.

We have hairdryers – these are new and have just gone to the Policy Unit this week. We have Gigbliss Plus hairdryer… That allows you to buy and trade energy and to dry your hair when energy is cheaper… What happens when you do involve the public in balancing energu. And we have another hairdryer… That asks whether you want unethical energy now, or whether you want to wait for an ethical source – the hairdryer switches on accordingly. And then we have Gigbliss Auto, which has no buttons. You don’t have control, only the bitcoin wallet has decision powers… You don’t know when it comes on… But it will. But it changes control. Of those three hairdryers, which are we happy to move to… Where do we feel happy here.

And then we have KASH cups, with chips in them. You can only but coffee when you put two cups down. So you get credit, through the cups digital wallet, to encourage network and development. You don’t have to get copy – you can build up credit. We had free coffee in the other room… But we had a very fancy barista for the KASH cups, and people queued for this for 20 minutes – coffee with social value.

Questions for us… We give machines agency, and credit… What does that mean for value and how we balance value.

Maggie: It’s at this point I wish Tomorrow’s World still existed!


Q1) where is this fascinating work taking you?

A1) I think this week has been so disruptive in terms of data and technologies disruption of social, civic, political values. I think understanding that we can’t balance value, or fair trade, etc. on our own is helpful and I’m really excited by what bots can offer here…

Q2) I was fascinated by the hairdryers… I’ve been in the National Grid’s secret control room and seeing that, that thing of Eastenders finishes and we make a cup of tea means bringing a whole power station on board… But waiting 10 minutes might avoid that need. It’s not trivial it’s huge.

A2) Yes, and I think understanding how that waiting, or understanding consequences of actions would have a real impact. The British public are pretty conscious and ethical I think, when they have that understanding…

Q3) Have you thought about avoiding queues with blockchain?

A3) We don’t want to just play incentives to get people out of queues. People are there for different reasons, different values, some people enjoy the sociability of a queue… Any chance to open it up, smash it up, and offer the opportunity to co-construct is great. But we need to do that with people not just algorithms.

Maggie: At this point I should be introducing Cathy O’Neil, but she has been snowed in by 15 inches of snow on the East Coast of the US. So, she will come over at a later date and you’ll all be invited. So, in place of that we have a panel on the elephant in the room, the Facebook and Cambridge Analytica scandal, with a panel on data and ethics.

Panel session: The Elephant in the Room: What Next? – Jonathan Forbes (JF), CTO, Merkle Aquila (chair); Brian Hills (BH), Head of Data, The Data Lab; Mark Logan (ML), Former COO Skyscanner, Investor and Advisor to startups and scale ups; Mhairi Aitken (MA), Research Fellow, University of Edinburgh. 

JF: So, thinking of that elephant in the room.. That election issue… That data use. I want to know what Facebook could have done better?

ML: It has taken them a long time to respond, which seems strange… But I see it as a positive really. They see this as a much bigger issue rather than the transactional elements here. In that room you look at risk and you look at outrage. I think Facebook were trying to figure out why outrage was so high, I think that’s what has surprised them. I think they took time to think about what was happening to them. I don’t think it’s just about electing a game show host to president… The outrage is different. Cambridge Analytica is a bad actor, not just on data but on their advocacy for other problematic tactics. Facebook shouldn’t be bundled into that. I think aspects here is that you have a monopoly. Facebook is an advertising company – they need to generate data and pass it onto app developers. Those two things don’t totally aligned. And I think the outrage is about trust and expectation of users.

JF: You are closest to the public in your research. The share price is dropping significantly right now… How, based on past experience, do you see this playing out.

MS: I’m used to talking to people about public sector use of data. Often people talk about Facebook data and make two points: firstly that they contribute their own data and control  that and know how it’s used; but they also have very high expectations of use for public sector organisations and don’t have that for private sector organisations – they think someone will generate ads and profit but when used in politics that’s very different, and that changes expectations.

JF: I enjoyed your comment about the social license… and I think this may be a sign that the license is being withdrawn. The GDPR legislation certainly changes some things there. I was interested to see Tim Berners Lee’s response, taking Mark Zuckerberg’s perspective… I was wondering, Brian, about the commercial pressures and the public pressures here. Are they balancing that well?

BH: No. When we look back I think this will be a pivotal moment. I kind of feel like GDPR piece is like being in a medieval torture chamber… We have a countdown but the public don’t know much about it. With Facebook it’s like we have a firework in the sky and people are asking what on earth is going on… And we have an opportunity to have a discussion about the use of data. As we leave today we have a challenge around communicate our work with data, what are our responsibilities here. The big data thing, many business cases seem like we’ve failed – we’ve focused on the technology and only that. And I feel we now have an opportunity and a window here.

JF: I’d like to take the temperature of the room… How many of you had Facebook on their phone, and don’t this week? None.

ML: I think that’s the point. The idea of not doing to others data what you wouldn’t want done to your own… But the reality is that legislation is playing catch up to practice. Commercially it’s hard to do the right thing. I think Mark Zuckerberg has reasonably good intentions here… But we have this monopoly… The parallel here is banking. And monopoly legislation hasn’t kept pace with the monopolies we have. I think it would be great if you could export your data, friends data, etc. to another platform. But we can’t.

Comment: I think you asked the wrong question… Who here doesn’t Facebook on their phone at all. Actually quite a lot. I think actually we have that sense that power corrupts and absolute power corrupts absolutely. And I don’t feel I’m missing out, I’m sure others feel that too. And I’m unsurprised about Facebook, I could see where it was going.

JF: OK, so moving towards what we can do, should we have a code of conduct, a hypocratic oath to data, a “do no harm”.

BH: I don’t see ethics featuring in data models. I think we have to build that in. Cathy O’Neil talks about Weapons of Math Destruction… We have to educate our data science students how to use these tools ethically, to think about who they will work with. Cathy was a Quant and didn’t like that so she walked away. We have to educate our students about the choices they make. We talk about optimisation, optimisation of marketing. In optimising STEM stuff… And we are missing stuff… I think we need to move towards STEAM, where A is for Arts. We have to be inclusive for arts and humanities to work with these teams, to think about skills and diversity of skills.

JF: Particularly thinking about healthcare

MA: There is increasing drive to public engagement, to public response. That has to be much more at the heart of training for data scientists and how it relates to the society we want to create. There can be a sense of slowing momentum, but it’s fundamental to getting things right, and shaping directions of where we are going…

JF: Mark, you mentioned trust, and your organisation has been very focused on trust.

ML: These multifacet networks are built on trust. For Skyscanner trust was so much more important than favouring particular clients. I think Facebook’s error has been to not be more transparent in what they do. We have had comments about machine learning as hype, but actually machine learning is about machines learning to do something without humans. We are moving to a place where decisions will be made by machines. We have to govern that, and to police machines with other machines. And we have to have algorithms to ensure that machine learning is appropriate and ethical.

JF: I agree. It was interesting to me that Weapons of Math Destruction is the top seller in algorithms and programme – a machine generated category – but that is reassuring that those working in this space are reading about this. By show of hands how many here working in data science are thinking about ethics. Some are. But unclear who isn’t working with data, or who isn’t working ethical. So, to finish I want your one takeaway for this week.

BH: I think it’s up to us to decide how to do things differently, and to make the change here. If we are true data warriors driving societal benefit then we have to make that change ourselves.

ML: We do plenty to mess up the planet. I think machine learning can help us sort out the problems we’ve created for ourselves.

MA: I think its been a wonderful event, particularly the variety and creativity being shared. And I’m really pleased to open up these conversations and look at these issues.

JF: I’m optimistic too. But don’t underestimate the ability of a small group of committed people to change the world. So, Data Warriors, all of you… You know what to do!

Maggie: Thank you all for your conversation, your enthusiasm. One message I really want to give you is that when you look at the use of data, the capacity to do good… The vast majority of young people are oblivious. They could miss out on an amazing career. But as the world changes, they could miss out on a decent career without these skills. Don’t underestimate your ability as one person with knowledge of that area to make a difference, to influence and to inspire. A few years back, in Greenock, we ran an event with Teen Tech and the support of local tech companies made all the difference… One team went to the finals in London, won and went to Silicon Valley… And that had enormous impact on that school and community, and now all S2 students do that programme, local companies come in for a Dragon’s Den type set up. Any moment that you can inspire and support those kids will make all the difference in those lives, and can make all the difference, especially if family, parents, community don’t know about data and tech.

Closing Comments – Gillian Docherty, CEO, The Data Lab

Firstly thank you to Maggie for being an amazing host!

I have a few thank yous to make. It has been an outstanding week. Thank you all for participating in this event. This has been just one event of fifty. We’ve had another 3000 data warriors, on top of you 450 data warriors for Data Summit. Thank you to our amazing speakers, and exhibitors. The buzz has been going throughout the event. Thank you to our sponsors, and to Scottish Government and Scottish Enterprise. Thank you to our amazing volunteers, to Grayling who has been working with the press. To our venue, events team and caterers. Our designer from two fifths design. And the team at FutureX who helped us organise Data Talent and Data Summit – absolutely outstanding job! Well done!

And two final thank yous. Firstly the amazing Data Lab team. We have thousands of new people being trained, huge numbers of projects. I also want to specifically mention Craig Skelton who coordinated our Fringe events; Cecilia who runs our marketing team; and Fraser and John who were behind this week!

My final thank you is to all of you, including the teams across Scotland participating. It is a fantastic time to be working in Scotland! Now take that enthusiasm home with you!

 March 23, 2018  Posted by at 10:48 am Events Attended, LiveBlogs Tagged with: , , ,  No Responses »
Mar 222018

Today I am at the Data Fest Data Summit 2018, two days of data presentations, showcases, and exhibitors. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to the Data Lab – Gilian Doherty, The Data Lab CEO

Welcome to Data Summit 2018. It’s great to be back, last year we had 25 people with 2000 people, but this year we’ve had 50 events and hope to reach over 3500 people. We’ve had kids downloading data from the space station, we’ve had events on smart meters, on city data… Our theme this year is “Data Warrior” – a data warrior is someone with a passion and a drive to make value from data. You are data warriors. And you’ll see some of our data warriors on screen here and across the venue.

Our whole event is made possible by our sponsors, by Scottish Enterprise and Scottish Government. So, let’s get on with it!

Our host for the next two days is the wonderful and amazing Maggie Philbin, who you may remember from Tomorrow’s World but she’s also had an amazing career in media, but she is also chair of UK Digital Skills and CEO of Teen Tech, which encourages young people to engage with technology.

Intro to the Data Summit – Maggie Philbin

Maggie is starting by talking to people in the audience to find out who they are and what they are here for… 

It will be a fantastic event. We have some very diverse speakers who will be talking about the impact of data on society. We have built in lots of opportunities for questions – so don’t hesitate! For any more information do look at the app or use the hashtag #datafest18 or #datasummit18.

I am delighted to introduce our speaker who is back by popular demand. She is going to talk about her new BBC Four series Contagion, which starts tonight.

The Pandemic – Hannah Fry

Last year I talked about data for social good. This year I’m going to talk about a project we’ve been doing to look at pandemics and how disease spreads. When we first started to think about this, we wanted to see how much pandemic disease is in people’s minds. And it turns out… Not many.

Hannah’s talk was redacted from this post yesterday but, as Contagion! has now been broadcast, here we go: 

Influenza killed 100 million people in the 20th Century. The Spanish Flu killed more people in one year than both World Wars. Which seems surprising but that may be partly because Pandemic Flu is very different from Seasonal Flu. Pandemic Flu is where a strain of flu jumps from animals to humans and spreads so fast that we can’t vaccinate fast enough. For that reason Pandemic Flu is the top of the UK Government’s Risk Register.

So, what we decided to do was essentially a TV stunt with a real purpose. We built a simple smart phone app. The App captures where people are, and how many people they are with. That allows us to see how disease might spread. Firstly to do that for TV of course, but secondly this is proper citizen science for real research. So, I spent a year calling in lots of favours, getting on all sorts of media, asking people to download an app.

But we also needed a patient zero, and we also needed a ground zero. We picked Haselmere in Surrey, which is a sort of Goldilocks town, just big enough, well connected.. A beautiful English town… Just the type you’d like to destroy with an imaginary virus. And I was patient zero… So I went there, went to the gym, went to the shops, went to the pub,,, But unknown to me I also walked past others with the app… So when I stood need to one of these , it was for enough time to infect that person… And so now there were two people and then many more… A pharmacist got infected early on and continued infecting out…

These patterns are based on our best mathematical models for infection… And you can quickly see pockets of infection developing and growing. Spreading quickly to a whole town. But those dots on a map are all real people…

Looking at some real infection sites…. So, in Petersfield there is a school were a few kids from Haselmere attend, commuting by train. Three kids running our app… By day three, two were infected, one wasn’t. They went to the break room, and outside, and the third person got infected… And then infected their family…

I wanted to also talk about a person from Haselmere who work in London on Day Two. Two people from the town don’t know each other, but they took the train home, and the one infected the other…

Now, this is just the Haselmere experiment, but we did a nationwide experiment…

We persuaded 30,000 people to download the app and take part… Again, it starts with me walking around Haselmere. By a month in, London is swamped. Two months in it sweeps Scotland. By three months it’s in North Ireland. Really by then only the North of Scotland was safe! What is startling isn’t the speed of the spread, but also how many people get infected… This is the most accurate model we have to date. The most accurate estimate for a Spanish Flu type virus, is a staggering 43,343,849. A conservative fatality rate of 2% would be 886,877 deaths. But that’s worst case scenario… That’s no interventions… Which is why this data and this model are so important as they allow you to understand and trial intervention. Generally most people infect the same small number of people, but some super spreaders have a much bigger impact. If you target super spreaders with early vaccination – just vaccinating a targeted 10% – makes a huge difference. It really slows the spread, giving yourself a fighting chance to overcoming infection.

We know these pandemics can and will happen, but it’s about what you plan for and how you intervene. The only way to answer those big questions and to know how to intervene, is to understand that data, to understand that spread. So we are anonymising this data set and releasing it to the academic community – as a new gold standard for understanding infection. Data really does save lives.


Q1) So, Shetland is safe…. Unless the infection started there.

A1) When we spoke to one person about what they’d do in a pandemic, they said they’d get in a car with their kids and just

Q2) I’m from the NHS and there has been a lot of work of super spreaders, closing schools… Has there been work on the most efficient, mathematically effective patterns to minimise infection.

A2) Schools are an interesting one… Closing schools sounds like it makes everything simple. Sometimes shutting schools means kids share in an unpredictable manner as they will go places too. And then you reopen schools and reinfect potentially… And that’s without the economic impact. These are all questions we are thinking about.

Q3) That’s awesome and scary. What about people developing immunity.

A3) Our model is no immunity, and no-one recovers. But you can build that data in later, adding rish assumptions. And some of the team working on this are looking at infection transmitted through the air – some viruses can stick around a few hours.

Q4) I remember the SARS book. I’m very paranoid… Brought suits, gloves, bleach… In New Zealand you need a two week supply of stuff in your house… If we did that, how would that make a difference.

A4) Yes… So for instance the government always pushes messages about hand washing whenever flu is taking place. It doesn’t feel that that would make a big difference… But at a population level it really does…

Q5) My question is whether you will make the data available for other people – for epidemiology but also for transport, for infrastructure.

A5) Yes, absolutely. We wanted to make this as scientifically rigorous as possible. The BBC gives us the scale to get this work done. But we are now in the process of cleaning the data to share it. Julia Gog at Cambridge is the lead here so look out for this.

Q6) What about data privacy here?

A6) At a national level the data is accurate to 1 km squared, with one pin every 24 hours. Part of the work to clean the data is checking if it can be reverse engineered to make sure that privacy is assured. For Haselmere there is more detail… We are looking at skewing location, at just sharing distance apart rather than location, and at whether there is any way you can reverse engineer the dataset if you’ve seen the TV programme, so we are being really careful here.

Business Transformation: using the analytics value chain – Warwick Beresford-Jones, Merkle Aquila

I’ll be talking about the value chain. This is:

Data > Insight > Action > Value (and repeat)

Those two first aspects are “generation” and the latter two are “deployment”. We are good at the first two, but not so much the action and value aspects. So we take a different approach, thinking right to left, which allows faster changes. Businesses don’t always start with an end in mind, but we do have accessible data, transformatic insights, organisational action, and integrated technology. In many businesses much of the spend is on technology, rather than the stage where change takes place, where value is generated for the business. So that a business understands why they are investing and what the purpose of this.

I want to talk more about that but first I want to talk about the NBA and the three point line, and how moving that changed the game by changing basket attempts…And that was a tactical decision of whether to score more points, or concede fewer points, enabling teams to find the benefit in taking the long shot. Cricket and Football similar use the value chain to drive benefit, but the maths work differently in terms of interpreting that data into actions and tactics.

Moving back to business… That right to left idea is about thinking about the value you want to derive, the action required to do that, and the insights required to inform those actions, then the data that enables that insight to be generated.

Sony looked at data and customer satisfaction and wanted to reduce their range down from 15 to 4 handsets. But the data showed the importance of camera technology – and many of you will now have Sony technology in the cameras in your phones, and they have built huge value for their business in that rationlisation.

BA wanted to improve check in experiences. They found business customers were frustrated at the wait, but also families didn’t feel well catered for. And they decided to trial a family check in at Heathrow – that made families happier, it streamlined business customers’ experience, and staff feedback has also been really positive. So a great example of using data to make change.

So, what questions you should be asking?

  • What are the big things that can change our business and drive value?
  • Can data analytics help?
  • How easy will it be to implement the findings?
  • How quickly can we do?

Q1) In light of the scandal with Facebook and Cambridge Analytica, do you think that will impact people sharing their data, how their data can be used?

A1) I knew that was coming! It’s really difficult… And everyone is also looking at the impact of GDPR right now. With Facebook and LinkedIn there is an exchange there in terms of people and their data and the service. If you didn’t have that you’d get generic broadcast advertising… So it depends if people would rather see targeted and relevant advertising. But then with some of what Facebook and Cambridge Analytica is not so good…

Q2) How important is it for the analysts in an organisation to be able to explain analytics to a wider audience?

A2) Communication is critical, and I’d say equally important as the technical work.

Q3) What are the classic things people think they can do with data for their business, but actually is really hard and unrealistic?

A3) A few years ago I was meeting with a company, and they gave an example of when Manchester United had a bad run, and Paddy Power had put up a statue of Alex Ferguson with a “do not break glass sign” and they asked how you can have that game changing moment. And that is really hard to do.

Q4) You started your business at your kitchen table… And now you have 120 people working for you. How do you do that growth?

A4) It’s not as hard as you think, but you have to find the right blend of raw talent with experience – lots of tricky learning.

Project Showcase

How will you make a difference? I’m going to talk about how I’ve made major change for one of Scotland’s biggest organisation. I was working for Aggreko, the leader of mobile modular power and temperature solutions. They provide power for the Olympics, the World Cup, the Superbowl… A huge range of events across the world.
We are now watching a short video on how Aggreko supplies large scale mobile power (30 MW set up in 17 days) to cover local demand in Macha Pichu when a hydroelectric plant has to be shutdown for maintenance. 
In the dark old days Aggreko was a reactive organisation. A customer would ring with an issue, then Aggreko would send an engineer out. And then they moved to monitoring the mobile power kit to help monitor equipment across the world on a 24/7 basis. My team build the software to undertake that monitoring, to respond to every alert, alarm, any issue customers might face. And in fact in many cases to fix an issue before a customer ever became aware of it. And that meant far greater reliability and efficiency. And doing that we wondered how we might be able to predict issues, to predict how eqyuipment might fail. We didn’t know how to do that and we weren’t afraid to ask…
So we went to the Data Lab, took my idea to their board, and they funded a year long pilot to work with University of Strathclyde and Microsoft, as well as needing to build a team of engineers, technicians, specialists to be part of the team to take this far. This was a group of massively smart group, but also some big egos… A lot of what I had to do was to ensure there was good collaboration across those teams. The collaboration is really what made this project a real success. We created an advanced analytics team which allowed us to put models into use, some of which could predict an issue 2 weeks ahead of any issue, and being able to manage those issues for our customers.
The guys at Data Lab helped me to make a difference, they were brilliant and all that help is available to you too. So what are you waiting for?  
There are various ways to resolve this, but they are not easy. There is work for the 1% of large companies, but that leaves SME out. And 50k SMEs go out of business every year in the UK. So, what is the solution? Well, let me tell you about Previse and what we do. We think we have a unique solution. David Brown, one of our co-founders, had experience in the sector, and he didn’t want to accept the status quo. Accounting the oldest processes and data that a company is, but no-one is using that in this sort of way. So what do we do?
Previse finds data, engages with data, pulls in other data… And looks at what can work. We can look at all data on every invoice from every supplier. We then determine a score, and a threshold…. So that when invoices come in they can be prioritised and mostly approved and paid immediately. The process is the same for the buyer but it makes a huge difference for the supplier. Placing an invoice through Previse you can send and have approved invoices very swiftly, and without chasing and additional work. That is a huge difference in cost and time. The large corporates we’ve been talking with – including 70% of large FTSE companiess – are really enthusiastic and want us to help them.
And our experience in Scotland has been incredible. The Data Lab helped us throughout, finding the right universities to work with. We work with Heriot Watt (Mike Chantler) and with MBN to find the right resources, and Scottish Enterprise have helped us make Scotland our hub for data science and software engineers. We’ve employed 5 people in the last 6 months, and we’ll double that by the end of the year. We can generate growth, but it’s also about making real change with data.
If SMEs are paid on time, that allows them to thrive and grow. It’s a huge problem and we think it can be resolved.
Our platform consists of four modules: sustainability; mapping; reporting and advanced. But I’ll talk about our mapping module and some projects we’ve worked on:
  • Mapping the water footprint of your crops – a project with the University of Edinburgh, funded by Data Lab. This brings together a wide range of crop data layers. We have an overlay based on water for crop growing, and overlays of gray water, or the erosion potential – for instance there is high erosion potential on the west coast of Scotland, mmostly low erosion in the east of Scotland.
  • Forests 2020 is a Mexican application supported by the UK Space Agency, and we work with University of Edinburgh, University of Leicester, and Carbomap. Here we can see deforestation patterns, and particular crop areas.
  • Innovate UK: farm data, which is a collaboration with Rothamsted Research, Environment Systems, and Innovate UK – this is at an early stage looking at crop rotation data for UK and export markets. And you can also see the soil you are growing on, what can be planted, what sort of fertilisers to use.
  • Sustainability risk – supports  understanding of risks such as water depletion, and the various factors impacting and shifting that.
  • We also have tools for government to know how to plan what type and locations they should be building power plants in.

So, in conclusion, layering data allows us to gain new insights and understanding.

After a good lunch and networking session we are now back in the main hall, starting with a video on the use of data in Heineken production process. And an introduction to Stefaan Verhulst, a former Glasgow graduate now based in New York.

Data Driven Public Innovation In Partnership With The Private Sector: The Emerging Practice Of Data Collaboratives – Stefaan Verhulst, Co-founder and Chief Research and Development Officer, The Gov Lab

I’m delighted to be back in Scotland for this event looking at how data can be help society, and how society can be. That is also the focus of The Gov Lab in New York. And we also look at how we can unleash data for good.

An example  want to give you is the earthquake in Nepal a few years ago. It was a terrible event but it was also inspiring too, because Ncell, a cell phone operator, and Flowminder (based in Sweden and the UK) worked together to map the flow of people to intervene, to save lives. It is a great example of using data in the public good. And it’s an example of the growth of available data, including web crawling/scraping/search analysis; social media; retail data etc. all collected by the private sector. But we also have new data science to address this data, to gain meaning from this data. And often that expertise to extract meaning is sitting in the private sector.

So, the real question is how we extract value and engage with the private sector around data they collect. That’s a whole different ballgame from open government data. It’s not just about data sharing, but about new kinds of public-private sharing around data for the public good. So we have set up new programmes of Data Collaboratives. So we set up the Data Collaboratives Explorer allows you to explore those collaborations taking place – there are over 100 in there already. From that collaborative work we have gained some insights that I will share today.

So, firstly, data collaboratives are important across the policy lifecycle:

  • That starts with situation analysis. Corporations in the US have worked together in the US to understand the scale of the opioid epidemic, for instance.
  • Our second value proposition is about knowledge creation. For instance, post hurricane season how does the mosquito population change and how does that change mosquito born diseases.
  • Our third value proposition is prediction, fr instance projects to predict suicide risk from search results – a project in Canada and also in India.
  • And then we have evaluation and impact assessment. An example here is Vision Zero Labs looking at traffic safety and experiments in spatial composition to influence and reduce risk of accidents.

In those collaboratives we see different models in use. These include: data pooling – enabling sharing and analysis across the collaboration; prizes and challenges – opening some data as a source of generating new insights through innovative ideas and projects that benefit both public and private sector, e.g. BBVA’s Innova challenge; research partnerships – with collaboration across private sector and public or academic sector – such as work on fake news on Twitter; intelligence products – JP Morgan Chase has an institute to extract insights from their own data and actually that can be hugely detailed and valuable; API – for instance Zillow allows you to access real time mortgage and housing market data; trusted intermediary – for instance Dalberg who acts between telecommunications companies and others.

So, there are many ways to set up a data collaborative. But why would the private sector want to do this? Well, they may be motivated by reciprocity – sharing data may lead to access to specialist expertise; research and insights; revenue; regulatory compliance; reputation and retainment of talent – often corporations need to retain talent through solving harder or more interesting problems; responsibility.

But there are challenges too. For instance the taxi and limousine agency in New York regulates all taxi operations, including Uber. In their wisdom they shared the data… But that exposed some celebrity locations (and less salubrious locations). The harm here wasn’t huge but that data in a different cultural contexts could present a much higher risk. So, some of the concerns around sharing data include:

  • privacy and security
  • generalisability and data quality (e.g. not everyone has a cell phone)
  • competitive concerns
  • cultural challenges – there is something of a culture of hoarding data within organisations.

So, to move towards data responsibility we really need risk and value assessment that recognises data as a process, and part of a wider value chain. We need fair information practices and processes – our principles are about 30 years out of date and we urgently need new principles and processes. GDPR helps, but not all the challenges we may have. We need new methods and approaches. And that means having a decision tree across the data cycle.

There are risks in sharing data, but there are also risks in not sharing the data. If we had not have used the NCell data in Nepal, we would have had more deaths. So we have to respond not just to risks, but also to opportunity cost of not sharing data. What is your responsibility as a corporation?

I’ve given lots of examples here… But how do we make data driven public innovation systemic? We need data stewards in organisations so there is someone who can sign off on data collaboratives, we need that profession in place in organisations to enable work with the public sector. We need methods – like the Unicef collaboratory around childhood obesity, that’s a new methology. We also need new evidence, of how data can be used and what impact it will have. And finally we need a movement – this all won’t happen without a movement to establish data collaboratives, and I’m delighted to be here today as part of this movement, and ultimately use data to improve peoples lives.


Q1) In light of Cambridge Analytica and Trump, aren’t we misusing data?

A1) I think use is part of that value chain and we have to have a debate about what kind of use we are comfortable with, and which we are not. And that case also raises questions about freedom of expression, and a need to regulate against deceptive behaviours.

Q1) Several years ago hashtags brought down governments in the Middle East, and now we have governments in those countries controlling the public through hashtags. It’s scary.

A1) I’ve been working in privacy for many years, and I really encourage a comparison of risks and value. And to do a cost-benefit analysis. We need to rebalance that.

Gillian is introducing our special guest… Minister Derek MacKay

Message from the Scottish Government – Derek Mackey, MSP, Cabinet Secretary for Finance & Constitution, the Scottish Government

I’m not sure that I’ve thought of myself as a data warrior before, but I did teach the Social Security Minister how to use Instagram the other week! I say that partly as I have an appeal and a plea for you… The First Minister has a huge set of followers on Twitter, but I’m stuck just below 18k… Maybe you are the audience to take me over that line!

There’s a lot I want to cover in terms of the excitement of this event. We have a strong reputation and record in Scotland. With responsibility for the budget and internationalisation, this is really exciting. I’m particularly enthused by the international representation including Brazil, Singapore, USA, and Ireland too. This event allows us to put the spotlight on data science in Scotland. It is a natural place for people to come and do business. And this is a great event with business leaders here, with experience to share with others.

Our government, Scottish Enterprise and Data Lab are working together to build innovation and business in Scotland. We are fortunate in Scotland to have world class data resources. Scotland has Universities, 5 of which are in the top 100, and we have 70% of reseach rated as excellent in the last REF. We can feel this group. Data Driven Innovation has the potential to deliver £20bn value to Scotland in the next five years. This buzz can be harnessed to make Scotland the Data Capital in Europe. I paricularly support the growth in FinTech. Many people describe themselves as disruptors – that would have once been seen as a negative but is now a real positive, about opening new opportunities. And data helps us deliver our work, one example of which is the Cancer Challenge which is helping us understand how best to use our resources for the best outcomes.

The Scottish Government Innovation Action Plan seeks to build a sustainable economy, with skills crucial to that, including funding for business growth, innovation, etc. We’ve also launched the Scottish Digital Academy and the Data Science Accellerator to look at how things are changing, to innovate working methods – such as CivTech’s innovative models. We are really serious about business growth, the economy and skills. We have invested in innovation, education and internationalisation. We are the strongest part of the UK outside London and the SouthEast.

So, the Scottish Government supports your enthusiasm for data, for what can be done with data. High tech, low carbon is the future we see that, and we want to be country welcome in Europe and the rest of the world – we don’t support the UK government’s view on Europe.

I commend your work and hope that you have a fruitful and enjoyable time here. And we hope the collaboration of our agencies helps to bear fruit now and in the future.

Improving Transparency In The Extractives Industry Using Data Science – Erin Akred, Lead Data Scientist, DataKind

I am a data scientist from DataKind where we harness data for the improvement of humanity. We exist to use data to see the kind of world we want to see. The challenge we face is that many not for profits, charities, government agencies etc. do not have the resources to do the types of datascience that the private sector (e.g. Netflix) can. So we link pro bono data scientists with organisations with a social mission.

Last year we did a project looking at automating detecting mines from earth observation imagery. We are used to using this data for other purposes, but this is a challenging problem. I will talk more about this but I wanted to talk more about DataKind.

Our founder, Jake, was working at the New York Times on data science, and saw people volunteering and attending hack events at the weekend, giving back on their talents… So he thought perhaps I could partner with a mission driven organisation, could I organise a similar event and make this happen… He started DataKind and we’ve been developing what we can offer these mission-driven organisations who also want to benefit from Data Science. So we now pair data scientists with mission driven projects. We have over 18k community members worldwide, 6 chapters in 5 countries (US, Bangalore, Singapore, Dublin, London, San Francisco, Washington DC), we have chapter applicants in 40+ global cities; 228 events worldwide; and we’ve worked on over 250 projects generating about $20m value generated in volunteer effort.

On example project has been with the Omidyar Network to look at data science solutions that might enable social actors to operate more effectively and efficiently in their efforts to combat corruption in the extractives industry. Now we don’t start with the data that is out there. Our funders really want impact, and we think of that as impact per dollar. So, anyway, the context of this work was illegal mining which can cause conflict in Eastern Demographic Republic of Congo, it includes poor environmental outcomes, and social challenges. As data scientists we partner with other organisations to ensure we know how to get value out of data insights.

To understand illegal mining we have to know where it is taking place. So we did work on machine learning from images. We worked with Global Forest Watch and IPIS.

Now, not all of our projects are successful… Usually projects fails because of issues in:

  • Problem statement – a well thought through problem statement is really important.
  • Datasets
  • Data Scientists
  • Funding
  • Subject Matter Expertise
  • Social Actors

Now, I spoke to someone last night who has run lots of Kaggle projects – crowdfunded data science challenges. Now in those projects you have data, data scientists but you don’t have subject matter experts – and that’s crucisl knowledge and skills to have on board. For instance when looking at malaria, there was a presumption that mosquito nets would be helpful, but the way they work looks like a shrine, like death… And they don’t want to sleep in them. So they used them as fiishing nets.

When we work with an organisation we do want a data set, but we also want an organisation open to seeing what the data reveals, not trying to push a particular agenda. And we have subject matter experts that add crucial context and understanding of the data, of any risks or concerns with the data as well.

We start with, e.g.:

We want to create image classification models

Using publicly available earth satellite imagery

So that those owrking in the transparancy sector can be made aware of irregular mining activity

So that they can improve environmental and conflict issues due to mining. 

Some of the data we use is open – and a lot of data I’ve work with is open – but also closed data, data generated by mission-driven organisational apps, etc.

And the data scientists on these projects are at the top of their game, who these organisations could not afford to work with or recruit earlier.

So, for this project we used a random forest analyser on the data, to find mine locations. We had had generated training data for this project which determined that we can pick out where illegal mining work has occured with good accuracy.

To find out more and get involved – and I’d encourage you to do that – go to:


Q1) Where do you see DataKind going?

A1) We do a lot with not a lot of money. I had assumed that DataKind was 100 people when I joined, it was less than 10. I would love to see this model replicated in other countries. And conferences… Bringing volunteer data scientists together with providers enables us to increase the opportunity for these things to happen. Bringing these people together, those conferences are rich experiences that amplify the impact of what we are doing.

Q2) For the mining project you can access the data online. The US Federal Government is hosting the data, and we used Google Earth engine in this work.

From Analytics To AI: Where Next For Government Use Of Data? – Eddie Copeland, Director of Government Innovation, Nesta

I’ve been talking to anyone who will listen over the last 5 years about the benefits of public sector data. We have been huge proponents of using open data, but often data has been released in a vague hope that someone else might do something with it. And we have the smart cities agenda, generating even more data that often we have no idea how to use. But there is a missing link there… The idea that public organisations should be the main consumer of their own data, for improving their own practice.

Now you’ll have read all those articles asking if data is the new “oil”, the new “fuel”, the new “soil”! I don’t much care about the analogy but the key thing is that data is valuable. Data enables the public sector to work better, it enables many of the tried and tested ways of working better. Doing more and better with less. But that’s hard to do. For a public sector organisation with lots of amazing data on opportunities and challenges in my area, but not the next door area, how can I understand that bigger picture. We can target resources to the most vulnerable areas, but we need data to tell us where those are. Without visibility across different organisations/parts of the public sector (e.g. in family and child services), how can that data be used to understand appropriate support and intervention?

Why do we focus on data issues? Well, there is a technology challenge as so many public sector organisations have different IT services. And you have outrageous private sector organisations who charge the public sector to access their own data – they should be named and shamed. Even when you get the data out the format can be inconsistent, it’s hard to use. Then there is what we can do with the data – we often urge on the side of caution, not what is useful. Historically the main data person in public sector organisations was the “data protection officer” – the clue is in the title!  It takes an organisational leap to collaborate on issues where that makes sense.

I used to work for a think tank and I got bored of that, I really wanted to be part of a “do tank”, to actually put things into action. And I found this great organisation called Nesta and we have set up the London Office of Data Analytics:

  • an impactful problem – it takes time, backing, support you have to have a problem that matters
  • a clearly defined intervention – what would you do differently if you had all the information you could want about the problem you want to solve (data science is not the innovation)
  • what is the information asset you would need to undertake that intervention?
  • what intervention do you need to undertake to solve that issue?

So when we looked at London the issue that seemed to fit these criteria was unlicensed Houses of Multiple Occupancy, and how we might predict that. We asked housing officers how they identified these properties, we looked at what was already known, we looked at available information around those indicators. And then developing machine learning to predict those unlicensed HMOs – we are now on the third version of that.

We have also worked on a North East Data Pilot to join up data across the region to better understand alcohol harms. But we didn’t know what intervention might be used, which has made this harder to generate value from.

And we are now working on the Essex Centre for Data Analytics, looking at the issue of modern slavery.

Having now worked through many of these examples, we’ve found that data is the gateway drug to better collaboration between organisations. Just getting all the different players in the room, talking about the same problem in the same way, is hugely valuable. And we see collaborations being set up across the place.

So, things we have learned:

  1. Public sector leaders need to create the space and culture for data to make a difference – there is no excuse for not analysing the data, and you’ll have staff who know that data and just need the excuse to focus and work on this.
  2. Local authorities need to be able to link their own data – place based and person based data.
  3. We need consistent legal advice across the public sector. Right now lots of organisations are all separately getting advice on GDPR when they face common issues…

So, what’s next? Nesta is an innovation organisation. There is excitement about technologies of all types. For this audience AI probably is overhyped but nonetheless that has big potential, particularly algorithmic decision making out in the field. Policy makers talk about evidence based decision making, but AI can enable us to take that out into the field. Of course algorithms could do great things, but we also have examples that are bad… Companies hiring based on credit records is not ok. Public sector bodies not understanding algorithmic bias is not ok. For my own part I published 10 principles for a code of conduct for public sector organisations to use data centres – I’d love your feedback at

It is not OK to use AI to informa a decision if the person using it could not reasonable understand its basic objectives, function and limitations. We would face a total collapse of trust that could set us back a decade. And we’ve seen over the last week what that could mean.


Q1) Aren’t the problems you are talking about are surely people problems?

A1) Public organisations are being asked to do more with less, and that makes it difficult for that time to be carved out to focus on these challenges, that’s part of why you need buy in and commitment at senior level. There is a real challenge here about finding the right people… The front line workers have so much knowledge but you have organisations who

Q2) Your comment that you have to understand the AI, GDPR require a right to explanation to use of data and that’s very hard to do unless automated.

A2) Yes, that’s a really untested part of GDPR. If local authorities buy in data they have to understand where that data is from, what data is being used and what that means. In the HMO example local front line staff can look at those flags from the prediction and add their own knowledge of the context of, for instance, a local landlord’s prior record. But that understanding of how to use and action that data is key.

Data Driven Business. It’s Not That Hard.- Alex Depledge, Founder,, Former CEO

That’s a deliberately provocative title – I knew that this would be a room full of intellectuals and I’m going to bring back down to earth. I’m known for setting up, and I think it’s fitting that I am following Eddie talking about the basics and the importance of getting the basics right. So many companies that say they are running a data driven business, and they are not… Few are actually doing this.

I started my professional life at Accenture. I met my co-founder there. About 7 years into our friendship she emailed me and said “I’ve got it. I need a piano teacher, I’ve been Googling for four hours, we need a place to find music teachers”. And I said “that’s a rubbish idea”. And then I needed a wysteria trimmed… And we decided we wanted to build a marketplace for local services… We had a whole idea, a powerpoint deck, and thought that great, we’ll get a team in India or Singapore to build it… Sounded great, but nothing happened.

And then Jules quit her well paid job and she said “it’s ok, I’ve brought a book!” – and it was a Ruby on Rails book… She started coding… And she built a thing. And that led to us going through a Springboard process… We had some data but I was trying to pull in money. We were attracting some customers, but not a lot of service providers… We were driven by intuition or single conversations… So one day I said that I’m quitting and going back to the day job… And I was frustrated… And a collague said “maybe we should look at what the data says?”… And so they looked. And they found that 1 in 4 people coming to the website wants a cleaner. And we were like “holy shit!”. Because we didn’t have any cleaners. So we threw away what we had, we set up a three page site. We went all in so you could put a postcode in, find a cleaner, and book them. We got 27 bookings, then double that… And we raised some funding – £250k just when we desperately needed it. We found cleaners, we scaled up, we got much bigger investment. And we scaled up to 100 people.

Then we really turned into a data driven business, building what people want, try it, check the data, iterate. Our VC at Axel pushed us to use mobile… We weren’t convinced. We checked the data that actually people booked cleaners from their desk at lunchtime. At our pinnacle we moved 10k cleaners around London at one point. We had to look at liquidity and we needed cleaners to have an average of 30 hours of work per week… too few and cleaners weren’t happy, too high and jobs weren’t taken up. So at 31 hours we’d start recruiting.

From there we looked at expansion and what kind of characteristics were needed. We needed cities like a donut – clients in the middle, cleaners at the outside. We grew but then we got some unwanted attention and chose to sell. For £32 million. And the company that brought us had 80 engineers.. And they migrated 16 countries onto our platform which had been built by 8 engineers.

So, we sold our business…. And I thought I’m not going to do that again…

And then I wanted a new kitchen… So I had an architect in… spent £@500… 45 days later I got plans… and 75 days later I had an illustration of how it would look so I could make a decision. And so I started Resi, the first online architect. And it took me just 4 months to be convinced that this could be a business. We set up a page of what we thought we might do. I spent £10 per day on Facebook A/B testing ads. And we’ve had a huge amount of business…. We wanted to find the sweet spot for achitects and how long the work would take. Again we needed to know how much time was needed for each customer. So 3 hours is our sweet spot. Our business is now turning over £1 million a year after one year. And only one person works with data, he also does marketing. He looked at our customers and when they convert and how our activities overlaid. After 10 days we weren’t following up, and adding some intervention (email/text etc.) tripled our conversions.

We’ve also been able to look at hotspots across the UK, and we can target our marketing in those areas, and also understand that word of mouth… We can take advantage of that.

I’m a total data convert. I still don’t like spreadsheets. Data informs our decisions – not quite every decision as instinct matters too. But every piece of data analysis we did was doable in a spreadsheet by someone in high school… It doesn’t take machine learning, or AI, or big data. Even simple analysis can create tremendous results.


Q1) What next?

A1) I always said I didn’t want to dine out on one story… Like Hassle. But I don’t know the end for Resi yet… Invite me back in a few years!n

Q1) The learning for a few hours of work was huge.

A1) Our entire business was based on a single piece of analysis – what were our customers looking for led to £32m.

The AI Race: Who’s Going To Win? – Vicky Brock (VB – chairing), CEO, Get Market Fit; Alex Depledge (AD), Founder, Former CEO; Joel KO (JK), Founding CEO, Marvelstone Ventures; Chris Neumann (CN), Early Stage Investor

CN: I’m a recovering entrepreneur. As an investor I’ve had a global purview on what’s going on in the AI race. And I think it’s interesting that we see countries and areas which haven’t always been at the cutting edge of technology, really finding the opportunities here. Including Edinburgh.

JK: We are funders based in Singapore and investing in FinTech. The AI technology has been arising… I’m hoping to invest in AI start ups and incubators.

AD: You already know who I am. In my brief hiatus between companies I was an entrepreneur in residence in Index Ventures, and I saw about 300 companies come in saying they were doing AI or Machine Learning so I have some knowledge here. But also knowing a leading professor in data ethics I don’t care who wins, but I care that Pandora isn’t let out of her box until governments have a handle on this because the risks are great.

VB: I’m a serial entrepreneur around data. And machine learning or AI can kind of be the magic words for getting investment. There is obvious hype here… Is it a disruptor?

CN: I’ve seen a lot of companies – like Alex – say they use ML or AI… In some ways its the natural progression from being data driven. I do think there will be an incredible impact on society over the next 10 years from AI. But I don’t think it will be the robots and tech from science fiction, it will probably be in more everyday ways.

VB: Is AI the key word to get funding…

JK: I see many AI start ups… But often actually it’s a FinTech start up… But they present themselves that way as funders like to hear that… There is so much data… And AI does now spread into data lives… Entrepreneurs see AI as a way to sell themselves to investors.

VB: At one stage it was “big data” then “AI” but you’ve had some little data… What did you see when you were entrepreneur in residence?

AD: No disrespect to investors but they focus on financials and data, but actually I’d often be asking about what was happening under the bonnet… So if they were were using machine learning, ask about that, ask about data sets, ask where it’s coming from… But often they do interesting data work but it’s a good algorithm or calculation… It’s not ML or AI. And that’s ok – that’s something I wanted to bring out in my presentation.

VB: What’s looking exciting now?

CN: We see really interesting organisations starting to do fascinating work with AI and ML. I focus on business to business work, but that often looks less exciting to others. So I am excited about an investment I’ve made in a company using BlockChain to prove GDPR compliance. I spoke with a cool company here using wearables and AI for preventing heart attacks, which is really amazing.

JK: I have been here almost a week, met start ups, and they were really really practical. They have the sense to make a revenue stream from the technology. And these very new start ups have been very interesting to me personally.

VB: You’ve started your next company, did you cross lots of ideas off first…

AD: Jules and I had a list of things we wouldn’t do… Chris talked about B2B… We talked about not doing large scale or consumer ideas. We whittled our list of 35 ideas down to 4 each and they were all B2B… But they bored us. We liked solving problems we’ve experienced. My third business I hope will be B2B as getting to £10m is a bit more straightforward than in B2C.

VB: AI requires particular skillsets… How should we be thinking about our skillsets and our talents.

CN: Eddie talked earlier about needing to know what the point in. It can be easy to get lost in the data, to geek out… And lose that focus. So Alex just asking that question, finding out who gives a damn, that’s really important. You have to do something worthwhile to somebody, there’s no point doing it .

JK: With AI… In ten years… Won’t be coding. AI can code itself. So my solution is that you should let your kids play outside. In Asia lots of parents send kids to coding schools… They won’t need to be engineers… Parents’ response to the trend is too early and not thought through…

AD: I totally agree. Free play and imagination and problem solving is crucial. There aren’t enough women in STEM. But you can over focus on STEM. It’s data and digital literacy from any angle, it could be UX, marketing, product management, or coding… In London we hav ethis idea that everyone should be coding, but actually digital literacy is the skills we need to close. And actually that comes down to basic literacy and numeracy. It’s back to basics to me.

VB: I’d like to make a shout out for arts and social sciences graduates. We learn to ask good questions…

AD: Looking at recent work on where innovation comes from, it comes from the intersectionality of disciplines. That’s when super exciting stuff happens…


Q1) Mainly for Alex… I’m machine learning daft… And I love statistics. And I know the value of small scale statistics. And the value of machine learning and large scale data – not so much AI. How do you convey that to business people?

AD) We don’t have a stand out success in the UK. But with big corporates I tell them to start small.. Giving engineers space to play, to see what is interesting… That can yield some really interesting results. You can’t really show people stuff, you need to just try things.

VB) Are you trying to motivate people to use data in your company?

JK) Yes, with investors you see patterns… I tell kids to start start ups as early as possible… So they can fail earlier… Because failures then lead to successful businesses next time.

CN) A lot of folk won’t be aware that for many organisations there is a revenue stream around innovation… It’s a really difficult thing to try to bring in innovative practices into big organisations, or collaborate with them, without squishing that. There are VCs and multinationals who will charge you a lot of money to behave like a start up… But you can just start small and do it!

The Revolutionary World Of Data Science – Passing On That Tacit Knowledge! – Shakeel Khan, Data Science Capability Building Manager, HM Revenue & Customs

I’ve been quite fortunate in my role in that I’ve spend quite a lot of time working with both developed and developing economies around data science. There is huge enthusiasm across the world from governments. But there is also a huge fear factor around rogue players, and concerns about the singularity – machines exceeding humans’ capabilities. But there are genuine opportunities there.

I’ve been doing work in Pakistan, for DFID, where they have a huge problem with Dengy Fever. They have tracked the spread with mobile phone data, enabling them to contain it at source. That is saving lives. That’s a tremendous outcome. Closer to home, John Bell at Cambridge University has described AI as the saviour of our health services, as AI can enable us to run our services more effectively and more economically.

In my day job at HMRC, you can’t underestimate what the work that we do enables in terms of investment in the country and its services.

I want to talk about AI at three stages: Identify; Adopt; Innovate.

In terms of data science and what is being done around the world… The United Arab Emirates have set up their Ministry of AI and a 2031 Articificial Intelligebce Strategy. We have the Alan Turing Institute looking at specific problems but across many areas, some really interesting work there. In Edinburgh we have the amazing Data Lab, and the research that they are doing for instance with cancer, and we have the University of Edinburgh Bayes Centre. Lots going on in the developed world. But what about the developing world? I’ve just come back from Rwanda, who had a new Data Revolution Policy. I watched a TED talk a few weeks back that emphasised that what is not needed in sub0-saharan Africa is help, what they need is the tools and means to do things themself.

Rwanda is a hugely progressive country. They have more women in parliament (62.8%) than any country in the world. Their GDP is $8.3bn. They have a Data Revolution Policy. They are at the start of their journey. But they are trying to bring tacit knowledge in, to leapfrog development… Recognising the benefit of that tacit knowledge and of those face to face engagements.

For my role I am split about 50/50 between international development and work for HMRC. So I’ll say a bit more about the journey for developed economies…

Defining Data Science can be quite abstract. You have to make a benefits case, to support the vision, to share a framework and some idea of timeline, with quick wins, to build teams, to build networks. Having a framework allows organisations to build capabilities in a manageable way…

A new Data Science Centre going up in Kigali, Rwanda, will house 200 data scientsists – thats a huge commitment.

The data science strategic framework is about data; people skills; cultural understanding and acceptance – with senior buy in crucial for that… And identifying is also about data ethics, skills development – we have been developing frameworks for years that we can now share. For Rwanda we think we can reduce the time to develop data capabilities from maybe 5 years to perhaps 3. Similarly in Pakistan.

When you move to the adopt phase… You really need to see migrationa cross sectors. I started my career in finance. When I came to HMRC I did a review of machine learning and how that was being used, how that machine learning was generating benefit. We managed to bring in £29 bn that would otherwise be lost, partly through machine learning. One machine learning model can, effetively, bring in tens or hundreds of millions of pounds so they have to be well calibrated and tested. So, I developed the HMRC Predictive Analytics Handbook (from June 2014), which we’ve shared across HMRC but also DWP, across collaeagues in government.

In terms of Innovate, it is about understanding the field and latest developments. However HMRC are risk averse, so we want to see where innovation has worked elsewhere. So I did some work with Prof David Hand at Imperial College London about 20 years ago, and I got back in touch, and we developed a programme of data science learning. Not about Imperial providing training, it was a partnership between HMRC and Imperial. We looked closely at the curriculum and demonstrate value added, and look at how we could innovate what we do.

University of Edinburgh Informatics is a really interesting one. I read a document a few years ago by the late Prof. Jon Oberlander about the way that the academic and public and private sectors working together could really benefit the Scottish economy. Two years of work led to a programme in natural language processing that was the result of close collaboration in HMRC. Jon Oberlander was hugely influential, and passionate about conversational technology and the scourge of isolation. And was able to ask lots of questions about AI, and when that will be truly conversational. I hope to continue that work with Bayes, but also wanted to say thank you to Jon for that.

AI is increasingly touching our lives. Wherever we are in the world, sharing our tacit knowledge will be incredibly important.


Q1) Rwanda has clearly made a deep impression. What were the most suprising things?

A1) People have stereotypes about sub saharan Africa that just aren’t true. For instance when you get off the plane you cannot take plastic bags in – they are an incredibly environmental country. I saw no litter anyway in the country. The people of Rwanda are truly committed to improving the lives of people.

Q2) Do you use the same machine learning methods for low income and high income tax payers/avoiders?

A2) There are some basic machine learning methods that are consistent, but we are also looking at more novel models like boosted trees.

Q3) I worked in Malawi and absolutely back up your comment about the importance of visiting. You talked about knowledge from yourself to Rwanda, how was the knowledge exchange the other way?

A3) Great question. It wasn’t learning all from developed to developing. We learnt a great deal from our trip. That includes cultural aspects. I terms of the foundations of data science, we in the UK have used machine learning in financial services and retail for 30 – 40 years, that isn’t really achievable in these countries at the moment and there it is learning going from developed to developing.

Closing comments – Maggie Philbin

I’ve been reflecting on the (less serious) ways data might influence my life. My son in law is in a band (White Lies) and that has given me such an insight into how the music industry use data – the gender and age of people who access your music, whether they will go to gigs etc. And in fact I was very briefly in a band myself during my Swap Shop days… We made a mock up Top of the Pops… Kids started writing in… And then BBC records decided to put it out… We had long negotiations about contracts… But I was sure no-one would buy it… It reached number 15… So we went from parodying Top of the Pops to being on Top of the Pops. And thank you to Scotland – we made number 9 here! But I hadn’t negotiated hard – we just got 0.5%. And if we’d had that data understanding that White Lies have, who knows where we would have been.

So, day one has been great. Thank you to The Data Lab, and to all the sponsors. And now we adjourn for drinks.

 March 22, 2018  Posted by at 10:53 am Events Attended, LiveBlogs Tagged with: , ,  No Responses »