Today I am back at the Data Fest Data Summit 2018, for the second day. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too!
I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly.
Intro to Data Summit Day 2 – Maggie Philbin
We’ve just opened with a video on Ecometrica and their Data Lab supported work on calculating water footprints.
I’d like to start by thanking our sponsors, who make this possible. And also I wanted to ask you about your highlights from yesterday. These include Eddie Copeland from Nesta’s talk, discussion of small data, etc.
Data Science for Societal Good — Who? What? Why? How? – Kirk Borne, Principal Data Scientist and Executive Advisor, Booz Allen Hamilton
Data science has a huge impact for the business world, but also for societal good. I wanted to talk about the 5 i’s of data science for social good:
So, the number one, is the Interest. The data can attrat people to engage with a problem. Everything we do is digital now. And all this information is useful for something. No matter what your passion, you can follow this as a data scientist. I wanted to give an example here… My background is astrophysics and I love teaching people about the world, but my day job has always been other things. About 20 years ago I was working in data science at NASA and we saw an astronomical – and I mean it, we were NASA – growth in data. And we weren’t sure what to do with it, and a colleague told me about data mining. It seemed interesting but I just wasn’t getting what the deal was. We had a lunch talk from a professor at Stanford, and she came in and filled the board with equations… She was talking about the work they were doing at IBM in New York. And then she said “and now I’m going to tell you about our summer school” – where they take kids from inner city kids who aren’t interested in school, and teach them data science. Deafening silence from the audience… And she said “yes, we teach the staff data mining in the context of what means most for these students, what matters most. And she explained: street basketball. So IBM was working on a software called IBM Advanced Calc specifically predicting basketball strategy. And the kids loved basketball enough that they really wanted to work in math and science… And I loved that, but what she said next changed my life.
My PhD research was on colliding galaxy. It was so exciting… I loved teaching and I was so impressed with what she had done. These kids she was working with had peer pressure not to be academic, not to study. This school had a graduation rate of less than 50%. Their mark of success for their students was their graduation rate – of 98%. I was moved by that. I felt that if this data science has this much power to change lives, that’s what I want to do for the rest of my lives. So my life, and those of my peers, has been driven by passion. My career has been as much about promoting data literacy as anything else.
So, secondly, we have insight. Traditionally we collect some data points but we don’t share this data, we are not combining the signals… Insight comes from integrating all the different signals in the system. That’s another reason for applying data to societal good, to gain understanding. For example, at NASA, we looked at what could be combined to understand environmental science, and all the many applications, services and knowledge that could be delivered and drive insight from the data.
Number three on this list is Inspiration. Inspiration, passion, purpose, curiousity, these motivate people. Hackathons, when they are good, are all about that. When I was teaching the group projects where the team was all the same, did the worst and least interestingly. When the team is diverse in the widest sense – people who know nothing about Python, R, etc. can bring real insights. So, for example my company run the “Data Science Bowl” and we tackle topics like Ocean Health, Heart Health, Lung Cancer, drug discovery. There are prizes for the top ten teams, this year there is a huge computing prize as well as a cash prize. The winners of our Heart Health challenge were two Wall Street Quants – they knew math! Get involved!
Next, innovation. Discovering new solutions and new questions. Generating new questions is hugely exciting. Think about the art of the possible. The XYZ of Data Science Innovation is about precision data, precision for personalised medicine, etc.
And fifth, ignition. Be the spark. My career came out of looking through a telescope back when I lived in Yorkshire as a kid. My career has changed, but I’ve always been a scientist. That spark can create change, can change the world. And big data, IoT and data scientists are partners in sustainability. How can we use these approaches to address the 17 Sustainability Development Goals. And there are 229 Key Performers Indicators to measure performance – get involved. We can do this!
So, those are the five i’s. And I’d like to encapsulate this with the words of a poet…. Data scientists – and that’s you even if you don’t think you are one yet. You come out of the womb asking questions of the world. Humans do this, we are curious creatures… That’s why we have that data in the first place! We naturally do this!
“If you want to build a ship, don’t drum up people to gather wood adn don’t assign them tasks and work, but rather teach them to yearn for the vast and endless sea”
– Antoine de Saint-Exupery.
This is what happened with those kids. Teach people to yearn for the vast and endless sea, then you’ll get the work done. Then we’ll do the hard work
Slides are available here: http://www.kirkborne.net/DataFest2018/
Comment, Maggie Philbin) I run an organisations, Teen Tech, and that point that you are making of start where the passion actually is, is so important.
KB) People ask me about starting in data science, and I tell them that you need to think about your life, what you are passionate about and what will fuel and drive you for the rest of your life. And that is the most important thing.
Q1) You touched on a number of projects, which is most exciting?
A1) That’s really hard, but I think the Data Bowl is the most exciting thing. A few years back we had a challenge looking at how fast you can measure “heart ejection fraction – how fast the heart pumps blood out” but the way that is done, by specialists, could take weeks. Now that analysis is built into the MRI process and you can instantly re-scan if needed. Now I’m an astronomer but I get invited to weird places… And I was speaking to a conference of cardiac specialists. A few weeks before my doctor diagnosed me with a heart issue…. And that it would take a month to know for sure. I only got a text giving me the all clear just before I was about to give that talk. I just leapt onto that stage to give that presentation.
The Art Of The Practical: Making AI Real – Iain Brown, Lead Data Scientist, SAS
I want to talk about AI and how it can actually be useful – because it’s not the answer to everything. I work at SAS, and I’m also a lecturer at Southampton University, and in both roles look at how we can use machine learning, deep learning, AI in practical useful ways.
We have the potential for using AI tools for good, to improve our lives – many of us will have an Alexa for instance – but we have to feel comfortable sharing our data. We have smart machines. We have AI revolutionising how we interact with society. We have a new landscape which isn’t about one new system, but a whole network of systems to solve problems. Data is a selleble asset – there is a massive competitive advantage in storing data about customers. But especially with GDPR, how is our data going to be shared with organisations, and others. That matters for individuals, but also for organisations. As data scientists there is the “can” – how can the data be used; and the “should” – how should the data be used. We need to understand the reasons and value of using data, and how we might do that.
I’m going to talk about some exampes here, but I wanted to give an overview too. We’ve had neural networks for some time – AI isn’t new but dates back to the 1950s. .Machine learning came in in the 1980s, deep learning in the 2010s, and cognitive computing now. We’ve also had Moore’s Law changing what is theoretically possible but also what is practically feasible over that time. And that brings us to a definition “Artificial Intelligence is the science of training systems to emulate human tasks through learning and automation”. That’s my definition, you may have your own. But it’s about generating understanding from data, that’s how AI makes a difference. And they have to help the decision making process. That has to be something we can utilise.
Automation of process through AI is about listening and sensing, about understanding – that can be machine generated but it will have human involvement – and that leads to an action being made. For instance we are all familiar with taking a picture, and that can be looked at and understood. For instance with a bank you might take an image of paperwork and passports… Some large banks check validity of clients with a big book of pictures of blacklisted people… Wouldn’t it be better to use systems to achieve that. Or it could be a loan application or contract – they use application scorecards. The issue here is interpretability – if we make decisions we need to know why and the process has to be transparent so the client understands why they might have been rejected. You also see this in retail… Everything is about the segment of one. We all want to be treated as individuals… How does that work when you are one of millions of individuals. What is the next thing you want? What is the next thing you want to click on? Shop Directory, for instance, have huge ranges of products on their website. They have probably 500 pairs of jeans… Wouldn’t it be better to apply their knowledge of me to filter and tailor what I see? Another example is the customer complaint on webchat. You want to understand what has gone wrong. And you want to intervene – you may even want to do that before they complain at all. And then you can offer an apology.
There are lots of applications for AI across the board. So we are supporting our customers on the factors that will make them successful in AI, data, compute, skillset. And we embed AI in our own solutions, making them more effective and enhancing user experience. Doing that allows you to begin to predict what else might be looked at, based on what you are already seeing. We also provide our customers with extensible capabilities to help them meet their own AI goals. You’ll be aware of Alpha Go, it only works for one game, and that’s a key thing… AI has to be tailored to specific problems and questions.
For instance we are working on a system looking at optimising the experience of watching sports, eliminating the manual process of tagging in a game. This isn’t just in sport, we are also working in medicine and in lung cancer, applying AI in similar 3D imaging ways. When these images can be shared across organisations, you can start to drive insights and anomalies. It’s about collaborating, bringing data from different areas, places where an issue may exist. And that has social benefit of all of us. Another fun example – with something like wargaming you can understand the gamer, the improvements in gameplay, ways to improve the mechanics of how game play actually works. It has to be an intrinsic and extrinsic agreement to use that data to make that improvement.
If you look at a car insurer and the process and stream of that, that’s typically through a call centre. But what if you take a picture of the car as a way to quickly assess whether that claim will be worth making, and how best to handle that claim.
I value the application, the ways to bring AI into real life. How we make our experiences better. It’s been attributed to Voltaire, and also to Spiderman, that “with great power comes great responsibility”. I’d say “with great data power comes great responsibility” and that we should focus on the “should” not the “could”.
Comment) A correction on Alpha Go: Alpha Zero plays Chess etc. It’s without any further human interaction or change.
Q1) There is this massive opportunity for collaboration in Scotland. What would SAS like to see happen, and how would you like to see people working together?
A1) I think collaboration through industry, alongside academia. Kirk made some great points about not focusing on the same perspectives but on the real needs and interest. Work can be siloed but we do need to collaborate. Hack events are great for that, and that’s where the true innovation can come from.
Q2) What about this conference in 5 years time?
A2) That’s a huge question. All sorts of things may happen, but that’s the excitement of data science.
Socially Minded Data Science And The Importance Of Public Benefits – Mhairi Aitken, Research Fellow, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh
I have been working in data science and public engagement around data and data science for about eight years and things have changed enormously in that time. People used to think about data as something very far from their everyday lives. But things have really changed, and people are aware and interested in data in their lives. And now when I hold public events around data, people are keen to come and they mention data before I do. They think about the data on their phones, the data they share, supermarket loyalty cards. These may sound trivial but I think they are really important. In my work I see how these changes are making real differences, and differences in expectations of data use – that it should be used ethically and appropriately but also that it will be used.
Public engagement with data and data science has always been important but it’s now much easier to do. And there is much more interest from funders for public engagement. That is partly reflecting the press coverage and public response to previous data projects, particularly NHS data work with the private sector. Public engagement helps address concerns and avoid negative coverage, and to understand their preferences. But we can be even more positive with our public engagement, using it to properly understand how people feel about their data and how it is used.
In 2016 myself and colleagues undertook a systematic review of public responses to sharing and linking of health data for research purposes (Aitken, M et al 2016 in BMC medical ethics, 17 (1)). That work found that people need to understand how data will be used, they particularly need to understand that there will be public benefit from their data. In addition to safeguards, secure handling, and a sense of control, they still have to be confident that their data will be used for public benefits. They are even supportive if the benefit is clear but those other factors are faulty. Trust is core to this. It is fundamental to think about how we earn public trust, and what trust in data science means.
Public trust is easy to define. But what about “public benefit”. Often when people call about data and benefits from data. People will talk about things like Tesco Clubcard when they think of benefit from data – there is a direct tangible benefit there in the form of vouchers. But what is the public benefit in a broader and less direct sense. When we ask about public benefit in the data science community we often talk about economic benefits to society through creating new data-driven innovation. But that’s not what the public think about. For the public it can be things like improvements to public services. In data-intensive health research there is an expectation of data learning to new cures or treatments. Or that there might be feedback to individuals about their own conditions or lifestyles. But there may be undefined or unpredictable potential benefits to the public – it’s important not to define the benefits too narrowly, but still to recognise that there will be some.
But who is the “public” that should benefit from data science? Is that everyone? Is it local? National? Global? It may be as many as possible but what is possible and practical? Everyone whose data is used? That may not be possible. Perhaps vulnerable or disadvantaged groups? Is it a small benefit for many, or a large benefit for a small group. Those who may benefit most? Those who may benefit the least? The answers will be different for different data science projects. That will vary for different members of the public. But if we only have these conversations within the data science community we’ll only see certain answers, we won’t hear from groups without a voice. We need to engage the public more with our data science projects.
So, closing throughts… We need to maintain a social license for data science practices and that means continual reflection on the conditions for public support. Trust is fundamental – we don’t need to make the public trust us, we have to actually be trustworthy and that means listening, understanding and responding to concerns, and being trustworthy in our use of data. Key to this is finding public benefits of data science projects. In particular we need to think about who benefits from data science and how benefits can be maximised across society. Data scientists are good at answering questions of what can be done but we need to be focusing on what should be done and what is beneficial to do.
Q1) How does private industry make sure we don’t leave people behind?
A1) BE really proactive about engaging people, rather than waiting for an issue to occur. Finding ways to get people interested. Making it clear what the benefits are to peoples lives There can be cautiousness about opening up debate being a way to open up risk. But actually we have to have those conversations and open up the debate, and learn form that.
Q2) How do we put in enough safeguards that people understand what they consent to, without giving them too much information or scaring them off with 70 checkboxes.
A2) It is a really interesting question of consent. Public engagement can help us understand that, and guide us around how people want to consent, and what they want to know. We are trying to answer questions where we don’t always have the answers – we have to understand what people need by asking them and engaging them.
Q3) Many in the data community are keen to crack on but feel inhibited. How do we take the work you are doing and move sooner rather than later.
A3) It is about how we design data science projects. You do need to take the time first to engage with the public. It’s very practical and valuable to do at the beginning, rather than waiting until we are further down the line…
Q3) I would agree with that… We need to do that sooner rather than later rather than being delayed deciding what to do.
Q4) You talked about concerns and preferences – what are key concerns?
A4) Things you would expect on confidentiality, privacy, how they are informed. But also what is the outcome of the project – is it beneficial or could they be discriminatory, or have a negative impact on society? It comes back to causing public benefits – they want to see outcomes and impact of a piece of work.
Automated Machine learning Using H2O’s Driverless AI – Marios Michailidis, Research Data Scientist, H2O.ai
I wanted to start with some of my own background. And I wanted to talk a bit about Kaggle. It is the world’s biggest preictive modelling competition platform with more than a million members. Companies host data challenges and competitors from across the world compete to solve them for prizes. Prizes can be monetary, or participation in conferences, or you might be hired by companies. And it’s a bit like Tennis – you gain points and go up in the ranking. And I was able to be ranked #1 out of a half million members t here.
So, a typical problem is image classification. Can I tell a cat from a dog from an image. That’s very doable, you can get over 95% accuracy and you can do that with deep learning and neural net. And you differentiate and classify features to enable that decision. Similarly a typical problem may be classifying different bird song from a sound recording – also very solvable. You also see a lot of text classification problems… And you can identify texts from a particular writers by their style and vocabulary (e.g. Voltaire vs Moliere). And you see sentiment analysis problems – particularly for marketing or social media use.
To win these competitions you need to understand the problem, and the metric you are being tested on. For instance there was an insurance problem where most customers were renewing, so there was more value in splitting the problem into two – one for renewals, and then a model for others. You have to have a solid testing procedure – really strong validation environment that reflects what you are being tested on. So if you are being tested on predictions for 3 months in the future, you need to test with past data, or test that the prediction is working to have the confidence that what you do will be appropriately generalisable.
You need to handle the data well. Your preprocessing, your feature engineering, which will let you get the most out of your modelling. You also need to know the problem-specific elements and algorithms. You need to know what works well. But you can look back for information to inform that. You of course need access to the right tools – the updated and latest software for best accuracy. You have to think about the hours you put in and how you optimize them. When I was #1 I was working 60 hours on top of my day job!
Collaborate – data science is a team sport! It’s not just about splitting the work across specialisms, it’s about uncovering new insights by sharing different approaches. You gain experience over time, and that lets you focus your efforts on where you can focus your effort for the best gain. And then use ensembling – combine the methods optimally for the best performance. And you can automate that…
And that brings us to H2O’s diverless AI which automates AI. It’s an AI that creates AI. It is built by a group of leading machine learning engineers, academics, data scientists, and kaggle Grandmasters. It handles data cleaning and feature engineering. It uses cutting edge machine learning algorithms. And it optimises and combines them. And this is all through a hypothesis testing driven approach. And that is so important as if I try a new feature or a new algorithm, I need to test it… And you can exhaustively find the best transformations and algorithms for your data. This allows solving of many machine learning tasks, and it is all in parallel to make it very fast.
So, how does it work? Well you have some input data and you have a target variable. You set an objective or success metric. And then you need some allocated computing power (CPU or GPU). Then you press a button and H2O driverless AI will explore the data, it will try things out, it will provide some predictions and model interpretability. You get a lot of insight including most predictive insights. And the other thing is that you can do feature engineering, you can extract this pipeline, these feature transformations, then use with your own modelling.
Now, I have a minute long demo here…. where you upload data, and various features and algorithms are being tried, and you can see the most important features… Then you can export the scoring pipeline etc.
This work has been awarded Technology of the Year by InfoWorld, it has been featured in the Gartner report.
You can find out more on our website: https://www.h2o.ai/driverless-ai/ and there is lots of transparency about how this work, how the model performs etc. You can download a free trial for 3 weeks.
Q1) Do you provide information on the machine learning models as well?
A1) Once we finish with the score, we build the second model which is simple to predict that score. The focus on that is to explain why we have shown this score. And you can see why you have this score with this model… That second interpretability model is slightly less automated. But I encourage others to look online for similar – this is one surrogate model.
Q2) Can I reproduce the results from H2O?
A2) Yes. You can download the scoring practice, it will generate the code and environment to replicate this, see all the models, the data generated, and you can run that script locally yourself – it’s mainly Python.
Q3) That’s stuff is insane – probably very dangerous in the hands of someone just learning about machine learning! I’d be tempted to throw data in… What’s the feedback that helps you learn?
A3) There is a lot of feedback and also a lot of warning – so if test data doesn’t look enough like training data for instance. But the software itself is not educational on it’s own – you’d need to see webinars, look at online materials but then you should be in a good position to learn what it is doing and how.
Q4) You talked about feature selection and feature engineering. How robust is that?
A4) It is all based on hypothesis testing. But you can’t test everything without huge compute power. But we have a genetic algorithm to generate combinations of features, tests them, and then tries something else if that isn’t working.
Q5) Can you output as a model as eg a deserialised JSON object? Or use as an API?
A5) We have various outputs but not JSON. Best to look on the website as we have various ways to do these things.
This next session showcases innovation in startups.
Matt Jewell, R&D Engineer, Amiqus
I’m an R&D Engineer at Amiqus, and also a PhD student in Law at Edinburgh University. Firstly I want to talk about Amiqus, and our mission is to make civil justice accessible to the world. And we are engaged in GDPR as a data controller, but also as a trust and identity provider – where GDPR is an opportunity for us. We created amiqusID to enable people to more easily interact with the law – with data from companies house, driving licenses, etc.
As a PhD student in law there is some overlap in my job and my PhD research, and I was asked about in data ethics. So I wanted to note GDOR Article 22 (3) which states that
“the data controller shall implement suitable measures to safeguard the data subject’s rights and frredoms and legitimate interests, at least the right to obtain human intervention on he part of the controller, to express his or her point of view and to the contest the decision.”
And that’s across the board. GDPR recommits us to privacy, but also embeds privacy as a public good. And we have to think about what that means in our own best practices, because our own practices will shape what happens – especially as GDPR is still quite uncertain, still untested in law.
Carlos Labra, CEO & Co-Founder, Particle Analytics
I come from a mechanical engineering background, so this work is about simulation. And specifically we look at fluids simulation in aircraft. Actually particle simulation is the next step in industry, and that’s because it has been incredibly difficult to do this simulation with computers. We can do basic computer models for large scale materials but not appropriate for particles. So in Particle Analytics we are trying to address this challenge.
So, a single simulation for a silo, and my model for a silo, has to calculate the interactions between every single particle (in the order of millions), in very small time intervals. That takes huge computing power. So for instance one of our clients, Astec, works on asphalt dryer/mixer technology and we are using particle analytics to enable them to establish and achieve new energy-based KPIs (Key Performance Indicators) that could make enormous savings per machine per year, purely by optimising to different analytics.
So we look at spatial/temporal filters, multiscale analysis, and reduce data size/noise. The Data operators generate new insights and KPIs. So the cost of simulation is going down, and the insights are increased.
Steven Revill, CEO & Co-Founder, Urbantide
I’m here to talk to you about our platform USmart which is making smart data. How do we do this? Well, when we started a few years ago we recognised that our businesses, organisations, and places, would be helped by artificial intelligence based on data. That requires increased collaboration around data and increasing reuse of data. Too often data is in silos, and we need to break it out and share it. But we also need to be looking at real time data from IoT devices.
So, our solution is USmart. It collects data from any source in real time, and we create value with automatic data pipelines with analytics, visualisation and AI ready. And that enables collaboration – either with partners in a closed way, or as open data.
So, I want to talk about some case studies. Firstly Smartline, which is taking housing data to identify people at risk of, or in, fuel poverty. We have 80m data points so far, and we expect to reach up to 700m+ soon. This data set is open and when it goes live we think it will be the biggest open data set in the UK.
Cycling Scotland is showing the true state of cycling, helping them to make their case for funding and gain insght.
And we are working with North Lanarkshire Council on business rates, which could lead to saving of £18k per annum, but can also identify incorrect rates of £!00k+ value.
If you want to find out more do come and talk to me, take a look at USmart, and join the USmart community.
Martina Pugliese, Data Science Lead, Mallzee
I am data science lead for Mallzee – proudly established and run from Edinburgh. Mallzee is an app for clothes, allowing you to like or dislike a product. We show you 150+ brands. We’ve had 1.4m downloads, 500m ratings on products, 3m products rated. The app allows you to explore products, but it also acts as a data collection method for us and for our B2B offering to retailers. So we allow you to product test, very swiftly, your products before they hit the market.
Why do this? Well there are challenges that are two sides of the same coin: Overstock where you have to discount and waste money; and Understock where you have too little of the best stock and that means you don’t have tine to make the best return on your products.
As well as gathering data, we also monitor the market for trends in pricing, discounting, something new happening… So for instance only 50.8% of new products last quarter were sold at full price. We work to help design, buying and merchandising teams improve this rate by 6-10% through customer feedback.
So, data is our backbone. For the consumer we enable discovery, we personalise the tool to you – it should save you time and money. At the same time the data also enables performance prediction. We have granular user segmentation. And it goes back to you – the best products go on the market. And long term that should have a positive environmental impact in reducing waste.
Maggie Philbin: Thank you. I’m going to ask you to feedback on each others ideas and work.
Carlos: I’m new to the data science world, so for me I need to learn more – and these presentations are so useful for that.
Martina: This is really useful for me, and great to see that lots of different things going on.
Matt: My work focuses on smart cities, so naturally interested in Steven’s presentation. Less keen on problematising the city.
Steven: Really interesting to discuss things backstage, but also exciting to hear Martina talking about how central data is for your business right now.
Maggie: And that is part of the wonderful things about being at Data Fest, that opportunity to learn from and hear from each other, to network and share.
We are back from lunch with a video on work in the Highlands and Islands using ambient technologies to predict likelihood of falls etc.
Transforming Sectors With Data-Enabled Innovation – Orsola De Marco, Head of Startups, Open Data Institute
I’m going to talk about transforming sectors with data. The ODI, founded by Tim Berners-Lee and Nigel Shadbolt, focuses on data and what data enables.We think about data as infrastructure. If you think of data as roads you see that the number of roads do not matter as much as how they are connected… In the context of data we need data that can be combined, that is structured for connection and combination. And we look at data through open data and open innovation. What the ODI’s work has in common is that open innovation is at the core. This is not just about innovating, but also about making your organisation more porous, bringing in the outside. And I love the phrase “if you are the smartest person in the room, then you are in the wrong room”, because so often innovation comes from collaboration and from the outside.
Open innovation has huge potential value. McKinsey in 2013 predicted $3-5 trillian impact of open data; Lateral Economics (2014) puts that at more like $20 tn.
When we talk about open innovation and collaboration, we can talk about the corporate-startup marriage. We used to see linear solution having good returns, but that is no longer the case. Problems are now much more complex, and startups are great at innovation, at thinking laterally, at finding new approaches. But corporates have scale, they have reach, and they have knowledge of their industries and markets. If you bring these two together, it’s clear you can bring a good opportunity to live.
As example I wanted to share here is Transport for London who wanted to release open data to enable startups and SMEs to use it. CityMapper is one of the best known of these tools built on the data. Last year, after several years of open data, they commissioned a Deloitte report (2017) that this release had generated huge savings for TfL.
Another example is Arup. Historically their innovation had been taking place in house. They embraced a more open approach, and worked with two of our start ups Macedon C and Smart Sensors. Macedon C helped Arup explore airport data so that Arup didn’t need to do that processing. Smart Sensors installed 200 IoT sensors, sharing approaches to those sensors, what it means to implement IoT in buildings, how they could use this technology. And they rolled them out to some of their services.
Those are some examples. We’ve worked with 120 startups across the world. And they have generated over £37.2M in sales and investment. These are real businesses bringing real value – not just a guy in a shed. The major challenge is on the supply side of the data. A lot of companies are reluctant to share, mentioning three blockers: (1) it feels very risky to open data up – that issue feels highly relevant this week; (2) its expensive to do especially if you don’t know the value coming back; (3) perceived lack of data literacy and skills. Those are all important… But if you lead and innovate, you get to set the tone for innovation in your sector.
The idea of disruption is raised a lot, but it is real. But to actually disrupt you do really need a culture of open innovation is essential to lead. It needs to be brought in at senior level and brought into the sector.
Data infrastructure can transform sectors. And joining forces between data suppliers and users are important there. For instance we are working on a project called Open Active, with Sport England. A lack of information on what was going on in different areas was an issue for people getting active. We were involved at the outset and could see that data was the blocker here… If you tried to aggregate information it was impossible. So, in the first year of the programme we brought providers into the room, agreed an open standard, and that enabled aggregation of data. We are now in the second phase and, now that the data is consistent and available, we are bringing start ups in to engage and do things with that data. And those start ups aren’t all in sports, some are in healthcare sector – using sports data to augment information shared by medics. And from leisure companies helping individuals to find things to do with their spare time.
Another example is the Open Banking sector. Over 60% of UK banking customers haven’t changed their bank account in 5 years. And many of those haven’t changed them in 20 years. So this initiative enables customers to grant secure access to your banking details for e.g. mortgage lenders, or to enable marketplaces to offer energy switching companies. Our experience in this programme was to facilitate these banks, and took that experience of data portability… And now we are working with Mexico on a FinTech law that requires all banks to have an open API.
In order to innovate in sectors it’s important to widen access to data. This doesn’t mean not taking data privacy seriously, or losing competitive advantage.
And I wanted to highlight a very local programme. Last year we began a project in the peer to peer accommodation market. The Scottish expert advisory panel noted that whilst a lot of data is generated, no real work is looking at the impact of the sharing economy in accommodation. That understanding will enable policy decisions tied to real concerns. We will be making recommendations on this very soon. If you are interested, do get in touch and be part of this.
Q1) You talked a lot about the value of data. How do you measure that economic value like that?
A1) We base value on sales and investment generated, and/or time or money saves in processes. It’s not an exact science but it looks for changes to the status quo.
Q2) What is the most important and valuable thing from your experience here?
A2) I think I’ll approach that answer in two ways. We do innovate work with data but we often facilitate conversations between data provider and start ups. For making data available we remove those blockers; for start ups it’s helping that facilitate those conversations, it’s helping them grow and develop and tailoring that support.
Q3) What next?
A3) Our model is a sector transformation model. We talk to a sector about sharing and opening up, and then we have start ups in an accelerator so that data will find a use. That’s a huge difference from just publishing the data and wondering what will happen to it.
Designing Things with Spending Power – Chris Speed, Chair of Design Informatics, University of Edinburgh
I have a fantastic team of designers and developers, and brilliant students who ask questions, including what things will be like in Tomorrow’s World! We look at all kinds of factors here around data. So I want to credit that team.
Many of you in the room will be aware that data is about value constellations, rather than value chains. These are complex markets, many players – which may be humans but also which may be bots. That changes our capacity to construct value, since we have agents that construct value. And so I will talk about four objects to look at the disruption that can be made, and what that might mean, especially as they gain agency, to gain power. One of the things we thought was, what happens when we give things spending power.
See diagram from Rand organisation comparing centralised with decentralised and distributed – we see this model again and again… But things drift back occasionally (there’s only one internet banking platform now, right?). I’m going to show this 2014 bitcoin blockchain transaction video – they move too fast to screengrab these days! So… what happens when we have distributed machines with spending power? And when transactions go down to absolutely tiny transactions and amount of money.
So, we run BlockExchange workshops, with lego, to work on the idea of blockchain, what it means to be a distributed transaction system.
Next we have the fun stuff… What happens when we have things like Ethereum… And smart contracts. What could you do with digital wallets. If the UN gives someone a digital password, do they need sovereignty. So, we undertake bodily experiments with this stuff. We ran a physical experiment – body storming – with bitcoin wallets and smart contracts… A bit like Pokemon Go but with cash – if you hit a hotspot the smart contract assigns you money, Or when you enter a sink, you lose bitcoin. So, here is video of our GeoCoin app and also an experiment running in Tel Aviv.
These three banking volunteers design to design a new type of cinema experience… They enter the cinema by watching two trailers that are pickupable in the street… Another colleague decides not to do this… They gain credit by tweeting about trailers… bodystorming allows new ideas to be developed (confusingly, there is no cinema… This is, er, a cinema of the mind – right Chris?).
Next we have a machine with a bitcoin wallet. Programmable money allows us to give machines buying power… Blockchain changes the history to things, adding value to value… So, we set up a coffee machine Bitbarista, with an interface that asks the coffee drinker to make decisions about what kind of coffee they want, what values matter… Mediating the space between values and value.
We have hairdryers – these are new and have just gone to the Policy Unit this week. We have Gigbliss Plus hairdryer… That allows you to buy and trade energy and to dry your hair when energy is cheaper… What happens when you do involve the public in balancing energu. And we have another hairdryer… That asks whether you want unethical energy now, or whether you want to wait for an ethical source – the hairdryer switches on accordingly. And then we have Gigbliss Auto, which has no buttons. You don’t have control, only the bitcoin wallet has decision powers… You don’t know when it comes on… But it will. But it changes control. Of those three hairdryers, which are we happy to move to… Where do we feel happy here.
And then we have KASH cups, with chips in them. You can only but coffee when you put two cups down. So you get credit, through the cups digital wallet, to encourage network and development. You don’t have to get copy – you can build up credit. We had free coffee in the other room… But we had a very fancy barista for the KASH cups, and people queued for this for 20 minutes – coffee with social value.
Questions for us… We give machines agency, and credit… What does that mean for value and how we balance value.
Maggie: It’s at this point I wish Tomorrow’s World still existed!
Q1) where is this fascinating work taking you?
A1) I think this week has been so disruptive in terms of data and technologies disruption of social, civic, political values. I think understanding that we can’t balance value, or fair trade, etc. on our own is helpful and I’m really excited by what bots can offer here…
Q2) I was fascinated by the hairdryers… I’ve been in the National Grid’s secret control room and seeing that, that thing of Eastenders finishes and we make a cup of tea means bringing a whole power station on board… But waiting 10 minutes might avoid that need. It’s not trivial it’s huge.
A2) Yes, and I think understanding how that waiting, or understanding consequences of actions would have a real impact. The British public are pretty conscious and ethical I think, when they have that understanding…
Q3) Have you thought about avoiding queues with blockchain?
A3) We don’t want to just play incentives to get people out of queues. People are there for different reasons, different values, some people enjoy the sociability of a queue… Any chance to open it up, smash it up, and offer the opportunity to co-construct is great. But we need to do that with people not just algorithms.
Maggie: At this point I should be introducing Cathy O’Neil, but she has been snowed in by 15 inches of snow on the East Coast of the US. So, she will come over at a later date and you’ll all be invited. So, in place of that we have a panel on the elephant in the room, the Facebook and Cambridge Analytica scandal, with a panel on data and ethics.
Panel session: The Elephant in the Room: What Next? – Jonathan Forbes (JF), CTO, Merkle Aquila (chair); Brian Hills (BH), Head of Data, The Data Lab; Mark Logan (ML), Former COO Skyscanner, Investor and Advisor to startups and scale ups; Mhairi Aitken (MA), Research Fellow, University of Edinburgh.
JF: So, thinking of that elephant in the room.. That election issue… That data use. I want to know what Facebook could have done better?
ML: It has taken them a long time to respond, which seems strange… But I see it as a positive really. They see this as a much bigger issue rather than the transactional elements here. In that room you look at risk and you look at outrage. I think Facebook were trying to figure out why outrage was so high, I think that’s what has surprised them. I think they took time to think about what was happening to them. I don’t think it’s just about electing a game show host to president… The outrage is different. Cambridge Analytica is a bad actor, not just on data but on their advocacy for other problematic tactics. Facebook shouldn’t be bundled into that. I think aspects here is that you have a monopoly. Facebook is an advertising company – they need to generate data and pass it onto app developers. Those two things don’t totally aligned. And I think the outrage is about trust and expectation of users.
JF: You are closest to the public in your research. The share price is dropping significantly right now… How, based on past experience, do you see this playing out.
MS: I’m used to talking to people about public sector use of data. Often people talk about Facebook data and make two points: firstly that they contribute their own data and control that and know how it’s used; but they also have very high expectations of use for public sector organisations and don’t have that for private sector organisations – they think someone will generate ads and profit but when used in politics that’s very different, and that changes expectations.
JF: I enjoyed your comment about the social license… and I think this may be a sign that the license is being withdrawn. The GDPR legislation certainly changes some things there. I was interested to see Tim Berners Lee’s response, taking Mark Zuckerberg’s perspective… I was wondering, Brian, about the commercial pressures and the public pressures here. Are they balancing that well?
BH: No. When we look back I think this will be a pivotal moment. I kind of feel like GDPR piece is like being in a medieval torture chamber… We have a countdown but the public don’t know much about it. With Facebook it’s like we have a firework in the sky and people are asking what on earth is going on… And we have an opportunity to have a discussion about the use of data. As we leave today we have a challenge around communicate our work with data, what are our responsibilities here. The big data thing, many business cases seem like we’ve failed – we’ve focused on the technology and only that. And I feel we now have an opportunity and a window here.
JF: I’d like to take the temperature of the room… How many of you had Facebook on their phone, and don’t this week? None.
ML: I think that’s the point. The idea of not doing to others data what you wouldn’t want done to your own… But the reality is that legislation is playing catch up to practice. Commercially it’s hard to do the right thing. I think Mark Zuckerberg has reasonably good intentions here… But we have this monopoly… The parallel here is banking. And monopoly legislation hasn’t kept pace with the monopolies we have. I think it would be great if you could export your data, friends data, etc. to another platform. But we can’t.
Comment: I think you asked the wrong question… Who here doesn’t Facebook on their phone at all. Actually quite a lot. I think actually we have that sense that power corrupts and absolute power corrupts absolutely. And I don’t feel I’m missing out, I’m sure others feel that too. And I’m unsurprised about Facebook, I could see where it was going.
JF: OK, so moving towards what we can do, should we have a code of conduct, a hypocratic oath to data, a “do no harm”.
BH: I don’t see ethics featuring in data models. I think we have to build that in. Cathy O’Neil talks about Weapons of Math Destruction… We have to educate our data science students how to use these tools ethically, to think about who they will work with. Cathy was a Quant and didn’t like that so she walked away. We have to educate our students about the choices they make. We talk about optimisation, optimisation of marketing. In optimising STEM stuff… And we are missing stuff… I think we need to move towards STEAM, where A is for Arts. We have to be inclusive for arts and humanities to work with these teams, to think about skills and diversity of skills.
JF: Particularly thinking about healthcare
MA: There is increasing drive to public engagement, to public response. That has to be much more at the heart of training for data scientists and how it relates to the society we want to create. There can be a sense of slowing momentum, but it’s fundamental to getting things right, and shaping directions of where we are going…
JF: Mark, you mentioned trust, and your organisation has been very focused on trust.
ML: These multifacet networks are built on trust. For Skyscanner trust was so much more important than favouring particular clients. I think Facebook’s error has been to not be more transparent in what they do. We have had comments about machine learning as hype, but actually machine learning is about machines learning to do something without humans. We are moving to a place where decisions will be made by machines. We have to govern that, and to police machines with other machines. And we have to have algorithms to ensure that machine learning is appropriate and ethical.
JF: I agree. It was interesting to me that Weapons of Math Destruction is the top seller in algorithms and programme – a machine generated category – but that is reassuring that those working in this space are reading about this. By show of hands how many here working in data science are thinking about ethics. Some are. But unclear who isn’t working with data, or who isn’t working ethical. So, to finish I want your one takeaway for this week.
BH: I think it’s up to us to decide how to do things differently, and to make the change here. If we are true data warriors driving societal benefit then we have to make that change ourselves.
ML: We do plenty to mess up the planet. I think machine learning can help us sort out the problems we’ve created for ourselves.
MA: I think its been a wonderful event, particularly the variety and creativity being shared. And I’m really pleased to open up these conversations and look at these issues.
JF: I’m optimistic too. But don’t underestimate the ability of a small group of committed people to change the world. So, Data Warriors, all of you… You know what to do!
Maggie: Thank you all for your conversation, your enthusiasm. One message I really want to give you is that when you look at the use of data, the capacity to do good… The vast majority of young people are oblivious. They could miss out on an amazing career. But as the world changes, they could miss out on a decent career without these skills. Don’t underestimate your ability as one person with knowledge of that area to make a difference, to influence and to inspire. A few years back, in Greenock, we ran an event with Teen Tech and the support of local tech companies made all the difference… One team went to the finals in London, won and went to Silicon Valley… And that had enormous impact on that school and community, and now all S2 students do that programme, local companies come in for a Dragon’s Den type set up. Any moment that you can inspire and support those kids will make all the difference in those lives, and can make all the difference, especially if family, parents, community don’t know about data and tech.
Closing Comments – Gillian Docherty, CEO, The Data Lab
Firstly thank you to Maggie for being an amazing host!
I have a few thank yous to make. It has been an outstanding week. Thank you all for participating in this event. This has been just one event of fifty. We’ve had another 3000 data warriors, on top of you 450 data warriors for Data Summit. Thank you to our amazing speakers, and exhibitors. The buzz has been going throughout the event. Thank you to our sponsors, and to Scottish Government and Scottish Enterprise. Thank you to our amazing volunteers, to Grayling who has been working with the press. To our venue, events team and caterers. Our designer from two fifths design. And the team at FutureX who helped us organise Data Talent and Data Summit – absolutely outstanding job! Well done!
And two final thank yous. Firstly the amazing Data Lab team. We have thousands of new people being trained, huge numbers of projects. I also want to specifically mention Craig Skelton who coordinated our Fringe events; Cecilia who runs our marketing team; and Fraser and John who were behind this week!
My final thank you is to all of you, including the teams across Scotland participating. It is a fantastic time to be working in Scotland! Now take that enthusiasm home with you!