Oct 072016
 

PS-15: Divides (Chair: Christoph Lutz)

The Empowered Refugee: The Smartphone as a Tool of Resistance on the Journey to Europe – Katja Kaufmann

For those of you from other continents we had a great deal of refugees coming to Europe last year, from Turkey, Syria, etc. who were travelling to Germany, Sweden, and Vienna – where I am from – was also a hub. Some of these refugees had smartphones and that was covered in the (right wing) press about this, criticising this group’s ownership of devices but it was not clear how many had smartphones, how they were being used and that’s what I wanted to look at.

So we undertook interviews with refugees to see if they used them, how they used them. We were researching empowerment by mobile phones, following Svensson and Wamala Larsson (2015) on the role of the mobile phone in transforming capacilities of users. Also with reference to N. Kabeer (1999), A. Sen (1999) etc. on meanings of empowerment in these contexts. Smith, Spend and Rashid (2011) describe mobiles and their networs altering users capability sets, and about phone increasing access to flows of information (Castell 2012).

So, I wanted to identify how smartphones were empowering refugees through: gaining an advantage in knowledge by the experiences of other refugees; sensory information; cross-checking information; and capabilities to opposse actions of others.

In terms of an advantage in knowledge refugees described gaining knowledge from previous refugees on reports, routes, maps, administrative processes, warnings, etc. This was through social networks and Facebook groups in particular. So, a male refugee (age 22) described which people smugglers cannot be trusted, and which can. And another (same age) felt that smart phones were essential to being able to get to Europe – because you find information, plan, check, etc.

So, there was retrospective knowledge here, but also engagement with others during their refugee experience and with those ahead on their journey. This was mainly in WhatsApp. So a male refugee (aged 24) described being in Macedonia and speaking to refugees in Serbia, finding out the situation. This was particularly important last year when approaches were changes, border access changed on an hour by hour basis.

In terms of Applying Sensory Abilities, this was particularly manifested in identifying own GPS position – whilst crossing the Aegean or woods. Finding the road with their GPS, or identifying routes and maps. They also used GPS to find other refugees – friends, family members… Using location based services was also very important as they could share data elsewhere – sending GPS location to family members in Sweden for instance.

In terms of Cross-checking information and actions, refugees were able to track routes whilst in the hand of smugglers. A male Syrian refugee (aged 30) checked information every day whilst with people smugglers, to make sure that they were being taken in the right direction – he wanted to head west. But it wasn’t just routes, it was also weather condiions, also rumous, and cross-checking weather conditions before entering a boat. A female Syrian refugee downloaded an app to check conditions and ensure her smuggler was honest and her trip would be safer.

In terms of opposing actions of others, this was about being capable of opposing actions of others – orders of authorities, potential acts of (police) violence, risks, fraud attempts, etc. Also disobedience by knowledge – the Greek government gave orders about the borders, but smartphones allowed annotated map sharing that allowed orders to be disobeyed. And access to timely information – exchange rates for example – a refugee described negotiating price of changing money down by Google searching for this. And opposition was also about a means to apply pressure – threatening with or publishing photos. A male refugee (aged 25) described holding up phones to threaten to document policy violence, and that was impactful. Also some refugees took pictures of people smugglers as a form of personal protection and information exchange, particularly with publication of images as a threat held in case of mistreatment.

So, in summary the smartphones

Q&A

Q1) Did you have any examples of privacy concerns in your interviews, or was this a concern for later perhaps?

A1) Some mentioned this, some felt some apps and spaces are more scrutinised than others. There was concern that others may have been identified through Facebook – a feeling rather than proof. One said that they do not send their parents any pictures in case she was mistaken by Syrian government as a fighter. But mostly privacy wasn’t an immediate concern, access to information was – and it was very succesful.

Q2) I saw two women in the data here, were there gender differences?

A2) We tried to get more women but there were difficulties there. On the journey they were using smartphones in similar ways – but I did talk to them and they described differences in use before their journey and talked about picture taking and sharing, the hijab effect, etc.

Social media, participation, peer pressure, and the European refugee crisis: a force awakens? – Nils Gustafsson, Lund university, Sweden

My paper is about receiving/host nations. Sweden took in 160,000 refugees during the crisis in 2015. I wanted to look at this as it was a strange time to live in. A lot of people started coming in late summer and early autumn… Numbers were rising. At first response was quite enthusiastic and welcoming in host populations in Germany, Austria, Sweden. But as it became more difficult to cope with larger groups of people, there were changes and organising to address challenge.

And the organisation will remind you of Alexander (??) on the “logic of collective action” – where groups organise around shared ideas that can be joined, ideas, almost a brand, e.g. “refugees welcome”. And there were strange collaborations between government, NGOs, and then these ad hoc networks. But there was also a boom and bust aspect here… In Sweden there were statements about opening hearts, of not shutting borders… But people kept coming through autumn and winter… By December Denmark, Sweden, etc. did a 180 degree turn, closing borders. There were border controls between Denmark and Sweden for the first time in 60 years. And that shift had popular support. And I was intrigued about this. And this work is all part of a longer 3 year project on young people in Sweden and their political engagement – how they choose to engage, how they respond to each other. We draw on Bennett & Segerberg (2013), social participation, social psychology, and the notion of “latent participation” – where people are waiting to engage so just need asking to mobilise.

So, this is work in progress and I don’t know where it will go… But I’ll share what I have so far. And I tried to focus on recruitment – I am interested in when young people are recruited into action by their peers. I am interested in peer pressure here – friends encouraging behaviours, particularly important given that we develop values as young people that have lasting impacts. But also information sharing through young people’s networks…

So, as part of the larger project, we have a survey, so we added some specific questions about the refugee crisis to that. So we asked, “you remember the refugee crisis, did you discuss it with your friends?” – 93.5% had, and this was not surprising as it is a major issue. When we asked if they had discussed it on social media it was around 33.3% – much lower perhaps due to controversy of subject matter, but this number was also similar to those in the 16-25 year old age group.

We also asked whether they did “work” around the refugee crisis – volunteering or work for NGOs, traditional organisations. Around 13.8% had. We also asked about work with non-traditional organisations and 26% said that they had (and in 16-25% age group, it was 29.6%), which seems high – but we have nothing to compare this too.

Colleagues and I looked at Facebook refugee groups in Sweden – those that were open – and I looked at and scraped these (n=67) and I coded these as being either set up as groups by NGOs, churches, mosques, traditional organisations, or whether they were networks… Looking across autumn and winter of 2015 the posts to these groups looked consistent across traditional groups, but there was a major spike from the networks around the crisis.

We have also been conducting interviews in Malmo, with 16-19 and 19-25 year olds. They commented on media coverage, and the degree to which the media influences them, even with social media. Many commented on volunteering at the central station, receiving refugees. Some felt it was inspiring to share stories, but others talked about their peers doing it as part of peer pressure, and critical commenting about “bragging” in Facebook posts. Then as the mood changed, the young people talked about going to the central station being less inviting, on fewer Facebook posts… about feeling that “maybe it’s ok then”. One of our participants was from a refugee background and ;;;***

Q&A

Q1) I think you should focus on where interest drops off – there is a real lack of research there. But on the discussion question, I wasn’t surprised that only 30% discussed the crisis there really.

A1) I wasn’t too surprised either here as people tend to be happier to let others engage in the discussion, and to stand back from posting on social media themselves on these sorts of issues.

Q2) I am from Finland, and we also helped in the crisis, but I am intrigued at the degree of public turnaround as it hasn’t shifted like that in Finland.

A2) Yeah, I don’t know… The middleground changed. Maybe something Swedish about it… But also perhaps to do with the numbers…

Q2) I wonder… There was already a strong anti-immigrant movement from 2008, I wonder if it didn’t shift in the same way.

A2) Yes, I think that probably is fair, but I think how the Finnish media treated the crisis would also have played a role here too.

An interrupted history of digital divides – Bianca Christin Reisdorf, Whisnu Triwibowo, Michael Nelson, William Dutton, Michigan State University, United States of America

I am going to switch gears a bit with some more theoretical work. We have been researching internet use and how it changes over time – from a period where there was very little knowledge of or use of the internet to the present day. And I’ll give some background than talk about survey data – but that is an issue of itself… I’ll be talking about quantitative survey data as it’s hard to find systematic collection of qualitative research instruments that I could use in my work.

So we have been asking about internet use for over 20 years… And right now I have data from Michigan, the UK, and the US… I have also just received further data from South Africa (this week!).

When we think about Digital Inequality the idea of the digital divide emerged in the late 1990s – there was government interest, data collection, academic work. This was largely about the haves vs. have-nots; on vs. off. And we saw a move to digital inequalities (Hargittai) in the early 2000s… Then it went quite aside from work from Neil Selwyn in the UK, from Helsper and Livingstone… But the discussion has moved onto skills…

Policy wise we have also seen a shift… Lots of policies around digital divide up to around 2002, then a real pause as there was an assumption that problems would be solved. Then, in the US at least, Obama refocused on that divide from 2009.

So, I have been looking at data from questionnaires from Michigan State of the State Survey (1997-2016); questionnaires from digital future survey in the US (2000, 2002, 2003, 2014); questionnaires from the Oxford Internet Surveys in the UK (2003, 2005, 2007, 2009, 2013); Hungarian World Internet Project (2009); South African World Internet Project (2012).

Across these data sets we have looked at questionnaires and frequency of use of particular questions here on use, on lack of use, etc. When internet penetration was less high there was a lot of explanation in questions, but we have shifted away from that, so that we assume that people understand that… And we’ve never returned to that. We’ve shifted to devices questions, but we don’t ask other than that. We asked about number of hours online… But that increasingly made less sense, we do that less as it is essentially “all day” – shifting to how frequently they go online though.

Now the State of the State Survey in Michigan is different from the other data here – all the others are World Internet Project surveys but SOSS is not looking at the same areas as not interent researchers neccassarily. In Hungary (2009 data) similar patterns of question use emerged, but particular focus on mobile use. But the South African questionnaire was very different – they ask how many people in the household is using the internet – we ask about the individual but not others in the house, or others coming to the house. South Africa has around 40% penetration of internet connection (at least in 2012 when we have data here), that is a very different context. There they ask for lack of access and use, and the reasons for that. We ask about use/non-use rather than reasons.

So there is this gap in the literature, there is a need for quantitative and qualitative methods here. We also need to understand that we need to consider other factors here, particularly technology itself being a moving target – in South Africa they ask about internet use and also Facebook – people don’t always identify Facebook as internet use. Indeed so many devices are connected – maybe we need

Q&A

Q1) I have a question about the questionnaires – do any ask about costs? I was in Peru and lack of connections, but phones often offer free WhatsApp and free Pokemon Go.

A1) Only the South African one asks that… It’s a great question though…

Q2) You can get Pew questionnaires and also Ofcom questionnaires from their website. And you can contact the World Internet Project directly… And there is an issue with people not knowing if they are on the internet or not – increasingly you ask a battery of questions… and then filtering on that – e.g. if you use email you get counted as an internet user.

A2) I have done that… Trying to locate those questionnaires isn’t always proving that straightforward.

Q3) In terms of instruments – maybe there is a need to developmore nuanced questionnaires there.

A3) Yes.

Levelling the socio-economic playing field with the Internet? A case study in how (not) to help disadvantaged young people thrive online – Huw Crighton Davies, Rebecca Eynon, Sarah Wilkin, Oxford Internet Institute, United Kingdom

This is about a scheme called the “Home Access Scheme” and I’m going to talk about why we could not make it work. The origins here was a city council’s initiative – they came to us. DCLG (2016) data showed 20-30% of the population were below the poverty line, and we new around 7-8% locally had no internet access (known through survey responses). And the players here were researchers, local government, schools, and also an (unnamed) ISP.

The aim of the scheme was to raise attainment in GCSEs, to build confidence, and to improve employability skills. The Schools had a responsibility to identify students in need at school, to procure laptops, memory sticks and software, provide regular, structured in-school pastoral skills and opportunities – not just in computing class. The ISP was to provide set up help, technical support, free internet connections for 2 years.

This scheme has been running two years, so where are we? Well we’ve had successes: preventing arguments and conflict; helped with schoolwork, job hunting; saved money; and improved access to essential services – this is partly as cost cutting by local authorities have moved transactions online like bidding for council housing, repeat prescription etc. There was also some intergenerational bonding as families shared interests. Families commented on the success and opportunities.

We did 25 interiews, 84 1-1 sessions in schools, 3 group workshops, 17 ethnographic visits, plus many more informal meet ups. So we have lots of data about these families, their context, their lives. But…

Only three families had consistent internet access throughout. Only 8 families are still in the programme. It fell apart… Why?

Some schools were so nervous about use that they filtered and locked down their laptops. One school used the scheme money to buy teacher laptops, gave students old laptops instead. Technical support was low priority. Lead teachers left/delegated/didn’t answer emails. Very narrow use of digital technology. No in-house skills training. Very little cross-curriculum integration. Lack of ICT classes after year 11. And no matter how often we asked about it we got no data from schools.

The ISP didn’t set up collections, didn’t support the families, didn’t do what they had agreed to. They tried to bill families and one was threatened with debt collectors!

So, how did this happen? Well maybe these are neoliberalist currents? I use that term cautiously but… We can offer an emergent definition of neoliberalism from this experience.

There is a neoliberalist disfigurement of schools: teachers under intense pressue to meet auditable targets; the scheme’s students subject to a range of targets used to problematise a school’s performance – exclusions, attendance, C grades; the scheme shuffled down priorities; ICT not deemed academic enough under Govian school changes; and learning is stribbed back to narrow range of subjects and focus towards these targets.

There were effects of neoliberalism on the city council: targets and “more for less” culture; scheme disincentivised; erosion of authority of democratic institutional councils – schools beyond authority controls, and high turn over of staff.

There were neoliberalist practices at the ISP: commodifying philanthropy; couldn’t not treat families as customers. And there were dysfunctional mini-markets: they subcontracted delivery and set up; they subcontracted support; they charged for support and charged for internet even if they couldn’t help…

Q&A

Q1) Is the problem digital divides but divides… Any attempt to overcome class separation and marketisation is working against the attempts to fix this issue here.

A1) We have a paper coming and yes, there were big issues here for policy and a need to be holistic… We found parents unable to attend parents evening due to shift work, and nothing in the school processes to accommodate this. And the measure of poverty for children is “free school meals” but many do not want to apply as it is stigmatising, and many don’t qualify even on very low incomes… That leads to children and parents being labelled disengaged or problematic

Q2) Isn’t the whole basis of this work neoliberal though?]

A2) I agree. We didn’t set the terms of this work..

Panel Q&A

Q1/comment) RSE and access

A1 – Huw) Other companies the same

Q2) Did the refugees in your work Katja have access to Sim cards and internet?

A2 – Katja) It was a challenge. Most downloaded maps and resources… And actually they preferred Apple to Android as the GPS is more accurate without an internet connection – that makes a big difference in the Aegean sea for instance. So refugees shared sim cards, used power banks for the energy.

Q3) I had a sort of reflection on Nils’ paper and where to take this next… It occurs to me that you have quite a few different arguements… You have this survey data, the interviews, and then a different sort of participation from the Facebook groups… I have students in Berlin here looking at the boom and bust – and I wondered about that Facebook group work being worth connecting up to that type of work – it seems quite separate to the youth participation section.

A3 – Nils) I wasn’t planning on talking about that, but yes.

Comment) I think there is a really interesting aspect of these campaigns and how they become part of social media and the everyday life online… The way they are becoming engaged… And the latent participation there…

Q3) I can totally see that, though challenging to cover in one article.

Q4) I think it might be interesting to talk to the people who created the surveys to understand motivations…

A4) Absolutely, that is one of the reasons I am so keen to hear about other surveys.

Q5) You said you were struggling to find qualitative data?

A5 – Katja) You can usually download quantitative instruments, but that is harder for qualitative instruments including questions and interview guides…

XP-02: Carnival of Privacy and Security Delights – Jason Edward Archer, Nathanael Edward Bassett, Peter Snyder, University of Illinois at Chicago, United States of America

Note: I’m not quite sure how to write up this session… So these are some notes from the more presentation parts of the session and I’ll add further thoughts and notes later… 

Nathanial: We have prepared three interventions for you today and this is going to be kind of a gallery exploring space. And we are experimenting with wearables…

Fitbits on a Hamster Wheel and Other Oddities, oh my!

Nathanial: I have been wearing a FitBit this week… but these aren’t new ideas… People used to have beads for counting, there are self-training books for wrestling published in the 16th Century. Pedometers were conceived of in Leonardo di Vinci’s drawings… These devices are old, and tie into ideas of posture, and mastering control of physical selves… And we see the pedometer being connected with regimes of fitness – like the Manpo-Meter (“10,000 steps meter) (1965). This narrative takes us to the 1970s running boom and the idea of recreational discipline. And now the world of smart devices… Wearables are taking us to biometric analysis as a mental model (Neff – preprint).

So, these are ways to track, but what happens with insurance companies, with those monitoring you. At Oriel Roberts university students have to track their fitness as part of their role as students. What does that mean? I encourage you all to check out “unfitbit” – interventions to undermine tracking. Or we could, rather than going to the gym with a FitBit, give it to Terry Crews – he’s going anyway! – and he could earn money… Are fitness slaves in our future?

So, use my FitBit – it’s on my account

And so, that’s the first part of our session…

?: Now, you might like to hear about the challenges of running this session… We had to think about how to make things uncomfortable… But then how do you get people to take part… We considered a man-in-the-middle site that was ethically far too problematic! And no-one was comfortable participating in that way… Certainly raising the privacy and security issue… But as we talk of data as a proxy for us… As internet researchers a lot of us are more aware of privacy and security issues than the general population, particularly around metadata. But this would have been one day… I was curious if people might have faked your data for that one day capture…

Nathanial: And the other issue is why we are so much more comfortable sharing information with FitBit, and other sharing platforms, faceless entities versus people you meet at a conference… And we didn’t think about a gender aspect here… We are three white guys here and we are less sensitive to that being publicised rather than privatised. Men talk about how much they can benchpress… but personal metadata can make you feel under scrutiny

Me: I wouldn’t want to share my data and personal data collection tools…

Borrowing laptop vs borrowing phone…

?: In the US there have been a few cases where FitBits have been submitted as evidence in court… But that data is easier to fake… In one case a woman claimed to have been raped, and they used her FitBit to suggest that

Nathanial: You talked about not being comfortable handing someone your phone… It is really this blackbox… Is it a wearable? It has all that stuff, but you wear it on your body…

??: On cellphones there is FOMO – Fear Of Missing Out… What you might mix…

Me: Device as security

Comment: Ableism embedded in devices… I am a cancer survivor and I first used step counts as part of a research project on chemotherapy and activity… When I see a low step day on my phone now… I can feel this stress of those triggers on someone going through that stress…

Nathanial: FitBit’s vibrate when you have/have not done a number of steps… Trying to put you in an ideological state apparatus…

Jh: That nudge… That can be good for able bodied… But if you can’t move that is a very different experience… How does that add to their stress load.

Interperspectival Goggles

Again looking at the condition of virtuality – Hayles 2006(?)

Vision is constructed… Thinking of higher resolution… From small phone to big phone… Lower resolution to higher resolution TV… We have spectacles, quizzing glasses and monocles… And there is the strange idea of training ourselves to see better (William Horation Bates, 1920s)… And emotional state interfering with how you do something… Rgeb we have optomitry and x-rays as a concept of seeing what could not be seen before… And you have special goggles and helmets… LIke the idea of the Image Accumulator in Videodrome (1985?), or the idea of the Memory recorder and playback device in Brainstorm (1983). We see embodied work stations – Da Vinci Surgery Robot (2000) – divorcing what is seen, from what is in front of them…

There are also playful ideas: binocular football; the Decelerator Helmet; Meta-perceptional Helmet (Cleary and Donnelly 2014); and most recently Google Glass – what is there and also extra layers… Finally we have Oculus Rift and VR devices – seeing something else entirely… We can divorce what we see from what we are perceiving… We want to swap people’s vision…

1. Raise awareness about the complexity of electronic privacy and security issues.

2. Identify potential gaps in the research agenda through playful interventions, subversions, and moments of the absurd.

3. Be weird, have fun!

Mathius

“Cell phones are tracking devices that make phonecalls” (Applebaum, 2012)

I am interested in IMSI catcher which masquerades as a wireless base station, prompting phones to communicate with it. They are used by police, law inforcement, etc. They can be small and handheld, or they can be drone mounted. And they can track people, people in crowds, etc. There is always a different way to use it – you can scan for people in crowds. So if you know someone is there you can scan for it in a different way. So, these tools are simple and disruptive and problematic, especially in activism contexts.

But these tools are also capable of caturing transmitted content, and all the data in your phone. These devices are problematic and have raised all sorts of issues about their use, who and how you use them. I’d like to think of this a different way… Is there a right to protest? And to protest anonymously? We do have anti-masking laws in some places – that suggests no right to anonymous protest. But that’s still a different privacy right – covering my face is different from participating at all…

Protests are generally about a minority persuading a majoruty about some sort of change. There is no legal rights to protest anonymously, but there are lots of protected anoymous spaces. So, in the 19th century there was big debate on whether or not the voting ballot should be anonymous – democracy is really the C19th killer app. So there is a lovely quote here about the “The Australian system” by Bernheim (1889) and the introduction of anonymous voting. It wasn’t brought in to preserve privacy. At the time politicians brought votes – buying a keg of beer or whatever – and anonymity was there to stop that, not to preserve individual privacy. But Jill LePore (2008) writes about how our forebears considered casting a “secret ballot” to be “cowardly, underhanded and dispicable”.

So, back to these devices… There can be an idea that “if you have nothing to fear, you have nothing to hide”, but many of us understand that it is not true. And this type of device silences uncomfortable discourse.

Mathias Klang, University of Massachusetts Boston

Q1) How do you think that these devices fit into the move to allow law inforcement to block/”switch off” the camera on protestors/individuals’ phones?

A1) Well people can resist these surveillance efforts, and you will see subversive moves. People can cover cameras, conceal devices etc. But with these devices it may be that the phone becomes unusable, requiring protestors to disable phones or leave phones at home… And phones are really popular and well used for coordinating protests

Bryce Newell, Tilburg Institute for Law, Technology, and Society

I have been working on research in Washington Stat, working with law enforcement on license plate recognition systems and public disclosure law. And looking at what you can tell. So, here is a map of license plate data from Seattle, showing vehicle activity. In Minneapolis similar data being released led to mapping of the governer’s registered vehicles..

The second area is about law enforcement and body cameras. Several years ago peaceful protestors at UC Davis were pepper sprayed. Even in the cropped version of that image you can see a vast number of phones out, recording the event. And indeed there are a range of police surveillance apps that allow you to capture police encounters without that being visible on the phone, including: ACLU Police Tape, Stop and Frisk Watch; OpenWatch; CopRecorder2. And some of these apps upload the recording to the cloud right away to ensure capture. And there have certainly been a number of incidents from Rodney King to Oscar Grant (BART), Eric Garner, Ian Tomlinson, Michael Brown. Of these only the Michael Brown case featured law enforcement with bodycams. There has been a huge call for more cameras on law enforcement… During a training meeting some officers told me “Where’s the direct-to-YouTube button?” and “If citizens can do it, why can’t we also benefit from the ability to record in public places?”. There is a real awareness of control and of citizen videos. I also heard a lot of there being “a witch hunt about to begin…”.

So, I’m in the middle of focused coding on police attitudes to body cameras. Police are concerned that citizen video is edited, out of context, distorting. And they are concerned that it doesn’t show wider contexts – when recording starts, perspective, the wider scene, the fact that provocation occurs before filming usually. But there is also the issue of control, and immediate physical interaction, framing, disclosure, visibility – around their own safety, around how visible they are on the web. They don’t know why it is being recorded, where it will go…

There have been a number of regulatory responses to this challenge: (1) restrict collection – not many, usually budgetary and rarely on privacy; (2) restrict access – going back to the Minneapolis case, within two weeks of the map of governer vehicles being published in the paper they had an exemption to public disclosure law which is now permanent for this sort of data. In the North Carolina protests recently the call was “release the tapes” – and they released only some – then the cry was “release all the tapes”… But on 1st October law changed to again restrict access to this type of data.

But different state provide different access. Some provide access. In Oakland, California, data was released on how many license plates had been scanned. In Seattle data on scans can, because the data for many scans of one licence plates over 90 days is quite specific, you can almost figure out the householder. But granularity varies.

Now, we do see body cameras of sobriety tests, foot chases, and a half hour long interview with prostitute that discloses a lot of data. Washington shares a lot of video to YouTube. We see that in Rotterdam, Netherlands police doing this too.

But one patrol office told me that he would never give his information to an officer with a camera. Another noted that police choose when to start recording with little guidance on when and how to do this.

And we see a “collatoreal visibility” issue for police around these technologies.

Q&A

Q1) Is there any process where police have to disclose that they are filming with a body cam?

A1) Interesting question… Initially they didn’t know. We used to have two party consent process – as for tapings – to ensure consent/implied consent. But the State attorney general described this as outside of that privacy regulation, saying that a conversation with a police officer is a public conversation. But police are starting to have policies that officers should disclose that they have cameras – partly as they hope and sometimes it may reduce violence to police.

Data Privacy in commercial users of municipal location data – Meg Young, University of Washington

My work looks at how companies use Seattle’s location data. I wanted to look at how data privacy is enacted by Seattle municipal government? And I am drawing on the work of Annemarie Mol and John Law (2004), an ethnographer working on health, that focuses on the lived experience. My data is drawing on ethnographic as as well as focus groups, interviews with municipal government and local civic technology communities. I really wanted to present the role of commercial actors in data privacy in city government.

We know that cities collect location data to provide services, and so share it for third parties to do so. In Washinton we have a state freedom of information (FOI) law, which states “The people of this state do not yield their sovereignty to the government…”, making data requestable.

In Seattle the traffic data is collected by a company called Acyclica. The city is growing and the infrastructure is struggling, so they are gathering data to deal with this, to shape traffic signals. This is a large scale longitudinal data collection process. Acyclica are doing that with wi-fi sensors sniff MAC addresses, the location traces sent to Acyclica (MAC salted). The data is aggregated and sent to the city – they don’t see the detailed creepy tracking, but the company does. And this is where the FOI law comes in. The raw data is on the company side here. If the raw data was a public record, it would be requestable. The company becomes a shield for collecting sensitive data – it is proprietizing.

So you can collect data, have service needs met, but without it becoming public to you and I. But analysing the contract the terms do not preclude the resale of data – though a Seattle Dept. of Transport (DOT) worker notes that right now people trust companies more than government. Now I did ask about this data collection – not approved elsewhere – and was told that having wifi settings on in public making you open to data collection – as it is in public space.

My next example is the data from parking meters/pay stations. This shows only the start, end, no credit card #s etc. The DOT is happy to make this available via public records requests. But you can track each individual, and they are using this data to model parking needs.

The third example is the Open Data Portal for Seattle. They pay Socrata to host that public-facing data portal. They also sell access to cleaned, aggregated data to companies through a separate API called the Open Data Network. The Seattle Open Data Manager didn’t see this situation as different from any other reseller. But there is little thought about third party data users – they rarely come up in converations – who may combine this data with other data sets for data analysis.

So, in summary, municipal government data is no less by and for commercial actors as it is the public. Proprietary protections around data are a strategy for protecting sensitive data. Government transfers data to third party

Q&A

Q1) Seattle has a wifi for all programme

A1) Promisingly this data isn’t being held side by side… But the routers that we connect to collect so much data… Seeing an Oracle database of the websites fokls

Q2) What are you policy recommendations based on your work?

A2) We would recommend licensing data with some restrictions on use, so that if the data is used inappropriately their use could be cut off…

Q2) So activists could be blocked by that recommendation?

A2) That is a tension… Activists are keen for no licensing here for that reason… It is challenging, particularly when data brokers can do problematic profiling…

Q2) But that restricts activists from questioning the state as well.

Response – Sandra Braman

I think that these presentations highlight many of the issues that raise questions about values we hold as key as humans. And I want to start from an aggressive position, thinking about how and why you might effectively be an activist in this sort of environment. And I want to say that any concerns about algorithmically driven processes should be evaluated in the same way as we would social process. So, for instance we need think about how the press and media interrogate data and politicians

? “Decoding the social” (coming soon) is looking at social data and analysis of social data in the context of big data. She argues that social life is too big and complex than predicatable data. Everything that people who use big data “do” to understand patterns, are things that activists can do too. We can be just as sophisticated as corporations.

The two things I am thinking about are how to mask the local, and how to use the local… When I talk of masking the local I look back to work I did several years back on local broadcasting. There is mammoth literature on TV as locale, and production and how that is separate, misrepresenting, and the assumptions versus the actual information provided vs actual decision making. My perception is that social activism is that there is some brilliant activity taking place – brilliance at moments, specific apps often. And I think that if you look at the essays that Julian Assange before he founded WikiLeaks, particularly n weak links and how those work… He uses sophisticated social theory in a political manner.

But anonymity is practicably impossible… What can we learn from local broadcast? You can use phones in organised ways – there was training for phone cameras for the Battle of Seattle for instance. You can fight with indistinguishable actions – all doing the same things. Encryption is cat and mouse… Often we have activists presenting themselves as mice, although we did see an app discussed at the plenary on apps to alert you to protest and risk. And I have written before on tactical memory.

In terms of using the local… If you know you will be sensed all the time, there are things you can do as an activist to use that. It is useful to think about how we can conceive of ourselves as activists as part of the network. And I was inspired by US libel laws – if a journalist has transmission/recording devices but are a neutral observer, you are not “repeating” the libel and can share that footage. That goes back to 1970s law, but that can be useful to us.

We are at risk of being censored, but that means that you have choices about what to share, being deliberate in giving signals. We have witnessing, which can be taken as a serious commitment. That can happen with people with phones, you can train witnessing. There are many moments were leakage can be an opportunity – maybe not with volume or content of Snowden, but we can do that. There are also ways to learn and shape learning. But we can also be routers, and be critically engaged in that – what we share, the acceptable error rate. National Security are concerned about where in the stream they should target the misinformation – activists can adopt that too. The server functions – see my strategic memory piece. We certainly have community-based wifi, MESH networks, and that is useful politically and socially. We have responsibilities to build the public that is appropriate, and the networking infrastructure that enables those freedom. We can use more computational power to resolve issues. Information can be an enabler as well as influencing your own activism. Thank you to Anne and her group in Amsterdam for triggering thinking here, but big data we should be engaging critically. If you can’t make decisions in some way, there’s no point to doing it.

I think there needs to be more robustness in managing and working with data. If you go far then you need a very high level of methodological trust. Information has to stand up in court, to respect activist contributions to data. Use as your standard, what would be acceptable in court. And in a Panspectrum (not Panopticon) environment, when data is collected all the times, you absolutely have to ask the right questions.

Panel Q&A

Q1) I was really interested in that idea of witnessing as being part of being a modern digital citizens… Is there more on protections or on that which you can say

A1 – Sandra) We’ve seen all protections for whistle blowing in government disappear under Bush (II)… We still have protections for private sector whistle blowers. But there would be an interesting research project in there…

Q2) I wondered about that idea of cat and mouse use of technology… Isn’t that potentially making access a matter of securitisation…?

A2) I don’t think that “securitisation” makes you a military force… One thing I forgot to say was about network relations… If a system is interacting with another system – the principle of requisite variety – they have to be as complex as the system you are dealing with. You have to be at least as sophisticated as the other guy…

Q3) For Bryce and Meg, there are so many tensions over when data should be public and when it should be private, and tensions there… And police desires to show the good things they do. Also Meg, this idea of privatising data to ensure privacy of data – it’s problematic for us to collect data, but now a third party can do that.

A3 – Bryce) One thing I didn’t explain well enough is that video online comes from police, and from activists – it depends on the video here. Some videos are accessed via public records requests and published to YouTube channel – in fact in Washington you can make requests for free and you can do it anonymously. Police department does public video. Whilst they did a pilot in 2014 they had a hackathon to consider how to deal with redaction issues… detect faces, blur them, etc.. And proactive posting of – only some – video. The narrative of sharing everything, but that isn’t the case. The rhetoric has been about being open, by privacy rights and the new police chief. A lot of it was administrative cost concerns… In the hackathon they asked if posting in a blurred form, it would do away with blanket requests to focus requests. At that time they dealt with all requests for email. They were receiving so many emails and under state law they had to give up all the data and for free. But state law varies, in Charlotte they gave up less data. In some states there is a a differnet approach with press conferences, narratives around the footage as they release parts of videos…

A3 – Meg) The city has worked on how to release data… They have a privacy screening process. They try to provide data in a way that is embedded. They still have a hard core central value that any public record is requestable. Collection limitation is an important and essential part of what cities should be doing… In a way private companies collecting data results in large data sets that will end up insecure in those data sets… Going back to what Bryce was saying, the bodycam initiative was really controversial… There was so much footage and unclear what should be public and when… And the faultlines have been pretty deep. We have the Coalition for Open Government advocates for full access, the ACLU worried that these become surveillance cameras… This was really contentious… They passed a version of a compromise but the bottom line is that the PRA is still a core value for the state.

A3 – Bryce) Much of the ACLU, nationally certainly, was to support bodycams, but individuals and local ACLUs change and vary… They were very pro, then backing off, then local variance… It’s a very different picture hence that variance.

Q4) For Matthias, you talked about anti-masking laws. Are there cases where people have been brought in for jamming signals under that law.

A4 – Matthias) Right now the American cases is looking for keywords – manufacturers of devices, the ways data is discussed. I haven’t seen cases like that, but perhaps it is too new… I am a Swedish lawyer and that jamming would be illegal in protest…

A4 – Sandra) Would that be under antimasking or under jamming law.

A4 – Matthias) It would be under hacking laws…

Q4) If you counter with information… But not if switching phone off…

A4 – Matthias) That’s still allowed right now.

Q5) Do you do work comparing US and UK bodycamera?

A5 – Bryce) I don’t but I have come across the Rotterdam footage. One of my colleagues has looked at this… The impetus for adoption in the Netherlands has been different. In the US it is transparancy, in the Netherlands it was protection of public servants as the narrative. A number of co-authors have just published recently on the use of cameras and how they may increase assault on officers… Seeing some counter-intuitive results… But the why question is interesting.

Comment) Is there any aspect of cameras being used in higher risk areas that makes that more likely perhaps?

A5 – Sandra) It’s the YouTube on-air question – everyone imagines themselves on air.

Q6) Two speakers quoted individuals accused of serious sexual assault… And I was wondering how we account for the fact that activists are not homogenous here… Particularly when tech activists are often white males, they can be problematic…

A6) Techies don’t tend to be the most politically correct people – to generalise a great deal…

A6 – Sandra) I think they are separate issues, if I didn’t engage with people whose behaviour is problematic it would be hard to do any job at all. Those things have to be fought, but as a woman you should also challenge and call those white male activists on their actions.

Q7 – me) I was wondering about the retention of data. In Europe there is a lot of use of CCTV and the model  there is record everything, and retain any incident. In the US CCTV is not in widespread use I think and the bodycam model is record incidents in progress only… So I was wondering about that choice in practice and about the retention of those videos and the data after capture.

A7 – Bryce) The ACLU has looked at retention of data. It is a state based issue. In Washington there are mandatory minimu periods… They are interesting as due to findings in conduct they are under requirements to keep everything for as long as possible so auditors from DOJ can access and audit. Bellingham and Spokane, officers can flag items, and supervisors can… And that is what dictates retention schedule. There are issues there of course. Default when I was there was 2 years. If it is publicly available and hits YouTube then that will be far more long lasting, can pop up again… Perpetual memory there… So actual retention schedule won’t matter.

A7 – Sandra) A small follow up – you may have answered with that metadata… Do they treat bodycam data like other types of police data, or is it a separate class of data?

A7 – Bryce) Generally it is being thought of as data collection… And there is no difference from public disclosure, but they are really worried about public access. And how they share that with prosecutors… They could share on DVD… And wanted to use share function of software… But they didn’t want emails to be publicly disclosable with that link… So being thought about as like email.

Q8 – Sandra) On behalf of colleagues working on visual evidence in course.

Comment – Micheal) There is work on video and how it can be perceived as “truth” without awareness of potential for manipulation.

A8 – Bryce) One of the interesting things in Bellingham was release of that video I showed of a suspect running away… The footage was following a police pick up for suspected drug dealing but the footage showed evasion of arrest and the whole encounter… And in that case, whether or not he was guilty of the drug charge, that video told a story of the encounter. In preparing for the court case the police shared the video with his defence team and almost immediately they entered a guilty plea in response to that… And I think we will see more of that kind of invisible use of footage that never goes to court.

And with that this session ends… 

PA-31:Caught in a feedback loop? Algorithmic personalization and digital traces (Chair: Katrin Weller)

Wiebke Loosen1, Marco T Bastos2, Cornelius Puschmann3, Uwe Hasebrink1, Sascha Hölig1, Lisa Merten1, Jan­-Hinrik Schmidt1, Katharina E Kinder­-Kurlanda4, Katrin Weller4

1Hans Bredow Institute for Media Research; 2; 3Alexander von Humboldt Institute for Internet and Society; 4GESIS Leibniz Institute for the Social Sciences

?? – Marco T Bastos, University of California, Davis  and Cornelius Puschmann, Alexander von Humboldt Institute for Internet and Society

Marco: This is a long-running project that Cornelius and I have been working on. At the time we started, in 2012, it wasn’t clear what impact social media might have on the filtering of news, but they are now huge mediators of news and news content in Western countries.

Since then there is some challenge and conflict between journalists, news editors and audiences and that raises the issue of how to monitor and understand that through digital trace data. We want to think about which topics are emphasized by news editors, and which are most shared by social media, etc.

So we will talk about taking two weeks of content from the NYT and The Guardian across a range of social media sites – that’s work I’ve been doing. And Cornelius has tracked 1.5/4 years worth of content from four German newspapers (Suddeutsche Zeitung, Die Zeit, FAZ, Die Welt).

With the Guardian we accessed data from the API which tells you which articles were published in print, and which have not – that is baseline data for the emphasis editors place on different types of content.

So, I’ll talk about my data from the NY Times and the Guardian, from 2013, though we now have 2014 and 2015 data too. This data from two weeks is about 16k+ articles. The Guardian runs around 800 articles per day, the NYT does around 1000. And we could track the items on Twitter, Facebook, Google+, Delicious, Pinterest and Stumbleupon. We do that by grabbing the unique identifyer for the news article, then use the social media endpoints of social platforms to find sharing. But we had a challenge with Twitter – in 2014 they killed the end point we and others had been using to track sharing of URLs. The other sites are active, but relatively irrelevant in the sharing of news items! And there are considerable differences across the ecosystems, some of these social networks are not immediately identifiable as social networks – will Delicious or Pinterest impact popularity?

This data allows us to contrast the differences in topics identified by news editors and social media users.

So, looking at the NYT there is a lot of world news, local news, opinion. But looking at the range of articles Twitter maps relatively well (higher sharing of national news, opinion and technology news), but Facebook is really different – there is huge sharing of opinion, as people share what lies with their interests etc. We see outliers in every section – some articles skew the data here.

If we look at everything that appeared in print, we can look at a horrible diagram that shows all shares… When you look here you see how big Pinterest is, but in fashion in lifestyle areas. The sharing there doesn’t reflect ratio of articles published really though. Google+ has sharing in science and technology in the Guardian, in environment, jobs, local news, opinion and technology in the NYT.

Interestingly news and sports, which are real staples of newspapers but barely feature here. Economics are even worse. Now the articles are english-speaking but they are available globally… But what about differences in Germany… Over to Cornelius…

Cornelius: So Marcos’ work is ahead of mine – he’s already published some of this work. But I have been applying his approach to German newspapers. I’ve been looking at usage metrics and how that relationship between audiences and publishers, and how that relationship changes over time.

So, I’ve looked at Facebook engagement with articles in four German newspapers. I have compared comments, likes and shares and how contribution varies… Opinion is important for newspapers but not necessarily where the action is. And I don’t think people share stories in some areas less – in economics they like and comment, but they don’t share. So interesting to think about the social perception of sharability.

So, a graph here of Die Zeit here shows articles published and the articles shared on Facebook… You see a real change in 2014 to greater numbers (in both). I have also looked at type of articles and print vs. web versions.

So, some observations: niche social networks (e.g. Pinterest) are more relevant to news sharing than expected. Reliance on FB at Die Zeit grew suddenly in 2014. Social nors of liking, sharing and discussing differ significantly across news desks. Some sections (e.g. sports) see a mismatch of importance and use versus liking and sharing.

In the future we want to look at temporal shifts in social media feedback and newspapers coverage. Monitoring

Q&A

Q1) Have you accounted for the possibility of bots sharing content?

A1 – Marcus) No, we haven’t But we are looking across the board but we cannot account for that with the data we have.

Q2) How did you define or find out that an article was shared from the URLs

A2) Tricky… We wrote a script for parsing shortened URLs to check that.

A2 – Cornelius) Read Marco’s excellent documentation.

Q3) What do you make of how readers are engaging, what they like more, what they share more… and what influences that?

A3 – Cornelius) I think it is hard to judge. There are some indications, and have some idea of some functions that are marketed by the platforms being used in different ways… But wouldn’t want to speculate.

Twitter Friend Reportoires: Inferring sources of information management from digital traces – Jan-Hinri Schmidt; Lisa Merton, Wiebke Loosen, Uwe, Kartin?

Our starting point was to think about shifting the focus of Twitter Research. Many studies are on Twitter – explicitly or implicitly – as a broadcast paradigm, but we want to conceive of it as an information tool, and the concept of “Twitter Friend Reportoires” – using “Friend” in the Twitter terminology – someone I follow. We ware looking for patterns in composition of friend sets.

So we take a user, take their friends list, and compare to list of accounts identified previously. So our index has 7,528 Twitter account of media outlets (20.8%) of organisations (political parties, companies, civil society organisations (53.4%) and of individuals (politicians, celebrities and journalists, 25.8%) – all in Germany. We take our sample, compare with a relational table, and then to our master index. And if the account isn’t found in the master index, we can’t say anything about them yet.

To demonstrate the answers we can find with this approach…. We have looked at five different samples:

  • Audience_TS – sample following PSB TV News
  • Audience_SZ – sample following quality daily newspapers
  • MdB – members of federal parliament
  • BPK – political journalists registerd for the bundespressekonferenz
  • Random – random sample of German Twitter users (via Axel Bruns)

We can look at the friends here, and we can categorise the account catagories. In our random sample 77.8% are not identifiable, 22.2% are in our index (around 13% are individual accounts). That is lower than the percentages of friends in our index for all other audiences – for MdB and BPK a high percentage of their friends are in our index. Across the groups there is less following of organisational accounts (in our index) – with the exception of the MdB and political parties. If we look at the media accounts we can see that with the two audience samples they have more following of media accounts than others, including MdB and BPK… When it comes to individual public figures in our indexes, celebrities are prominent for audiences, much less so for MdB and BPK, but MdB follow other politicians, and journalists tend to follow other politicians. And journalists do follow politicians, and politicians – to a less extent – follow journalists.

In terms of patterns of preference we can suggest a model of a fictional user to understand preference between our three categories (organisational account, media account, individual account). And we can use that profile example and compare with our own data, to see how others behaviours fit that typology. So, in our random sample over 30% (37,9%) didn’t follow any organisational accounts. Amongst MdB and BPK there is a real preference for individual accounts.

So, this is what we are measuring right now… I am still not quite happy yet. It is complex to explain, but hard to also show the detail behind that… We have 20 categories in our master index but only three are shown here… Some frequently asked questions that I will ask and answer based on previous talks…

  1. Around 40% identified accounts is not very must is it?
    Yes and no! We have increased this over time. But initially we did not include international accounts, if we did that we’d increase share, especially with celebrities, also international media outlets. However, there is always a trade off, there will also be a long tail… And we are interested in specific categorisations and in public speakers as sources on Twitter.
  2. What does friending mean on Twitter anyway?
    Good question! More qualitative research is needed to understand that – but there is some work on journalists (only). Maybe people friend people for information management reasons, reciprocity norms, public signal of connection, etc. And also how important are algorithmic recommendations in building your set of friends?

Q&A

Q1 – me) I’m glad you raised the issue of recommendation algorithms – the celebrity issue you identified is something Twitter really pushes as a platform now. I was wondering though if you have been looking at how long the people you are looking at have been on Twitter – as behavioural norms

A1) It would be possible to collect it, but we don’t now. We do, for journalists and politicians we do gather list of friends of each month to get longitudinal idea of changes. Over a year, there haven’t been many changes yet…

Q2) Really interesting talk, could you go further with the reportoire? Could there be a discrepancy between the reportoire and their use in terms of retweeting, replying etc.

A2) We haven’t so far… Could see which types of tweets accounts are favouriting or retweeting – but we are not there yet.

Q3) A problem here…

A3) I am not completely happy to establish preference based on indexes… But not sure how else to do this, so maybe you can help me with it. 

Analysing digital traces: The epistemological dimension of algorithms and (big) internet data – Katharine Kinder-Kuranda and Katrin Weller

Katherine: We are interested in the epistemiological aspects of algorithms, so how we research these. So, our research subjects are researchers themselves.

So we are seeing real focus on algorithms in Internet Research, and we need to understand the (hidden) influence of algorithms on all kinds of research, including researchers themselves. So we have researchers interested in algorithms… And in platforms, users and data… But all of these aspects are totally intertwined.

So lets take a Twitter profile… A user of Twitter gets recommendations of who to follow in a given moment of time, and they see newsfeeds at a given moment of time. That user has context that as a researcher I cannot see or interpret the impact of that context on the user’s choice of e.g. who they then follow.

So, algorithms observe, count, sort and rank information on the basis of a variety of different data sources – they are highly heterogeneous and transient. Online data can be user-generated content or activity, traces or location data from various internet platforms. That promises new possibilities, but also raises significant challenge, including because of its heterogeneity.

Social media data has uncertain origins, about users and their motivations; often uncertain provenance of the data. The “users that we see are not users” but highly structured profiles and the result of careful image-management. And we see renewed discussion of methods and epistemology, particularly within the social sciences, for instance suggestions include “messiness” (Knupf 2014), and ? (Kitchen 2012).

So, what does this mean for algorithms? Algorithms operate on an uncertain basis and present real challenges for internet research. So I’m going to now talk about work that Katrin and I did in a qualitative study of social media researchers (Kinder-Kurlanda and Weller 2014). We conducted interviews at conferences – highly varied – speaking to those working with data obtained from social media. There were 40 interviews in total and we focused on research data management.

We found that researchers found very individual ways to address epistemological challenges in order to realise the potential of this data for research. And there were three real concerns here: accessibility, methodology, research ethics.

  1. Data access and quality of research

Here there were challenges of data access, restrictions on privacy of social media data, technical skills; adjusting research questions due to data availability; struggle for data access often consumes much effort. Researchers talks about difficulty in finding publicatio outlets, recognition, jobs in the disciplinary “mainstream” – it is getting better but a big issue. There was also comment on this being a computer science dominated fields – which had highly formalised review processes, few high ranking conferences, and this enforces highly strategic planning of resources and research topics. So researchers attempts to acieve validity and good research quality are constrained. So, this is really challenging for researchers.

2. New Methodologies for “big data”

Methodologies in this research often defy traditional ways of achieveing research validity – through ensuring reproducability, sharing of data sets (ethically not possible). There is a need to find patterns in large data sets by analysis of keywords, or automated analysis. It is hard for others to understand process and validate it. Data sets cannot be shared…

3. Research ethics

There is a lack of users informed consent to studies based on online data (Hutton and Henderson 2015). There are ethical complexity. Data cannot really be anonymised…

So, how do algorithms influence our research data and what does this mean for researchers who want to learn something about the users? Algoritms influence what content users interact with, for example: How to study user networks without knowing the algorithms behind follower/friend suggestions? How to study populations?

To get back to the question of observing algorithms? Well the problem is that various actors in the most diverse situations react out of different interests to the results of algorithic calculations, and may even try to influence algorithms. You see that with tactics around trending hashtags as part of protest for instance. The results of algorithmic analyses presented to internet users with information on how algorithms take part.

In terms of next steps. researchers need to be aware that online environments are influenced by algorithms and so are the users and the data they leave behind. It may mean capturing the “look and feel” of the platform as part of research.

Q&A

Q1) One thing I wasn’t sure about… Is your sense when you were interviewing researchers that they were unaware of algorithmic shaping… Or was it about not being sure how to capture that?

A1) Algorithms wasn’t the terminology when we started our work… They talked about big data… the framing and terminology is shifting… So we are adding the algorithms now… But we did find varying levels of understanding of platform function – some were very aware of platform dynamics, but some felt that if they have a Twitter dataset that’s a representation of the real world.

Q1) I would think that if we think about recognising how algorithms and platform function come in as an object… Presumably some working on interfaces were aware but others looking at, e.g. friendship group, took data and weren’t thinking about platform function, but that is something they should be thinking about…

A1) Yes.

Q2) What do you mean by the term “algorithm” now, and how that term is different from previously…

A2) I’m sure there is a messyness of this term. I do believe that looking at programmes, wouldn’t solve that problem. You have the algorithm in itself, gaining attention… From researchers and industry… So you have programmers tweaking algorithms here… as part of different structures and pressures and contexts… But algorithms are part of a lot of peoples’ everyday practice… It makes sense to focus on those.

Q3) You started at the beginning with an illustration of the researcher in the middle, then moved onto the agency of the user… And the changes to the analytical capacities working with this type of data… But how much is the awareness amongst researchers of how the data, the tools they work with, and how they are inscribed into the research…

A3) Thank you for making that distinction here. The problem in a way is that we saw what we might expect – highly varied awareness… This was determined by disciplinary background – whether STS researchers in sociology, or whether a computer scientist, say. We didn’t find too many disciplinary trends, but we looked across many disciplines…. But there were huge ranges of approach and attitude here – our data was too broad.

Panel Q&A

Q1 – Cornelius) I think that we should say that if you are wondering about “feedback” here, it’s about thinking about metrics and how they then feedback into practice, if there is a feedback loop… From very different perspectives… I would like to return to that – maybe next year when research has progressed. More qualitative understanding is needed. But a challenge is that stakeholder groups vary greatly… What if one finding doesn’t hold for other groups…

Q2) I am from the Wikimedia Foundation… I’m someone who does data analysis a lot. I am curious if in looking at these problems you have looked at recommender systems research which has been researching this space for 10 years, work on messy data and cleaning messy data… There are so many tiny differences that can really make a difference. I work on predictive algorithms, but that’s a new bit of turbulence in a turbulent sea… How much of this do you want to bring this space…

A2 – Katrin) These communities have not come together yet. I know people who work in socio-technical studies who do study interface changes… There is another community that is aware that this exists… And is not aware so closely… But see it as tiny bits of the same puzzle… And can be harder to understand for historical data… And getting an idea of what factors influence your data set. In our data sets we have interviewees more like you, and some with people at sessions like this… There is some connection, but not all of those areas coming together…

A2 – Cornelius) I think that there is a clash between computational social science data work, and this stuff here… That predictable aspect screws with big claims about society… Maybe an awareness but not a keenness. In terms of older computer science research that we are not engaging in, but should be… But often there is a conflict of interests sometimes… I saw a presentation that showed changes to the interface, changing behaviour… But companies don’t want to disclose that manipulation…

Comment) We’ve gone through a period, disheartened to see it is still there, that researchers are so excited to trace human activities, that they treat hashtags as the political debate… This community helpfully problematises or contextualises this… But I think that these papers are raising the question of people orientating practices towards the platform, from machine learning… I find it hard to talk about that… And how behaviour feeds into machine learn… Our system tips to behaviour, and technology shifts and reacts to that which is hard.

Q3) I wanted to agree with that idea of the  need to document. But I want to push at your implicit position that this is messy and difficult and hard to measure… But I think that applies to *any* methods… Standards of data removal, arise elsewhere, messiness occurs elsewhere… Some of those issues apply across all kinds of research…

A3 – Cornelius) Christian would have had an example on his algorithm audit work that might have been helpful there.

Comment) I wanted to comment on social media research versus traditional social science research… We don’t have much power over our data set – that’s quite different in comparison with those running surveys, undertaking interviews… and I have control of that tool… And I think that argument isn’t just about survey analysis, but other qualitative analysis… Your research design can fit your purposes…

 

Twitter recommend algorithms, celebrities and noise. Time on twitter. Overall follower/following counts? Does friend suggest influence?

Advertistors? and role in shaping content in news

Time:
Friday, 07/Oct/2016:

4:00pm – 5:30pm

Session Chair:

Location: HU 1.205
Humboldt University of Berlin Dorotheenstr. 24 Building 1, second floor 80 seats
Show help for 'Increase or decrease the abstract text size'

Presentations

Wiebke Loosen1, Marco T Bastos2, Cornelius Puschmann3, Uwe Hasebrink1, Sascha Hölig1, Lisa Merten1, Jan­-Hinrik Schmidt1, Katharina E Kinder­-Kurlanda4, Katrin Weller4

1Hans Bredow Institute for Media Research; 2University of California, Davis; 3Alexander von Humboldt Institute for Internet and Society; 4GESIS Leibniz Institute for the Social Sciences

Aug 202015
 

Today I am back for another talk which forms part of the IFIP Summer School on Privacy and Identity Management hosted in Informatics at the University of Edinburgh.

Today’s talk is from Angela Sasse, Professor of Human Centred Technology at University College London, and she also oversees their Computer Security group (her presentation will include work of Anthony Morton). She is also head of the first research group in the UK researching the science of Cyber Security. Apparently she also authored a seminal paper in the ’90s entitled “Humans are not the enemy” which addressed mismatches of perceptions and behaviours. That motif, that users are not the enemy, is still something which has not quite yet been learned by those designing and implementing systems even now. 

I think my title gives you a good idea of what I will be talking about: I will be starting with talking about how people reason about privacy. That is something which is often not accounted for properly, but is important in understanding behaviours. Then I will be talking about why current technologies do not meet their preferences. Then I will look to the future – both some dystopian and utopian scenarios there.

So, how do people reason about privacy? Some work with Adams (2001) looked at this and we used the crucial subtitles “protecting users not just data”. There we pointed out that there is a real difference between how the law treats this, and how people understand privacy. Individuals are pragmatic in their choices, they are thinking about the risks and the benefits – they trade those off. Some of this stuff came out of early internet networking, video calls, etc. but it has stood the test of time as these things have become commonplace.

There has been a raft of research over the last 15 years, not just by computer scientists but also social scientists, ethicists, economists. And we have come to a place that we understand that people do trade risks for benefits but that is not always efficient in an economic sense, it is not always logical… And there are a number of reasons for this: they may not be aware of all risks and consequences – around secondary level information; and around secondary and tertiary usage, aggregation with other data sources; their perception may be skewed by hyperbolic discounting – entirely dismissing things with low risk; there is a paradox here as people do belief in privacy and security but their actions are not always reflective of this.

So, why don’t people act in line with their own preferences? Well there is “Confusology” (Odlyzko) which I’ll come back to. Hyperbolic discounting is about risk in the future and potential, vs rewards that are immediate and tangible (sometimes). Sometimes users say “they know this anyway” – there is no point obfuscating information as “they” know this stuff already – they are just testing honesty or willingness. When you have done a lot of work on financial disclosure this arguement comes up a lot there. It also comes in with ISPs and perceptions of surveillance. Sometimes this reaction is plausible and logical, but sometimes it is much more of a Cognitive Dissonance defense, something of an excuse to minimise workload. That is also why we really do need to work on the public discourse because the more false information is in the public discourse, the most this encourages individuals to make choices in that way. The more we allow that kind of nonsense to be out there, the more it undermines important dicussions of privacy. The final reason is that technology does offer protection people want – but they still want the benefits.

Back to Confusology (Odlyzko 2014), I really recommend Odlyzko’s work here. He talks about several factors: inadvertant disclosure – complex tools make consequences of actions hard to predict; there is too much work – rules and legal jargon make privacy too much work, and people are loathe to expend effort on tasks they see as secondary to their goal. Legal jargon is practically an orchestrated campaign, “I agree with the terms and conditions…” is the biggest lie on the internet!; lack of choice (so consent is not meaningful) – I challenge you to find a provider who offers genuinely meaningful terms of consent; the hidden persuaders – temptation, nudging, exploiting cognitive biases… encouraging users to think that sharing more is the preferred option. I have seen Google encouraging researchers in privacy to work on “opinionated design” because they have tried everything to get people to click through in the right way – they make warnings different every time, hide other options etc. I think this is a slippery slope. In the privacy area we see this choice as pretty fake, particularly if you hide and obscure other options.

The inadvertant disclosure stuff is still happening. Many users do not understand how technology works and that can catch users out – a key example is peer to peer file sharing, but we also see this with apps and the requests they make of your device (use of contacts, data, etc) and there will be lots more inadvertant disclosures associated with that coming out.

Too  much work leads to over disclosure. Once you are in the habit of doing something, you don’t have to think about it too much. It is less work to fill in a form disclosing information you have given before, than to stop and think about what the implications of sharing that data actually are.

We also see successful adopted technologies that fail on privacy. Platforms for Privacy Preferences (P3P) was far too much work to be useful to many people. It was only IE that implemented it, and they did so in a way that websites could systematically escape cookie blocking. It was too complex and too ambiguous for browser vendors. And there is absolutely no means to verify websites do what they say – 5% of TRUST -e “verified” websites had implementation errors in 2010. This is a place where cognitive dissonance kicks in again – people fixate on something that they see as helping with one form of security and don’t necessarily look at other risks. Meanwhile DoNotTrack – users of this are identified more quickly than those who don’t through web finderprinting. Advertising circumvent with Supercookies.

So, it really isn’t clear what you need to do to ensure that the privacy people want is enabled in websites and tools.

To change tack slightly it is worth reflecting on the fact that privacy preferences vary. It can be useful to frame this in a Technology Adoption Framework – TAM offers a useful framework but privacy needs do vary across cultures, and it varies between people. You need to speak to different people in different ways to get the message across. Westin is a three point scale around privacy that you could use, but that is too coarse-grained since it basically only differentiates between hardcore secure users, pragmatists, and those unconcerned.

However there have been various studies with the Westin Scale (see Berkeley Survey 2009; Harris Poll 2003; Harris Poll 1999) and most users fall into the Privacy Pragmatists category. But behaviours, when studied, consistently DO NOT match their preferences! So we need something better.

There have been attempts to improve the Westin scale but there has been limited scope of other alternative measures of privacy concern, e.g. IUIPC (Malhotra et al 2005) and CFIP (Smith et al 1996). And people engage in information seeking behaviours (Beldad et al 2011), since people seek trust signals (trust symbols and trust symptonms) (Riegelsberger et al 2005). Asking people about the provider of a service, and their trust in that provider is important in terms of understanding their behaviour and their preferences.

So my PhD student (Morton) looked to work on development of the Westin scale to better align preferences and behaviours, using a mixture of qualitative and quantitative methods, investigating subjective viewpoints. He has been interviewing people, analysing their statements, and ordering those statements with research participants asking them how well those statements reflected their views. The number of participants (31 offline, 27 online) is relatively small, but the number of statements generated by them was into the thousands – so this is a really complex picture. So, participants ranked statements as important or unimportant with a Q-sort process (a version of card sorting task).

Morton has found that people sort into five categories:

  • Information Controllers – those really aware of the data, looking at the data and what it says about them. These are skeptical people and do not have a high trust in the cloud and want control over the collection, use and dissemination of personal information. For them things that are not important include: organisational assurances; others’ use of the technology service.
  • Security Concerned – their principal focus is on security of the technology platform, providing organisation;s security processes, potential impact on personal security and finances. They are trading off the benefits and risks here. They are less interested in the technology in abstract.
  • Benefit Seekers – are those happy to trade off the risks
  • Crowd Followers – trust in others’ use to make decisions about privacy and security
  • Organisational Assurance Seekers – they look for the organisation to say the right things, disclaimers etc. They expect bad things to happen, and want assurance against that.

Now I think that work is quite interesting. And we are now undertaking a large scale study with 1000 participants in the UK and US with all participants sorted into one of these categories, and several scenarios to assess. The first 300 participants’ contributions already suggest that this is a better model for connecting preference with behaviour.

I did want to talk about why we need to make privacy more salient. Ultimately privacy is about relationships. People manage relationships with other peoplel through selective disclosure of information – that is a fundamental part of how we engage, how we present different personas. As more information is disclosed, the more that is undermined. And that is most obviously taking place in University admissions or potential employer searches for individuals. The inability to make selective disclosures can undermine relationships.

For exampe: a chocolate biscuit purchase: seeing someone buying chocolate biscuits buys the main shop on card, then buys biscuits in cash. It turns out this person’s partner is a health food nut and manages the finances tightly. So that person and their child agree to the healthy food rules at home, but then have access to chocolate biscuits elsewhere. This is how people manage relationships. That sort of lack of disclosure means you do not need to revisit the same arguement time and again, it helps illustrate why privacy is so fundamental to the fabric of society.

We do have ways of making privacy cost more salient. There is this trade off around privacy – we are often told these things are “for your own good”. And without a significant push for evidence that is hard to counter. We don’t force accountability of promised/stated benefits. CCTV in the UK is a great example. It took almost two decades for any investigation into that investment, when there was research it was all pretty damning (Gill and Spriggs 2005; Metropoliton Police Review 2008 – CCTV only contributes to prevention or resolution in 3% of crime, it is costly and there is only 1 crime per 100 cameras). And we have had misuse of CCTV also coming through courts. Investigations into inappropriate behaviour by the London Met Police over a year show inappropriate disclosure – like the CCTV case – a huge percentage of that issue.

We have the extension of the state into something of military surveillance. We see the rise of drones, robots and autonomous vehibles. There is an increasing number of networks and devices – and we see mission creep in this “deeply technophilic” industry. We also see machine learning and big data being advertised as the solve all solution here… But as Stephen Graham notes “emerging security policies are founded on… profiling” of individuals, a Minority Report state. David Murajami Wood from the Surveillance Studies Network talk about automatic classification and risk based profiling as adding up to “social sorting” and we see this with tools like Experian MOSAIC and ACLU Pizza. We must not let this happen without debate, push back, and a proper understanding of the implications.

Odlyzko raised the issue of who controls the information – it is often big global mega corps. The decline of privacy actually undermines the fundamentals of capitalism and the dynamic nature of the market system – a truly dystopian solution.

So, do people really not care? Post Snowden it can seem that way but there are signs to the contrary: the UK Investigatory Powers Tribunal ruled GCHQ surveillance to be illegal; major tech companies are distancing themselves from government, putting up legal resistance; and deploying better security (encryption) and we see talk of a Digital Charter from Tim Berners Lee, progressing this debate. Privacy protection behaviours are not always obvious though.

We also see the idea that “Digital Natives Don’t Care” – now that is not true, they just care about different things, they engage in “social steganography” hiding in plain sight (boyd 2014).

So, in conclusion: technology has profound impact on privacy, in many ways that people don’t understand – at least not immediately; people often eagerly assume and over estimate benefits and under estimate and discount risks; we need to counter this by better communication about risks and benefits; communication needs to relate to what matters to people with different preferences.

Q&A

Q1) It seems to me that some of the classical social science sources about relationships, what information to ignore and which to note… It seems those sources can be updated and adapted to the modern world and that you can analogyse up to the point

A1) Yes, you look at this area and there are really three people I always go back to from the 1960s: Goffman, Lumans and Giddon.

Q1) And more recently Henry Jenkins too.

Q2) From your presentation many people make poor decisions around privacy, but those are pragmatic choices. But I really do think we don’t see people understanding the impact of surveillance – there is a lack of understanding that not only might they look for terrorists but of the other implications of machine learning, of other use of data, and that that is a level of data use that is not proportionate the problem.

A2) That is the debate we need to see in the public discourse so urgently. There is a pushing out of tools without any consideration of those implications. Using the language of cost and waste around data can be useful here, but some want a story of the negative consequences in order to make sense of this – for instance someone being denied a job because of errors or disclosure.

Q3) Do you think that education institutions in the United Kingdom have any role to set an example or themselves or others, by practicing what academics would advise.

A3) Online privacy protection is part of the national curriculum now. If I was running a school I wouldn’t want to turn it into a prison – metal detectors etc. But there is also the tracking of learning behaviours and activities, data mining to identify individual learning paths – risks there are also something to think about. It is often the most mundane and banal stories that often hit home: what if someone is worried to search for treatment for a disease, lest their own status be disclosed by that? Being tracked changes behaviour.

Q4) The detection rate of terrorism is so low that it is not just a waste of money, it is also ineffective method.

A4) But then it is more convenient to sit behind a computer than to actually be out on the street facing direct human interaction and risk, that may also be part of it.

Q5) Going back to the topic of education. there are quite a lot of primary schools in the UK where they are using apps, ebooks etc. Is there

A5) There are three technologists who did a fantastic study. They found it makes kids more obedient, and they start to behave like people in prison which is damaging to individuals as well as to society. This will foster rather than discourage criminal activity.

Comment) Emmerline Taylor, in Australia, has done a book on how kids respond to technology in schools.

And with that we close a really interesting talk with clear relevance for some of the findings and recommendations coming out of our Managing Your Digital Footprint research work.

Aug 182015
 

All of this week, whilst I am mainly working on Managing Your Digital Footprint research work, there is a summer school taking place at the University of Edinburgh School of Informatics on Security and Privacy with several talks on social media. This afternoon I’ll be blogging one of these: “Policing and Social Media Surveillance : Should We Have any Privacy in Public?” from the wonderful Professor Lilian Edwards from University of Strathclyde and Deputy Director, CREATe.

I come to you as a lawyer. I often say what I do is translate law to geek, and vice versa. How many here would identify themselves as from a legal discipline (about 10 are), I know most of you are from a computer science or HCI area. What I will talk about is an overlap between law and computer science.

So, a nice way to start is probably David Cameron saying: “In extremis, it has been possible to read someone’s letter, to listen to someone’s call to listen in on mobile communications,” he said. “The question remains: are we going to allow a means of communications where it simply is not possible to do that? My answer to that question is: no, we must not.

I’m going to argue that encryption, privacy, etc. is a good thing and that there should be some aspect of privacy around all of those social media posts we make etc. Now, what if you didn’t have to listen to secret conversations? Well right now the security services kind of don’t… they can use Tumblr, Facebook, Twitter etc..

So, a quick note on the structure of this talk. I will set some context on open source intelligence (OSINT), and Social Media Intelligence (SOCMINT). Then I will talk about legal issues and societal implications.

So, SOCMINT and OSINT. In the last 5-7 years we’ve seen the rise of something called “intelligence led” policing, some talk about this as the Minority Report world – trying to detect crime before they take place. We have general risk aversion, predictive profiles, and we see big data. And we see “Assemblages” of data via private intermediaries. So we see not only the use of policing and intelligence data, but also the wide range of publicly available data.

There has been the growth in open source intelligence, the kind of stuff that easy to get for free, including SOCMINT – the stuff people share on social media. You can often learn a great deal from friends graphs, their social graph – even with good privacy settings that can be exposed (used to always be open) and that is used in friend of friends analysis etc. The appeal of this is obvious – there is a lot of it and it is very cheap to get hold of it (RUSI and Anderson Report 2015), 95% of intelligence gathered is from this sort of “open source” origins, the stuff that is out there (ISC 2015). There have been a number of reports in the last year with increadibly interesting information included. Another report stated that 90% of what you need to know if from this sort of open source, and it’s great because it is cheap.

In terms of uses (Barlett and Miller 2013) are various, but worth noting things like sentiment analysis – e.g. to predict a riot etc, apparently very useful. Acquiring information from the public – have you seen this robber, etc. is very useful. Horison scanning is about predicting disturbance, riots etc. We are also seeing predictive analytics (e.g. IBM Memphis P.D.; PredPol in Kent) and that is very popular in the US, increasingly in the UK too – towards that Minority Report. Now in all of these report there is talk of predition and monitoring, but little mention of monitoring individuals – but clearly that is one of the things this data usage enables.

These practices are rising policy challenges (Omand 2012) of public trust, legitimacy and necessity, transparency. And there is the issue of the European Convention on Human Rights: article 8 gives us the right to a private life, which this sort of practice may breach. Under that article you can only invade privacy for legitimate reasons, only when necessary, and it the level of invasion of privacy can only be proportionate to the need in society.

So, looking at what else is taking place here in contemporary practice: we had the Summer Riots in 2011 where the security services used #tweets, BB texts etc. and post riot reports really capture some of the practice and issues there; Flickr stream of suspect photos leading to 770 arrests ad 167 charges, Facewatch mobile app During the 2012 Olympics the police wanted to use social media data, but basically did not know how. So issues here include police managerial capacity; there is sampling bias (see “Reading the Riots”) as Twitter is a very partial view of what is occuring; And there is human error – e.g. in crowdsourced attempts to identify and locate the Boston Bombings.

So I want to talk about the possibility of using public social media posts and question whether they have any protection as private material.

An individual tweets something, says she didn’t intend for it to be seen by the police, commentators online say “What planet is this individual on? Her tweets are public domain” and that is the attitude one tends to see, including in the law courts. e.g. “a person who walks down the street will inevitably be visible” (PG v UK 2008 ECt HR). In the UK that seems to be the standard perspective, that no reasonable expectation to privacy when expressing yourself in public.

In the US there is even less privacy of social media posts, e.g. see C.f. Bartow (2011) who says “Facebook is a giant surveillance tool, no warrant required, which the government can use… with almost no practical constraints from existing laws”. There is no idea of privacy in the US constitution effectively.

You’d think that the EU would be better but where are our traditional concepts of when “reasonable expectation of privacy arises?” Is it in our body, our home (Rynes ECJ 2013), car, what about our data “relating to you” vs “public sphere” (Cf Koops).

So, what are the legal controls? Well the Data Protection law seems obvious but there are strong UK exemptions around detection and prevention of crime – so there is no need for consent.

How about the European Convention on Human Rights article 8, the right to a “private life”. So, the start of my arguement is Von Hannover ECtHR (2004) about intrusion by press rather than police – Princess Caroline of Monaco was being followed by the press in all of her activities. The Court says, seminally, that this is absolutely an invasion of her private life – even though she is a public figure in a public sphere. So we have a concept of privacy being beyond the bounds of your home, of being able to have a right to privacy when out in public.

Now, that was an important case… But it hasn’t had that much impact. So you have cases where the police take photos of people (Wood v Metropolitan Police 2008) or CCTV (reapplication by JR38 for Jusicial review (2015). In the case of Wood a serial activist was going to a corporate AGM, expected to cause trouble, so police followed him and photographed him. Judge said that he was an activist and well known, and could expect to be followed. The arguement was that the image was a one off thing – that not part of ongoing profile.

The most recent case, which was in Northern Ireland, was caught on CCTV during the NI equivelent of the London Riots. The person in question was 14 year old and images were circulated widely, possibly including to the Derry Journal. Again he uses, but in an interesting way. There are at least three judgements.

Lord Kerr says “The facet that the activity… Is suspected to be criminal… will not alone be sufficient to remove it from… application of article 8”. That’s a big deal – suspicion of criminal activity isn’t enough for your rights to be exempt. However in this case the second test, whether the intrusion is justified, was found to be the case. And they took very little time to decide it was a justified act. Under proportionality of rights of individual, and rights of community to protect itself, they felt this intrusion was justified. They say that he’d benefit too – saying that that 14 year old might be diverted from a life of crime. They lay it on a bit but they are under pressure to justify why they have not stigmatised this youth through sharing his image. So, an interesting case.

So, there is some expectation of privacy in public but even so interference can be justified. Interferance must be justified as necessary, proportionate and according to law. But security usually seems to win in UK? (Wood, JR38). Even if no reasonable expectation of privacy, may still be part of “private life”. But all of this assumes that you know you are being surveilled, of your information being accessed. But you may not know if your data is being used to build up profiles, to build up an airport stop list, etc.

Now, in response to Snowdon, we have something called RIPA – an envisioned “digital” scheme to cover surveillance of personal data. This scheme covers real time interceptions of emails, warrant from secretary of state needed. But social media isn’t part of this. They just seem to be making up how they manage that data.

Now I want to argue that use of SOCMINT shouldn’t have any special excemption…

Demos in 2013 asseted “open” SOCMINT collection (and processing) needs no authorisation of any kind. Why? They argued that no expectation of privacy so long as user new from T&C that public data might be collected, especially via API. I think that is just egregiously stupid… Even if you believed that it would apply to the platform – not for the police, the rest of the world, etc.

The other argument is the detailed profile argument. And that is that even if we admit that this material is “public” there is still part of ECHR which is that detailed profiles of this sort need to be treated with respect – that comes from practices by the Stasi and concerns around the possibility of a secret police state, Juris Prudence (Rotaru v Romania) covers this.

So, my perspective is that there is a real difference between structured and unstructured data… Even if in public is SOCMINT an autoamatic dossier? With Google most of the internet is a structured dossier. With that in mind ECtHR case law has seen structured dossiers maintained ver time as a key threat – Rotaru v Romainis dictum: “public information can fall within the scope of private life where it is systematically collected and stored in files held by authorities”. So does the Rotaru distinction between structured data in files held by police, and unstructured data hold up in the age of Google and data mining (e.g. Google Spain (ECJ 2014), UK RIPA case (2015).

As we move into the internet as the main site for key publishing of data, and as the internet of things and smart cities come online

Q&A

Q1) Should we be able to do data mining on large sets of social data?

A1) Big data, data mining and the internet of things can be seen as the three horsemen of the apocalypse in a way. And that’s the other talk I could have given. The police, using this sort of data are using data in a different context, and that isn’t ok under ECHR art 8.

Q2) I remember a paper about a year ago about the distinction between what an individual can do in terms of asking about others etc. They have more right that the police in some contexts.

A2) There is this weird thing where if you are not looking at specific people, you aren’t as restrained. That’s because it used to be the case that you could find out very little without investigating an individual. That has changed considerable but he law hasn’t been updated to reflect that.

Q3) A lot about us is public, so don’t we just have to deal with this. I see the concerns of a police state, but I don’t understand where you are drawing the line on legal controls on policing. If they can only do the same as a member of the public then there shouldn’t be an issue there…

A3) You’ve given that answer yourself – the power dynamic is asymmetrical. They have capacity to join data up to their own databases – which may include your being a witness or victim of crime, not always suspect or perpetrator. There is a lot of black boxing of data here…

Q3) What controls are you proposing?

A3) Honestly, I don’t know if the quick answer. But if we look at the requirements for intercepting letters, email, telephone are strict, searching homes, pretending to be friend etc. are less strict… But that scooping up of mass data is something different in terms of implications and we need some form of safeguarding around that, even if less strict than some other approaches/interceptions.

There is overwhelming evidence that young people don’t realise the potential implications of their sharing of data, and see these spaces as a private space away from other areas of their life in which they find themselves surveilled. So there is a reasonable presumption of privacy there.

Q3) I think there is a need for appropriate controls on police activities, I agree with that. If I share things only with friends on facebook and police look at that, that is an investigation. But if I tweet something it is public

A3) This is the classic liberal argument I don’t agree with. Tweeting is a bit different. Facebook is the new mall, the new social space, they use openness to serve them socially, believing it will only be read by peers. So they have a reasonable expectation of privacy. Part of Bartett and Millar work is about the use of the word “rape” – in gaming culture it is being used to take a game. Imagine that being crunched. That’s the sort of issue that can arise in big data. I’m not saying police needs a warrant for all Twitter data capture, I’m saying we need to think about what is appropriate.

Q4) There is a perspective that taking the UK out of the EU Human Rights Act is a red herring to distract from other legislation.

A4) Even if we left the EU Human Rights Act, the UK Government would find many of its protections are embedded in other part of EU law, so it would still require appropriate respect of individual rights to privacy. But that’s a political conversation really.

Q5) So, in terms of the issues you have raised, how do we understand what is private and what is public data?

A5) I think essentially that we need to safeguard certain points in what has become a continuum in privacy around human rights, something that will set some barriers about the types of interventions that can occur, and what kind of oversight they require.

And with that Lilian’s excellent and information-packed talk is done. Really interesting and there were clearly plenty more questions arising. Particularly interesting for me thinking about the Digital Footprints work, and the legislative context for the research we have been undertaking on student expectations, experiences, practices. 

Apr 272015
 

This afternoon I am attending a talk on the Privacy of Online Social Networks which has been arranged by the Social Network Analysis in Scotland Group (SNAS) and is taking place at the University of Edinburgh. The speakers are Jordi Herrera-Joancomarti, Cristina Perez-sola, and Jordi Casas-Roma, all from Universitat Autonoma de Barcelona (UAB). I’ll be taking notes throughout, although I think that the talk is also being recorded so may be available later. As ever this is a liveblog so corrections, comments, etc. welcome. (I will also be adding some images from the event later today as some of the processes discussed were quite complex and require illustration!)

We are opening with an introduction to the SNAS group, which meets at the University of Edinburgh on the last Tuesday of every month… They have a mailing list and I’ll add a link here later. Dr Jordi Herrera-Joancomarti is leading the talk, and is an expert on privacy and security.

Dr Jordi H-J: This is collaborative work with my colleagues Cristina and Jordi. My background is not social sciences but mathematics, so it is a good challenge for me to speak to a non technical audience here… Hopefully there are no scary mathematical equations here! I’ll open with an introduction, talk about Online Social Networks and graph theory, talk about the data you can mine, and I will talk about Online Social network Data anonimisation, and how you can release data from networks without compromising privacy, before coming to my conclusions.

So, to start with the definition of Online Social Network I am using is an “online service, platform or site that allos to create a user profle which can be connected with other user profiles of the network… ”  – a very computer science definition.

So this can be about specialisms like Flickr, LastFM, WikiLoc… specialised format (e.g. Twitter); Scope limited (e.g. LinkedIn); General purpose (e.g. Faebook, Google+) etc. The denomination of connectivity can be network dependent (e.g. Facebook: friends; Twitter: followers). An dinteractions between user profiles are also network ependent (e.g. Facebook: “like” action, post a message; Twitter: tweet, Retweet etc).

So, why are OSN interesting or important? Well they have become an important part of people’s everyday communications, with huge volumes of users. But there is also a book, Big Data (Viktor Mayer-Schonberger and Kenneth Cukier) which includes chapter 5 “Datafication” talking about the quantification of the world along the time from differnt aspects. So, when words became data (Google books in 2004); when localization becomes data (GPS); and when relationships become data (OSN). For instance Facebook datafied relationships, and most notably with the introduction of “Social graph”.

To graph theory then. A graph is a mathematical tool used to represent objects (nodes) that can be connected by links (edges). OSN can be modeled using graphs and analysed with graph theory. So… You can represented connections between individuals etc.

There are different OSN propoerties that dertmine the type of the corresponding social graph:

– Undirected graphs are those with no meaning on the incidence of an edge in the node. Facebook social graph is an undirected graph. So, no arrows between individuals, no value to that edge.

– Directed graphs (digraph) are those in which the edges have a direction associated with them. Twitter social graph is a directed graph. For instance you can follow someone, they don’t have to follow you… So we have arrows here to indicate connection and direction.

– Weighted graphs assign a weight to every edge in a graph.

So, when you add direction to a graph you can borrow many analysis tools from graph theory. So if we try with a degree of a node in an undirected graph… The degree of a node is the number of edges incident to that node, denoted as deg(vi).

In a directed graph the same concept applies but it is more complex… We have In-degree of a node and that is the number of head endpoints adjacent to that node denoted as deg-(vi). Similarly we can have out-degree for number of tail endpoints, denoted as deg+(vi).

So, in a facebook social graph the degree of a node is the number of friends of that user. In Twitter social graph, the in-degree can be seen as the number of followers of that user. High in-degree may indicate a popular user. And the out degree can be seen as the number of users that person follows.

We can also talk about the clustering coefficient. We see local clustering coefficient of a node – the proportion of edges between the nodes within its neighbourhood divided by the number of edges that could possible exist between them… So it measures how far are the neighbourood of a node to become a clique. So this is how well the friends of a node are connected. These kinds of technical techniques can be used to understand user connections and relationships.

We study OSN privacy from an information-fundamental point of view, analysing OSN privacy from a graph mining perspective. We do not study specific OSN services, configurations or vulnerbailities. In some cases we do make some assumptions about the type of OSN: open vs closed profiles. For instance Facebook is more difficult to extract data from than Twitter, an open social network.

So there are two kinds of users information that can be extracted:

1) Node information – data about a specific user, details contaied in the users profile on a specific OSN

2) Edge information – data about the relationship between members of the network – and that is what we are most interested in.

Edge information can, however, directly disclose node attributes – e.g. an edge representing a sentimental relationships between two individuals of the same sex would be revealing about their sexual orientation. It is more difficult to protect edge information than node information – as it depends on behaviour of connected people whereas node information is controlled by just one user. Relations between users can also detect communities, and more node attributes.

So, I wanted to explain about data retrieval. How do you ontain social network information? Well you can ask OSN providers – but many are not that cooperative or put a great deal of restrictions/agreements to do that. They provide local and/or anonimised data. OR you can take the data from the OSN providers – that is not always possible adn depends on the open degree of the OSN service. And it is very important to take care on the mechanism used to obtain information as that may determine the bias of the data you collect.

You can gather data several ways. You can use a web crawler to gather daya from an open OSN (like Twitter). Web crawlers are computer programs that retrieve web pages starting from a single (or multiple) page and exploring all its linked pages and also the pages linked to those ones and so on. Since most of OSN interact through the web, you can use web crawlers for OSN data retrieval… The process is iterative…

A download is the interface between the OSN and the crawler – it downloads the users profiles and passes it to the parser, which then parses that data. You draw out the friends of that user and add them to the queue, which contains all the users that are awaiting to be explored, found when crawling every user. And the scheduler selects which user, from the ones in the queue will be explore and sends the decision to the downloader. The scheduler impacts on both performance and data quality.

If you are exploring the whole network then it is not so important to consider the crawler details… if I am crawling every member I will find all of the connections at the end… the order you gather data in doesn’t matter in that case. BUT you cannot crawl all of the network available now… So you will have to, at some point, decide to take a partial view of the network. So to do that we have to think about notification and assumptions…

Users can be crawled (one that all his profiles information and all friends are known to the crawler (v E Vcrawl). A discovered user (connected to the user crawled), and an explored user  (discovered by relationship to discoverd user)?

So… for instance a Breath-First Search (BFS) Algorithm would start with one user (h)… you find they have two friends (d and j)… I crawl j and then discover they connect to users l and k and g (and I’ve already crawled d and h)… Then I crawl user d, finding connections to f, e, b, c… others are already found… Then I crawl l, find connections etc…

So, that is your schedule, the order you crawl. And the idea is that you can end up with all the elements of the network… This is quite a linear process. So, this is one approach, and this BFS algorithm produces graphs quite dissimilar to other algorithms you could use.

An alternative approach is the Depth-First Search (DFS) which works as a traditional stack, the first nodes to be crawled are the last ones that have been discovered (LIFO management). So, in this approach… If you start with user h… you discover j and d… But the next node you explore is d… then when you find connections to f, g, e, b, c… and you next explore node c. At the end you will end up with all the nodes as well… But in a different order than you had before… So, again, if you do this with a group of users (example here being 162 flickr nodes) it looks quite different…

Then you can do more intelligent things… You can use “greedy” algorithms:

– Real-degree greedy (hypothetical greedy or higherst-degree-crawler) takes its decisions based on the real degree (which may be unknown to the crawler before the node is crawled) of the nodes in the OSN. So a user has degree 5, degree 7 etc. based on the edges between different nodes… You can gather the whole network, or you may have restrictions and only capture part of the network…

– Explored-degree greedy (greedy) uses the actual known degree of the nodes in the OSN… So if you graph that you see many many connections, you look more conciously to the mode connected nodes.

You can also choose to select more variance in the network, to randomise your sample to an extent. This can be done with a lottery algorithm…

So, if you take information from a social network or a social network graph you have to be really well aware of what you are getting. When you do your sampling from different profiles, etc. that you understand what your sample is of. As far as you can see you can just adjust the scheduler to get what you want… you can do that to focus on particular users, types of users.

Schedulers have implications on privacy… depending on the level you select that has different implications… So your scheduler can have different objectives for the crawler – taking the privacy attackers point of view. So you can then understand which scheduler algorithm fits those objectives most appropriately…

You can also do more tricky things… For instance the classification of users from a graph point of views. So, I want to classify users, identifying the set of categories a new observation belongs to. The decision is made on the basis of a training set of data containing observations whose category membership is already known. When you try to classify users within the network, you can see link information which may help you to classify a user – connections to a community for instance.

The idea is that you can see classification as a privacy attack – user classification allows an attacker to infer private attributes from the user. Attributes may be sensitive by themselbes, attribute disclosuer may have undesirable consequences for the user. So the design of a user (node) classifer that uses the graph structure alone (no semantic infomation needed)… So, for instance… We may classify the user, with a neighborhood analysis to better classify the user… So the classifer analyses the graph structure and maps each node to a 2-dimensional sample using degree and clustering coefficient. The output is an initial assignation of nodes to categories…

And you can make that neighborhood information to classify the node… You can also have a relational classifier, which maps users to n-dimensional samples, using both degree and clustering coefficient and the neighborhood information to classify users…

So coming to the issue of data and data release… When you obtain a collection of data… you may have a more anonymised data view… You may see connections etc. but without user names, for instance. The intention is to preserve the privacy of users. But is this enough? Well no… this nieve anonimisation potentially reveals huge amounts about the user… if you know other data (other than names), you may be able to deduce who is in the network, you might find one user in the network and thus expose others. Removing the identifiers is not enough… So, you have to do something more elaborate…

One approach is to modify the edges – adding or deleting edges to hinder re-identification… But the problem is that you have two opposite objectives: On th eone hand you want to maximise the data utility and you want to minimise noise in that data. But you also want to preserve users privacy…

So, there are different ways to quantify the objective…. There are generic information loss measures (GIL) – measures like average distance, diameter, harmonic mean of shortest distance, etc… You want to preserve that in your data. So… you have the original network, you do one metric… and end up with a different network that is anonimised, and you can apply a similar metric afterwards to use it… In statistical databases you can preserve the mean of all the registers that sold boots (say)… If you know the questions to ask of that data, you know the process to keep that anonimised data close to the original data set…

You can also use specific information loss measures (clustering process)… Similar problem here… You have the original clusters, you use a clustering method to get to an anonimised (perturbed) version.

So, some measures behave in a similar way independently of the data in which they are gathered.

And then you have the idea of k-anonimity. A model that indicates that an attacker can not distinguish between different k records although he managed to find a group of quasi-identifiers. Therefore the attackers can not re-identify an individual. So, node degree can be the quasi-identifier… We can presume the attacker may know some of the nodes in the network… We can preserve the degree sequence, and the ordered degree sequence. And you can measure the k degree by understanding how many nodes have the same degree. So if two nodes in the network have degree 4, then the k-degree anonymity is 2. You can then make use of this to preserve the graph…

To modify the graph you can use edge modification (adding and/or deleting); node modification (adding and/or deleting). You can use uncertain graphs – adding or removing edges “particially” by assigning a probabiity to each edge. The set of all possible edges is considered and a probability is assigned to each edge.

Edge modification can include edge rotation, random perturbation, relevant edge identification, k-anonymity orientated anonimisation. These can allow you to keep data you want to keep, whilst preserving user privacy.

So, in conclusion, OSN can be modeled with social graph and analysed using graph mining techniques. Web crawlers may retrieve sensitive information from OSNs but the quality of the collected information will depend on the scheduler algorithm specitifities. Relational classifiers may provide relevant user information by just analyzing the graph structure information… Data anonimisation is needed for releasing OSN data without compromising the user’s privacy. This is a research field that is quite new and quite difficult… unlike statistical databases, where you can change one user without impacting on others, any change here does effect the network. And anonymisation algorithms need a trade-off between information loss and user anonymity loss.

Q&A

Q1) You talked about how much stuff is being datafied… Soon with smart watches we’ll have health data available. Because crawlers take some time… things could change whilst you are crawling.

A1) One of the problems in social networks and graph theory, is that algorithms for this sort of data are complex and time consuming… And that is a problem… Especially at scale. And sometimes you have the information, you make a lot of computation but the information is not static… so not only a lot of work not only on algorithms but also on understanding different and changes in the network – what happens when a node is removed for instance. There are people working on algorithms for dynamic data… But much m

Q2) What kind of research questions have you been using this with?

A2) There are two different issues for me in terms of social sciences… We don’t start with research questions… we start with problem and try to start it… So when AOL released data about lots of servers… you could identify individuals from the data… but you shouldn’t be able to… That happens because they don’t understand or care about anonymising data. So we are trying to provide tools to enable that anonymisation. We also have ideas about the crawling approach… So as a social network provider you might want to avoid this type of crawler… you might use this approach to trap or mislead the crawler… So the crawler end up in a dead end… and cannot crawl the network.

Q3) Some of the techniques you showed there were about anonymisation… do you use removal of nodes for that purpose

A3) There are several approaches for adding or removing nodes… Sometimes those approaches collapse those nodes… So you anonymise all the nodes too… But the general techniques that are more used are those that perturb and move the nodes.

Q4) One of the last things you said was about that trade off of utility of analysis and user privacy. My question is who makes that decision about the trade off? Would the people being studied agree with those decisions for instance, in the real world?

A4) The real world is much more complex of course. The problem is about deciding level of usefulness of the data… At the present time these methods are not used as far as they could be done… For statistical data this is often fixed by government… for instance in Census data you can see the method by which data has been anonimised. But for OSN there is nothing of that type, and nobody is telling… and basically no-one is releasing data… Data is money… So if we can try to give good algorithms to enable that, then maybe the OSN companies can release some of this kind of data. But at this moment, nobody is putting that idea of privacy there… Generally privacy level tends to be low, information level is high…

Q5) I didn’t totally understand how you set the boundaries of the network… Is it the crawling process?

A5) The idea is that there are no boundaries… Crawler goes… Maybe it completes within 1000 nodes, or 3 hours… or similar. You won’t crawl everything and you want some data. So 10 million users might be the boundary for instance… Then you have data to look at… So I have 10 million users out of a pool of 500 million… But which ones do I have? How representative? That needs consideration…

Q6) The crawler gathers a model of relationships and behaviours, and I’m sure that marketers are very interested. Is there potential to predict connections, behaviours, intentions etc.

A6) Yes, there are lots of techniques of graph theory that allow that sort of interpretation and prediction. OSN use these sorts of approaches for recommendations and so on…

Q6) How reliable is that data?

A6) Understanding similarities there can help make it more reliable… similarity rather than distance between nodes can be helpful for understanding behaviour… But I will say that they are quite accurate… And the more information they gather, the more accurate they are…

Q7) I was wondering when you were talking about measuring the effectiveness of different anonymisation methods… Is there a way to take account of additional data that could effect anonimisation

A7) In computer security in general, when you model someone you have to define the adversary model… What the adversary is able to do… So, what is the attacker able to have… The available information… So the more information is available, the harder it is to protect the individual. It is a complex scenario.

Q8) Is there a user friendly web crawler that can be used by non technicians…

A8) No. Sorry about that… No, because there are some frameworks… But you don’t have one solution to fit all… But the idea is that there are some frameworks that are more suited to computer science people… Tomorrow in the workshop we will explain extracting information from Twitter… And those techniques will let us explore how we could develop a crawler on Twitter… So exploring connections and followers, etc.

Q9) What are the ethics of web crawling in social sciences? And what are the positions of the OSN on that?

A9) You can crawl OSN because the information is public. So you can crawl Twitter, as information is public. If you want to crawl Facebook, you have to be authorised by the user to look at the profile… And you need to develop an algorithm to run as an app in Facebook… and authorise that… But that doesn’t mean the user understands that… But for instance in last US Election, Obama campaign did an application on Facebook that did that… graphing their supporters and friends… And use that in the campaign…

Q9) I was wondering about the crawling of discussion forums… where you cannot get authorisation. But you also mentioned that providers not keen… is it legitimate to do that…

A9) I think that it is… If you are crawling public information… There is another thing of the OSN not liking it – then they can make some restrictions. If I do things that avoid OSN restrictions that is fine… You can do that

Q10) I wanted to follow up on that… There are legal and ethical issues associated with crawling websites. You have to consider it extremely carefully. If I use a website that says it does not allow crawlers, I don’t expect it to be crawled and that would not be legal under data protection law. And there was some research about 10 years ago a research project found that bloggers, although posting in public, didn’t expect to be analysed and interpreted… And you do have to think about the ethics here… And you need to think about the user’s expectation when they put the data up.

A – Christina) Everyone uses Google, you can’t expect that when you put something on the internet you have to expect it to be crawled

A – Jordi) From my perspective, as a researcher doing cryptography what you say is quite strange… My work is about protecting information… It assumes people will be trustworthy with your information…

Q10) No, I’m saying an ethical researcher should not be breaking the law.

Comment) There can be an expectation of privacy in a “public” space…

Comment – from me) I would recommend the Association of Internet Researchers Ethics Guide for more on how you can mediate expectations of users in your research. For your cryptography work that may not be as relevant, but for others in this audience that guide is very helpful for understanding ethical research processes, and for thinking about appropriate research methods and approaches for ethical approval.

And with a gracious close from Jordi, we are done! There is a workshop running tomorrow on this type of analysis – I won’t be there but others may be tweeting or blogging from it.