A Month In Data, Part II

In my new monthly blog post series — “A Month In Data” — I have curated another set of interesting articles, links and resources that I have come across this month relating to data, algorithms and policy: from data science, AI and machine learning, through to ethics, society and governance. As before, alongside the main list — which is presented in no specific order or precedence — I also offer a set of short links to posts, academic papers and other relevant resources.

Part II: September 2017

In this second set of posts we again have a broad spread of topics reflecting the breadth of the series — from machine learning, neural nets and algorithmic transparency, through to Brexit botnets, mapping the creative economy and data journalism:

  • London Underground Wifi Tracking: Here’s Everything We Learned From TfL’s Official Report
    This builds upon some of the findings from last year’s wifi tracking trial, in which Transport for London analysed wifi data picked up from phones as people travel on the London Underground. In the month in which the trial took place last year, it logged more than 500 million (anonymised) wifi connection requests from around 5.6 million devices, augmenting existing Oyster data to provide more insight into movements across the tube network.

  • Don’t use a neural network to name your next pub
    As part of a project on the cultural peculiarities of pubs called “The Last Hour!”, a neural network was used to generate potential names of pubs from a training list of 1,053 pubs from north-east England. As the neural network progresses in its training, the proportion of terrible — and rude — pub names only increases, from the Bollock Hotel and Mingside Arms, to the Doss of Wulling of Stank.
  • Are Engineers Responsible for the Consequences of Their Algorithms?
    Algorithms already control a stunning amount of our lives — the information we see, the jobs we get, even how much defendants should pay for bail. That’s unlikely to change as technology is increasingly integrated into the systems around us. And though they are supposed to help us make decisions without our fallible human subjectivity, algorithms often end up perpetuating our pre-existing biases. Therefore: should engineers be responsible for the consequences of their algorithms? (yes)
  • Row over AI that ‘identifies gay faces’
    A widely reported (e.g. here, here and here) facial recognition experiment at Stanford that claims to be able to distinguish between gay and heterosexual people (with 91% accuracy) has sparked a row between its creators and two leading LGBT rights groups. While details of the peer-reviewed project are due to be published in the Journal of Personality and Social Psychology, this looks like hugely irresponsible research from a highly biased training dataset. If you are a researcher and you are scraping profiles from the internet with the intent of using that data to build intrusive tech…stop.
  • An anatomy of murder: How The Economist does data journalism
    Is there a special process involved in reporting a data story? Does a data journalist go out into the field like the paper’s other correspondents? Using a recent story on America’s rising murder rate as a case study, this article lists all of the trials and triumphs of data reporting.
  • A Field Guide to Fake News
    A project of the Public Data Lab with support from First Draft, A Field Guide to Fake News explores the use of digital methods to trace the production, circulation and reception of fake news online. An open access sample of the first three chapters of the guide is available online, as well as a recent talk on “Fake News in Digital Culture” at the 2017 Institute for Policy Research Symposium.
  • Three data-related blog posts on the creative economy/industry from Nesta, the UK’s innovation foundation:

    • The clubbing map: What has happened to London nightlife?
      London club closures are leading to widespread concern, due to the decline of London culture that they imply, but also the growing awareness of the economic impact on the wider night-time economy and creative sectors; to inform debate in this area, Nesta has published an interactive map of London nightlife and its closures with an accompanying analysis.

    • Decoding the value of music in cities
      A new method to evaluate quality of life in cities is emerging: using music as an indicator, and focusing on the prevalence and success of one’s local music sector, a number of people surmised that if one investigates the importance of music on a city, you can take lessons from music and apply them to other urban indicators.
    • Women in film: what does the data say?
      Nesta have undertaken a gender-focused analysis of UK film casts and film crews using a new Filmography from the British Film Institute, which contains records on over 10,000 UK films, stretching back to 1911, and over 250,000 unique cast and crew members.
  • I asked Tinder for my data. It sent me 800 pages of my deepest, darkest secrets
    In March, journalist Judith Duportail asked Tinder to grant her access to her personal data, as is her right under EU data protection law. Some 800 pages came back containing information such as her Facebook “likes”, links to where her Instagram photos would have been had she not previously deleted the associated account, her education, the age-rank of men she was interested in, how many Facebook friends she had, when and where every online conversation with every single one of her matches happened…the list goes on, raising interesting questions about the data privacy of the other parties involved.
  • Paper: The Brexit Botnet and User-Generated Hyperpartisan News
    This paper in Social Science Computer Review by Marco Bastos and Dan Mercea analyses a network of Twitterbots comprising 13,493 accounts that tweeted during the UK’s EU referendum, only to disappear from Twitter shortly after the ballot. The results move forward the analysis of political bots by showing that Twitterbots can be effective at rapidly-generating cascades and that the retweeted content comprises user-generated, short shelf-life, hyperpartisan news (albeit not strictly fake news). In a related theme, check out this paper on how social media, news and political information was used during the US Election, especially concentrated in the swing states.

  • First Evidence That Night Owls Have Bigger Social Networks than Early Risers
    Human activity follows an approximately 24-hour day-night cycle, but there is significant individual variation in awake and sleep times. Individuals with circadian rhythms at the extremes can be categorised into two chronotypes: “larks”, those who wake up and go to sleep early, and “owls”, those who stay up and wake up late. To study how chronotypes relate to social behaviour, the study (read the full paper on arXiv) used data collected using a smartphone app on a population of more than seven hundred volunteer students to simultaneously determine their chronotypes and social network structure. If you stay up late, your social network is likely to be bigger than those of morning people: they find that owls maintain larger personal networks, albeit with less time spent per contact.

You might also like…

(also check out last month’s inaugural post in this series)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.