A Month In Data, Part V

For the “A Month In Data” blog post series, I curate a set of interesting articles, links and resources that I have come across this month relating to data, algorithms and policy: from data science, AI and machine learning, through to ethics, society and governance. As before, alongside the main list — which is presented in no specific order or precedence — I also offer a set of short links to posts, academic papers and other relevant resources.

Part V: December 2017

In this fifth set of posts, we have everything from AI ethics, algorithms and the law, and emerging UK technology strategy, through to developing predictive capabilities, cyberwar, and generating Christmas carols with neural nets:

  • A Round Up of Robotics and AI ethics: part 1 Principles
    This excellent summary post by Alan Winfield is a round up of the various sets of ethical principles of robotics and AI that have been proposed to date, ordered by date of first publication. The principles are presented here (in full or abridged) with notes and references, but without detailed commentary.
  • Should we be afraid of AI?
    Machines seem to be getting smarter and smarter and much better at human jobs, yet true AI is utterly implausible. Why? After so much talking about the risks of ultraintelligent machines, it is time to turn on the light, stop worrying about sci-fi scenarios, and start focusing on AI’s actual challenges, in order to avoid making painful and costly mistakes in the design and use of our smart technologies. (you might also enjoy this other post by Luciano Floridi: Why Information Matters)
  • 2017 Was The Year We Fell Out of Love with Algorithms
    Algorithms that amplify fear and help foreign powers put a finger on the scale of democracy? These things sound dangerous! This a shift from just a few years ago, when “algorithm” primarily signified modernity and intelligence, thanks to the roaring success of tech companies such as Google — an enterprise founded upon an algorithm for ranking web pages. This year, there has growing concern about the power of technology companies, increasingly regarded as our “algorithmic overlords”. Also see: In 2017, society started taking AI bias seriously.
  • Chasing trains: The UK talks a good AI game but is it losing pace?
    In the scramble to promote sectors of British excellence before Brexit, government and industry have galvanised around artificial intelligence. But sowing a flower bed for AI in the UK is more than a matter of selling startups to internet giants, and there is a concern that the UK is resting on a few lush laurels; the country needs to cultivate its soft power in AI — leading by example in how it interrogates the ethical dilemmas posed by new technology.
  • A Reality Check: Algorithms in the Courtroom
    Can imperfect algorithms help address systemic inequalities in the criminal justice system, a way to combat the capricious and biased nature of human decisions? This post addresses three critical questions: how well does pre-trial risk assessment work in practice, what do the tools actually measure, and how are the tools related to the life-shaping decisions reformers care most about? (in a related theme, check out how Lawyer-Bots Are Shaking Up Jobs)
  • Four posts on further imaginative uses for neural nets:

  • Three posts on a theme of developing (and understanding) predictive capabilities:

    • Wisdom of the Crowd Accurately Predicts Supreme Court Decisions
      Crowds can sometimes be wiser than the smartest individuals they contain; now researchers at the Chicago Kent College of Law in Illinois have carried out the largest study of crowdsourcing (using data from FantasySCOTUS) in predicting SCOTUS decisions (also see the arXiv paper).
    • Predicting Stock Performance with Natural Language Deep Learning
      Microsoft and a financial services partner have developed a model (using convolutional neural networks running on the Azure Machine Learning Workbench) to predict the future stock market performance of public companies in categories where they invest. The goal was to use select text narrative sections from publicly available earnings release documents to predict and alert their analysts to investment opportunities and risks.
    • YouTube Views Predictor
      A comprehensive guide to getting more views on YouTube backed by machine learning. Their goal was to create a model that can help influencers predict the number of views for their next video; due to the sheer scale of the problem, the scope was narrowed to fitness-related videos, creating a predictor that could be useful for moderately sized YouTube channels.
  • How to break a CAPTCHA system in 15 minutes with Machine Learning
    Everyone hates CAPTCHAs – those annoying images that contain text you have to type in before you can access a website; CAPTCHAs were designed to prevent computers from automatically filling out forms by verifying that you are a real person. But with the rise of deep learning and computer vision, they can now often be defeated easily.

  • Three posts with a cyber security and national security theme:

    • How An Entire Nation Became Russia’s Test Lab for Cyberwar
      A hacker army has systematically undermined practically every sector of Ukraine: media, finance, transportation, military, politics, energy. Wave after wave of intrusions have deleted data, destroyed computers, and in some cases paralysed organisations’ most basic functions. They have been part of a digital blitzkrieg that has pummelled Ukraine for the past three years — a sustained cyber­assault unlike any the world has ever seen.
    • Machine Learning for Cybercriminals
      In the past year, there has been ample information on the use of machine learning in both defence and attacks (especially defence); the objective of this article is systemising information on possible or real-life methods of machine learning deployment in malicious cyberspace. It is intended to help members of information security teams to prepare for imminent threats.
    • Project Maven brings AI to the fight against ISIS
      For years, the US Defense Department’s most senior leadership has lamented the fact that US military and spy agencies, where AI technology is concerned, lag far behind state-of-the-art commercial technology. Project Maven is a crash program that was designed to deliver AI technologies — specifically, technologies that involve deep learning neural networks — to an active combat theatre within six months from when the project received funding.

  • And finally, four posts with a technical theme:

    • A Zero-Math Introduction to Markov Chain Monte Carlo Methods
      For many, Bayesian statistics is voodoo magic at best, or completely subjective nonsense at worst. Among the trademarks of the Bayesian approach, Markov chain Monte Carlo methods are especially mysterious. They’re math-heavy and computationally expensive procedures for sure, but the basic reasoning behind them, like so much else in data science, can be made intuitive.
    • Difference Between Classification and Regression in Machine Learning
      Fundamentally, classification is about predicting a label and regression is about predicting a quantity; in this tutorial, you will discover the differences between classification and regression.
    • Understanding Dimension Reduction with Principal Component Analysis (PCA)
      The “curse of dimensionality” refers to an exponential increase in the size of data caused by a large number of dimensions. As the number of dimensions of a data increases, it becomes more and more difficult to process it. Dimension Reduction is a solution to the curse of dimensionality and Principal Component Analysis (PCA) is one of the most popular linear dimension reduction methods (this tutorial is from a seven-part series on Dimension Reduction).
    • Creating isochrone catchments from a (distance) matrix
      This article presents a simple Javascript algorithm for the creation of drivetime catchments from a distance matrix API. A catchment is defined by an origin (given as a latitude and longitude coordinate) and a distance in seconds which can be reached from the origin.

You might also like…

(check out all of the previous posts in the A Month In Data series)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.