In defence of algorithms
The A-Level results fiasco has been all across the news this week, and rightly so. There have been many examples of students getting completely unrealistic grades, and students from poorer backgrounds seem to have been worst affected. A lot of online chatter seems to have focused on blaming The Algorithm, along with people getting lots of plaudits for making the completely new and razor-sharp insight that algorithms are still designed by humans and thus susceptible to bias. Then, I saw this.
I’ll give my thoughts on the grading algorithm later on, but this tweet made my This Person Doesn’t Know What They’re Talking About siren wail, and I wanted to outline a little bit about why, using an area that Susskind references, and of which I have a decent amount of experience: insurance.
The criticism/warning here is that insurance is being derived using data-analytical methods rather than approaches that would “care little for you as an individual person”. Before we start, a couple of quick points: (i) an algorithm is really just a set of instructions, and good luck pricing insurance without a robust procedure for doing so in place, and (ii) however biased you might imagine an algorithm may be, having a pricing algorithm regularly reviewed, tested and refined by a company (and likely its regulatory body) is going to be less biased than if you do away with the algorithm entirely and let individuals set insurance costs free from any procedure and at their own arbitrary whims.
But anyway. Let’s suppose you work for a car insurer, and you’ve been asked to price 10,000 fully comprehensive car insurance contracts for potential customers. How are you going to calculate those prices, whilst keeping to Susskind’s ideal of caring for customers as individual people? Well, I suppose you’d start by looking at each individual’s driving history. Have some of them crashed recently? Or have points on their licence? This might allow you to establish relative risk at least.
However. There will be plenty of applicants in your 10,000 who have never crashed, and this is where things get tricky. Car insurance just doesn’t work like online shopping. If you order home delivery from Tesco’s once a week, and buy a tin of beans in every order for a year, Tesco’s has fifty-two data points, and its algorithm is going to be pretty confident in recommending you a tin of beans when you log in next time.
Car crashes, by contrast, are mercifully rare. In your 10,000 applicants, you will likely have many people who have been driving for ten years without incident. So, treating these people “as an individual”, would we just say we’ll charge them nothing for insurance next year, as their driving history is spotless? Almost certainly not, because there is still a chance that year eleven doesn’t work out as well for them. So the “individual” approach begins to get stuck.
To make the problem more stark, suppose your 10,000 applicants are all 18 year olds, who have all just passed their test. They have no driving history. How are you going to price a car insurance contract for them, whilst caring “for them as an individual”? I suppose we could spend millions on paying underwriters to go and interview each of the applicants’ friends and family to get an idea of whether they’d be safe drivers, but they could have a rose-tinted view of their newly-qualified mate’s driving abilities, or they could simply lie to us. We could go around and see how many Fast & Furious DVDs they have. But really, the answer to “how can we avoid algorithms and price to the individual?” here is: you can’t, because you have no information.
Enter the algorithm! Because one thing we absolutely do have in abundance is data on car crashes that have occurred in the past. We may not be able to find out much about the risk of our 18 year old applicants this year, but we do know a lot about how much it has cost to insure 18 year olds in previous years.
Insurers will go further than just looking at 18 year olds, too. They gather lots of data to inform this pricing process, split up by lots of different risk factors. So we might have a good idea about how much, on average, an 18 year old in a certain postcode with a certain vehicle type and a certain intended annual mileage might cost to insure in any given year (the risk factors are in bold here), because we can look at how much that same cohort cost in previous years.
(Note that sex was previously used as a risk factor in car insurance until the EU ruled that there could be no gender discrimination in pricing contracts; the upshot is women now pay a lot more for their cover and men pay less).
So insurers will in practice use a combination of individual factors (driving history) and cohort factors (age, location) in order to price a contract to what they expect to have to pay out. As always, there is a balance to be struck; more data will mean more accurate pricing, but simpler and shorter application forms are more customer friendly.
From a business point of view, it’s absolutely critical that they get all of this right. Price too low, and they end up losing money. Price too high, and customers go elsewhere. If there are any biases in the algorithm, insurers will lose out, big time.
Of course, it is worth noting that some customers will lose out here. A very conscientious 18 year old who will drive very safely is going to get a similar premium to an 18 year old boy racer; in actual fact, the conscientious driver ends up subsidizing the boy racer. We can say that’s unfair, but the insurer can’t do any better, because they don’t know the applicants anywhere near well enough to be able to make this distinction — and even if they tried, you aren’t going to eliminate bias this way, because somewhere along the line a human judgment would still have to be made as to how much more the boy racer should be charged.
Insurers aren’t using algorithms because they’re evil, but because years of data gathering and decades of actuarial expertise has proved them to be the most effective and fair approach.
Which brings us back to the A-Levels. Unlike insurance, the modelling for calculating the grades was brand new, and the algorithms had not been rigorously tested and refined over many years. Unlike insurance, the data to do the calculations was clearly inadequate, scanty and unreliable. Reportedly in “backtesting” the algorithm (applying the algorithm to last year and comparing against the actual results), 40% of the grades given were inaccurate; a level of inaccuracy which would see an insurer out of business very quickly indeed.
From what I know of the grading process, I think there may have been the following issues;
- it appears that large swings in grades compared to expectations were not properly validated, reviewed and adjusted as necessary. There have been examples on social media of students getting wildly unrealistic grades, and a good algorithm should flag these up.
- it is not clear whether analysis of outputs were broken down into segments such as private vs state; the assumptions driving the algorithm appear to result in more favourable outcomes for private school students, and again this should have been flagged.
- the inaccuracy highlighted by the backtesting.
- the data available was insufficient for the task at hand.
A key question for me is whether these modelling limitations were not properly communicated to government, or the government was aware of the level of inaccuracy and chose to proceed anyway.
Models serve a purpose but ultimately decisions rest with individuals; if a car insurance pricing algorithm is failing, directors of the firm who signed off on its use will carry the can, and the same is true here of Ofqual and the government.
But a new and commercially untested algorithm built upon poor quality data that has demonstrably failed to deliver accurate A-Level results does not imply that algorithms currently being put to use for commercial purposes such as insurance are therefore also unreliable and biased, and, although I appreciate people have books to plug, I find it a quite lazy, uninformed and risible claim to make. In retort, I merely say: insurers have huge amounts of data, experience and expertise on their side, but if you’ve got a more efficient and humane way for the industry to price its products, we’d love to hear from you.
Oh, and if you’ve got this far, one final point: change your car insurer regularly for better prices.