The prisoners have the right to solicit parole. Every so often, in every prison the commission composed of experts (psychologists/sociologists) congregates . They main task is to grade, who can be set free and who cannot. Ideally, they would like to let go prisoners, who will not commit a crime in the future and to keep those, who will commit it. Unfortunately, the experts don’t know, what will do each prisoner on the loose. Two researchers (Wormith and Goldstone[1]) have decided to look closely on this subject.
They have collected a data about 222 prisoners, who have been holden in state prisons. They have checked , which one of them have been on the parole and which have not and how they have been acting (have they had the further problems with the law).
It has turned out, that 40% of the prisoners after leaving a prison, commit further crimes during first 21 months of freedom. In this test, the prison commissions have granted a parole to 44% prisoners and the decision was right in 58% cases.
The whole examine has taken place in 1984. Wormith and Goldstone have decided to check, if it is possible to improve results using a simple algorithm. The idea was to check what the prisoners in both group have in common. For example in turned out that, if the prisoner have a job to which he can return, it is more probable, that he will obey tha law in comparison with the prisoner, who don’t have one. 21 different information describing each prisoner has been collected. Among them, have been a data about: a kind of committed crime, susceptibility to abuse alcohol, level of aggression in the district where, the prisoner lives. They are standard information which are in the prisoners files (the prison commissions have the access during making a decision about granting a parole).
The algorithm has been simple and has been based on the adding up the favourable factors of prisoner well behaviour and subtracting factors, which rises the likelihood of becoming a repeat offender. The final result was a single number, which have been using proper conversion rate and assigning the prisoner into one of the five groups (boding very well, boding well, boding average, boding poor, boding very poor).
[drizzle]The important fact is that, that algorithm had been created few years earlier by another researcher (Nuffield[2]), during the testing of a diffrent probe of prisoners. The results we talked about, have not been available at that time. The outcomes of this simple mathematical equation have been called Recidivism Prediction Score (RPS in abbreviation).
Let’s check, if the RPS has been better in selecting proper prisoners than the group of experts. The same question has been asked in our poll (the probe has been 200 people), so firstly take a look on your opinion on this matter:
It turns out, that 73% people think, that a human (expert or a group of experts) will have better result. Only 27% think, that the better will be soulless programme.
In the reality, the RPS have granted parole to 49% of prisoners (with the grades from “boding very well” to “boding average” including). The correct decision has been made in 65% cases.
It seems, that mathematical equation has achieved better result, than the group of experts. Moreover, it has done it releasing more people, so his job was a little bit harder (if the commission has released the same number of prisoners like the algorithm, their score would be worse, because they would have to look for more questionable candidates).
Now, let’s jump to a little bit different discipline.
For many years the psychologists have a problem with recognizing, if the patient suffers from psychosis or neurosis. The difference between these two diseases is that, the patient with psychosis acts strange (i.e. suspects, that the walls are tapped and destroys them to check it) and he is not aware of his sickness. On the other hand, patient with neurosis is aware, that his thoughts are strange and is afraid, that he will fall into mental illness (i.e. the patient knows, that walls are not tapped, but he is afraid that one day he will start destroying them). Neurosis is not an early stage of psychosis and the patients which have it, never start to destroy the walls, but they live in the constant fear, that someday it can happen. It can seem, that these diseases are easy to distinguish, but in practice, in a lot of cases, it is very hard to say which one the patient has. The general symptoms are similar, but the background and the treatment are radically different.
Every patient is usually tested with MMPI before the diagnosis. It is the most sophisticated psychological test, which helps to create a profile of the examined person. It contains 500 questions and the person under the examination needs to answer YES or NO (i.e. “Do you have a good appetite?”, “Do you work under a lot of pressure?”).
The problem about MMPI is that, the results are the group of a several dozen numbers and there is a need of a huge experience in psychology for the correct interpretation. It happens, that two psychologists will come up with the different conclusions, after examining the same patient results.
American researcher (Goldberg[3]) has decided to check, how good are the psychologists in distinguishing a psychosis from a neurosis and if the person with bigger experience in the field, will achieve better results.
Goldberg has gathered 3 groups of “researchers”:
- experts – this group has contained 3 clinical psychologists, who had years of experience in analyzing MMPI profiles of patients,
- average – this group has contained 10 students who had graduate degrees in clinical psychology. They had basic knowledge about MMPI and general idea what are the difference between psychosis and neurosis,
- naive – this group has contained 10 people without any psychological knowledge, who have never heard of MMPI and their task has been to assign patients one of the two letters: “N” or “P”.
In the beginning, each person has gotten a several dozen of MMPI profiles with diagnosis written on the back (N or P). It were profiles of real patient, which diagnosis has been known and doctors had got a self-confidence about it. After that training, the “researchers” ahve started to judge the other MMPI profiles.
In the mean time, the simple algorithm has beem used to grade the same profiles. The algorithm is actually too much said. It has only summed up 5 certain numbers from the test. The idea, which numbers should be summed up and why, was designed by another researcher (Meehl 1959 [4]), 9 years earlier. The same as in the parole experiment, data of examined patients hasn’t been known during the creation of that algorithm.
The equation has been formed in the same way as in the previous experiment, so it has been tested, what MMPI profiles with confirmed psychosis (and neurosis) have in common. On that basis, the resulting indicators of MMPI have been chosen, which distinguish one disease from the other most effectively.
What has been the effectiveness of the algorithm in comparison to the effectiveness of people? For starters, let’s look again at how this question has been responded in the survey:
Here, the gap is smaller, than in the previous question, but again, more respondents have opted for the people (47%). 44% answers bet on the algorithm. I wonder at the discrepancy between the results of these two questions. Perhaps, the nature of the problem is a little different (this question was presented somewhat more mathematically, than that of the parole, which could suggest the superiority of a computer in his resolve), or perhaps the social responsibility is the issue (bad diagnosis in this case has negative consequences only for a single patient. Bad decision about the early release may change the life of the innocent people, so we are afraid to entrust the machine with it). The exact causes of this divergence are leaved for the further discussion.
Here we have the effectiveness, which was accomplished by each group of researchers:
If the profiles have been assigned randomly, the result would be very close to 50%. “Naive” group achieved a result of 58%, the “average” group was better and reached 65%. Interestingly, the expert group has been virtually indistinguishable from the “average”. It seems, that 65% is the upper limit of the effectiveness of this assignment, which the psychologists are able to achieve. The mathematical equation has reached the efficiency of 70%, thus was found to be better, than either of group consisted with people.
You could say, that it has been only two studies and the outcomes may depend on a specific area or be the result of the chance (it’s like e.g. a person winning the lottery could say, that it has happened thanks to her skill, not luck). Fortunately, the subject of algorithms versus people is enough popular among researchers, that has already been repeatedly tested in the various fields. William Grove with a group of scientists in 2000, took the trouble to gather all scientific work on this subject. The results have been described in his work “Clinical Versus Mechanical Prediction: A Meta-Analysis” (Grove [5]). A total of 136 independent studies from various disciplines have been collected by him (including the two, which I described above). The studies have included such issues as forecasting: success at work, suicide, effectiveness of teaching students to detect lies, advertising sales, the success of start-ups, diagnosis of various diseases and many others.
Here are the results:
It turns out that as many as 94% of studies show, that the algorithm / mathematical formula is as good or better at making decisions than a man.
At this point, one might wonder whether in terms of decision-making process, the selection of the best listed companies is not exactly the same as picking out:
- the most promising of prisoners for parole,
- neurotic patients from patients with other disorders (psychosis),
- students, who in five years will start to achieve the best results in science,
- potential, future suicide bombers from among many others,
- companies, which will cope best in the future,
- etc.
In all these areas a mathematical formula creates a better prognosis than people, who are experts with years of experience. A multiplicity of domains, that have been tested, suggests that the superiority of algorithms over the human judgment is independent of this factor.
If so, then there is no reason why any market expert (guru), or even a group of them would achieve better investment results, than the mathematical formula. I was curious, what is your view on this matter, so I have added the adequate question in the questionnaire. Here are the outcomes:
Here, decisions split almost equally between all the possibilities. In total, 70% of respondents say, that better investment decisions are taken by people and only 30%, that by algorithm. What is interesting, in earlier questions the answer, that a single expert will perform the best was selected very rarely (4% and 9% of responses). When it comes to the stock market, as many as 27% of you, bet on the individual.
What do you think of these results? I’m curious about your perspective on these researches.
Bibliography:
[1] Wormith, Goldstone 1984 “The clinical and statistical prediction of recidivism”[2] Nuffield 1982 “Parole Decision-Making in Canada”
[3] Goldberg 1968 “Simple models or simple processes?”
[4] Meehl 1954 “A comparison of clinicians with five statistical methods of identifying psychotic MMPI profiles.”
[5] Groove et al 2000 “Clinical versus mechanical predictions: A Meta-Analysis“. [/drizzle]