Predicting the Unpredictable – The Truth About Injury Prediction


Stephen Smith | CEO @ Kitman Labs 

Injury Prediction feels like such a dirty word these days. It’s become somewhat of a stigma in this industry because  many people have been beating a drum about how they are doing it or going to do it. We, like many people within the industry do not believe that today there is a solution to predict injury nor do we believe that there will be one available in the near future. I am not saying we cannot make a huge impact in reducing the prevalence and impact of injuries but as far as prediction goes, it is more complicated than a younger version of me once hoped for.

When I first founded this company and was in the early days of my research around this topic I believed that we would find all of the answers to this and solve a huge problem in sport. My thoughts on the latter part of that statement have not changed and I have seen the evidence that we are solving a huge problem day to day with the teams that we are privileged to work with. However, the realization that prediction of injuries is not as straightforward as I initially hoped came after much work and data collection in this space. Today the focus must be on risk management and risk assessment so that we can pave the way to build robust models that may lead to prediction in the future.

One of the biggest issues we currently have within this industry is that we are trying to provide a simple solution to a complex problem. Injuries are rarely caused by one specific risk factor or problem. The realization that humans are complex and that human response and human failure is just as complex and involves the degradation of multiple factors must not be overlooked. As much as we may want there to be a simple and straightforward answer and solution there needs to be a level of complexity that factors in the uniqueness of humankind.

I was recently at a conference where I heard a practitioner present and discuss the fact he predicted 99 injuries the previous year with a team saving them many millions of dollars. 10 years ago I would have thought “wow, that is amazing”, I need to be able to leverage this and see how they are doing it. Today, I ask a very different question based on the learning and education I have been provided by the other fantastic sport & data science professionals I work with everyday. The first thing that popped into my mind that day was, if you predicted them, how do you know? If they didn’t happen, how do you know they would have? It seems simple but honestly how do we know prediction works, it relies on an event to actually happen and for us to showcase that we knew about it beforehand.

Secondly, if you did indeed predict these events, how many times did you predict events that did not occur. Even with the most robust of prediction models there are error rates, so it is always important to know how many false positives are actually produced. For example, if 99 injuries were “predicted” but 1000 were forecast then the accuracy rating is just below 10%. This probably would be less effective than guesswork. Unfortunately the number of false positives were not reported but that would be very helpful in understanding the value proposition provided.

Thirdly, injuries were reported globally as being predicted but as we discussed in my previous article, “Sport Science – Time to Deviate from the standard” injuries are not all the same. What causes a hamstring injury is very different to that of a torn rotator cuff. What types of injuries are being predicted and what information is allowing this process to be effective is crucial to understanding the efficacy of any system or solution.

Fourthly, I continuously wonder how so many people can suggest that they can reduce or predict injuries when they don’t even store injury data. I mean if we are trying to predict something surely we need to be able to warehouse that information and store it as a signal in our analytics process? Given the amount of people speaking about being able to predict, I am shocked to see that the majority of them do not even store injury data.

Lastly, if prediction were possible in this space the datasets and sample sizes needed would be pretty large. Think about the numerical aspect of this and the statistical significance needs to be able to showcase that this is possible. We would most likely need hundreds if not thousands of rotator cuff strains in a database, if we simply wanted to get close to even building prediction models for that injury alone. For any injury that we truly want to have a chance of predicting, we really would need this many events to begin to show a level of significance that would be considered predictive. And not only would we need the injury data but we also would need the longitudinal exposure data and response information to showcase which key markers actually allow for some level of insight as to what we can use to help reduce this injury and alleviate the problem.

The purpose of writing this article was not to look down on others that are making strong claims in this space but to highlight the fact that when we think about prediction like I did many years ago, that we need to take a detailed and mature approach to how we solve this problem. This also does not mean that we cannot assess risk in human beings day to day, in fact that is exactly what we need to do. The data that we have today is still valuable and can be used to help evaluate patterns and trends and the collation of large databases across multiple teams renders a dataset exponentially more powerful in deriving risk and helping practitioners to reduce this risk. We should be collecting datasets to understand the importance of information regardless of significance and using this to aid our decisions as we build our datasets and solve this problem. The problem as most people know is that this can’t be solved alone. As a team working in the elite sport space with 40-50 athletes, you may see 25 injuries a year, maybe less if you’re lucky. Of these 25 injuries, maybe 5 of them will be the same, to build a large enough dataset to build predictive models here would require hundreds of years worth of data from this one team. Forgive me for being a skeptic but teams need results and action today. That’s where the power of a large dataset comes into play.

We are focused on building the world’s largest repository of not only injury records but also of exposure data to help teams build towards a better future and one where maybe the word “prediction” will not raise as many eyebrows as it does today.


Testing assumptions about ACL assessment
NCAA Wellbeing - Non Sport Stress