Making sense of Big Data

“There are those who look at things the way they are and ask why… I dream of things that never were, and ask why not?” – Robert Kennedy

Data seems to taking over the world. Everyday there is a new product or service offering us more and more data feeds and data points in areas that we never thought possible. Big data has emerged as one of the largest new sectors in modern business. This is a sector that is getting down and dirty with numbers from all different types of businesses, from finances, marketing, social media analytics all the way through to sports. The variety of the types of data we are seeing, the scope of the application and its use is seemingly limitless. With that said much of what you see in the media with regards to big data all seems very sexy and very distant to what most of us as practitioners really know is a daunting and intimidating business. Thousands of datapoints everyday in spreadsheets are frightening to say the least, ‘how do we know what to do with them?’ and ‘what does any of it really mean?’ are the questions we ask ourselves.

In sports over the last 8-10 years we have seen an explosion of available data that guarantees to lead us to the promised land of winning every game, carving your opponents apart at the click of a button and removing injuries from the game at the wave of a wand. Of course when you are working at the coal face of professional sport you often treat these claims as ‘too good to be true’ and you would be right in doing so. These systems cannot make changes in sport, the can only showcase information, it is up to us, as practitioners, to use this information to reduce injuries and improve performance.

Sensors and wearables have brought to the fore in sports an abundance of information and data points that leave us feeling a sense of ‘paralysis by analysis’. They have provided us with HUGE data not big data. 300+ data points for every athlete from every session, adding up to far more than 50,000 data points for a single team over a week showcases the enormity of the situation we are faced with every day in sport. Couple that with the fact that as practitioners in sport we have minimal minutes with the athletes everyday and the unignorable fact that we are not data scientists or mathematicians and you start to understand that this is a bigger problem than most people in the outside world could ever comprehend.

With the amount of data we have and the minimal window of time to actually sift through it how can we even have enough time to look at one days worth of training and really understand it nevermind look at it to see what it means in relation the that day, the day before, the week before, the entire season or even the week ahead. The dataset is overwhelming as it stands but the fact that the window to use the data is even smaller makes the problem infinitely more difficult. What do we want this data for, if not to change our decision making processes and help to keep our athletes in peak physical condition, competing at the highest possible level and ensuring that they are healthy? If we can’t harness the data quickly enough then we can’t achieve any of these goals and then the dataset just becomes an archive for looking back on what could have been. I believe that to really harness big data in sports to make a difference we need to be able to use the data in real time to change and adapt what we do everyday. There are a number of things I think we need to do to make this happen.

We first need to know what we are looking for. What is the question we are asking?

Its very easy to be sold on what we are being told by salesmen and companies about what we should be looking for and what we can see in the data but is that really what we want and does it really do what it says? These are hugely important questions for me. If we don’t know what we want we will never find the right answer. If we haven’t asked ‘the what’ then its extremely difficult to know if we are even tracking the right data. If we are trying to analyse injury risk in baseball pitchers and all we collect is information on pitching and game data but we don’t understand how pitching actually affects range of motion, strength or mechanics we know we will never really understand the risks as we don’t understand the consequences. Asking what exactly we are looking for will help us to look in the right places and ensure we have all relevant data at hand.

We also need to know, what does it mean? How does it help? What can I use it for?

Sensors have really informed us about simple things like distances that athletes run over the course of a week or year but what does that really tell us. What does it mean? If athlete x runs 8km and athlete y runs 12km what does that really mean? Should we be worried about athlete x because he has done less work or worried about athlete y because he has done more. It’s never as simple as this, obviously! We need to take into account the normative distances for each athlete, the composition of speed zones within that distance, the positional and the match play requirements to make an informed decision. But still how do we decide what’s appropriate?

I believe it is also our responsibility to question everything.

If data is affecting our decision making process then surely it is our duty to ensure that we understand why we are doing something and not just rely on predefined notions/ideas or general beliefs and consensus. Let the dataset change your mindset, don’t allow your mindset to dictate the story that the data tells you. Take for example, accelerations and decelerations as they are measured by wearable sensors. It is generally understood that accelerating and decelerating is more demanding and taxing on a human than just simply running fast. The velocity of these changes in speed is believed to cause micro-trauma to muscle fibres. I have heard of some coaches trying to control the number of accels/decels an athlete sustains in a training session as they were afraid of the risk of injury from microtrauma. I found this typically hard to swallow not because I don’t agree with the concept but because I’m not sure if we looked at the dataset that it would show that athletes with higher numbers of accels/decels also suffered more injuries as a result. If we are going to make decisions like this I believe we need to validate these thoughts as facts first. In some cases I would even ask the opposite question. Do athletes with higher numbers of accels/decels actually suffer fewer injuries due to the training stimulus elicited from training at higher intensities? Maybe we would find that we actually need athletes to perform more higher speed movements to remain injury free and not the opposite. I hope this simple example shows us why its important to question everything.

Know what to leave out. Find out what’s reliable and what’s just noise.

In enormous datasets there is bound to be a huge amount of noise. Start by understanding what all of the datapoints really mean (not what we are told they mean), how reliable and reproducible they are and then eliminate the noise. A lot of research published has shown that certain variables tracked by sensors and wearables have more variability than others while some show more sensitivity than others. It’s incredibly important to know which pieces of data are actually accurate and reliable especially given the fact that we are going to use this information for decision making. If we know a piece of data has 30% variability then we must take that into account when analysing and interpreting the data otherwise any decisions we make could actually be counter intuitive.

We need to look at the bigger picture.

For us to understand whether any of the data points we collect from athletes play a part in injury occurrence we need to bring them to together to analyse them alongside other markers. For example, if we are collecting markers on performance, well-being, recovery, physical activity and movement demands and we are using these data points to alter training programs with a view to reducing our injury profile then surely we need analyse these data points in line with injury occurrence. We need to capture not just the stress we place on these athletes but also the stress response that is elicited in each of these individuals and what it means. It’s very easy for us to jump to our own conclusions about what this data really means but to be accurate we need to allow the data to tell the story. We will only truly understand this when we bring all of the data points together. Trying to look at one dataset in isolation and then hypothesise the impact that it creates is dangerous and I believe will not bring us closer to solving the problems we find ourselves faced with in sport.

Using the 5 simple tips above will help us to start to unearth the answers to the problems and issues we all face everyday. At Kitman Labs we are basing our approach to reducing injuries and improving performance on these principles. Our flagship product, Profiler has been designed with all of these rules in mind. We are conscious of the fact that, as practitioners, we don’t want to sit behind our desks staring at numbers on a computer screen, we know that every second in sport is precious. That is why our system is designed to alert staff in real time, with only the pertinent information we need to impact decisions, to help keep our squads healthy and ready to perform at their peak.

Sports science has emerged as a new market in sport due to the explosion of data, research and technology in this area. Fitness and Medical staff now have a voice in sport and we want to help make that voice louder, clearer and more informed. We want to empower smart coaches and help keep them away from their computers, spending more time with their athletes.

It is my firm belief that data is the new soil in sport. But just like in farming we need to irrigate it, weed it, grow it and prune it before we can watch it bloom. The concept of a data janitor is one that is becoming increasingly popular as people begin to understand that there is enormous workload involved in utilising and really making sense of big data. At Kitman Labs, we are dedicated to doing just that, we are working hard on the data collected in sports to make it the most valuable tool a coach can use to make crucial decisions to help their team to win.

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on pinterest
Pinterest
Scroll to Top