Unpacking the data deluge: 5 reasons why big data projects fail
- 21 July, 2020
- Article - New Technology
The exponential growth of digital products and services means more data being gathered than ever before: customers, usage, patterns, and problems. With this growth, we have needed to store and manage it all, but more importantly, to figure out how to learn and benefit from it. This has driven the increasing popularity and desire to incorporate machine learning (ML), artificial intelligence (AI) and data science into most new generation systems and products, these big data projects are often ambitious, and often failures.
Industry is scrambling to capitalise on this data, causing a surge in demand for skilled data scientists, engineers and developers who can curate, store, process and analyse data. However, according to a Gartner report, at least 60% of big data projects fail to even move past preliminary stages, but it’s predicted that these numbers could be closer to 85%. The overwhelming hype has led to unreasonable expectations from the business, and change is outpacing the production of competent professionals.
Even though only a fraction of data science and analytics models that are developed actually make it into production usage, technology is not the only point of failure. In fact, technology is a minor cause of failure relative to the real culprits. We unpack 5 key reasons why big data projects fail below.
1. Data science weekend warriors
Due to the growing hype around data, there has been an influx of interested people that are learning and practicing data science without experience. They can churn out something that looks great but isn’t necessarily accurate, and they struggle to productionalise it. They lack the theoretical background to understand which algorithms to use, and why certain algorithms work or don’t work in each context. Their exposure is generally a narrow portion of ML, limited to pet projects or data that they’ve been exposed to.
2. Productionalisation problems
Some data science is only meant for hypothesis testing and doesn’t need to be taken into production, but there are still multiple data science projects that do need to be productionalised. These projects fail when they can’t be productionalised due to data pipeline issues. Most data scientists want to deal with the ‘fun and attractive’ model building, without taking time to understand the end-to-end process. This leads to hacked together 'production' solutions that often fail and contribute to the large pool of unsuccessful projects in industry.
3. Lack of data engineer skillsets
Everyone wants to be a data scientist but very few want to be data engineers. Typically, the industry needs double the amount of data engineers to data scientists, since around 60% of the work is prepping and pipelining data, and only 30% is covered in areas like machine learning. The skillset for data engineers is also quite different from that of data scientists, and much closer to a traditional software engineer, mixed with DevOps and the understanding of ML algorithms.
4. Lack of customer buy-in
Due to the above issues, customers have been burnt by unsuccessful data science projects and are wary to offer their support. Badly productionalised solutions end up costing businesses more money than anticipated, weekend warriors who lack proper skill and experience take longer to reach a conclusion with their projects, and the lack of engineer skills makes deriving value from these projects more difficult.
5. Expecting ML to be a silver bullet
There is a common misconception that machine learning can do anything with any data. This lack of understanding of the meaningful use cases for ML can cause friction and unmet expectations. ML has its place, but if you aren’t sure what you want, any road will get you there. The proliferation of data makes this even more difficult, because people think data implies ML, without understanding the quality of the data and the problem they’re trying to address.
As more organisations turn to AI, ML and data science to drive decision making, introduce new products and maximise resources, the more important it becomes to ensure the success of big data projects. To avoid adding to the growing statistic of failed initiatives, businesses need to have a realistic data strategy in place to prepare for and avoid these pitfalls.
Recruit or partner with experienced data professionals who have a proven track record of successful data projects. Do enough preparation and research before rushing models into production, and promote cross-skilling within your organisation to help fight the data skills gap. If you’re still having trouble with gaining customer buy-in, move slower and stay involved to ensure you remain on the right path. Finally, recognise that ML (or any data approach for that matter) is not a silver bullet, and time needs to be invested to understand the strengths and weaknesses of the various approaches before starting data projects without clear direction.