What NOT to do in Big Data Management [SAS FORUM]

The evangelization phase of Big Data is coming to an end. Few are the business & IT leaders who do not understand the massive potential for customer, market and competitor intelligence. But that does not mean that we’re “there” yet. Many are still just launching themselves in Big Data, so there are still a lot of fallacies out there. For those who are considering a first Big Data project, I wanted to list the top 5 most common misconceptions and give tips on how to overcome them.

By Anthony Severeyns, Pre-Sales Consultant at SAS Belux

This blog is part of a tailor-made content series centred around the SAS Forum Belux 2015. It is linked with the event track called `Data Management’. Click here to join the event and learn more about the other 3 tracks (Internet of Things, Digital Society and Data Science).

1 - Do NOT think it is just about technology

One of the most frequent mistakes companies make, is perceiving a Big Data project as a classic IT trajectory. They install Hadoop, gather as much data as they can, and then they basically wait until they stumble upon a use-case that will overthrow the future of their company. “If you build it, they will come.” Right? Absolutely not. Not in a Big Data environment, which is much too complex for this kind of unjudicial behaviour.

Instead, carefully develop a use-case first and establish a roadmap. What does your company need? How can Big Data help you achieve it? Where do you want to end up eventually? You need to answer these and other ‘Big’ questions first. Try to think beyond thick & smart reports, though. When you launch yourself in Big Data, start small, and experiment. See what works and eliminate what doesn’t, fast. But, whatever you do, always think `business first’.

2 - Do NOT focus on disconnected efforts

Big Data is an ecosystem. If you do not respect this premise, your projects will fail to deliver. Basically, you need to link every shackle of the Big Data process: your internal databases – from all the relevant departments, not just the most ‘popular’ ones like marketing – with external data sources and structured with unstructured information. But you ought to connect all your smart applications too.

True, if you analyse terabytes of data, “do” social media listening and offer e-shop recommendations, you are already farther than most. But until you connect al your data and insights, you are not getting everything out of Big Data that you could. It comes down to this: if you still think and act in silos, you don’t have Big Data, you just have ̏a lot of data in many different formats.” So, make sure you think in integrated platforms.

3 - Do NOT hire an army of data scientists (for the wrong reasons)

Data scientists are a breed apart: among many other things, they are familiar with data governance, programming, statistics, machine learning, calculus and data visualization. On top of that they have the advantage of a high business understanding and excellent communication skills. And, yes, that is why they are very hard to come by, … and darn right expensive.

Don’t get me wrong: you will need 1 or more – depending on how many projects and how big your company is obviously – (external) data scientists if you’re serious about Big Data analytics. But do not make the mistake of leaving everything that has to do with Big Data in the hands of these highly intelligent ‘unicorns’. The more you centralize everything around them, the more of a bottleneck they risk becoming and the more frustrated your business users will be.

The solution is to pick your analytics wisely: instead of going for the complex ones that can only be ‘fed’ and understood by an army of data scientists, opt for tools that offer a far-reaching self-service and automation as well as a clear and understandable visualisation of outcomes. Choose a solution that your business users can work with. That way, you do not need a ‘choir’ of data scientists to make your data ‘sing’, you’ll just need a data scientist ‘conductor’.

4 - Do NOT neglect your data

Everyone who has ever worked with data knows the maxim “Garbage in, garbage out”. Big Data insights are only as valuable as the data they were extracted from. You’ll need information management processes and data governance to ensure your raw material is clean. Be proactive about it upfront too, so quality issues will not arise later.

Yet, at the same time, analysing Big Data is very different from mining and interpreting regular data. It is about uncovering trends and patterns, which can still emerge even if not all of the data is 100% clean. Just realize that, with the massive amounts of structured and unstructured Big Data, it is a utopian wish to purify everything in time. But you still need to think first about what is useful and what not, and remove the outliers that will disturb the results. Always get your data foundation right!

5 - Do NOT store everything

Big Data has a short-lived value. Most of it is extremely fertile at the time it is created and its worth decreases fast as time ticks away. That is obviously why (near) real-time analytics are the pinnacle of Big Data, as they are turned into insights when they are still at their most valuable. This basically means that a lot of Big Data is not useful to keep and store.

The answer for a lot of Big Data efforts – not just real time – is to stream data, rather than store it. It gets mined carefully and only the useful parts will effectively be stored. If you are not streaming your Big Data, some of you might very well be throwing away budget on useless storage. So, uncover your use-case (and we’re back full circle, to point 1), see what data you need and use streaming for those parts that are only valuable in the moment.

Join SAS Forum Belux on October 15 in Antwerp to gather other valuable insights about the digital society, the Internet of Things, data science and data management: www.sasforum.be