Hadoop helps us analyze extraordinary amounts of data in a cost-effective way. Think of fraud detection by analyzing millions of transaction details. Such detailed data are not stored in a data warehouse because that would cost far too much. but if you want to get an insight in the typical behavior of fraudsters, you need that level of detail. That is where Hadoop comes in. It can store and analyze huge amounts of data without you having to spend equally huge amounts of cash.
Hadoop has two compelling advantages over traditional analytics tools. One: it is much cheaper, mainly due to it being open-source. Two: it can be scaled into infinity. On top, it can now process unstructured data as well as structured data, which wasn’t the case with earlier versions of Hadoop.
Hadoop and SAS, a compelling combination
So you can imagine why this Hadoop platform is so popular with big data analysts and data scientists. It is also popular with commercial software vendors such as SAS. We occasionally integrate Hadoop software in our solutions. And we provide the required interfaces to Hadoop for a fast and smooth exchange of data between the two environments.
In both cases the ultimate goal is the same: to combine the advanced analytics power of SAS with the cheap and endlessly expandable data crunching power of Hadoop. Users can thus exploit each platform to its full advantage. Hadoop can be used for the raw processing, filtering and modelling of data, which are then sent to the SAS engine for the advanced analytics. The perfect partners in crime.
Is Hadoop really flawless? I wouldn’t go that far. As we mentioned before, Hadoop is an open-source platform. This means that everybody can contribute, which in turn means that a version can feel outdated within a few months. It also means that there are many different flavours of Hadoop, a bit like the many flavours of Android: they share the same name but are nevertheless far from compatible.
Less related to open-source, but still: Hadoop is not exactly the easiest software to use. The barrier to enter is still rather high, so that it is mostly used by tech geeks. And this is exactly what we want tot avoid for analytics: that it is used solely by the technical profiles. In our opinion, the user base should consist mostly of business-related profiles.
Nevertheless, Hadoop will continue to evolve and play an increasingly important role in the world of analytics. That’s why you’ll see me enjoying these 2 days of the Hadoop Summit, filled with amazing user stories, fascinating new possibilities and - occasionally - exciting technical details on the future releases. And to meet fellow analytics lovers and exchange experiences. Maybe I’ll see you there as well?