Is Europe (and Belgium) in the initial phase of Hadoop adoption?
I have witnessed Hadoop adoption going at multiple speeds: while a first wave happened already some 4-5 years ago, it didn’t really break through on an enterprise scale. Those early adopters have implemented Hadoop for very specific use cases, for example in the area of high volume web traffic monitoring, for collecting sensor data in real time, or also for storing and analyzing highly unstructured data. Today we see much more interest for integration of Hadoop at an enterprise level, bringing together data of very different natures and combining them into a single view.
What are the big advantages of this technology adoption?
There are multiple advantages.
- First there is an opportunity to avoid costs as Hadoop has proven to be much cheaper when compared to traditional storage and distributed data computing platforms.
- The second advantage is its scalability: all predictions clearly indicate that the explosion of data volume and the speed at which the data needs to be processed are only the beginning. Hadoop offers a highly scalable platform that can evolve further with future needs for more processing capabilities.
- Last but not least we must not forget that Hadoop is also a processing platform besides just a data storage platform, capable of processing the data in a distributed manner, minimizing all data movement between nodes.
What are the advantages for the early adopters? What can we learn from them?
We have seen early adopters of different kinds. On one side there have been the purely digital companies, such as Yahoo, Google, Facebook etc. On the other side there have been early adopters in more traditional businesses. For the digital companies Hadoop has been an enabler that has helped them to cope with the phenomenal growth that they have seen. And that does not only apply to the bigger, well-known names such as Yahoo and Facebook but it is true as well for many small digital startups out there that have embraced the benefits of Hadoop as of the first day of their existence. For example: Photobox, Spotify, and some of the online gaming platforms are all running their platform on Hadoop.
The more traditional business started with Hadoop in a more cautious manner, sometimes in a pure IT testing environment to experiment with technology and see where it can bring value. I also see very successful projects in traditional businesses that are setting up Hadoop as a central component in an innovation data lab setting, not mingled in their production environments. They have chosen this approach because they wanted to get to results quickly without going through the hurdles of testing Hadoop against all their IT standardization demands. They bring together people from the business, IT and some data scientists that collaborate on data experiments in a “fail-fast” approach, thus being able to run multiple innovation experiments on both internal and external data with very short iterative cycles.
Very few organizations however are using Hadoop as a pure RDBMS (relational database management system) replacement, or for example to replace an entire data warehouse. There is very little value in there. Instead a ‘fit for purpose’ approach works out best, maintaining the RDBMS/EDW for specific standard query workloads, but with less data attributes than previously envisioned. The newer data formats and additional attributes can be handled more cost effectively in Hadoop.
From your experience, is there any industry that is taking the lead?
The same industries that have pioneered with data analytics are the ones that have been looking into Hadoop first. We find the most mature projects in the Telco and Financial Services industry as they have digitized most of their operations already a long time ago. But other industries are catching up quickly, sensors applications, the Internet Of Things and Life Sciences are now gaining more maturity and often integrating Hadoop as a core component in their architecture.
How do you see Hadoop in the future and how can it be integrated with today’s technologies?
Initial projects have been mostly on the data storage, and (ETL) offloading side. There is more demand to perform real time analytics on this data and the ability to integrate them, again in real time. This brings new challenges because in a real time application the data never stops, as soon as it is generated it starts streaming and it has to be processed, transformed and analyzed while it is streaming. Only at the end it is stored to keep it for archiving or ad hoc reporting and analysis, but the first value is generated by analyzing it in stream.
Hadoop is a fast moving eco system, with new possibilities popping up continuously and issues being addressed. As Hadoop is gaining more and more traction within an organization and gets broader attention, this brings some challenges for adoption by less specialized users. An initiative such as the Open Data Platform (ODP – http://opendataplatform.org/) is also illustrating the increased maturity of the market demanding more standardization in the core Hadoop components for increased manageability and interoperability of their Hadoop implementation and their existing infrastructure.
Tools are coming to the market o make it easier to get results with Hadoop without too much specialized knowledge as the goal is to get business answers out of your data, and not to acquire Hadoop skills!