Future of Big Data in 2018!

Apr 3, 2021 | Blog

About Yogesh Kulkarni

Co-Founder and Chief Technology Officer at Ellicium

Future of Big Data in 2018!

The year 2017 was an interesting one in the Big Data world. Though the adoption of Hadoop as the Big Data platform moved marginally beyond the 50% mark as per the Gartner survey report, the focus of adoption has gradually shifted from being IT-driven to Business-driven. This only means that organizations are seeing a lot of value in not only investing in Big Data but also moving ahead with deriving value.

I have been fortunate to be a part of the Big Data World for the last several years and closely observe the journey over the years. It’s fantastic to see how a small toy elephant has become one of the most sought-after technologies and take several forms. I see the Big Data and Hadoop world move in a particular direction in 2018 –

A) Think Big Data? Think Cloud!

Companies who haven’t joined the Hadoop bandwagon as yet (slightly over 50% in 2017, as per the Gartner survey) will be seeking Hadoop deployments. However, in all likelihood, this will not be on-premise but on a cloud-vendor-of-choice. Typically, on-premise deployments are preferred when the cluster size is huge.

Why do I say so?

“Lowering costs and coping with complexity will be the primary motivating factors for cloud-based Hadoop deployments in 2018”.

 We did a migration from an RDBMS based IOT application to the cloud, way back in 2014 and going on the cloud was a pretty expensive proposition then. When we compare costs for the same cluster configuration now, it is around 30%! Fascinating, isn’t it? I see the same trend in 2018 as well.

ii) Getting started with Big Data is quick – Cloud deployments offer flexibility in gearing up to Hadoop without the time and overheads related to procuring, provisioning and setting up the infrastructure.

We have spoken with several customers in 2017 and a common trend is that they don’t want to wait for weeks or months for their infrastructure team to provide the hardware and software. This is especially true about SMEs. For one of our Manufacturing clients, we proposed migrating to their ERP data to the Google Big Query platform and the proposal was readily accepted by them, thanks to the enormous flexibility and low cost of the platform.

iii) Familiarity with the cloud is increasing – Due to various other reasons, infrastructure teams of organizations are more cloud-aware and also understand that Hadoop can be set up on the cloud. This makes a cloud setup particularly attractive to them.

iv) Flexibility – without the undue pressure of correctly sizing their first Hadoop cluster, organizations can make a beginning with Hadoop, try various use cases and gradually get to know Hadoop better. Instances can be shut down during non-working hours making it cost effective.

We are working with a client offering Insurance Solutions and the first step they wanted was a cloud based setup for their in-house team to try stuff and get used to the ecosystem. Very convenient indeed!

What does this mean?

a) Along with leading cloud platform vendors like AWS, Azure, and Google, other cloud based vendors who offer cloud deployments of various Hadoop distros like Cloudera, Hortonworks and MapR will also see lots of traction and demand in 2018. Organisations have their own cloud-vendor-of-choice and will prefer to embark upon the Hadoop journey with them.

b) We are in touch with several organizations who are yet, to begin with their Hadoop journey.

Convenience of setting up Hadoop on the cloud will encourage SMEs to start with their Hadoop journey in 2018″.

B) Taking Hadoop to Production? Plan carefully

Why do I say so?

i) Gartner’s research shows that while investment in big data continues, the move to production has remained flat. Gartner estimates that “roughly 14% of Hadoop deployments are in production”

Focus on Security and Governance is high before taking anything to production. There seem to be several gaps in these offerings provided by various Hadoop vendors. As a result, custom third party solutions are needed.

What does this mean?

a) Security solutions e.g. Sentry, Ranger, Knox from various Hadoop distros like Cloudera and Hortonworks will see increased interest and adoption. They will increasingly mature and offer the security considerations which a production-ready application needs.

“2018 is when organizations will get the increased confidence to move their Hadoop systems to production”.

b) Organisations will start insisting on Security in Hadoop right from the early stages.

For one of our Insurance clients, we are implementing security right from the POC stage. Clients who have already taken Hadoop to production have also started asking us for a roadmap for security to be demonstrated, tested and implemented.

C) “Hadoop will be forced to become magnanimous and accommodate supporting frameworks in 2018”

Why do I say so?

i) Performance in some Hadoop related use cases which require running heavy interactive queries by concurrent users is not as per expectations; thereby limiting its suitability for decision support cases.

ii) Spark supports stream analytics and interactive querying (Spark Streaming). Additionally, the support for multiple areas like SQL language support (Spark SQL) and Machine Learning (MLlib) has made it even more popular.

iii) For most of the current Hadoop implementations which we are doing for our clients, we are using Spark as the standard processing engine.

What does this mean?

a) Though not an official part of the Hadoop ecosystem, integrating Spark within a Hadoop cluster will become a de-facto standard.

“In 2018, Spark will continue to be the processing engine of choice for Hadoop systems replacing even some of the MR jobs in production”.

D) “In 2018, Focus on the third “V” (Variety) of Big Data will take a boost”

There will be an increasing demand for not only implementing Data Lakes using Hadoop but also deriving useful business insights. With its inherent support for storage and processing of all forms of data, Hadoop will remain the favorite for Data Lakes.

Why do I say so?

i) All these years, analytics was mainly restricted to the structured side of data. Datawarehouses and BI work well on the structured data but are simply unable to handle the dimension of unstructured data. Due to this, a lot of advanced business insights are simply not possible

ii) Focus on the third “V” (Variety) of Big Data is still missing to a great extent. While companies are successfully processing structured data on the Hadoop platform, the same cannot be said about unstructured data. As a result, integrating unstructured data will be one of the biggest reasons for Big Data adoption

We are seeing some very interesting Use Cases –

a) We are currently implementing a Hadoop Data Lake for the IT arm of a Financial Services organization. This involves establishing a direct link between the transaction level data (e.g. customer loan related entries) and the supporting documents (Loan agreement in pdf format). They are seeing a lot of business value in such analytics.

b) Another client of ours is planning to derive insights from images generated from body scanners. Apart from the volume of data, the complexity posed by the unstructured data is huge.

What does this mean?

a) Organisations will increasingly ask for unstructured (documents, images) data processing. Also, interesting use cases which were impossible earlier will start becoming feasible. As a result,

“Technologies and Platforms with demonstrated capability in the area of unstructured data processing and analysis will be in great demand in 2018”.

It will be interesting to see how 2018 turns out for all of us. I plan to revisit the above towards the end of 2018 and hopefully, I’ll be right!

I look forward to your comments and views. Please comment on the above or share your views with me at “yogesh.kulkarni@ellicium.com”

Want new articles before they get published?

Subscribe to our Blog.

[email-subscribers-form id="1"]