10 factors to consider when selecting Visualization tool for Hadoop data

10 factors to consider when selecting Visualization tool for Hadoop data
Kuldeep Deshpande
Posted by on January 23, 2017 in Blog, Data Visualisation

10 factors to consider when selecting Visualization tool for Hadoop data

So you have just started on your first project on Hadoop. Having heard about big data, Hadoop for months, you figured out your first use case that can be implemented using Hadoop. Data is extracted in a small cluster and you are now ready to ‘leverage data’. This is when you will face question of selecting your Big Data visualization tool.

In the past you might have faced the same question when you started with your BI initiative. But the dynamics of selecting a data visualization tool for Hadoop are quite different. Yes, learnings from selecting a BI tool can be reused for selecting visualization tool for Hadoop, but there is so much extra to be considered.

After having helped customers implement visualization tool for various types of structured, unstructured and streaming data on Hadoop, here are a few criteria that I can summarize-:

1. Budget available – Hadoop data visualization tools come in 4 categories: Firstly there are Enterprise BI tools like SAS, Cognos, and Microstrategy, QlikView etc who are enterprise BI tools and have good Hadoop compatibility. Then there are Hadoop specific visualization tools like HUNK, Datameer, and Platfora etc. that are meant specifically for visualization for Hadoop data. Thirdly there are open source tools like Pentaho, BIRT, and Jaspersoft that have been early adopters of Hadoop and probably have made more investment in Hadoop compatibility than some of the biggies. Finally there are charting libraries like RShiny, D3.js, Highcharts etc that are mostly open source / low cost and have good visualization but require scripting and coding. As you move from first category to fourth category, costs of the software licenses goes on reducing and ease of development and self service capabilities also go on reducing. There are some exceptions to this general trend though.

2. Your existing BI tool – Most probably your company is already using some BI tool or the other. You may have SAS, Microstrategy, IBM Cognos, and OBIEE in your company. Most of these tools have made tremendous investment in enhancing their tools for compatibility with Hadoop ecosystem. They have connectors for Hadoop and NoSQL databases, graphical tools are available. It may be easy for the end users to use something that they are already using. Think of using your existing BI tool for Hadoop data visualization unless there are obvious drawbacks in it.

3. Hadoop distribution used – If you are using Hadoop distribution from say Cloudera or Hortonworks, you can safely select tools that are certified by these distributors of Hadoop. For example, Tableau, Microstrategy, Pentaho, QlikView are all certified by Cloudera and have proven connectors to Cloudera distribution of Hadoop. Similarly, most of these tools are partners of Hortonworks also. In case your Big Data platform is IBM BigInsights, then going for Cognos makes sense since being IBM products, compatibility will not be an issue. It is always advisable to check if the tool you are selecting for visualization is certified by the Hadoop distribution being used.

4. Nature of data – If the data you want to analyze is tabular, columnar data then most of the tools are capable of providing visualization facilities. However if the data is say log data special purpose charting libraries like ‘timeplot’ may be good option. Similarly, for social mediadata, tools like Zoomdata provide better visualization capabilities.

5. End user profile – Who are your end users? Are they data scientists? Then a visualization tool with very high end visualization patterns will be required. If operational business users (such as sales managers, finance managers) are end users, then more than advanced visualization, speed of delivery and cost of tool (since number of users may be very high) is important.

6. Programming skills available – If you have good Java and JavaScript skills available in house, going for scripting based tools makes sense. Also, if you are an R shop and have good R programming capabilities, RShiny can be a good alternative. Standard BI tools such as Microstrategy, Pentaho on the other hand allow writing SQL on top of Hadoop data. Tools like Datameer are schema free and drag and drop tools. So in short, each tool comes with its own set of programming skill requirements and you need to make sure these requirements are compatible with programming skills available in house.

7. Operating system – It is a basic checkbox while selecting tools for visualization. We come across customers who use Linux platforms only and using Windows based tools like QlikView, Tableau, Microsoft BI is not possible in this case. Also, if you are planning an implementation on cloud, make sure your cloud provider can provide OS required by the visualization tool.

8. Visualization features required – Traditional BI tools that have added Hadoop capabilities are more mature compared to new entrants in providing visualization patterns commonly required. For example, multiple Y axes, support for HTML5 and animation, user friendly drill down are some features that are very mature in traditional BI tools, but still evolving in new entrants, open source BI tools and some charting libraries. It is advisable to compare your visualization needs to capabilities offered by the tools.

9. Data Volume – Data volume and streaming nature of data is important consideration especially if you are thinking of an in memory architecture visualization tool. If your Hadoop data store has Terabytes of data, data is being added real time and you plan to use in memory visualization tool then you need to think of mechanism to reduce the volume and feed data continuously from Hadoop to the visualization tool. This is possible, but not very simple. Be aware of impact of real time high volume data on in memory architecture.

10. Industry experience – It is always advisable to depend on dominant players in your industry vertical. SAS for example has been used by banks in analyzing big data for customer intelligence and risk management. In cases like this, the availability of underlying algorithms and visualization patterns makes the big data project implementation much easier.

All of these factors need to be carefully thought after. Some of them like operating system seem to be no brainer, but I have seen companies make an oversight and select visualization tool that later needed to be changed. After shortlisting visualization alternatives considering these factors, you are ready for next step in the journey of building visualization platform for big data and that step is initiating a Proof of Concept. More about learnings from data visualization Proof of Concept in the next blog.

About Me – I am Founder and CEO of Ellicium Solutions Inc., a company focused on providing innovative solutions in Big Data and Analytics. Apart from developing business solutions using Big Data, I enjoy sharing insights with fellow people.

For further details about our offerings, please visit our website:  www.ellicium.com

Follow our company LinkedIn page at: http://bit.ly/2d4YYTw