Big Data duplication - don't do it

How to avoid unnecessary cost overheads from data duplication, unnecessary transformation and data lake proliferation.

Data is streaming into organisations at unparalleled speed. Streaming data (such as clickstream, machine logs, sensor data, device data, etc), can assist the business to make better real-time decisions only when the data is analysed in context with historic information. Fast, complete analysis of both real-time with historic data makes the organisation more competitive and dynamic in the market.

The trouble is that most organizations still believe that input data must be copied from source (or operational) systems into a central analytical processing store in order to analyse it. The common belief is that a physical data lake or warehouse (with data duplicated from other systems) is what’s needed for big data, deep analytics and machine learning applications.

The truth is that it's a technology capability limitation, not a best practice. And, massive costs are incurred in moving data, integrating it into a common view and format, and then analysing it. This approach is “old-school” even when you are using Hadoop or Spark as the core technology framework. This approach is the reason why organisations take months or years to execute on analytical projects with costs for integration of data out of control.

Many organisations, limited by the wrong technologies and tied to this old "centralised platform" thinking, are forced to ignore their true big data opportunity; in-context real-time analytics and interaction. Traditional data integration methods and physical data lakes slow analytics value extraction and restrain the ability to create truly inspiring digital experiences for customers.

When you consider the update cycles, structure and location of data across organisations and the Internet, traditional data lake and data warehousing technologies are out of sync with new streaming real-time data sources. New digital models requiring real-time mass customer interaction necessitate the assimilation of historic data with data in the stream. And, a centralised physical data platform is problematic.

Zetaris is changing the data landscape in organisations by challenging the myth that data from across the organisation, or the network, needs to be centralised in order to analyse and act upon it. Zetaris' Networked Data Platform is stopping this costly duplication of data, human effort, and processing.

Zetaris enables the query to be moved to the data where it lives. We give data scientists the ability to query data across the organisation, or network, without needing it to physically be in the same place. Zetaris' technology enables in-context, real-time analytics for digital innovation. Through sophisticated analytical query optimisers, Zetaris ensures efficient processing across the network and handles the core challenges faced when distributing analytical workloads across your data centre.

With Zetaris, organisations don’t have to duplicate or move the data into yet another data store for analytics, reducing project timelines by a factor of 5x with a massive reduction in the total cost of project.

Vinay Samuel

Founder & CEO