Demystifying Big Data in Retail
The volume, variety, and velocity of data being produced in all areas of the retail industry is growing exponentially, creating both challenges and opportunities for those diligently analyzing this data to gain a competitive advantage. Although retailers have been using data analytics to generate business intelligence for years, the extreme composition of today’s data necessitates new approaches and tools. This is because the retail industry has entered the big data era, having access to more information that can be used to create amazing shopping experiences and forge tighter connections between customers, brands, and retailers.
A trail of data follows products as they are manufactured, shipped, stocked, advertised, purchased, consumed, and talked about by consumers – all of which can help forward-thinking retailers increase sales and operations performance. This requires an end-to-end retail analytics solution capable of analyzing large datasets populated by retail systems and sensors, enterprise resource planning (ERP), inventory control, social media, and other sources.
How does one start a big data project? In an attempt to demystify retail data analytics, this paper chronicles a real-world implementation that is producing tangible benefits, such as allowing retailers to:
- Increase sales per visit with a deeper understanding of customers’ purchase patterns.
- Learn about new sales opportunities by identifying unexpected trends from social media.
- Improve inventory management with greater visibility into the product pipeline.
A set of simple analytics experiments was performed to create capabilities and a framework for conducting large-scale, distributed data analytics. These experiments facilitated an understanding of the edge-to-cloud business analytics value proposition and at the same time, provided insight into the technical architecture and integration needed for implementation.
Big Data Basics
Data has been getting bigger for a while now. The volume of data generated or processed in 2014 alone is expected to exceed six zettabytes; that is 1,200 times more than all the data ever generated prior to 2003.
One of the reasons data is getting bigger is it is continuously being generated from more sources and more devices. Making matters more difficult, much of the data is unstructured, coming from videos, photos, comments on social media forums, reviews on web sites, and so on. As a result, this data is often made up of volumes of text, dates, numbers, and facts that are typically free form by nature and cannot be stored in structured, predefined tables.
Certain data sources are arriving so fast, there may not be enough time to store them before applying analytics to them. And that is why conventional data analytics and tools alone do not enable retail IT to store, manage, process, and analyze all the data they may need to utilize.
So what if retailers just ignore big data; after all, is it worth all the effort? It turns out that it is. According to McKinsey Global Institute, big data has the potential to increase net retailer margins by 60 percent. Likewise, companies in the top third of their industry in the use of data-driven-decision making were, on average, five percent more productive and six percent more profitable than their competitors, wrote Andrew McAfee and Erik Brynjolfsson in a Harvard Business Review article.
In order to generate the insights needed to reap substantial business benefits, new innovative approaches and technologies are required. This is because big data in retail is like a mountain, and retailers must uncover those tiny, but game-changing golden nuggets of insights and knowledge that can be used to create a competitive advantage.
Big Data Technologies
In order for retailers to realize the full potential of big data, they must find a new approach for handling large amounts of data. Traditional tools and infrastructure are struggling to keep up with larger and more varied data sets coming in at high velocity. New technologies are emerging to make big data analytics scalable and cost-effective, such as a distributed grid of computing resources. Processing is pushed out to the nodes where the data resides, which is in contrast to long-established approaches that retrieve data for processing from a central point.
Hadoop* is a popular, open-source software framework that enables distributed processing of large data sets on clusters of computers. The framework easily scales on servers based on Intel® Xeon® processors, as demonstrated by Cloudera, a provider of Apache* Hadoop-based software, support, services, and training. Although Hadoop has captured a lot of attention, other tools and technologies are also available for working on different types of big data analytics problems.