Business in the Era of Big Data
Deeper levels of understanding and targeting customers: A large US retailer has been able to accurately predict when a customer of theirs is expecting a baby. Churn management has become easily predictable for telecom companies and car insurance companies are able to understand how well their customers are driving.
Optimizing Business Process: Big Data is not only giving a peek into the external audience, but also a great way for introspection into business processes. Stock optimization in retail through predictive analysis from social media, web trends and weather forecasts is leading huge cost benefits. Supply chain management is particularly benefitting from data analytics. Geographic positioning and radio frequency identification sensors can now track goods or delivery vehicles and optimize routes by integrating live traffic data.
Driving smarter machines and devices: The recently launched and widely talked about Google’s self-driven car is majorly using Big Data tools. The energy sector is also taking advantage by optimizing energy grids using data from smart meters. Big Data tools are also being used to improve performance of computers and data warehouses.
Smarter Financial Trading: High-Frequency Trading (HFT) is finding huge application of Big Data today. Big Data algorithms used to make trading decisions has led to a majority of equity trading data algorithms taking into account data feeds from social media networks and news websites to make decisions in split seconds. These are some of the existing illustrations where Big Data is in application in the business sector. There are several other avenues, with newer ones opening by the day, where Big Data can drive organizations into being smarter, secured and connected
Data Acquisition & Data Warehousing: Data always has a source. It doesn’t come out of nowhere. And just as big as data is, so are the multifarious sources that can produce up to 1 million terabytes of raw data every day. This enormity and dispersion in data is not of much use, unless it is filtered and compressed on the basis of several criteria. The foremost challenge in this aspect is to define these criteria for filters, so as to not lose out any valuable information. For instance, customer preference data can be sourced from the information they share on key social media channels. But then, how to tap the non-social media users who might also be an important customer segment. What are the data sources for them? Data reduction is a science that needs substantial research to establish an intelligent process that brings down raw data to a user-friendly size without missing out the minute information pieces of relevance. And this is required in real-time, as it would be an expensive and arduous affair to store the data first and reduce later. An important part of building a robust Data Warehousing platform is the consolidation of data across various sources to create a good repository of master data, which will help in providing consistent information across the organization.
Data Extraction & Structuring: Data that has been collected, even after filtering, is not in a format ready for analysis. It is has multiple modes of content, such as text, pictures, videos, multiple sources of data with different file formats. This mandates for a Data Extraction Strategy that integrates data from diverse enterprise information repositories and transforms it into a consumable format. Data is basically of two categories – structured and unstructured. Structured data is that which is available in a pre-set format such as row and column based databases. These are easy to enter, store and analyze. This type of data is mostly actual and transactional. Unstructured data on the other hand is free form, attitudinal and behavioral. This does not come in traditional formats. It is heterogeneous, variable and comes in multiple formats, such as text, document, image, video and so on. Unstructured data is growing at a super-fast speed. In 2011, IDC held a study that stated that 90 percent of all data in the next decade will be unstructured. However, from a business benefit perspective, true value and insights reside in this massive volume of unstructured data that is rather difficult to tame and channelize.
Extract-Transform-Load (ETL) is the process that covers the entire stage of getting data loaded in the proper, cleaned format from the source to the target data warehouse. There are several ETL tools in available, principles of making the right selection are same as that of deciding the right course of big data implementation as explained later in the paper.
Data Modeling & Data Analysis: Once the proper mechanism of creating a data repository is established, then sets in the rather complex procedure of Data Analysis. Big Data Analytics is one of the most crucial aspects and room for development in the data industry. Data analysis is not only about locating, identifying, understanding, and presenting data. Industries demand for large-scale analysis that is entirely automated which requires processing of different data structures and semantics in an understandable and computer intelligent format. Technological advancements in this direction are making this kind of analytics of unstructured possible and cost effective. A distributed grid of computing resources utilizing easily scalable architecture, processing framework and non-relational, parallel-relational databases is redefining data management and governance. Databases today have shifted to non-relational to meet the complexity of unstructured data. NoSQL database solutions are capable of working without fixed table schemas, avoid join operations, and scale horizontally.
Data Interpretation: The most important aspect of success in Big Data Analytics is the presentation of analyzed data in a user-friendly, re-usable and intelligible format. And the complexity of data is adding to the complexity of its presentation as well. Sometimes, simple tabular representations may not be sufficient to represent data in certain cases, requiring further explanations, historical incidences, etc. Sometimes, predictive or statistical analysis from the data is also expected from the analytics tool to support decision-making. In other words, the final phase or culmination of the entire Big Data exercise is Data Interpretation or Data Visualization. Visualization of data is a key component of Business Intelligence. Here’s a snapshot of the Visualization framework that assists in Business Intelligence