Search This Blog

Tuesday, January 30, 2018

Basics of Data Analytics

 Data vs Information

Data usually refers to raw data, or unprocessed data. It is the basic form of data, data that hasn’t been analysed or processed in any manner.
For example:        Jeetendra,Jaiswal,36,Gamdevi,Road,Bhan,dup,West,78,Mum
Information is data that has been interpreted so that it has meaning for the user. Once the data is analysed, it is considered as information.
For example:        Jeetendra Jaiswal
365, Gamdevi Road
Bhandup (West),
Mumbai-400078

What is Data Analytics?
Data Analytics is the science of using various techniques and processes for examining raw data for extracting information from them. Data is extracted and processed using various techniques based on organizational requirements with the purpose of finding patterns and drawing conclusions about that information.
Organizations collect and analyse both real time and historic data related to their customers.
Using data analytics, organizations can identify and optimize their operations, identify the needs of their customers and provide them the best possible service, proactively identify the future trend and respond to it leading to a gain in this competitive market over others.
All these can help the organizations to optimize costs and increase their revenues.

Types of data analytics applications
Data Analytics can be broadly classified under 2 categories
1)       Exploratory Data Analysis (EDA)
An approach to analysing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modelling or hypothesis testing task.
2)       Confirmatory Data Analysis (CDA)
An approach where you evaluate your evidence using traditional statistical tools such as                 significance, inference, and confidence.
Data analytics can also be classified as
         I.            Quantitative Data Analysis (QtDA)
An approach where numerical data is analysed with quantifiable variables that can be compared or measured statistically.
       II.            Qualitative Data Analysis (QlDA)
An approach where we understand the content of non-numerical data like text, images, audio and video, including common phrases, themes and points of view.
More advanced types of data analytics
A.      Data mining
Data Mining is the analysis of large quantities of data to extract previously unknown, interesting patterns of data, unusual data and the dependencies.
B.       Business Intelligence
Business Intelligence techniques and tools are for acquisition and transformation of large amounts of unstructured business data to help identify, develop and create new strategic business opportunities.
C.       Statistical Analysis
Statistics is the study of collection, analysis, interpretation, presentation, and organization of data. The 2 main types of Statistical Analysis are:
1)       Descriptive statistics - data from the entire population or a sample is summarized with numerical descriptors such as mean, frequency, percentage, etc.
2)       Inferential statistics − It uses patterns in the sample data to draw inferences about the represented population or accounting for randomness. These inferences can be −
·         answering yes/no questions about the data (hypothesis testing)
·         estimating numerical characteristics of the data (estimation)
·         describing associations within the data (correlation)
·         modelling relationships within the data (E.g. regression analysis)
D.      Predictive Analytics
Predictive Analytics use statistical models to analyse current and historical data for forecasting (predictions) about future or otherwise unknown events.
E.       Text Analytics
Text Analytics, also referred to as Text mining provides a means of analysing documents, emails and other text-based content.

Use of Data Analytics
Data analytics initiatives support a wide variety of business uses. Below are few examples
1)       Banks and credit card companies analyse withdrawal and spending patterns to prevent fraud and identity theft.
2)       E-commerce companies and marketing services providers do clickstream analysis to identify website visitors who are more likely to buy a product or service based on navigation and page-viewing patterns.
3)       Mobile network operators examine customer data to forecast churn so they can take steps to prevent defections to business rivals; to boost customer relationship management efforts, they and other companies also engage in CRM analytics to segment customers for marketing campaigns and equip call centre workers with up-to-date information about callers.
4)       Healthcare organizations mine patient data to evaluate the effectiveness of treatments for cancer and other diseases.

Fundamentals to Data Analytics
1)       Identifying the Data
It is important for a data analytics project to identify where the valuable information resides and map it based on the 4V framework, defined by Volume, Variety, Velocity and Veracity.
2)       Data Quality
Data quality is the core aspect in data analytics that decides its intended use in business operations and decision making. The correctness and consistency of the data demonstrates its quality and fitment for use.
3)       Business Objectives
Clarity in defining your goals and objectives are essential for achieving success through analytics. Analysts need to have in mind the big-picture while building the conceptual framework and process useful in data analytics.
4)       Data Availability & Access
Data availability and access is the fundamental requirement to data analytics. Authorized personnel should be able to access the internal organizational data, and, the information external to the organization must be collected from reliable resources.
5)       Insight
The success of a data analytics project depends on the quantifiable insight it generates. The derived system should be able to provide timely and accurate answer to the business questions. This presents a valuable actionable insight that marks a way to tread and adds up to the value chain.
6)       Data Visualization
For a meaningful insight, it is a must to present the information in an appealing and insightful manner to the intended audiences. The business story and the user story should be represented with advanced visualization techniques for better clarity, with scope for interactive exploration.
7)       Data Practices
The right data analysis framework, a standard architecture for data interoperability, and strict compliance to data security & privacy norms creates a trusted environment for data analysis. These enablers help organizations to undertake projects that assure high data security for a project’s success and stakeholders’ buy-in.

Challenges in Data Analytics
1)       Handling Enormous Data in Less Time
Handling the data of any business or industry is itself a significant challenge, but when it comes to handling enormous data, the task gets much more difficult. Critical business decisions should be taken effectively, but we need to have strong IT infrastructure which can read the data faster and delivering real-time insights. To overcome this challenge, you can use Apache Hadoop’s MapReduce that helps in splitting the data of the application in small fragments. This process makes the data measurable.
2)        Visual Representation of Data
Another important task is the visual representation of data. You need to represent the data in an easy format that makes it readable and understandable to the audience. Handling an unstructured data and then representing in a visually attractive manner could be a difficult task. To recover this issue, the data analyst can utilize different types of graphs or tables to represent the data.
3)        Application Should Be Scalable
The major factor to consider is the scalability factor of the of the applications. Several organizations are facing the same issue where the volume of data has been increasing each passing day. Due to the multiple layers between the database and front-end, the data traversal takes time. To overcome this issue, the organizations should take care of the application’s architecture and technology to reduce performance issues and enhance scalability.

Guidelines for effective use of Data Analytics
1)       Define the Questions
Your questions will define your work process. So, define your questions and ask measurable and clear questions. Define your problem clearly and design the question in such a way that it either qualify or disqualify potential solutions.
2)       Set Appropriate Measurement Priorities
This point covers two different scenarios, i.e. decide what to measure and how to measure. You need to think about these situations. Deciding on how to measure the data is important before the data collection phase as it also has its own set of questions.
3)        Collect Data
After defining the questions and setting up the measurement priorities, now you need to collect the data. Try to keep your collected data in an organized way.
4)        Analyse and Make Data Useful
Now is the time to analyse the data. You can manipulate the data in multiple ways by plotting and searching correlations or by building a pivot table. The pivot table will help in sorting and filtering data and calculate the maximum, minimum, mean and standard deviation of your data.
5)       Interpret Results
Data Analytics is incomplete without compelling visualization. This is the time to interpret your data. Interpreting the data will answer all the data-related questions.

Conclusion
Data analytics is helping businesses to consider hundreds of parameters to predict with reasonable accuracy what will happen. Both structured and unstructured data are growing exponentially. With proper use of technology, this huge data can be used to make more accurate decisions which can help the organizations improve its operations, reduce costs, improve sales, provide better service to customers and improve its efficiency.