DATA EXPLORATION AND DATA PREPARATION PHASE

Dr. Raja Sarath Kumar Boddu,

B.E(Civil), M.Tech(CST), Ph.D.(AU), PGCDA-(IIM-V)

Professor, Department of CSE,

Principal, Lenora College of Engineering,

Rampachodavaram,

Andhra Pradesh, India.

Description

Big Data Analytics entails, for the most part, the gathering of data from a variety of sources, the management of this data in such a manner that it can be utilized by analysts, and the delivery of data products that are beneficial to the organization’s business. Depending on the needs, the deployment phase may consist of anything as straightforward as the generation of a report, or it may involve something more involved, such as the implementation of a repeatable data scoring procedure. Private firms and research organizations collect terabytes of data about their customers’ interactions, commerce, and social media, as well as sensors from devices such as mobile phones and vehicles. This data is used for a variety of purposes, including marketing and advertising. Figuring out how to make sense of all this data is one of the most difficult challenges of our day. The analysis of large amounts of data comes into play at this point. At this point in the process, the focus shifts to the understanding of the human resources in terms of their capacity to implement various architectural designs. Old data warehouses, as well as updated versions of those traditional data warehouses, are still in use in large-scale applications. For instance, Teradata and IBM both provide SQL databases that are able to manage terabytes of data; nevertheless, open-source options like postgreSQL and MySQL are still utilized for big scale applications. On the client side, the majority of solutions offer a SQL application programming interface (API), despite the fact that there are differences in the ways in which various storages operate in the background. As a result, having a solid knowledge of SQL is still considered a necessary ability for big data analytics.

Computers have become an integral part of our daily life in recent times. They have enormously impacted our personal, professional, as well as social lives. Considering the increasing demand of computers in society, schools, colleges, and universities have included computer education in their curriculum, to help students become skilled in programming and developing applications which can be used to solve various business, scientific, and social problems.

At par with that in the past decades, the field of Big Data Analytics has become one of the pillars of information technology. Big Data Analytics is an integral part of many commercial applications and research projects today, in areas ranging from medical diagnosis and treatment to finding your friends on social networks.

There have been important advances in the theory and algorithms that form the foundations of Big Data Analytics field. The goal of this text book is to present basic concepts of the theory, and a wide range of techniques that can be applied to a variety of problems. There are many Big Data Analytics algorithms not included in this book, that can be quite effective in specific situations. Self-learning will easily help to acquire the required knowledge.

There are many machine learning websites that give information on available Big Data Analytics software. Some of the popular software sources are R, SAS, Python, Weka, MATLAB, Excel, and Tableau. This book does not promote any specific software. It has included a large number of examples, but I use illustrative datasets that are small enough to allow the reader to follow what is going on without the help of software. Real datasets are far too large to show this. Datasets in the book are chosen not to illustrate actual large-scale practical problems, but to help the reader under-stand what the different techniques

do, how they work, and what their range of application is. This explains why a heavy focus on project work is a necessity. Each project must handle a large-scale practical problem. Use of domain knowledge to formulate the problem in machine learning setting, and interpretation of the results given by machine learning algorithms are important ingredients of training the students, in addition to the training on Big Data Analytics software.

Reviews

There are no reviews yet.

Add a review

Your email address will not be published. Required fields are marked *