categories: Technology, Science & Productivity
By the end of this level, you will be able to: Define different types of data attributes and different types of data sets, and understand the differences between different data quality problems including noise, missing values, and duplicates.
Describe various data preparation techniques including sampling, feature selection, estimation, and transformation of variables.
Get a concrete idea about the evolution of big data and the structure of the Hadoop ecosystem, and implement projects using Python that cover different aspects of the level.
Free lessons
Data Types and Quality Issues
Data 1
Data 2
Attributes
1. Data Acquisition & Modeling
Data Types and Quality Issues
Data 1
Data 2
Attributes
Attribute Types (P.1)
Attribute Types (P.2)
Attribute Types (P.3)
Types of Datasets
Graph-Based Data
Data Quality
Missing Values
Duplicate Data
Summary
Data Preparation
Sampling
Feature Selection
Feature Selection Methods
Discretization 1
Discretization 2
Variable Transformation
Example of Variable Transformation
The Required Project
Big Data Deep Dive
Distributed Systems
Big Data Evolution (P.1)
Big Data Evolution (P.2)
Big Data Challenges
Hadoop Ecosystem (P.1)
Hadoop Ecosystem (P.2)
Data Acquisition and Pipelining
Questions
This level comprehensively describes the various data attributes and types of data sets that a data scientist would usually encounter. The level also describes various data quality issues and how to deal with them. Various data preparation techniques are also covered. Finally, an introduction to big data and the Hadoop ecosystem is given.
- Graduate of any university (Engineering is not mandatory)
- Previous programming experience of any language is a big plus
- Knowledge of Linear algebra is a big plus
Engineer and Senior Member of IEEE
3,135 Learners
5 Courses