Data preparation and exploration

Real world data is often noisy, unreliable, and may be missing values.
True
False
Data Cleaning and Processing involves:
Tasks such as integrating multiple datasets, handling missing data, handling inconsistent data, and converting data types.
Selecting the key subset of original data features in an attempt to reduce the dimensionality of the training problem.
Creating additional relevant features from the existing raw features in the data that increase the predictive power of the resulting model.
Most datasets in practice were created for the purposes of building predictive models and do not have missing values.
True
False
What are some options for handling missing values in a dataset?
Replace missing values with a placeholder value that you specify
Replace missing values with a calculated value, such as a mean, median, mode, or imputed value.
Remove rows or columns containing missing values.
Randomly insert values for each missing value.
Duplicate data is not a problem because repetition within a dataset does not give those specific observations more influence on the result.
True
False
What is true of outliers?
They are values outside of the normal range for an attribute.
Outliers can be corrected using multivariate imputation using chained equations (MICE).
Measurement or typographical errors can create outliers.
The Azure Machine Learning Clean Missing Values module is the tool of choice for handling outliers in your dataset.
Azure ML Clip Values module can handle outliers by clipping data point values that exceed a specified threshold.
Feature normalization transforms dataset values into a common scale, while preserving the general distribution.
True
False
Adding irrelevant or distracting attributes to a dataset does NOT confuse machine learning systems. In the new age of artificial intelligence, machine learning systems no longer have to incorporate feature selection in the modeling process.
True
False
Feature Engineering involves:
Transforming raw data into features that better represent the underlying problem to the ML algorithm.
Implementing a filtering method to eliminate features with little correlation to class labels.
Adding features that provide additional information not clearly captured or easily apparent in the original or existing feature set.
Extracting a subset of original features in the dataset without changing them.
The "Curse of Dimensionality" implies that as the number of data dimensions increases, it becomes more difficult to create a model that generalizes well to data it has not seen.
True
False
{"name":"Data preparation and exploration", "url":"https://www.quiz-maker.com/QPREVIEW","txt":"Real world data is often noisy, unreliable, and may be missing values., Data Cleaning and Processing involves:, Most datasets in practice were created for the purposes of building predictive models and do not have missing values.","img":"https://www.quiz-maker.com/3012/images/ogquiz.png"}
Powered by: Quiz Maker