Data Mining Quiz Night 1

Create an infographic-style image depicting themes related to data mining, including graphs, algorithms, and data analysis tools.

Data Mining Quiz Night 1

Test your knowledge in the exciting field of data mining! This quiz covers various essential concepts including instance-based learning, DBSCAN, association rules, and more.

Challenge yourself with multiple-choice questions that will help you gauge your understanding:

Instance-Based Learning
Clustering Techniques
Association Rules
N-grams and Probability

13 Questions3 MinutesCreated by MiningMaster321

Which of the following is FALSE regarding Instance based learning

Hypothesis complexity can grow with the data

Classification costs are low

Constructs hypotheses directly from the training instances

RBF networks are an example of instance based learning

Time complexity of this algorithm depends upon the size of training data

What is the shape of the isodensity contours for the following covariance matrix:

Parallel to the variable axes and elongated along the the x_3 axis

elongated along vectors defined as linear combinations of x_1, x_2

Elongated along vectors defined as linear combinations of x_2, x_3

Elongated along vectors defined as linear combinations of x_1, x_2, x_3

Which of the following statement is FALSE? Lift

First sort the instances in descending order of probabilities

The smaller the lift factor, the better

The lift factor is the increase in positives instances in a sample vs the overall positive rate in the population

In general to plot the lift curve, we use the sample size as x-axis and number of positives as Y-axis

Allow to evaluate a classifier by considering subserts of the instances

The probability of a given N-gram within a sequence of words is computed using the:

Which of the following statement is TRUE?

Clustering is a supervised learning task

We aim to maximize the within cluster distance metric

Fuzzy clustering is interesting when we want to classify instances belonging to at most 1 cluster

In DBScan clustering the disgarded points are called noise points

The preferred distance metric for nominal attribute values is manhattan distance

For DBSCAN Parameter Selection, why is the value of Eps given MinPts=4 and what would be more likely to happen if this value increases?

10; the number of clusters would decrease

10; the number of clusters would increase

30; the number of clusters would decrease

30; the number of clusters would increase

Which of the following affirmations about instance based learning is FALSE:

It is time efficent in making predictions

It is easy to add new instances to the "model"

Does not make assumptions about the data

Can be memory intensive

Regarding association rules, which of the statements is FALSE:

Confidence of a rule is the number of instances satisfying the right hand side of a rule percentage of all the instances

Coverage is the number of instances with all the items of the rule

If an item has insufficient coverage, the apriori algo won't compute k-items set containing it

Support is the proportion of instances containg all items of the rule

Association rules are similar to classification rules but they aren't intended to be used together as a whole

Which of the following is a non-adaptive transformation for time series:

Piecewise Linear Approximation

Discrete Fourier Transformation

Singular Value Decomposition

Principal Component Analysis

Support calculates:

Calculate the confidence of all possible rules given the frequent itemsets

The percentage of transactions that contain all of the items in an item set

The probability that a transaction that contains the items on the left hand side of the rule also contains the item on the right hand side

The probability of all of the items in a rule occurring together divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them

What Algorithm does Shazam use to identify songs?

Looks at the anchor peak pairs of the song.

Looks at the nearest neighbors using the frequency of the song with songs in the database

Matches anchor points between the song and songs in the database

Matches the constellation plot of the new song with the songs in the database.(edited)

Which among these is FALSE about histograms?

It reveals data quality problems

They can handle data in multiple dimensions

They can provide valuable information such as outliers.

They are a non parametric model

Which of the following is FALSE regarding the convolution operation in image processing. For each pixel (x,y):

Multiply the corresponding mask and pixel values

Average the products to perform max pooling

Sum these products to compute the new pixel values

Data Mining Quiz Night 1

Data Mining Quiz Night 1

More Quizzes