Unlock hundreds more features
Save your Quiz to the Dashboard
View and Export Results
Use AI to Create Quizzes and Analyse Results

Sign inSign in with Facebook
Sign inSign in with Google

Take the Categorical Data Analysis Quiz Now!

Think you can ace this categorical data analysis quiz? Start now!

Difficulty: Moderate
2-5mins
Learning OutcomesCheat Sheet
paper cutout style quiz banner, with category icons and question mark shapes on coral background

This categorical questions quiz helps you practice working with categorical variables, read tables and charts, and pick the right method for counts and proportions. Use it to spot gaps before a test and build speed; for extra practice, check out more math MCQs or take a quick statistics quiz .

What type of variable is the "color" of a car?
Interval scale variable
Categorical nominal variable
Numerical continuous variable
Ratio scale variable
The color of a car is a qualitative measure and cannot be expressed numerically. It falls under nominal categorical variables because its categories have no intrinsic order. Categorical variables classify observations into distinct groups without ranking them. For more information, see .
The pain severity scale with categories "mild", "moderate", and "severe" is an example of which level of measurement?
Interval
Nominal
Ordinal
Ratio
The categories "mild", "moderate", and "severe" have a clear order but the differences between them are not quantitatively equal. This makes it an ordinal level of measurement. Ordinal scales rank categories without assuming equal intervals. Learn more at .
Which of the following is an example of nominal categorical data?
Annual income bracket
Eye color
Temperature in Celsius
Customer satisfaction rating (1 - 5)
Eye color categories (such as blue, green, brown) have no inherent order, making them nominal. Customer satisfaction ratings and income brackets have an order, classifying them as ordinal. Temperature in Celsius is measured on an interval scale. See for details.
Data with only two possible categories is referred to as what?
Nominal
Binary
Interval
Ordinal
When a variable has exactly two possible categories, it is called a binary variable. Binary data are the simplest form of categorical data, often coded as 0/1. This is commonly used for yes/no or true/false questions. For more information, refer to .
Which encoding technique converts each category into a separate binary indicator variable?
Principal component analysis
One-hot encoding
Label encoding
Standardization
One-hot encoding transforms each categorical level into its own binary column, indicating presence with a 1 and absence with a 0. This prevents the algorithm from assuming any ordinal relationship. It is widely used in machine learning preprocessing. See .
Which type of plot is most appropriate for displaying the distribution of a categorical variable?
Bar chart
Line graph
Histogram
Scatter plot
Bar charts display category labels along one axis and their frequency or proportion on the other, making them ideal for categorical data. Histograms are for continuous data, and scatter plots and line graphs are for numerical relationships. Read more at .
Which summary statistic is most commonly used to describe a single categorical variable?
Correlation coefficient
Mean
Standard deviation
Frequency table
Frequency tables tally the count of observations in each category, providing a clear summary of categorical data. Means and standard deviations apply to numeric variables, and correlation coefficients measure relationships. For details, see .
Which of the following is actually a categorical variable, despite being numeric in appearance?
Height in cm
Weight in kg
Age
ZIP code
ZIP codes are identifiers without mathematical meaning and should be treated as nominal categories, not numbers. Age, height, and weight are quantitative measures. Misclassifying identifiers can lead to incorrect analysis. Learn more at .
What does one-hot encoding achieve in data preprocessing?
Normalizes numeric values to a common scale
Maps ordinal relationships to numbers
Creates dummy variables for each category
Reduces dimensionality of data
One-hot encoding generates a separate binary variable for each category level, ensuring that algorithms do not assume any ordinal relationship. It is essential for categorical inputs in many machine learning models. This process can increase dimensionality but preserves category distinctions. See .
What issue arises if you include all dummy variables along with the intercept in a regression model?
Variance inflation
Overfitting
Heteroscedasticity
Dummy variable trap
Including all dummy variables plus an intercept causes perfect multicollinearity, known as the dummy variable trap. This means the design matrix is not full rank and coefficients cannot be uniquely estimated. Omitting one reference category avoids this issue. More details at .
The chi-square test of independence is used to assess what?
The relationship between two categorical variables
Equality of variances across groups
Linear correlation between numeric variables
Differences in means of two groups
The chi-square test of independence evaluates whether two categorical variables are associated in the population. It compares observed frequencies with expected frequencies under the assumption of no association. It does not measure strength or direction, only the presence of dependency. See .
How many degrees of freedom does a 3×4 contingency table have in a chi-square test of independence?
7
12
6
5
Degrees of freedom for a contingency table are calculated as (rows - 1) × (columns - 1). For a 3×4 table, this is (3 - 1)×(4 - 1)=2×3=6. This value determines the correct chi-square distribution for the test. More at .
When should you use a chi-square goodness-of-fit test instead of a test of independence?
To compare means of two independent samples
To assess the relationship between two categorical variables
To evaluate variance homogeneity across groups
To compare observed frequencies with an expected theoretical distribution
The goodness-of-fit test checks if the observed category counts match a specified theoretical distribution. In contrast, the test of independence analyzes the association between two variables. The goodness-of-fit version uses one categorical variable and known expected proportions. Learn more at .
Which statistic measures the strength of association between two nominal variables?
Cramér's V
Pearson's r
Kendall's tau
Spearman's rho
Cramér's V is a normalized measure of association for nominal variables, ranging from 0 (no association) to 1 (perfect association). It is based on the chi-square statistic and adjusts for table size. Other measures like Pearson's r apply to continuous variables. See for details.
Which encoding technique preserves the natural order of ordinal categories?
Label encoding
Binary encoding
Effect coding
One-hot encoding
Label encoding assigns sequential integer values to categories in the order defined by the analyst, preserving the ordinal relationship. One-hot and binary encoding break categories into separate variables, losing ordinal information. Effect coding centers contrasts but does not maintain order by default. More at .
What is Multiple Correspondence Analysis (MCA) primarily used for?
Calculating correlation coefficients
Testing independence in contingency tables
Generating dummy variables automatically
Dimensionality reduction for categorical data
Multiple Correspondence Analysis extends principal component analysis to handle multiple categorical variables, reducing dimensionality. It uncovers relationships between categories and visualizes them in low-dimensional space. MCA is useful when exploring complex categorical datasets. Learn more at .
A Cramér's V value of 0.2 typically indicates which level of association?
Weak association
Strong association
No association
Moderate association
Cramér's V values around 0.1 indicate small association, around 0.3 moderate, and above 0.5 strong. A value of 0.2 falls in the weak association range. However, interpretation can vary by context and sample size. For guidance, see .
What distinguishes effect coding from dummy coding in regression?
Both methods only use 0 and 1 values
Effect coding normalizes numeric variables
Dummy coding is only for ordinal data
Dummy coding uses a reference mean, effect coding uses -1 and sum-to-zero constraints
Effect coding assigns values of -1, 0, and 1 to categories so that parameter estimates sum to zero, centering effects around the grand mean. Dummy coding uses 0 and 1 only, comparing categories to a reference group. Effect coding allows interpretation of each level's deviation from the overall mean. More at .
Which encoding method is most suitable for high-cardinality categorical variables to avoid dimensionality explosion?
Target encoding
One-hot encoding
Ordinal encoding
Effect coding
Target encoding replaces each category with a summary statistic (e.g., mean of the target) and does not create new columns for each level, preventing high dimensionality. One-hot encoding would create as many new features as there are categories. It requires careful cross-validation to avoid leakage. More at .
Before applying logistic regression, how should you handle categorical predictor variables?
Leave them as raw text
Convert them to dummy variables
Bin them into equal-width intervals
Normalize them to a 0 - 1 range
Logistic regression requires numeric inputs, so categorical variables must be converted to dummy (indicator) variables. This avoids assumptions of numeric ordering or continuity. Proper encoding ensures that the model can learn separate coefficients for each category. See .
Which assumption regarding expected frequencies must be met for the chi-square test of independence?
All expected cell counts should be at least 5
The sum of cell counts equals 1
A linear relationship between variables
Homoscedasticity across groups
For the chi-square test of independence, each expected frequency should typically be at least 5 to ensure the approximation to the chi-square distribution is valid. If many cells have low expected counts, the test may be unreliable. Alternative tests or merging categories may be required. See .
When is Fisher's exact test preferred over the chi-square test for categorical data?
When expected frequencies are low (e.g., <5)
For time-to-event (survival) data
For continuous variables
With very large sample sizes
Fisher's exact test is used when sample sizes are small or when expected cell counts fall below 5, making the chi-square approximation inaccurate. It calculates the exact probability of the observed contingency. It is computationally intensive for large tables. More details at .
Compared to dummy coding, which statement about effect coding is correct?
The intercept represents the overall mean of the outcome
It is only suitable for ordinal variables
It uses only 0 and 1 values for categories
The intercept equals the reference group mean
In effect coding, the reference category is coded as -1 rather than 0, so the regression intercept represents the grand mean across all categories. Dummy coding with a 0/1 scheme makes the intercept equal the mean of the omitted reference category. Effect coding allows each coefficient to reflect deviation from the overall mean. See .
What is a recommended approach for multiple imputation of missing categorical data?
Predictive mean matching
Deleting missing observations listwise
Using logistic regression or classification models
Mean substitution
Multiple imputation for categorical variables often uses logistic regression (binary or multinomial) to predict missing category labels. This approach preserves the categorical nature and relationships among variables. Mean substitution is inappropriate for categories, and listwise deletion can bias results. More at .
Which clustering algorithm is designed to handle both categorical and numerical variables simultaneously?
DBSCAN
Hierarchical clustering (Ward's method)
K-means
K-prototypes
The k-prototypes algorithm extends k-means by combining a distance measure for numeric attributes with a matching dissimilarity for categorical attributes. It can cluster mixed data types effectively. Other methods like k-means only handle numerical data. For more, see .
0
{"name":"What type of variable is the \"color\" of a car?", "url":"https://www.quiz-maker.com/QPREVIEW","txt":"What type of variable is the \"color\" of a car?, The pain severity scale with categories \"mild\", \"moderate\", and \"severe\" is an example of which level of measurement?, Which of the following is an example of nominal categorical data?","img":"https://www.quiz-maker.com/3012/images/ogquiz.png"}

Study Outcomes

  1. Identify Categorical Variables -

    Differentiate between nominal and ordinal data by recognizing key characteristics of categorical variables in our categorical variables quiz.

  2. Interpret Distribution Patterns -

    Read and summarize frequency tables and bar charts to interpret distributions in categorical data questions and applied scenarios.

  3. Analyze Variable Relationships -

    Examine associations between variables using contingency tables and chi-square concepts to answer challenging data analysis questions.

  4. Apply Analysis Techniques -

    Select and implement appropriate methods for categorical data analysis quiz scenarios, including cross”tabulations and proportion tests.

  5. Evaluate Quiz Insights -

    Review your answers to understand strengths and areas for improvement, reinforcing mastery of categorical questions and enhancing future analyses.

Cheat Sheet

  1. Nominal vs Ordinal Variables -

    Distinguishing nominal (unordered categories like blood type) from ordinal data (ranked scales like survey Likert items) is essential when you face categorical questions. A simple mnemonic, "NO" (Nominal = Orderless, Ordinal = Ordered), helps you remember the difference for your categorical variables quiz. For depth, see UCLA's Statistical Consulting Group tutorials.

  2. Frequency Tables and Bar Charts -

    Summarizing categories with frequency tables or bar charts is your go-to approach on any data analysis questions involving categorical data questions. Calculating counts and proportions (e.g., table() in R or Pandas) reveals the distribution at a glance. For more tips, consult the Data Visualization section at DataCamp or the University of Minnesota's Statistical Methods guide.

  3. Chi-Square Tests of Independence -

    Chi-square (χ²) tests assess whether two categorical variables are related by comparing observed (O) and expected (E) counts via χ² = Σ((O−E)²/E). Remember the CARE mnemonic (Compare Actual vs Real Expected) to recall the formula during a categorical data analysis quiz. The American Statistical Association provides clear guidelines on assumptions and interpretation.

  4. Measures of Association (Phi & Cramér's V) -

    After finding significance, quantify association using Phi for 2×2 tables or Cramér's V for larger tables: V = √(χ²/(n*(k−1))). This step answers "how strong" in categorical questions and data analysis questions. Check IBM SPSS or the Statistical Analysis Handbook for worked examples.

  5. Logistic Regression Basics -

    Use logistic regression to model binary outcomes (e.g., pass/fail) by linking p to predictors via the logit: log(p/(1−p)) = β₀ + β₝X. Interpreting β coefficients as odds ratios is key on a categorical data analysis quiz. UCLA's IDRE resource offers step-by-step examples to build confidence.

Powered by: Quiz Maker