Exam II

All of the following are true about Robust OLS except?
Can be applied to a broad range of regression models including both linear and non-linear OLS (e.g., logistic) regression models
Can be used to find undetected specification errors in the tests of the OLS assumptions
Therefore, can be used even if a violation of OLS assumption is not detected, in order to enhance the statistical rigor of the analysis
One of the statistical analysis techniques to perform a linear regression
True or False: Generalized Least Squares (GLS) One of the statistical analysis techniques to perform a linear regression Used when significant violations of the OLS assumptions are found (actually, OLS is a special case of GLS that does not have the problems)
True
False
True or False: Strengths and advantages of OLS with robust standard errors Can be used when OLS assumptions are violated. Even if there are no signs of violation, cannot be used
True
False
Logistic Regression has a dependent varaible that is:
Categorical
Continuous
Linear regression has a dependent variable that is:
Categorical
Continuous
True or False: Logistic regression is a type of linear regression analysis technique that has a dichotomous variable (e.g., 1 or 0) as a dependent variable
True
False
True or False: Logistic regression is also subject to the violation of the assumptions that can be applied to the OLS assumptions (c.f., If one of the violations are detected, logistic regression with robust standard errors should be used)
True
False
Which of the following are examples of Business Questions for Logistic Regression
What factors are associated with the offer acceptance of credit card customers?
What factors are associated with daily downloads?
What factors are related with the employees who moved to another company in a year?
What factors are associated with the companies that have experienced a hacking incident?
All of the following are conditions for ANOVA except?
A statistical technique used to compare the means of two or more groups of observations to evaluate the impact of a treatment
Requiring a continuous dependent variable and a discrete independent variable in the analysis
Can have more than one independent varaible
Generally used in science experiments since other conditions should be same, except one condition, called treatment
Should be carefully used in the context of business since it is almost impossible to have such a controlled environment
Which of the following could be questions to analyze with ANOVA?
Do accountants, on average, earn more than teachers?
Do the owners of credit card A spend more than those of credit card B?
Do the online game user who experienced account stealing spend less than those who did not?
Which of the following is a goodness of fit model?
Pvalue
Rsquared
Coefficient
Tvalue
Which of the following determines the statistical significance of each independent varaible
Pvalue
Rsquared
Coefficient
Tvalue
Which of the following determines the strength of relationship with the dependent variable?
Pvalue
Rswuared
Coefficient
Tvalue
Which of the following determines the relative strength of relationship with the dependent variable?
Pvalue
Rsquared
Coefficient
Tvalue
True or False: If a Rsquared value is 0.36 this means that 36% of the variance of the dependent variable is explained by the independent variables in the model.
True
False
True or False: 0.1 means that the independent variable is statistically significant related to the dependent variable
True
False
If you had a coefficient of 0.1 this means that when the independent variable increases by one unit, the dependent variable differs by 0.1, assuming all other independent variables remain constant.
True
False
When looking at a linear regression model the only time that the intercept is meaningful is when all the varaibles in the model can have a value of zero.
True
False
Which of the following is the correct interpretation: daily_download = 22.949181 - 0.031739*age – 0.008061*rank
When age increases by one unit, daily downloads differs by -0.031739 assuming all other variables remain constant
When age increases by one unit, daily downloads differs by 0.031739 assuming all other independent varaibles remain constant
When age increases by one unit, daily downloads differs by -0.031739.
Which of the following is the correct interpretation: average_income = 31.45 + 0.031739*age + 156.789214*gender
When gender is 1 (1=female) average income differs by 156.789214 when all other independent variables remain constant
When gender increases by one unit, average income increases b 156.789214 assuming all independent variables remain constant
When gender is 1 average income varies by 156.789214.
Which of the following is the correct interpretation: everyone_dummy = -6.8572 + 0.0977*price + 1.4940*rating + 0.00216*rank
When rank differs by one unit, the log odds of “everyone_dummy” differs by 0.00216
When rank differs by one unit, everyone dummy differs by 0.00216 assuming all other independent varaibles remain constant
If A happened as a result of B Without B, A would have not happened If B had happened, A inevitably happened
Causality
Correlation
If A happened when B happened
Causality
Correlation
True or False: One of the factors that should be considered in the interpretation is endogeneity Particularly, endogeneity should be checked when translating association to causality
True
False
True or False: business data analysts need to avoid using such interpretations as “A impacts on B”, “A causes B”, and “A induces B” that imply a causality between variables
True
False
True or False: Instead, it would be better to use “A is associated with B”, “A is related to B”, and “A is one of the relevant factors to B”
True
False
True or False: When there are evident, multiple previous evidences that can support such a causality, analysts may carefully argue it (e.g., event study concerning the impact of hacking incidents on stock price)
True
False
What are the three basic assumptions in applying statistical results to business?
Association can be interpreted as causality
Association cannot be interpreted as causality
The variables in the analysis are all the important factors that explain the case
The analysis results can accurately predict the future in a similar circumstance
Which of the following are basic rules for making an effective table chart
Follow (strictly) a standard table format of your organization
Use a variety of font syles
Align texts in rows and columns in a consistent way
Work directly with the documentation software that will be used for reporting
Alignment doesn't really matter
Highlight important figures that should be emphasized
Locate the tables properly within a page
Which of the following is not true concerning using Visual Elements
Using images such as icons and images to emphasize or highlight the information in charts
Attract readers’ attention
Avoiding to use too many, complicated visual elements that distract readers (c.f., may not be a good idea to use visual elements for conservative readers)
All of the above are true
Value and usages of diagrams in analytics reports Used for visualizing ideas, conclusions, or summary derived from analysis results Should be simple, straightforward, and insightful MS PowerPoint, offering basic forms of diagrams
True
False
Which of the following are guidelines for making an effective diagram
Use different sizes and colors of shapes to present the relative magnitude of numeric features
Presenting a story from multiple analysis results in a diagram
Avoid using online copyright free pcitures
Make diagrams readable
Use free PowerPoint Diagrams
Drawing your own diagrams despite the time it takes
What type of learning can be defined as: Predefined target variable (i.e., dependent variable) Used for prediction and classification E.g., OLS, GLS, logistic regression, ANOVA, decision tree etc.
Supervised
Unsupervised
What type of learning can be defined as: No predefined target variable to predict or classify E.g., cluster segmentation, market basket analysis, principle component analysis (PCA), etc.
Supervised
Unsupervised
What type of dataset do you use to build the initial model
Test data
Train data
Validation data
What type of dataset is used to assess how well the model predicts
Test data
Train data
Validation data
What type of dataset is used to reevaluate effectiveness of the model when applied to unused data
Test data
Train data
Validation data
Which data type is optional?
Test data
Train data
Validation data
What type of tree: when the dependent variable is categorical similar to logistic regression
Classification tree
Regression tree
What type of tree: when the dependent is continuous similar to linear regressions
Classification tree
Regression tree
Three rules for deciding the optimal size of CART are accuracy, stability, and simplicity
True
False
Classification tree: misclassification rate Regression tree: root mean square error (RMSE)
Accuracy
Stability
Simplicity
Model behaves similarly when applied to different datasets (e.g., difference between the results from training & validation datasets)
Accuracy
Stability
Simplicity
Number of leaf nodes and rules (the less, the better)
Accuracy
Stability
Simplicity
 
{"name":"Exam II", "url":"https://www.quiz-maker.com/QPREVIEW","txt":"All of the following are true about Robust OLS except?, True or False: Generalized Least Squares (GLS) One of the statistical analysis techniques to perform a linear regression Used when significant violations of the OLS assumptions are found (actually, OLS is a special case of GLS that does not have the problems), True or False: Strengths and advantages of OLS with robust standard errors Can be used when OLS assumptions are violated. Even if there are no signs of violation, cannot be used","img":"https://www.quiz-maker.com/3012/images/ogquiz.png"}
Powered by: Quiz Maker