Understand the problems associated with inferential modelling of high dimensional, wide data including inappropriate covariate selection, coefficient inflation and over fitting.
Content / structure
Outline why conventional methods of model selection (e.g. a univariable filter followed by stepwise selection) perform poorly with high dimensional data.
Describe the principles and issues of overfitting including when and why it is likely to occur and how to detect it
Describe the impact of overfitting on inference
Understand different approaches to overcome problems when modelling high dimensional, wide data including regularisation and altering the variance bias trade off. Understand the strengths and weaknesses of different approaches.
Content / structure
Outline the principle of variance-bias trade off and how this can help to overcome issues of over fitting when the number covariates (p) is greater than the number of observations (n)
Describe the principles of regularisation and discuss details of methods including lasso, elastic net, MCP and modified Bayesian Information Criterion
Describe advantages and disadvantages of different approaches
Learn to implement basic regularisation techniques on high dimensional data including elastic net, minimax convex penalty (MCP), and modified BIC using 'R'.
Content / structure
Methods of regularised regression will be presented as examples and delegates will be able to fit models on their own laptops using data and code provided.
Model interpretation and assessment of fit will be discussed
Comparison of results between regularised methods and conventional model selection based on AIC will be discussed
Understand the concept of covariate selection stability and learn how to implement it using regularised models in ‘R’.
Content / structure
Define covariate selection stability in regression models
Outline the theoretical principles of selection stability and describe its usefulness for robust inference.
Describe methods to illustrate selection stability graphically.
Practical implementation; delegates will have the opportunity to evaluate selection stability using simulated datasets with known underlying casual variables (including random effects models) using code provided in R software
Understand the concept and importance of multiple method triangulation and learn how to implement this using ‘R’.
Content / structure
Outline the principles of multiple method triangulation and its relevance in terms of reproducible science
Demonstrate the concept using triangulation of a variety of methods of regularised regression
Practical implementation; delegates will have the opportunity to conduct multiple method triangulation using code provided in R software