Nima Hejazi & Jeremy Coyle -- Machine Learning Pipelines for R with sl3
About Nima and Jeremy
Nima is a PhD student in the Group in Biostatistics, where he is jointly supervised by Mark van der Laan and Alan Hubbard. Nima is also affiliated with the UC Berkeley NIH Biomedical Big Data training program and the Center for Computational Biology. Currently, his research centers around nonparametric statistical and causal inference, machine learning, and statistical computing – focusing on the development of robust techniques for inference and estimation in an eclectic collection of problem settings, with applications often arising in precision medicine, vaccine efficacy trials, computational biology, and public policy.
Jeremy is a recent PhD graduate in Biostatistics who continues working with the
department to translate statistical theory to software. During his PhD studies,
Jeremy worked with Alan Hubbard and Mark van
der Laan on a series of projects broadly
related to computational statistics, including more efficient cross-validation
routines for ensemble machine learning and a software framework for
cross-validation (origami
). His current
research interests include causal inference, model selection, re-sampling
techniques, statistical software development, and statistical methods for
assessing time series data from sensor systems.
Machine Learning Pipelines for R with sl3
We present sl3
, a recently developed
software package for the R language and environment for statistical
computing, designed to provide utilities for
engaging in a host of common machine learning tasks. Topics to be addressed
include efficient data organization and accession, the construction of pipelines
for data munging and analysis (based on the idea popularized by Python’s
scikit-learn
),
and methods for performing ensemble machine learning (e.g., optimal stacked
regressions). sl3
is a core part of the
tlverse
, a new ecosystem of software packages currently
being developed by a team in the Group in
Biostatistics here at Berkeley.
Selected materials for this presentation are available on GitHub here.
Software Setup
R and RStudio Installation
Jupyter R Kernel Installation
sl3
Installation
library("devtools")
devtools::install_github("tlverse/sl3@devel")
devtools
installation (if needed)
install.packages("devtools")