I am a data scientist interested in machine learning and deep learning.
This page is a portfolio of code, data analysis, and other output.
neighbr - Classification, regression, and clustering with k nearest neighbors algorithm. Implements several distance and similarity measures, covering continuous and logical features. An introduction can be found here.
amelie - Anomaly detection as binary classification for cross-sectional data. Uses maximum likelihood estimates and normal probability density functions to classify observations. An introduction can be found here.
I also maintain pmml and pmmlTransformations.
Motion Recognition - Classifying human motion using accelerometer and gyroscope data using a random forest model in R.
Severe Weather - Finding the most influential weather events across the U.S. in terms of economic and health impact in R, using the NOAA database.
Dog breed classifier - Building convolutional neural network image classifiers in keras, including transfer learning. The final model is applied to pictures of dogs as well as humans. Project for Udacity AI Nanodegree.
Time series prediction and text generation with RNN - Using RNNs to predict stock prices from a time series, and generate text one character at a time. Project for Udacity AI Nanodegree.
Sparse autoencoder - Extracting image features with a basic sparse autoencoder in python.
Conference Party - Application for organizing conferences. The backend was built as the course project for Developing Scalable Apps. Java on Google App Engine.
LetterFreq! - Shiny app that counts and plots character frequencies of input text. More information is in this short presentation.
Sorting benchmarks - Timing a few sorting algorithms in a jupyter notebook.