About

Who Am I?

Hi I'm Xavier Genelin. I'm a data scientist with 5+ years experience working with data and 3 of those being in the field of Data Science and Machine Learning. I have an educational background in mathematics, economics, and statistics. I received my bachelors in Mathematics from Xavier University in 2018 and my masters in Statistics from North Carolina State University in 2022.

I have experience working with python, R, and SQL to solve problems and assist in the decision making process using data. I love the different problems I can solve and insights I can uncover with various datasets. My employement experiences have led me to work with data in supply chain, marketing, manufacturing, HR, marketing, and other areas of business. I am open to remote Data Scientist and Machine Learning Engineer positions in a variety of industries.

Outside of work I am usually busy with sports. I coach high school football in Arcadia, Wisconsin and AAU basketball in the same area. This last year we had 6 different age groups for both boys and girls that I helped coach. My interests for side projects involve sports data (football and basketball mostly), music, and video games.

Experience

Work Experience

Data Scientist Freelance Research — July 2022-June 2024

  • Consulted with a medical team to examine rare skin cancer data from SEER registry
  • Collaborated using Github for version control and sharing code throughout the project Assisted the team in data collection and applying proper statistical and machine learning techniques
  • Created a linear regression model in R to determine significant variables in skin cancer incidence and write up the results for the final publication
  • Publication in the Journal of American Academdy of Dermatology in June 2024

Data Scientist Mattress Firm — November 2021-July 2022

  • Built customer segments in Python based on demographic data using scikit-learn to create unsupervised machine learning models to analyze customer habits and look for marketing opportunities for new products
  • Utilized SQL for data extraction, manipulation, and analysis to support data-driven decision making and machine learning models
  • Supported multiple stakeholders across private brand teams in developing data-driven marketing strategies powered by customer analytics and statistical models
  • Investigated customer survey data in python with NLP using the NLTK library to analyze feedback from various customer groups and determine opportunities for improvement in product delivery and product satisfaction
  • Created an XGBoost machine learning model to classify customers based on previous transaction habits to aid in customer analysis with 82% accuracy
  • Analyzed the impact of economic stimulus packages on sales, determining there was an increase
  • Utilized Github for version control and collaboration in data science projects while implementing CI/CD pipelines to ensure code quality through automated testing and streamline deployment
  • Conducted A/B testing for different media mixed models for new product and private brand teams
  • Communicated results with stakeholders (managers to C-suite) to both technical and non-technical audiences

Quantitative Analyst NC State Baseball — March 2021-June 2022

  • Formed a report using R and Tableau to analyze NC State pitchers to help optimize their performance
  • Consulted the coaching staff based on findings from analysis and assist with game strategy
  • Advanced to the 2021 College World Series semifinal and 0.632-win percentage in 2022

Data Analyst/BI Analyst November 2019-November 2021

  • Automated manual processes writing SQL queries for data extraction, saving 45 hours per week
  • Designed an app in python to optimize the process of diverting shipping containers, saving 8 hours per week
  • Conducted a statistical analysis in R on new product sales and advertisement spending using a machine learning model, determined ad spending had no impact on sales, saving $300,000
  • Developed and maintained 17 Power BI Dashboards for Supply Chain, Manufacturing, and HR to support business decisions
Education

Education

Master of Science, Statistics

North Carolina State University — May 2022
  • Clubs: AI, Sports Analytics
  • Relevant Courses: Deep Learning with Neural Networks, Statisical Learning, Fundamentals of Regression, Analysis of Big Data, Data Science for Statistics, Fundamentals of Statistical Inference (I and II), Applied Statistical Methods (I and II)

Bachelor of Science, Mathematics

Xavier Univeristy — May 2018
  • Concentration in Economics
  • Clubs: Clubs: Math Club, Economics Association, Habitat for Humanity, Relay for Life
  • Relevant courses: Econometrics, Probability Theory, Public Economics, Discrete Mathematics, Differential Equations

What I do?

Some areas of expertise

Machine Learning

Classification, Regression, Clustering, Deep Learning, NLP, Rec Systems

Programming

Python, R, PyTorch, pandas, NLTK, Tidyverse, numpy, sklearn, PySpark

Tools

Power BI, Tableau, R Shiny, GitHub, Jupyter Notebook, RStudio, Google Colab

Data Engineering

SQL, T-SQL, MySQL, BigQuery

Cloud Technologies

GCP, Docker

Other

Research, Presentation, Communication, Critical Thinking, Problem Solving

My Projects

Recent Work

Terrain Identification

Use deep learning techniques to identify terrain types for a prosthetic limb based on accelerometer and gyroscope readings.
  • PyTorch
  • Deep Learning
  • CNN

Emotion Detection

Use deep learning techniques to classifiy human emotion behind a conversation.
  • PyTorch
  • Deep Learning
  • CNN
  • NLP

NFL Win Prediction

An R shiny app that allows a user to explore NFL game data, and create models with selected parameters to predict winners for selected weeks.
  • R
  • R Shiny
  • Classification

Shiny App

Mobile Game User Retention A/B Testing

This project conducts an A/B Test for user retention in the mobile game Cookie Cats using the following methods: a bootstrap, Chi-Squared Test, and a Bayesian Test.
  • A/B Testing
  • Python

Music Recommendation System with ALS

This project creates a music recommendation system using Alternating Least Squares that with suggest music artists to a user based on their listening history and collaborative filtering.
  • Recommendation System
  • PySpark
  • Python

Alzheimers Risk Factors

Explore MRI scan data of demented and non-demented individuals to identify risk factors associated with Alzheimer-onset dementia
  • PySpark
  • Python
  • Classification

Motorcycle Sales

Explore motorcycle sale data to implement both the Grid Search Method and Gradient Descent Method to predict the selling price with the lowest variance.
  • Python
  • pandas
  • matplotlib

Data Channel Prediction

Analyze an online news popularity dataset to provide a report for 6 different data channels and predict the number of shares.
  • R
  • Regression

Covid API

A vignette that shows how to retrieve data from the Covid API for exploratory analysis.
  • R
  • Tidyverse
  • API

Red Wine Quality

Analyze the quality of red wine using various regression and classification models.
  • R
  • Classification
  • Regression

Big Data Project

This project provides a synopsis of Netflix's big data pipeline, investigates data in a sample database with SQL, and summarizes NFL data using the Pandas API on Spark.
  • PySpark
  • SQL
  • pandas

Get in Touch

Contact

Interested in working together or have a role that I may be a fit for? Feel free to reach out!

xgenelin@gmail.com

(507) 459-0673