NBA Data Analysis
Implemented analysis and data visualization concepts to display NBA player statistics on web application using Heroku and Python libraries
Ph.D. from NC State University
Implemented analysis and data visualization concepts to display NBA player statistics on web application using Heroku and Python libraries
Automated Data Warehouse to store Cannabis Product Data and visualize product analysis using MySQL, Mode Analytics, Tableau and Python libraries
Implemented machine learning algorithms (Logistic Regression, Decision Tree, Neural Networks, and Gradient Boosting) to detect credit card fraud using Python libraries (numpy, pandas, and scikit-learn)
Forecasted the monthly sales with Long Short-term Memory (LSTM) method using Python libraries (keras and scikit-learn)
Chatbot uses deep learning techniques (Natural Language Processing) to interact with customers via chat graphical user interface using Python libraries (keras, numpy, nltk, and tkinter)
Created a robust database system using SQL to provide an command line user interface for information storage and retrieval
Implemented Decision Tree algorithm using GINI Index and Information Gain to predict outcomes in R
Applied predictive modeling algorithms (Decision tree) in R to improve the odds to make a profit on small bet lines
Recent Research: Applying Machine Learning Methods for Insight into Textile Recycling Behavior
Abstract:
The purpose of this study was to investigate supervised machine learning models’ performance to determine the critical factors for textile recycling behavior (recycle textiles or do not recycle textiles). Secondary data from a survey given to 1,054 participants were analyzed. Six parameters were varied: feature scaling, cross-validation techniques, sampling techniques, number of folds, hyperparameters, and feature importance. Five algorithms were compared: decision tree, linear support vector classifier (linear SVC), K-nearest neighbor (KNN), gradient boosting decision trees (GBDT), and random forest trees. The hyperparameters used were the measure of impurity for decision tree and random forest, the number of nearest neighbors for KNN, and the learning rate for GBDT. The best performing model based on the F1 score was random forest on oversampled data. The feature importance resulted in zip code, gender, and ethnicity as the top 3 features. Zip code could be important because of high cardinality. When looking at permutation feature importance, the top three features were types of dwelling, gender, and ethnicity. Implications for textile and apparel survey researchers are given.
Things I love to do in my spare time!
Database and Data Visualization for Cannabis Products
Simple web application performs simple web-scraping and data visualization of NBA player statistics
Sudoku Game Solver using the backtracking algorithm using Python library (pygame)