RG.

Hi! I am Rohan Gonjari.

A Data Scientist, Data Analyst & Machine Learning Engineer.

With a Master's in Data Science & proven track record of 4 years in Data Analytics roles, I have worked on optimising pricing models and customer retention in the finance and insurance sector as a Data Scientist. I have specialized in imbalanced classification of Multi-Modal data & hyper-tuning deep neural network models as an ML Researcher. As a Data Analyst, I leveraged regression models & tableau dashboards to identify targets to increase sales. I have executed projects encompassing end-to-end implementations of ML pipelines, designing database schemas, conducting statistical tests & building interactive visualizations. Throughout my research, I have contributed to research publications on Multimodal Classification using GNNs.

Experience

Data Scientist

Legal & General America

Contract | Remote
Oct 2023 - Present

• Conducted statistical A/B testing (t-tests, ANOVA) to evaluate the effectiveness of dynamic pricing strategies.
• Utilized Bayesian statistics, Causal Inference, XGB models for dynamic pricing, to raise $5 million in additional ARR.
• Played a role in improving ad-hoc SQL queries & reporting to optimize pricing adjustments & strategic decisions.
• Optimized ETL pipelines with GCP tools (Cloud Storage, Dataflow) & Apache Spark for scalable healthcare & financial data management.
• Leveraged Snowflake & MS-SQL for data querying & management, improving data quality & analysis speed.
• Utilized Power BI to develop interactive dashboards & automated data processes to optimize report generation.

ML Researcher

University of Massachusetts Dartmouth - MIND Lab

Full-time | Dartmouth, MA
Aug 2022 - Sep 2023

• Utilized GNNs, Neural Networks, K-means clustering, Support Vector Machines (SVMs), & Decision Tree models to implement supervised machine learning using graph data.
• Performed dimensionality reduction (PCA, t-SNE) to help visualize graph nodes and edges using Seaborn.
• Designed ML architectures to efficiently fuse information for multimodal data (EEG, fNIRS) to improve BCI-systems.
• Proposed model showcased a notable improvement in classification by 16.25% & 21.65% in two distinct studies indicating potential impacts on patient care (Master’s Thesis).

Data Analyst

Destek Infosolutions

Full-time | India
Aug 2020 - July 2022

• Collaborated with 120+ clients to implement GA4 via GTM to meet project requirements with a 95% success rate.
• Implemented A/B testing to ensure accuracy & reliability of data collected in GA4 when updating event triggers.
• Led a data sourcing project to establish data pipelines & data warehouse, utilizing GCP services & SQLite.
• Applied regression models for targeted customer segmentation, resulting in a substantial 18% sales boost.
• Developed different Tableau dashboards to have more visibility of companies’ sales portfolio & other KPIs.

Projects

Sentiment Analysis of 2022 FIFA World Cup

Extracted real-time sentiment data from Twitter's API, categorized FIFA World Cup tweets using VADER sentiment analysis, and deployed a scalable data pipeline on Amazon Airflow & EC2 for processing, storing results on S3.

  • Python
  • Airflow
  • EC2
  • S3

Hospital Management System

Established MySQL data architecture for Health Management System, performed ETL using Selenium for NHS surveys, and transformed prescription data with NumPy and Pandas for loading into the HMS database.

  • Python
  • Selenium
  • MySQL

Evaluating Medical Condition

Diagnosed patient health based on predicted health scores using EDA and modeling. Predicted scores using regression model with Cross-Validation & Recursive Feature Elimination with significant & engineered features.

  • R
  • XGBoost

Visualizing Olympics Performance

Leveraged D3.js, HTML, and CSS to create a visualization featuring interactive geospatial and scatter plots of Olympics athlete data, facilitating insights into medal-winning factors and country-level correlations.

  • D3.js
  • HTML
  • CSS
  • Javascript

Parallelizing Conway's Game of Life

Utilized high-performance scientific techniques to scale cellular grid simulation on multiple cores. Achieved efficiency of 5.5 times when scaling automation problem to 8 cores compared to a sequential run.

  • MPI for Python

King County Housing Price Prediction

Analyzed housing properties for a real estate agency, engineered features, & identified significant factors. Leveraged regression with cross-validation & recursive feature elimination to predict housing prices.

  • Jupyter
  • Sci-Kit

Skills

Technologies

Libraries

Cloud

Expertise

Education

Master's of Science

Data Science - 2023

Coursework: High-Performance Parallel Computing, Advanced Data Mining, Deep Learning, Data Visualization, Data Architecture & Design, Business Analytics, Graph Neural Networks

Bachelor's of Technology

Electronics & Communication Eng - 2016

Coursework: Numerical Analysis, Discrete Mathematics, Data Structures & Algorithms, Statistical Analysis

Publications

Multimodality-enhanced graph generation and multimodality-driven graph convolutional networks.

Context-aware Multimodal Auditory BCI Classification through Graph Neural Networks.

Adversary on Multimodal BCI-based Classification.

Contact