About Me

Driven by curiosity, shaped by data.

I'm Samhita Sarikonda, a Data Analytics graduate from George Mason University who thrives on building intelligent systems that solve real-world problems. I've worked across diverse domains — from secure AI agents and LLM pipelines to cloud data engineering and NLP research.

My portfolio reflects both the science and storytelling of data. Whether it's cleaning a messy dataset, deploying cloud pipelines, or testing models under noisy inputs — I bring structure, curiosity, and purpose to each challenge.

If you're here, welcome. Scroll down and explore the work — each project tells a story of where data meets clarity.

Course Work

Data Science and Engineering Core

Database Management Systems Data Programming Statistics for Data Science Intro to Analytics & Modeling Predictive Analytics Business Intelligence

Machine Learning and AI

Interpretable Machine Learning Introduction to NLP NLP with Deep Learning

Big Data and Cloud Systems

Scalable Data Analytics and Big Data Data Warehousing Data Mining Cloud Computing (AWS/GCP)

Special Topics

Prompt Engineering & LLMs Data Visualization (Tableau/Power BI) Multimodal Learning

Projects

NetGuard: AI Pipeline for Threat Detection

NetGuard: AI Pipeline for Threat Detection

Built a modular AWS Lambda pipeline leveraging SecureGPT and S3 to classify network logs, integrate Reflexion and CoT prompting for improved threat detection, and automate prioritization workflows.

Scalable Fraud Detection - Geospatial Data

Scalable Fraud Detection - Geospatial Data

Built a fraud detection pipeline by integrating transactional data with geospatial mapping. Used clustering and anomaly detection on a scalable Spark pipeline to identify fraud hotspots across U.S. regions.

VectorSpeak: Word2Vec Skip-Gram Embedding Model

VectorSpeak: Word2Vec Skip-Gram Embedding Model

Implemented a Word2Vec model from scratch using PyTorch, training a skip-gram network with negative sampling to generate and visualize semantic word embeddings from raw text data.

Airline Reviews Dashboard

Airline Reviews Dashboard

Designed an interactive dashboard in Tableau to analyze airline customer reviews. Visualized sentiment trends, review categories, and rating breakdowns to uncover service improvement opportunities.

Chest X-Ray Disease Detection

Chest X-Ray Disease Detection

Built a deep learning model using DenseNet-121 to detect 14 chest diseases from X-rays. Applied GradCAM to visualize regions influencing predictions. Handled class imbalance using weighted loss.

CAGRviz: Visualizing Nifty 50 Growth

CAGRviz: Visualizing Nifty 50 Growth

Analyzed Nifty 50's 10-year performance using CAGR to uncover long-term growth trends. Built interactive charts to highlight consistent patterns for investors and analysts.

Calgary Crime Data Analysis

Calgary Crime Data Analysis

Analyzed Calgary crime data (2018-2024) and built an LSTM model to predict future crime trends. Conducted EDA to uncover high-crime areas, seasonal trends, and category patterns to support data-driven policing and safety planning.

Real-Time Crypto Data Pipeline with Mage

Real-Time Crypto Data Pipeline with Mage

Built a real-time data pipeline using Mage to ingest and process cryptocurrency prices from the Polygon API. Designed modular ETL flows for data extraction, transformation, and warehouse loading to support streaming analytics.

Bridge Defect Detection

Multimodal Bridge Defect Detection

Co-authored an IEEE-published framework integrating NDE sensor data (IE & USW) with image processing to detect bridge defects. Used Alpha Shape Analysis and visual cross-verification to enhance defect localization and minimize false positives.

Autogen Research Paper Summarization

Autogen Research Paper Summarization

Built an AI tool using Autogen and Streamlit to fetch and summarize research papers. Integrated ArXiv and Google Scholar for automated insights and topic recommendations.

Experience

🎓

Graduate Teaching Assistant

George Mason University

Jan 2025 - May 2025

Supported instruction and grading for NLP with Deep Learning (AIT 726), focusing on large language models (LLMs), PyTorch-based model implementation, and reproducible ML workflows. Mentored students in lab sessions and strengthened technical communication through feedback and walkthroughs.

NLP LLMs PyTorch Word2Vec RNN LSTM Named Entity Recognition Sentiment Analysis Model Evaluation
🔬

Data Science Research Assistant

George Mason University

Jun 2024- May 2025

  • Built and validated modular data pipelines to extract and preprocess multimodal logs and structured/unstructured datasets; implemented SQL-based checks for consistency across time partitions.
  • Contributed to LLM-based experimentation for event extraction and anomaly detection; co-authored peer-reviewed papers (IEEE, NLDB) involving benchmarking and multimodal fusion research.
LLMs Multimodal Fusion Data Pipelines PyTorch HuggingFace NLP Evaluation Python SQL Research
💼

Associate Data Engineer

Twilio Inc.

Jan 2022 - Aug 2023

Developed and maintained scalable SQL-based ETL workflows for billing pipelines using internal data tools. Built Looker dashboards to monitor KPIs across departments and collaborated with engineering teams to automate exception handling, logging, and data validation using AWS services.

SQL ETL Looker AWS Data Pipelines Automation Exception Handling SQL
🛠

Data Operations Analyst

Twilio Inc.

Jan 2021 - Dec 2022

Automated reconciliation workflows for usage and billing systems using Python and SQL. Integrated Stripe APIs and used Datadog and Splunk dashboards to monitor anomalies and streamline financial reporting for business analysts and finance teams.

Python SQL Stripe API Dashboards Anomaly Detection Finance Analytics Automation
📊

Data Analyst Intern

BSNL

Jun 2020 - Aug 2020

Analyzed telecom subscriber trends and usage patterns from regional datasets using SQL and Excel. Created time-series dashboards and performance summaries for internal strategy reviews and reporting.

SQL Excel Time-Series Dashboarding(Tableau) Telecom Data Analysis

SKILLS

  • Statistical Methods : Regression, Correlation, A/B (Hypothesis Testing), Clustering, Time-series Analysis
  • Machine Learning : Supervised Learning, Unsupervised Learning, Reinforcement Learning, Large Language Models (BERT, LLAVA, LLAMA, GPT4o, FLAN T5)
  • Programming Languages : Python, SQL, R, Pytorch, Keras, Tensorflow
  • Cloud Platforms : AWS (EC2, S3, Lambda), Google Cloud Platform
  • Databases : MySQL, SQLite, Google BigQuery, MongoDB
  • Data Visualization tools : Tableau, PowerBI, Looker Studio
  • Big Data & ETL Tools : HDFS, Pyspark, Kafka
  • Data Mining : Data Cleaning, Data Wrangling, Data Exploration, Data Visualization, Data Analysis, Data Mining, Data Modelling, Data Interpretation, Data Presentation

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form