Dhruv Joshi | AI/ML Engineer

AI/ML Engineer • GenAI/MLOps Engineer • Data Scientist • Prompt Engineer

AI is not just a tool for Automation;
It's an enabler for Augmentation.

❝ I have always been convinced that the only way to get artificial intelligence to work is to do the computation in a way similar to the human brain. ❞ -- Geoffrey Hinton

Machine Learning Engineer with 3+ years of hands-on experience designing, building, and deploying production-grade ML, Deep Learning, and Generative AI systems for enterprise financial and healthcare clients, including TD Bank, RBC, and Quest Diagnostics. I specialize in LLM engineering, RAG pipelines, and agentic AI workflows, alongside traditional ML (Computer Vision, NLP, Recommendation Systems), and have shipped models serving 1,000+ concurrent users with sub-100ms latency and 99.2% uptime in Kubernetes-based environments.

My core stack includes Python, PyTorch, TensorFlow, FastAPI, and cloud platforms like AWS and GCP (Vertex AI, SageMaker), with strong MLOps foundations in CI/CD for ML, MLflow, model monitoring, drift detection, and automated retraining. I enjoy translating complex ML concepts into clear business value, collaborating with cross-functional teams, and owning the end-to-end lifecycle from data and modeling through deployment, observability, and continuous improvement.

97.7% ROC-AUC cancer detection model deployed into clinical workflow.

LLM/RAG assistants powering real users with monitored latency & quality.

End-to-end MLOps on GCP: Vertex AI, Kubernetes, MLflow, CI/CD, monitoring.

Contact Me View Projects

Based in Toronto, ON
Open to AI/ML Engineer / Data Scientist / Prompt Engineer / MLOps / GenAI roles
Toronto & Remote.

Quick Snapshot

Role:AI/ML Engineer • MLOps Engineer • Prompt Engineer • Data Scientist • GenAI Engineer
Experience: 3+ years building production ML & GenAI systems
Strengths: LLMs & RAG, MLOps on GCP, scalable APIs, monitoring & reliability
Domains: USA/Canada Healthcare , fintech/Insurtech, B2B analytics, RegTech, EdTech, E-Comm

Email LinkedIn GitHub

Professional Experience

Owning the full lifecycle of ML and GenAI systems: data, modeling, MLOps, and production reliability.

ML Engineer

Leucistic Technologies Pvt Ltd · United States · Remote

Oct, 2025 – Mar 2026

Architected an ML‑powered legal and business document automation platform that ingests multi‑domain, multilingual contracts (PDF/DOCX), reads and understands them end‑to‑end, and delivers real‑time clause extraction, risk analysis, and review summaries for law firms.
Built a robust ingestion pipeline with OCR, semantic chunking, and structural parsing to normalize long, complex documents, enabling accurate downstream clause detection and metadata extraction across jurisdictions and languages.
Fine‑tuned transformer‑based LLMs and domain‑specific legal models for clause classification, obligation extraction, and anomaly detection, iterating for weeks to reach ~99% accuracy under strict law‑firm quality requirements.
Implemented a hybrid risk‑scoring engine combining embedding‑based similarity search, rule‑based checks, and RAG over approved templates and historical documents, surfacing high‑risk clauses and deviations automatically.
Designed scalable retrieval and reasoning pipelines using chunk‑and‑aggregate strategies, vector databases, and context‑aware retrieval to support very long, multilingual documents without losing global context.
Productionized the platform using FastAPI, Docker, and Kubernetes on GCP (Vertex AI), with CI/CD pipelines, monitoring, and logging, exposing APIs that integrate seamlessly into existing legal and business workflows and significantly reducing manual contract review time.
Initiated development of a secure, firm‑specific legal copilot chatbot that unifies these services behind a conversational interface (LLM + RAG), enabling lawyers and business users to query contracts, templates, and precedents in natural language.

LLMs RAG Legal NLP Multilingual NLP Transformers Vector Databases FastAPI Docker Kubernetes (GKE) GCP/Vertex AI CI/CD MLOps

ML Engineer

Jay Analytix · Toronto, Canada · Remote

Jan, 2025 – Oct, 2025

Architected and delivered a multilingual AI frontdesk platform that automates inbound/outbound calls and chat for booking, cancelling, rescheduling, and updating appointments across enterprise healthcare and financial clients.
Designed a modular LLM/RAG pipeline combining fine-tuned LLMs, domain-specific knowledge bases, and intent/entity models to handle FAQs, complex inquiries, complaint logging, and admin workflows with high accuracy.
Built scalable voice and chat APIs (FastAPI + Docker + GKE) with real-time speech recognition and text generation, enabling the system to handle 1,000+ concurrent sessions with sub-100ms latency and 99.2% uptime.
Implemented MLOps on GCP and Vertex AI with GitHub Actions CI/CD, automating training, deployment, and rollback for multiple conversation, routing, and ranking models across environments.
Deployed and monitored services with centralized logging, metrics, and alerting (Prometheus/Grafana + cloud logging), reducing incident detection and resolution times for production issues.
Standardized experiment tracking and model governance using MLflow, versioned datasets, and registries, enabling reproducible experiments and faster iteration on new model variants.
Collaborated with product, operations, and client stakeholders to define success metrics (containment rate, call deflection, booking conversion), and continuously tuned flows to improve automation coverage and user experience.

LLMs RAG Multilingual NLP FastAPI Docker Kubernetes (GKE) GCP Vertex AI MLflow GitHub Actions MLOps

AI Engineer

ATEM [Japan, USA] · Remote

Mar 2024 – Jan 2025

Designed a multi‑stage cancer detection pipeline using CNN‑based models where early stages perform binary screening (sigmoid) and later stages perform multi‑class grading (softmax) to classify cancer type, severity, and progression over time.
Built an end‑to‑end system that ingests histopathology and medical imaging data, applies OCR and semantic/region‑based chunking, and feeds structured tensors into specialized models for presence detection, staging, and damage estimation.
Achieved 97.7% ROC‑AUC in production by combining rigorous data preprocessing, augmentation, cross‑validation, calibration, and continuous monitoring for distribution shift and performance drift.
Linked the models into a sequential, LLM‑assisted decision framework that can answer clinician queries and explain predictions by referencing both image‑based features and historical patient/context data.
Implemented a hybrid risk‑scoring engine that aggregates outputs from multiple models into interpretable risk categories and recommended next actions (follow‑up imaging, biopsy, treatment escalation), aligned with clinical guidelines.
Productionized the pipeline using Python, PyTorch/TensorFlow, FastAPI, and containerized services on AWS/GCP with Kubernetes, enabling low‑latency inference and reliable integration into existing clinical systems.
Exposed results through dashboards and APIs that surface predictions, confidence intervals, trend plots, and quality metrics, giving clinicians an end‑to‑end view of patient status and model performance in real time.

Computer Vision CNNs PyTorch TensorFlow Medical Imaging Healthcare AI Model Evaluation ROC-AUC MLOps

Finance AI Analyst

CA Hemil Joshi and Associates · Onsite

May 2023 – Nov 2023

Designed and developed an end‑to‑end FinTech AI platform that ingests invoices, financial documents, PDFs, and high‑quality images, transforming them into structured, analytics‑ready data using LLMs, NLP, and computer vision, cutting manual data-entry effort by ~70%.
Built a robust ingestion pipeline with OCR, layout‑aware parsing, and transformer‑based language models to detect and normalize key entities (amounts, dates, counterparties, line items, tax details, payment terms), improving extraction accuracy from ~85% to 97%+ on held‑out test sets.
Implemented multi‑stage NLP components (entity recognition, semantic parsing, rule‑based validation) that reduced downstream correction/reconciliation time by 60% and decreased finance ops escalations for data issues by ~40%.
Leveraged embeddings, semantic similarity, and RAG‑style retrieval over historical documents and reference tables to auto‑resolve ambiguous fields, shrinking average invoice processing time from days to minutes and accelerating monthly closing cycles.
Orchestrated automated workflows that clean, validate, and load extracted data into warehouses, enabling near real‑time spend analytics and contributing to an estimated annual savings of $150K+ in manual processing and outsourcing costs.
Exposed processed data through interactive dashboards and APIs, giving finance and operations teams live visibility into cash flow and vendor risk, which supported better negotiations and helped uncover ~3–5% cost inefficiencies across certain spend categories.
Productionized the solution using Python, PyTorch/Transformers, FastAPI, Docker, and cloud services with monitoring and logging, ensuring stable SLAs (p95 latency under 500 ms) even as document volume grew 3x.

LLMs Document AI NLP Computer Vision OCR Transformers Embeddings RAG Python PyTorch / Transformers FastAPI

Internship Experience

Data Analyst

AaNeel Infotech · United States · Onsite(India)

Jan 2023 – Apr 2023

Learned how to work with real-world healthcare data and production systems by supporting end-to-end analytics and reporting for a US-based healthcare organization.
Applied data analysis, EDA, and feature engineering skills in Python and Excel to understand doctor availability patterns and operational constraints across 11 hospital websites.
Designed and deployed an automated data pipeline that scraped real-time doctor availability using Python-based web scraping, then cleaned, normalized, and structured the data for downstream use.
Integrated the pipeline with Excel, a centralized database, and a live production website, improving data timeliness and reducing manual update effort by more than 70%.
Earned formal recognition (CEO award and return offer) for improving data accuracy, automation, and operational efficiency through this AI-driven data pipeline.

Python Web Scraping Pandas Data Cleaning ETL Pipelines Excel / Power Query SQL Healthcare Data Dashboards Automation

Data Science Intern

LogicArt Technologies · Onsite

Jun 2022 – Dec 2022

Learned core data science fundamentals, including exploratory data analysis (EDA), data cleaning, and feature understanding using Python, Pandas, and NumPy.
Applied EDA techniques (summary statistics, correlations, outlier detection, missing‑value analysis) on real-world datasets to understand data quality and model readiness.
Built a small end-to-end EDA pipeline that automated data loading, preprocessing, and visualization, generating reusable insights for downstream ML experiments.
Practiced creating basic ML‑ready datasets by engineering simple features, encoding categorical variables, and splitting data into train/validation sets.
Documented findings and insights in notebooks, improving my ability to communicate data stories and prepare datasets for future AI/ML modeling work.

Python Pandas NumPy Exploratory Data Analysis (EDA) Data Cleaning Data Visualization

Selected Projects

Projects that showcase how I design, build, and ship production-grade ML, GenAI, and MLOps systems.

VoiceGPT – Interactive Voice & Text AI Assistant

GenAI · Speech Recognition · Transformers

Interactive voice-to-text and AI response system built in Google Colab that lets users talk to an LLM using their microphone or type prompts directly. It combines browser speech recognition with a GPT‑2 text generation pipeline for dynamic, conversational responses.

Integrated browser-based speech recognition via JavaScript to capture and transcribe spoken input into text in real time.
Used the transformers library to load and run GPT‑2, generating coherent responses conditioned on either transcribed speech or manually entered text.
Designed a Colab-backed web UI with buttons for recording, text input, and response generation, providing clear visual feedback and a smooth, interactive user experience.

LLMs GPT-2 Transformers Speech Recognition JavaScript Python Google Colab

View on GitHub →

Banana Plant Disease Detection

Computer Vision · Transfer Learning · ResNet101

End-to-end image classification system that detects and classifies major banana leaf diseases (Cordana, Sigatoka, Pestalotiopsis, and healthy) using deep learning and transfer learning on ResNet101, supporting early intervention and yield optimisation for farmers.

Curated and organised a multi-class image dataset of banana leaves into train/test splits, covering healthy and diseased classes for robust generalisation to unseen images.
Implemented a preprocessing pipeline with normalization, label encoding, tensor conversion, and extensive data augmentation to improve robustness and reduce overfitting.
Fine-tuned a pre-trained ResNet101 backbone with custom dense layers (GAP → 64 → 128 → 512 → 1024 → 4-class softmax) using Adam and sparse categorical cross-entropy.
Achieved ~92% test accuracy with clear training curves (accuracy and loss), demonstrating strong performance on real-world banana disease images.
Saved the trained model and evaluation workflow for reuse, enabling practical deployment in agricultural decision-support tools.

ResNet101 Transfer Learning TensorFlow / Keras Image Classification Data Augmentation Agriculture AI

View on GitHub →

Healthcare AI – Brain Tumor Detection from MRI

Medical Imaging · ResNet50 · Tumor Detection

Deep learning system that reads brain MRI scans and predicts whether a tumor is present, using ResNet50-based transfer learning on TCIA/TCGA lower-grade glioma data with expert segmentation masks as ground truth.

Worked with 3,900+ MRI scans and corresponding FLAIR segmentation masks from TCIA/TCGA, linking imaging with tumor genomics and patient outcome data for radiogenomics analysis.
Built a preprocessing pipeline using ImageDataGenerator to normalize pixel values, resize images to 256×256, create train/validation/test splits, and stream batches efficiently for GPU training.
Used ResNet50 (ImageNet weights) as a frozen feature extractor and added a custom classification head with global average pooling, dense layers with ReLU + dropout, and a 2‑class softmax layer for tumor vs. no‑tumor classification.
Applied a two‑phase training strategy: first training only the head, then unfreezing the last ResNet50 blocks with a lower learning rate for fine‑tuning, monitored by early stopping and model checkpoints.
Achieved strong performance with training accuracy above 85% and steadily improving validation accuracy and loss, indicating good generalisation on unseen MRI scans.

ResNet50 Transfer Learning Medical Imaging Brain Tumor Detection TensorFlow / Keras ImageDataGenerator

View on GitHub →

US Airline Sentiment Analysis

NLP · Text Classification · Autoencoders + Logistic Regression

End-to-end sentiment analysis system on the “Twitter US Airlines Sentiment” dataset (~14k tweets), classifying customer feedback toward US airlines as positive, neutral, or negative to gauge service quality and pain points.

Cleaned raw tweets by removing URLs, mentions, hashtags, special characters, and extra whitespace, then normalised text (lowercasing) and applied SpaCy-based tokenisation, lemmatisation, and stopword removal.
Converted processed tweets into Bag‑of‑Words (BoW) features and trained a baseline Logistic Regression classifier, reaching ~76% validation accuracy and ~74% test accuracy.
Experimented with autoencoder architectures on BoW features to detect and remove anomalous / noisy samples via reconstruction error, then retrained Logistic Regression on the cleaned data to improve robustness.
Explored Random Forest classifiers as an alternative model, benchmarking validation and test accuracy and comparing performance and generalisation against the regularised Logistic Regression baseline.
Tuned L2 regularisation strengths, hidden layer sizes, activations, and dropout within the autoencoder‑plus‑LogReg pipeline to maximise validation accuracy and improve generalisation on unseen tweets.

NLP Sentiment Analysis Logistic Regression Random Forest Autoencoders SpaCy Scikit-learn

View on GitHub →

CNN vs ANN – CIFAR‑10 Image Classification

Deep Learning · CNN vs ANN · CIFAR‑10

Comparative study of Convolutional Neural Networks (CNNs) and standard Artificial Neural Networks (ANNs) on the CIFAR‑10 image dataset, demonstrating why CNNs are better suited for image classification by learning hierarchical spatial features.

Implemented a CNN with stacked Conv2D–BatchNorm–MaxPool blocks (32→64→128 filters), dropout, and a 512‑unit dense layer, ending in a 10‑class softmax classifier.
Built a baseline ANN with only fully connected layers (Flatten → 32 → 64 → 128 → 512 → 10) on raw pixel inputs to highlight the limitations of non‑convolutional architectures for images.
Trained both models on CIFAR‑10 and compared training/validation accuracy and loss curves across epochs, showing CNNs achieve higher validation accuracy and lower loss with better generalisation.
Used batch normalisation and dropout in the CNN to stabilise training and reduce overfitting, improving robustness versus the simpler ANN baseline.
Visualised results and sample predictions (e.g., cat vs dog) to make the performance gap between CNN and ANN interpretable for non‑experts.

CNN ANN CIFAR‑10 TensorFlow / Keras Batch Normalization Dropout Image Classification

View on GitHub →

Amazon iPhone Price & Review Scraper

Python · Web Scraping · Data Visualization

Python web scraping pipeline that extracts iPhone product names, prices, review counts, and product links from Amazon search results using requests, BeautifulSoup, and pandas, then exports the dataset to Excel and visualizes price–review trade-offs for iPhone 15 models.

Inspected Amazon’s HTML structure (developer tools, data-asin blocks, title and price spans) to reliably locate product containers and the nested elements containing names, prices, reviews, and links.
Implemented a scrape_amazon_page(url) function that sends an HTTP GET request with browser-like headers, parses the HTML via BeautifulSoup, and extracts structured lists of names, prices, reviews, and relative links.
Hardened parsing logic with null‑safe checks and sensible fallbacks (e.g., "Not Found") when expected elements are missing or layout slightly changes, improving robustness to minor DOM shifts.
Built a pagination loop that iterates over result pages, calls scrape_amazon_page for each URL variant, appends results to global lists, stops when no more products are found, and sleeps between requests to be polite to the server.
Loaded all scraped data into a pandas DataFrame and saved it as iphones.xlsx for analysis and sharing, then used Plotly to create an interactive scatter plot of Price vs Reviews with product name coloring and link hover‑tooltips.

Python Requests BeautifulSoup Pandas Plotly Web Scraping Data Engineering

View related code on GitHub →

Eggplant Leaf Disease Recognition (Farmer‑Assist AI)

Computer Vision · CNN · Plant Pathology

Deep learning system that classifies eggplant leaf images into seven disease classes (including healthy) using a custom CNN on the “Eggplant Disease Recognition” dataset, to help farmers choose the right treatment early.

Used the Kaggle Eggplant Disease Recognition dataset (392 images, 7 classes: Healthy Leaf, Insect Pest, Leaf Spot, Mosaic Virus, Small Leaf, White Mold, Wilt) and loaded it via image_dataset_from_directory at 512×512 resolution.
Built a TensorFlow tf.data pipeline with shuffling, train/val/test split, caching, and prefetching for efficient GPU training on Colab (T4), plus normalization via a Rescaling(1./255) layer and on‑the‑fly augmentation (random flips and rotations).
Designed a custom CNN: stacked Conv2D–MaxPool blocks (32→64→128→128→128→64 filters), followed by Flatten, a 512‑unit dense layer with ReLU and 0.5 dropout, and a final softmax layer over 7 disease classes.
Trained for 50 epochs with Adam and sparse categorical cross‑entropy, reaching ~81–82% test accuracy (loss ≈ 0.39) on held‑out data, with steadily improving train/validation curves after epoch ~20.
Saved the best performing model versions to disk (models/<version>.h5) for deployment in farmer‑facing tools (e.g., mobile or web apps) that can detect disease from a single leaf photo.

CNN TensorFlow / Keras Image Classification Plant Disease Data Augmentation Agriculture AI

View on GitHub →

Python Visualization – Matplotlib vs Seaborn vs Plotly

Data Visualization · Statistical Graphics · Interactive Dashboards

Comparative visualization project using the Iris dataset to showcase how Matplotlib, Seaborn, and Plotly differ in API, defaults, and interactivity when building pair plots, scatter plots, and histograms.

Used the Iris dataset and created the same core visuals (pair plot, sepal-length vs sepal-width scatter, petal-length histogram by species) across all three libraries for an apples‑to‑apples comparison.
With Matplotlib, built plots from low‑level primitives to highlight fine‑grained control over axes, legends, colors, and layout at the cost of more verbose code.
Leveraged Seaborn’s higher‑level APIs (e.g., pairplot, scatterplot, histplot) and built‑in themes to produce cleaner statistical graphics with minimal code, tightly integrated with pandas DataFrames.
Re‑implemented the same visuals in Plotly to produce fully interactive charts (hover tooltips, zoom, pan) suitable for web dashboards and embedded analytics.
Summarized trade‑offs: Matplotlib for maximum customization, Seaborn for fast & beautiful statistical plots, and Plotly for rich interactive visualizations in modern data apps.

Matplotlib Seaborn Plotly Iris Dataset Data Visualization Python

View on GitHub →

Emotion AI – Facial Keypoints & Expression Recognition

Computer Vision · CNN + ResBlocks · Multi‑task Emotion AI

Two‑stage Emotion AI system that first predicts dense facial keypoints from grayscale face images and then classifies facial expressions (angry, disgust, sad, happy, surprise) using CNNs with residual blocks on Kaggle datasets.

Part 1: Built a deep CNN with custom ResBlocks for facial keypoint detection, including image preprocessing (resize, normalization, noise reduction), augmentation, and training on a Kaggle facial keypoints dataset.
Implemented ResBlock architecture with convolution blocks and identity shortcuts (Conv2D → BatchNorm → ReLU, 3×3 kernels, max pooling, average pooling, and dense layers with dropout) to stabilize gradients and learn robust facial landmarks.
Visualized random faces with 30 keypoints overlaid using matplotlib to qualitatively validate landmark predictions before feeding them into downstream emotion models.
Part 2: Trained a CNN‑ResBlock classifier on an emotion dataset (5 classes: 0=Angry, 1=Disgust, 2=Sad, 3=Happy, 4=Surprise), including targeted augmentation to handle class imbalance for the low‑frequency “Disgust” class.
Combined both models into an end‑to‑end Emotion AI pipeline where the keypoint detector supports more accurate and stable facial expression classification, illustrating real‑world applications in healthcare, retail, education, in‑car monitoring, and AR.

Emotion AI Facial Keypoints CNN Residual Blocks Computer Vision Affective Computing

View on GitHub →

Skills & Stack

Technologies I use to design, build, and operate ML, GenAI, and data products at production scale.

Machine Learning & DL

Supervised & unsupervised learning, model evaluation, feature engineering on structured and unstructured data.
Deep learning with PyTorch / TensorFlow / Keras (CNNs, RNNs, transformers, sequence models).
Computer Vision: medical imaging, plant disease, emotion & facial keypoints, CIFAR‑10, transfer learning (ResNet50/101).
NLP: sentiment analysis, text cleaning, tokenization, lemmatization, SpaCy, BoW, Logistic Regression, Random Forest, autoencoders.
Model optimization & performance tuning (batching, latency reduction, quantization where applicable).

GenAI & LLMs

Prompt engineering, RAG, agentic workflows, evaluation of LLM‑powered systems.
LLM fine‑tuning / adaptation (LoRA / QLoRA) and task‑specific alignment.
Vector DBs & semantic search (Pinecone, Weaviate, FAISS), embeddings, retrieval pipelines.
LLM integration in production apps (multi‑agent orchestration, tools, context management).

MLOps & Backend

Python (advanced), FastAPI / Flask, REST APIs, microservices architecture.
MLflow, DVC, CI/CD for ML, model monitoring, drift detection, automated retraining.
Orchestration & scheduling with Apache Airflow and production pipelines for ETL + ML.
Docker, Kubernetes, cloud‑native deployment on GCP / Vertex AI, AWS (SageMaker, EC2, Lambda) & Azure ML.

Data & Platforms

SQL, BigQuery, Redshift, data pipelines and ETL/ELT for analytics and ML workloads.
Pandas, NumPy, Spark / PySpark, Databricks for large‑scale data wrangling and feature engineering.
Experiment tracking, A/B testing, metrics dashboards (Tableau, Power BI, custom reports).
Git, GitHub, Linux, Bash scripting and automation in Agile/Scrum environments.

Web & Frontend

HTML5, CSS3, responsive layout, component‑based portfolio sections and project cards.
JavaScript / TypeScript for interactivity, dynamic content, and API integration.
Embedding interactive plots and dashboards (Plotly, custom visual components).

Data Viz & Analytics

Matplotlib, Seaborn, Plotly for statistical and interactive visualizations (pair plots, histograms, dashboards).
Domain analytics for healthcare (clinical dashboards, cancer & risk models) and finance (sentiment & forecasting).
Storytelling with data and communicating model impact to business & clinical stakeholders.

Domain & Applied AI

Healthcare AI: cancer detection, brain tumor MRI, clinical decision support, risk prediction.
Agriculture AI: banana and eggplant disease detection for farmer decision‑support.
Finance & customer analytics: airline sentiment, banking/fintech models, enterprise GenAI solutions.

Web Scraping & Automation

Web scraping with Python, requests, BeautifulSoup (e‑commerce product data, pagination, error handling).
Data export and reporting (pandas DataFrames, Excel, analytics‑ready datasets).
Automation scripts for data ingestion, cleaning, and ML pipeline kicks.

Education

Formal education and structured learning that underpin my work in AI, ML, and cloud.

Artificial Intelligence & Machine Learning – Postgraduation

Fanshawe College · London, ON, Canada

CGPA 4.01 / 4.20 · Dean’s Honour Roll & President’s Honour Roll.

Postgraduate specialization in Artificial Intelligence & Machine Learning, emphasizing advanced deep learning, generative AI, and production-scale ML system design and deployment.

Bachelor of Engineering – Computer Engineering

Gujarat Technological University · Ahmedabad, GJ, India

CGPA 8.66 / 10 · Dean’s Honour Roll & First place in Data Science Hackathon.

Bachelor’s in Computer Engineering (AI-focused) with a strong grounding in computer systems, algorithms, data structures, and applied machine learning for real-world applications.

Certifications & Courses

Industry-recognized certifications and intensive courses that reinforce my skills in AI, ML, data, and security.

Artificial Intelligence – Google

Google · Online

Foundational program in Artificial Intelligence covering core AI concepts, use cases, and responsible AI principles.

Machine Learning – Google

Google · Online

Hands-on coursework in supervised/unsupervised learning, model evaluation, and ML best practices using Python.

SQL for Data Analysis

Online · Data & Analytics

Focused on writing analytical SQL, joins, window functions, and performance-aware queries for BI and ML workflows.

Data Analysis with Python

Online · Python & Analytics

Practical training in NumPy, Pandas, and data cleaning, including EDA and feature preparation for ML models.

Data Visualization Techniques

Online · Visualization

Covers Matplotlib, Seaborn, Plotly, and dashboard design principles for communicating insights effectively.

Tableau for Data Visualization

Online · BI & Dashboards

Interactive dashboards, calculated fields, and visual storytelling for business and stakeholder reporting.

Statistical Analysis with R

Online · Statistics

Core statistical methods, hypothesis testing, and regression analysis to support data-driven ML decisions.

Introduction to Data Science

Online · Data Science

End-to-end overview of the data science lifecycle: problem framing, data prep, modeling, and evaluation.

Excel for Data Analysis

Online · Productivity & Analytics

Data cleaning, pivot tables, and advanced formulas for quick exploratory analysis and reporting.

Ethical Hacking & Cyber Security

Online · Security

Covers core security principles, common vulnerabilities, and secure practices relevant to ML and web systems.

Bug Bounty Hunting & Facebook Security Quiz

Online · Application Security

Hands-on exposure to web security testing, vulnerability discovery, and platform security best practices.

Techie Guru · PLC Workshop

Workshop · Technology & Automation

Practical introduction to PLCs and industrial automation, complementing systems and controls knowledge.

Start-Ups and Entrepreneurship

Online · Business & Innovation

Covers startup fundamentals, product thinking, and go-to-market considerations for tech and AI products.

Let’s build something impactful.

Open to AI/ML Engineer, MLOps Engineer, Prompt Engineer, Data Scientist and GenAI Engineer roles in Toronto and remote.

I’m excited about roles where I can own the full lifecycle: understanding the business problem, designing ML/GenAI solutions, and shipping reliable systems that move real metrics in production.

Email Me Message on LinkedIn

Email dhruv@dhruvjoshi.co

Location

Toronto, Ontario, Canada

Links

GitHub · LinkedIn

AI is not just a tool for Automation; It's an enabler for Augmentation.

❝ I have always been convinced that the only way to get artificial intelligence to work is to do the computation in a way similar to the human brain. ❞ -- Geoffrey Hinton

Professional Experience

ML Engineer

ML Engineer

AI Engineer

Finance AI Analyst

Internship Experience

Data Analyst

Data Science Intern

Selected Projects

VoiceGPT – Interactive Voice & Text AI Assistant

Banana Plant Disease Detection

Healthcare AI – Brain Tumor Detection from MRI

US Airline Sentiment Analysis

CNN vs ANN – CIFAR‑10 Image Classification

Amazon iPhone Price & Review Scraper

Eggplant Leaf Disease Recognition (Farmer‑Assist AI)

Python Visualization – Matplotlib vs Seaborn vs Plotly

Emotion AI – Facial Keypoints & Expression Recognition

Skills & Stack

Machine Learning & DL

GenAI & LLMs

MLOps & Backend

Data & Platforms

Web & Frontend

Data Viz & Analytics

Domain & Applied AI

Web Scraping & Automation

Education

Artificial Intelligence & Machine Learning – Postgraduation

Bachelor of Engineering – Computer Engineering

Certifications & Courses

Artificial Intelligence – Google

Machine Learning – Google

SQL for Data Analysis

Data Analysis with Python

Data Visualization Techniques

Tableau for Data Visualization

Statistical Analysis with R

Introduction to Data Science

Excel for Data Analysis

Ethical Hacking & Cyber Security

Bug Bounty Hunting & Facebook Security Quiz

Techie Guru · PLC Workshop

Start-Ups and Entrepreneurship

Let’s build something impactful.

AI is not just a tool for Automation;
It's an enabler for Augmentation.