Sourabha KK - Machine Learning Engineer

About Me

I am a Machine Learning Engineer and Data Scientist focused on building end-to-end ML systems that deliver real business impact. My expertise spans data analysis, feature engineering, and deploying reliable production models.

I specialise in transforming complex data into actionable insights, bridging experimentation with production-ready systems, and building ethical, trustworthy AI solutions. Currently seeking ML and data science opportunities in the UK.

Currently seeking opportunities in the UK where I can contribute to impactful ML projects and collaborate with talented teams solving real-world challenges.

Technical Skills

🧠 Machine Learning & Data Science

Supervised Learning (Classification, Regression) Feature Engineering & Selection Model Evaluation (Accuracy, Precision, Recall, F1, ROC-AUC) Train/Test Splits & Cross-Validation Bias–Variance Trade-offs Model Explainability (Feature Importance, SHAP) Data Leakage Prevention Statistical Analysis

🛠️ ML Engineering & Pipelines

End-to-End ML Pipelines Leakage-Safe Preprocessing Test-Driven Development (TDD) for ML Modular ML System Design Deterministic & Reproducible Training Model Versioning & Release Practices Error Handling & Input Validation

🧪 Testing & Code Quality

Pytest (Unit & Behavioral Tests) RED → GREEN → REFACTOR Workflow Test Isolation with Synthetic Data Regression Prevention via Test Coverage Type Hinting & Static Analysis Clean Code & Refactoring

📊 Data Processing & Feature Engineering

Data Validation & Schema Enforcement Numerical Feature Scaling Categorical Encoding (One-Hot Encoding) Derived Feature Creation (Ratios, Bucketing) Handling Missing & Invalid Data Pandas & NumPy

🤖 Models & Algorithms

Random Forests Logistic Regression Tree-Based Models Ensemble Methods Baseline Model Design Hyperparameter Tuning

⚙️ Tools & Libraries

Python Scikit-learn Pandas NumPy Pytest Git & GitHub Virtual Environments (venv)

📦 Software Engineering Practices

Version Control (Git) Semantic Commits Project Structuring Documentation (README-driven development) Reproducible Environments Debugging & Profiling

📈 Evaluation & Interpretability

Classification Reports Confusion Matrices ROC Curves & AUC Feature Importance Analysis Business-Oriented Model Interpretation

🧠 Problem-Solving & Methodology

Translating Business Problems to ML Tasks Assumption Identification & Validation Iterative Development Technical Trade-off Analysis Production-Oriented Thinking

Featured Projects

Customer Churn Prediction ML Pipeline

Problem: High customer attrition impacting revenue retention

Built an end-to-end ML pipeline with test-driven development (79 tests) to predict customer churn. Implemented leakage-safe preprocessing (fit on train, transform on test), deterministic training with fixed random seeds, and modular production-oriented architecture. Used RandomForest classifier with feature importance analysis for interpretability.

Python Scikit-learn RandomForest Pytest TDD

Key Outcome

Identified high-risk customers 3 months in advance, enabling targeted retention campaigns with potential savings of 250K annually.

View on GitHub

NLP Complaint Classification with BERT

Problem: Manual review and prioritisation of customer complaints at scale

Developed a transformer-based NLP system using BERT to automatically classify and prioritise customer complaints. Fine-tuned pre-trained BERT models on domain-specific complaint data to achieve high accuracy in multi-class classification, enabling automated routing and priority assignment.

Python PyTorch BERT Transformers NLP

Key Outcome

Automated complaint classification and prioritisation, reducing manual review time and enabling faster response to high-priority customer issues. Improved customer satisfaction through intelligent routing to appropriate departments.

View on GitHub

ML Model Deployment with FastAPI

Problem: Deploying ML models for real-time inference in production environments

Implemented production-style deployment of machine learning models as a REST API using FastAPI. Created scalable, high-performance endpoints for real-time inference with proper error handling, input validation, and API documentation. Demonstrates MLOps best practices for model serving.

Python FastAPI Docker REST API MLOps

Key Outcome

Built production-ready API infrastructure for ML model serving with automatic documentation, request validation, and containerisation. Enabled seamless integration of ML models into production applications with low latency and high reliability.

View on GitHub

Time Series Demand Forecasting

Problem: Inaccurate demand predictions leading to inventory inefficiencies

Developed a time-series forecasting system to predict demand patterns using both statistical and deep learning approaches. Implemented multiple forecasting models including ARIMA, Prophet, and LSTM networks to capture seasonal trends, cyclical patterns, and external factors affecting demand.

Python TensorFlow LSTM ARIMA Prophet

Key Outcome

Improved forecast accuracy through ensemble methods combining statistical and deep learning models. Enabled better inventory planning and reduced stockouts by providing reliable demand predictions across multiple time horizons.

View on GitHub

ML & Data Engineering Approach

Feature Engineering

I believe great models start with great features. I invest time in understanding domain context, creating meaningful transformations, and validating feature importance. Every feature must earn its place through rigorous evaluation.

Model Evaluation

Accuracy alone is never enough. I select metrics aligned with business objectives, implement proper cross-validation strategies, and test models on realistic scenarios. Understanding when a model fails is as important as knowing when it succeeds.

Data Leakage Prevention

I'm vigilant about temporal integrity and information leakage. Features are engineered with production constraints in mind, ensuring train-test splits respect time boundaries and that no future information contaminates predictions.

Production Readiness

Models must perform reliably in production. I design for scalability, implement comprehensive error handling, monitor performance drift, and ensure models can be retrained and redeployed with minimal friction.

Ethical AI & Privacy

I'm committed to responsible AI development. This means respecting data privacy, identifying and mitigating bias, ensuring model transparency where required, and considering the broader impact of automated decisions on people.

Continuous Learning

The ML landscape evolves rapidly. I stay current with research, experiment with new techniques, and maintain a pragmatic approachadopting innovations when they solve real problems, not just for novelty.

Professional Experience

Web Development Intern

ShashWatt Energy February 2024 - July 2024 Hubli, India

Redesigned and optimized the company's Wix-based website, improving mobile responsiveness and page load speed by 35%
Collaborated with the design and marketing teams to enhance content layout and visual appeal, resulting in a 25% increase in user engagement and lead queries

Education

MSc in Applied Artificial Intelligence

University Name 2025 - 2026

Pursuing advanced studies in applied AI, focusing on cutting-edge machine learning techniques, deep learning architectures, and real-world AI system deployment. Specializing in production-ready AI solutions with emphasis on scalability, ethics, and business impact.

BTech in Computer Science and Information Technology

University Name 2020 - 2024 CGPA: 8.40/10.0

Strong foundation in computer science fundamentals, data structures, algorithms, and software engineering. Developed expertise in machine learning, data analytics, and modern development practices. Completed projects in predictive modeling, NLP, and data-driven applications.

Get In Touch

sourabha.kallapurk@gmail.com

github.com/SourabhaKK

linkedin.com/in/sourabhakk

United Kingdom

Turning Data Into Decisions

About Me

Technical Skills

🧠 Machine Learning & Data Science

🛠️ ML Engineering & Pipelines

🧪 Testing & Code Quality

📊 Data Processing & Feature Engineering

🤖 Models & Algorithms

⚙️ Tools & Libraries

📦 Software Engineering Practices

📈 Evaluation & Interpretability

🧠 Problem-Solving & Methodology

Featured Projects

Customer Churn Prediction ML Pipeline

NLP Complaint Classification with BERT

ML Model Deployment with FastAPI

Time Series Demand Forecasting

ML & Data Engineering Approach

Feature Engineering

Model Evaluation

Data Leakage Prevention

Production Readiness

Ethical AI & Privacy

Continuous Learning

Professional Experience

Web Development Intern

Education

MSc in Applied Artificial Intelligence

BTech in Computer Science and Information Technology

Get In Touch