Inference Engineer based in New York.

I'm Prem — an inference engineer focused on low-latency LLM serving, multi-model routing, and cost-efficient production AI infrastructure. I've deployed scalable AI systems on GCP and AWS with a hard bias toward latency, reliability, and the cost-latency-accuracy tradeoff — not whichever framework is trending this quarter.

Right now I'm finishing my M.S. in Artificial Intelligence at the Rochester Institute of Technology and shipping Emotion Engine — a research project showing that emotion-like internal states (fear, grief, suspicion) emerge from a 72-feature predictive LSTM with no hardcoded rules (p = 3.3e-113 across 205K agent-step records). Before RIT, I built a production LLM routing layer at Concentrix + Webhelp — 3+ foundation models, 1.5s end-to-end, 18% cost reduction, 42% fewer incidents.

The throughline of everything on this site: Simple systems scale better than clever ones.

Education

  • M.S. in Artificial IntelligenceRochester Institute of Technology, Aug 2024May 2026
  • B.Tech in Computer Science and EngineeringNational Institute of Technology, Silchar, Aug 2020May 2024

Experience

  • Generative AI Engineer, Concentrix + Webhelp Feb 2024Jul 2024
  • Data Science Intern, AlphaBits Technologies Aug 2023Jan 2024
  • ML Engineer Intern, iNeuron.ai Jun 2023Aug 2023

Publications

Certifications

Backend & Programming
Python ·SQL ·FastAPI ·REST APIs ·PostgreSQL ·MongoDB ·PySpark
Distributed Systems & Data
Kafka ·Streaming pipelines ·Feature stores ·Data pipelines ·Vector DBs ·System design
Cloud & Infrastructure
GCP ·AWS (EC2, S3, Bedrock, SageMaker) ·Docker ·Kubernetes ·Terraform ·CI/CD ·MLflow
Machine Learning
PyTorch ·TensorFlow ·Scikit-learn ·LLMs ·RAG ·NLP ·Computer Vision ·MCP