Saish's Page

👨‍💻 About Me

AI Software Engineer holding a Master’s in Information Management (Data Science & Analytics) from University of Illinois at Urbana-Champaign. Experienced in building RAG pipelines, vector indexes, and LLM-powered applications on GCP and AWS. Key skills include NLP, Prompt Engineering, Multimodal Data Ingestion, Machine Learning, Deep Learning, and Production-grade Data Pipelines, leveraging Python, PyTorch, TensorFlow, SQL, and Cloud-native tools. Passionate about designing and productionizing AI systems enabling semantic search, retrieval, and agentic applications.

🚀 LLM Journey

I have gained hands-on experience with RAG (Retrieval-Augmented Generation), Prompt Engineering, and vector-based retrieval systems, building 50+ vector indexes and 20+ ETL pipelines using Vertex AI and Elasticsearch to power enterprise-scale conversational applications and agentic AI solutions. Through my academic work, I developed expertise in language modeling, neural machine translation, text summarization, and dialogue systems, implementing projects with PyTorch, TensorFlow, Hugging Face, NLTK, spaCy, and LangChain, including a medical chatbot for dental queries. I have also conducted RAG application testing, developed multimodal data ingestion frameworks, and co-authored research on dialogue system architectures, continuously enhancing pipelines with state-of-the-art techniques to improve retrieval performance and response quality.

🤖 Generative AI Experience

Multimodal Ingestion Framework: Designed a self-serve, real-time framework using Agile methodologies to ingest multimodal data; implemented Gemini-based extraction for single-pass parsing of text, tables, and images.
RAG & Vector Search: Deployed 13M-vector Elasticsearch databases and built Vertex AI indexes powering 50+ conversational apps; engineered JWT-secured APIs for hybrid search testing.
Pipeline Engineering: Led a team deploying 20+ ETL pipelines with automated CI/CD (Jenkins, Docker, Cloud Composer); optimized CSV processing time by 80% via custom microservices.
Cloud-Native Architecture: Built autoscaling Cloud Run services and Vertex AI embedding modules with rate limiting and parallel processing for scalable workflows.
Observability & Evaluation: Implemented BigQuery document-level tracking for pipeline observability; optimized response quality using Galileo Evaluate and advanced chunking strategies.

🛠 Technical Skills

Domain	Skills
Programming	Python (PyTorch, TensorFlow, pandas, scikit-learn, NLTK, spaCy, NumPy, Matplotlib, pymoo), R, SQL, PySpark
Tools & Platforms	Jupyter Notebook, RStudio, SQLite, PostgreSQL, Git, Tableau, HuggingFace, Airflow, Doccano, Hadoop, Spark, Weights & Biases, LangChain, OpenAI, Ollama
AWS Cloud	S3, Glue, Lambda, Athena, EC2, SageMaker, Bedrock
GCP Cloud	Cloud Storage, BigQuery, Cloud Run, Vertex AI, Cloud Composer

🔬 Research

Assessment of NER Tools for detecting Funding Organizations (Information Quality Lab, UIUC)

Named Entity Recognition (NER) is a key element within the Natural Language Processing (NLP) pipeline of information extraction. NER helps discover valuable insights from textual documents by detecting entities mentioned in unstructured text and categorizing them into predefined categories such as person, organization, location, date, etc. In the past few years, the developers within the NLP community have developed some NER tools for detecting entities such as the organization names. The role of research funders in science is important. In order to better understand NER tools’ accuracy in identifying sponsors in the research funding domain, further research is needed to analyze research funding acknowledgement statements. My Research explores how well existing NER tools recognize funding organizations. Specifically, the most common existing NER tools have been evaluated for their performance to identify scenarios that need improvement, which will enable new research pertaining to Named Entity Recogniton in the research funding field.

🔗 View Repository

🏗 Projects

🧠 NLP and LLM Projects

Dental AI Assistant: A Medical Chatbot built using Retrieval QA chain and Prompt Tuning (NLP)
Privacy Policy Document Summarization
Neural-Machine-Translation-of-sentences-from-English-to-French
Disaster Tweet Classifier
Character Analysis using Sentiment Score