I'm Raj Shaikh
Data Scientist | Mathematics & Statistics Enthusiast | Open-Source Believer
Hello! Iโm a Data Scientist with a solid foundation in Mathematics, Statistics, and Software Engineering. I love diving into data, uncovering insights, and sharing solutions with the open-source community. Youโll often find me exploring new frameworks, building end-to-end data pipelines, or experimenting with the latest AI/ML techniques.
- Daily AI/ML Blogs: I share my thoughts and experiments on AI/ML topics at learn.mathnai.com
- Strong believer in community: I think we can achieve a lot more together by sharing knowledge.
Technologies: OpenAI API, PostgreSQL, DuckDuckGo API, Llama Index
- Built an intelligent, multi-agent LLM chatbot to automate travel planning, including personalized itineraries, real-time booking, and weather updates.
- Achieved 90% accuracy in travel-related intent recognition and integrated third-party APIs for flight and hotel bookings.
- Links: GitHub | Article
Technologies: TensorFlow, sklearn, LIME, SHAP
- Developed a machine learning model using Multilayer Perceptron (MLP) to predict credit card approvals with 80% accuracy.
- Implemented LIME and SHAP to explain model predictions, enhancing transparency and trust.
- Links: GitHub | Article
Technologies: OpenAI Gym, Pandas
- Designed a dynamic treatment allocation model to optimize cancer treatment outcomes in clinical trials.
- Achieved a 10% increase in success rates using an epsilon-greedy multi-armed bandit algorithm while reducing trial costs.
- Links: GitHub | Article
โ๏ธ MLOps CI/CD Pipeline Setup
Technologies: GitHub Actions, Docker, MLflow, DVC
- Implemented a CI/CD pipeline to automate linting, testing, and deployment of a machine learning model to AWS EC2.
- Tracked experiments with MLflow and versioned datasets with DVC for seamless collaboration and reproducibility.
- Links: GitHub | Article
- ๐ Data Analytics: Advanced visualization and storytelling through data.
- ๐ค Machine Learning: Predictive models and optimization techniques.
- ๐ง Deep Learning: Neural networks and generative models.
- ๐ฃ๏ธ Natural Language Processing: Language models, NER, and sentiment analysis.
- ๐ Statistics: Statistical methods and hypothesis testing.
Languages & Tools ๐ง
- Python, Java, SQL
- Git, Docker, Jenkins, dbt
- Snowflake, Databricks, AWS, Azure
Machine Learning & Deep Learning ๐ค
- Supervised & Unsupervised Learning
- Ensemble Methods (Random Forest, XGBoost)
- Neural Networks (CNNs, RNNs), Transfer Learning
- Autoencoders, Graph Neural Networks
- Model Optimization & Generative Models
NLP & Large Language Models (LLM) ๐ฃ๏ธ
- Named Entity Recognition (NER), Sentiment Analysis
- Language Modelling, BERT, GPT
- Parameter-Efficient Fine-Tuning (PEFT), LoRA, RAG
- Feature Extraction, Topic Modeling
Data Engineering ๐๏ธ
- PySpark, Azure Databricks
- Azure Data Factory, Power BI
- Snowflake (Data Warehousing)
Frameworks & Libraries ๐
- scikit-learn, numpy, pandas
- TensorFlow, Keras, PyTorch
- gensim, NLTK, SpaCy
- Flask, Django
- SHAP, LIME
Statistics & Optimization ๐
- Regression Models, Hypothesis Testing
- Dimensionality Reduction (PCA, t-SNE)
- Time Series Analysis, Feature Engineering
Tools & Platforms โ๏ธ
- AWS, Azure, Heroku
- Hugo (Static Site Generator)
- Jira, Confluence
โSharing knowledge and insights is what drives progress in the AI community.โ