This project involves developing a comprehensive data-driven system using MongoDB to evaluate, analyze, and rank recipes based on complexity and ingredient substitutions.
This project uses publicly available datasets that do not contain any personally identifiable information (PII). All user reviews and recipe data used are anonymized. Any resemblance to real persons, living or dead, or actual recipes is purely coincidental.
Any external resources or references used have been properly cited. Unauthorized copying or distribution of this project, in whole or in part, is strictly prohibited and would be penalized in brutal ways.
- Project Overview
- Data Source
- Queries
- Query 1: Ingredient Substitution Identification
- Query 2: Recipe Complexity Calculation
- Query 3: Utensil Extraction and Wash Time Calculation
- Results
- Usage
- References
Recipes Dataset: Contains 522,517 recipes with details such as cooking times, servings, ingredients, nutrition, and instructions. Reviews Dataset: Contains 1,401,982 reviews from 271,907 users, including author information, ratings, and review text. The dataset is sourced from Kaggle. Link:
Objective: Identify and list ingredient substitutions from reviews. Method: Use $lookup to join reviews with recipes on RecipeId. Extract words before and after keywords like "substitute," "replaced," and "instead" from review text. Check if these words are ingredients and filter out non-ingredient words. Return unique pairs of before and after ingredients. Result: Unique pairs of substituted ingredients. Columns Used: RecipeId, Review, RecipeIngredientParts
Objective: Compute a complexity score for recipes. Method: Split RecipeInstructions into steps and count unique action words. Calculate complexity score based on number of steps and unique actions. Group by RecipeCategory and calculate average complexity score and rating. Result: Average complexity scores and ratings for each recipe category. Columns Used: RecipeId, RecipeCategory, RecipeInstructions, AggregatedRating
Objective: Determine the complexity of washing utensils used in recipes. Method: Use regex to find utensils in RecipeInstructions. Map each utensil to a specific wash time. Group recipes by RecipeCategory and AuthorName. Calculate total and maximum wash time for each recipe. Result: Total wash times and complexity levels for each recipe. Columns Used: RecipeId, Name, RecipeCategory, AuthorName, RecipeInstructions
Substitution Analysis: Provides insights into common ingredient substitutions used by cooks. Complexity vs. Rating: Visualizes the relationship between recipe complexity and user ratings. Wash Time Complexity: Breaks down the cleaning effort required for different recipes, categorized by author and type.