By Daniel Emery
This is a Python tool for Streamlit that automates redirect mappings during site migrations by matching URLs from an old site to a new site based on content similarity. Users can interactively choose the relevant columns from their CSV data for URL matching through a web interface.
faiss-cpu
: A library for efficient similarity search and clustering of dense vectors.sentence-transformers
: A Python framework for state-of-the-art sentence, text, and image embeddings.pandas
: An open-source data manipulation and analysis library.streamlit
: An open-source app framework for Machine Learning and Data Science projects.
- Prepare
origin.csv
anddestination.csv
with the page URL in the first column, followed by titles, meta descriptions, and headings. Remove unwanted URLs and duplicates. - Run the Streamlit app using the Redirect Matchmaker Script.
- Upload the origin.csv and destination.csv files through the Streamlit interface.
- Select columns for matching using the interactive Streamlit widgets.
- Click "Run" to initiate the matching process.
- The app will process the data and create
output.csv
with matched URLs and a similarity score. - Review and manually correct any inaccuracies in
output.csv
directly in the app. - Download
output.csv
directly from the Streamlit app interface.
Note: Ensure only URLs with a 200 status code and without UTM parameters, etc., are used.