SPRINT is an open-source GitHub application that acts as an issue management assistant for developers, project managers, computer science students, and educators. SPRINT has 3 features:
- Identifying similar issues
- Predicting issue severity
- Locating potential buggy code files
SPRINT can be installed as a GitHub app on any GitHub repository. The installation link of SPRINT is given below, using which a user can install SPRINT on one or many repositories.
This repository hosts the code, resources, and supporting materials for the SPRINT Tool. It is organized into the following folders:
This folder contains all materials necessary to replicate the experiments, evaluations, and studies conducted for SPRINT. It is further divided into the following subfolders:
- Evaluation: Includes model fine-tuning scripts, evaluation guidelines, and results for the three main features of SPRINT.
- SPRINT Test Cases: Provides sample test cases to test the three features of SPRINT.
- User Study: Contains the user study questionnaire and survey results related to SPRINT.
This folder contains the core codebase for the SPRINT tool. The accompanying README.md
file provides detailed instructions on how to run and customize SPRINT to suit your requirements.
This folder holds images and other utility files used throughout the repository, including visuals for documentation purposes.
Feel free to explore these folders for a comprehensive understanding of SPRINT and its functionality.
When a new issue is reported, SPRINT fetches that issue and analyzes it. After analysis, SPRINT generates comments and labels for its three features:
-
Similar Issue Detection:
SPRINT generates a comment containing the list of potential similar issues' (if any) ID, title, and URL. Users can click on that URL to inspect the issues further. If one or more similar issues exist, SPRINT labels the reported issue as "Duplicate". -
Severity Prediction:
SPRINT classifies the reported issue into one of five severity levels:- Blocker: Issue stops all operations; requires immediate resolution.
- Critical: Issue causes major failure; disrupts core functionality.
- Major: Issue affects primary features but has workarounds.
- Minor: Issue impacts secondary features; low operational impact.
- Trivial: Issue has minimal or cosmetic effects only.
After classifying the severity level, SPRINT creates a label of that severity and attaches it to the reported issue.
-
Bug Localization:
SPRINT generates a comment containing a list of code files along with their URLs that likely require modification to solve the issues. The code file URLs can take the users to the respective code files for further inspection.
SPRINT is implemented using Python's Flask framework under the following architecture:
-
Issue Indexer:
- Fetches and stores existing issues in a local database for efficient access and analysis.
- Applies page-based indexing to partition issues for efficient fetching.
-
GitHub Event Listener:
- Monitors new issues using GitHub Webhooks and fetches them for processing.
- Sends the reported issues and code files to other components for further analysis.
- Formats the output of the feature components and send them back to GitHub.
-
Issue Management Component:
- Identifies duplicate issues by analyzing textual similarity.
- Classifies reported issues into five severity levels: blocker, critical, major, minor, or trivial.
- Predicts potential buggy code files that might require modification to solve the issue.
-
Other Utilities:
- Process Pool Executor: Enables multiprocessing to analyze issues concurrently for faster processing.
- Data Storage: SPRINT uses a local relational database to store and index issues for efficient fetching and synchronization with GitHub.
- Model Library: Leverages fine-tuned machine learning models for text analysis and predictions.
We are more than happy to receive your contributions (any kind of contributions). If you have an idea of a feature or enhancement, or if you find a bug, please open an issue (or a pull request). If you have questions, feel free to contact us: Ahmed Adnan (bsse1131@iit.du.ac.bd), Antu Saha (asaha02@wm.edu), and Oscar Chaparro (oscarch@wm.edu)
SPRINT is a tool for bug report duplicate detection, severity prediction and bug localization. A user can run SPRINT and customize it by following the instructions given below. We have also made our .env file public so that users can get an idea of which variable names to use and which values are required in those variables.
Step 1:
Clone the repository
Step 2:
Download the Models
You can download our fine-tuned models for the 3 features from here: models.
After downloading, put them in your preferable location and add the location path (the path of the downloaded folders with feature names; e.g. 'modelDupBr', 'modelPrioritySeverity') in the .env
file. Add model paths for each of the 3 features in the .env
file in variables ''DUPLICATE_BR_MODEL_PATH', 'SEVERITY_PREDICTION_MODEL_PATH', 'BUGLOCALIZATION_MODEL_PATH'.
You can also use your own fine-tuned models. You just need to add your model path in the .env
file.
[n.b. - The bug localization model (Llama-7b-chat-finetune) requires a GPU of the ampere family to load the shards to run, the entire project and the models require about 20gb of space]
Step 3:
Install ngrok from (https://ngrok.com/download) [This will create a secure tunnel from a public endpoint (Github repository) to a locally running network service (our project running in localhost)]
Step 4:
Create a new GitHub application. You need to go to the following path:
Settings -> Developer's Settings -> New GitHub App
Make sure in ‘Repository Permissions’ section of the GitHub application, there is Read and Write access to ‘Actions’, ‘Webhooks’ and ‘Issues’. After saving the GitHub application, there will be an option to Generate a private access token (this token will enable permission for SPRINT to fetch and post data to a user’s Github repositories). Generate this token and then copy and paste app id, client id, and github private access token/private key to the .env
file of the cloned code.
Step 5:
Open the cloned project in IDE and install the required dependencies. You can use our requirements.txt file for this. Then, run the following 2 commands in 2 different terminals:
ngrok http 5000
python main.py
# or
python -m main
Step 6:
Go to the repository where you need to run the tool. Go to -
Settings -> Webhooks -> Add Webhook
Then copy the forwarding address shown after running the command ngrok http 5000
or ./ngrok http 5000
(if ngrok.exe is in your SPRINT Tool folder) into the Payload URL section of Add Webhook.
Make sure ‘Which events would you like to trigger this webhook?’ section has ‘Issues’, ‘Issue Comments’ and ‘Labels’ checkboxes checked
Step 7:
Create issues in that repository and see SPRINT work
SPRINT provides three features: Duplicate Issue Detection, Severity Prediction, and Bug Localization. Each feature is implemented as a Python function-based API and can be used within your project. Below is a guide on how to interact with these APIs, the expected inputs, outputs, and how to modify or customize their behavior.
DuplicateDetection(sent1, sent2, issue_id)
Compares a new issue with an existing one to detect duplicates based on textual similarity.
sent1
: String. The title or description of the new issue.sent2
: String. The title or description of the existing issue to compare against.issue_id
: Integer. The ID of the issue being compared.
- Returns: Integer
1
: Duplicate.0
: Not a duplicate.
- Model Path: Update the
DUPLICATE_BR_MODEL_PATH
environment variable in.env
to change the pre-trained model. - Model Hyperparameters: Modify the tokenizer settings (
max_length
,padding
) or replace the model architecture if needed. - Parallel Processing: The APIs support multiprocessing for faster execution using a multiprocessing pool. Customize the
chunkify
logic or the number of processes (processes=4
) to suit your system’s capabilities.
SeverityPrediction(input_text)
Predicts the severity level of a reported issue based on its textual content.
input_text
: String. The combined title and description of the issue.
- Returns: String. One of the following severity levels:
Blocker
,Critical
,Major
,Minor
,Trivial
.
- Model Path: Update the
SEVERITY_PREDICTION_MODEL_PATH
in.env
. - Severity Classes: Adjust the severity classification mapping in
GetSeverityPriorityClass
if custom labels are needed:severity_classes = { 0: "Blocker", 1: "Major", 2: "Minor", 3: "Trivial", 4: "Critical", }
BugLocalization(issue_data, repo_full_name, code_files_list)
Predicts the most likely buggy code files that might require modification to fix the issue.
issue_data
: String. The combined title and description of the issue.repo_full_name
: String. The repository’s full name (e.g.,org/repo
).code_files_list
: List of Strings. Paths to all code files in the repository.
- Returns: List of Strings. File paths for the top 5–6 predicted buggy files.
- Model Path: Update the
BUGLOCALIZATION_MODEL_PATH
in.env
. - Prompt: Modify the
prompt
string in the function to adjust the question or context provided to the model. - Quantization Settings: Fine-tune the
BitsAndBytesConfig
if you need to optimize model performance for specific hardware.
All three features rely on pre-trained models and their paths are defined in .env
. SPRINT's three features can support many transformer-based models and LLMs. Update the following environment variables to add your customized model paths:
DUPLICATE_BR_MODEL_PATH
SEVERITY_PREDICTION_MODEL_PATH
BUGLOCALIZATION_MODEL_PATH
To use custom models:
- Fine-tune your models for tasks like classification or text similarity.
- Save the models to a local directory.
- Update the corresponding model paths in the
.env
file.
SPRINT is designed to be modular and extensible, allowing developers to easily add new features. This guide provides a brief overview of how to create a new feature as a functional API and integrate it into SPRINT.
Identify the new functionality you want to add. Clearly define:
- Purpose: What problem does the feature solve?
- Inputs: What data does it require?
- Outputs: What will the feature return or produce?
-
Set Up the Model/Logic
- If the feature requires a machine learning model, train or fine-tune a model specific to the task.
- Save the model and its tokenizer in a local directory.
- Define the model's path in the
.env
file for easy configuration.
-
Implement the API Write a Python function that encapsulates the feature's logic. Use SPRINT's existing APIs as templates. Ensure:
- The function accepts clear input parameters.
- The function processes the inputs and produces outputs efficiently.
- Proper error handling is included.
-
Integrate the New Feature into SPRINT
Update the Process LogicModify the
processIssueEvents.py
file to include calls to the new feature API. All the GitHub issues after fetching can be used from this code file according to the requirements. Example:# Call the new feature new_feature_result = NewFeature(input_issue_data) create_comment(repo_full_name, issue_number, new_feature_result)
-
Add Configuration
Add environment variables for the new feature in the .env file (e.g., model paths, hyperparameters).
-
Update Outputs
Decide how the results from the new feature will be presented. For example:
- Add comments to GitHub issues.
- Attach labels based on the feature's output.