WebSED: Web-Based Sound Event Detection

Supervisors: Tuomas Virtanen, Toni Heittola
Author: Bilal Lamsili

Overview

WebSED is a web-based application that implements Sound Event Detection (SED) using Pretrained Audio Neural Networks (PANNs). It performs real-time audio tagging and event detection directly in the browser, leveraging the ONNX model format for fast inference. The application ensures user privacy by minimizing backend dependencies.

Note: This project builds on the work of Qiuqiang Kong's Pretrained Audio Neural Networks (PANNs). For more details on the PANNs architecture, refer to the official repository.

Try the Application

You can try or use the WebSED application by visiting the following link:
WebSED Live Application

Features

Real-time sound event detection in the browser.
Powered by PANNs for high-accuracy audio tagging.
Utilizes the ONNX Runtime for fast and efficient model inference.
Minimal backend reliance, ensuring user privacy.

Model Conversion to ONNX

Prerequisites

Before converting the model, make sure you have the following dependencies installed:

torch
onnx
Pretrained model weights (Cnn14_DecisionLevelMax.pth)

Conversion Steps

To convert the model to the ONNX format, follow these steps:

Create a Python script convert_to_onnx.py with the following content:

import torch
import warnings
from pytorch.models import Cnn14_DecisionLevelMax  # Adjust this path if needed

# Override torch.log10 globally to ensure ONNX compatibility
if hasattr(torch, 'log10'):
    torch.log10 = lambda x: torch.log(x) / torch.log(torch.tensor(10.0))

# Suppress ONNX export warnings related to unsupported operators
warnings.filterwarnings("ignore", category=UserWarning, module="torch.onnx")

def export_model_to_onnx():
    # Initialize the model with required parameters
    model = Cnn14_DecisionLevelMax(
        sample_rate=32000,
        window_size=1024,
        hop_size=320,
        mel_bins=64,
        fmin=50,
        fmax=14000,
        classes_num=527
    )

    # Load the checkpoint file (update the file path as necessary)
    checkpoint = torch.load('Cnn14_DecisionLevelMax_mAP=0.385.pth', map_location='cpu')
    model.load_state_dict(checkpoint['model'])
    model.eval()

    # Define a dummy input for the model (adjust size if needed)
    dummy_input = torch.randn(1, 320000)  # Example input size for 32kHz audio

    # Export the model to ONNX format with opset version 12
    torch.onnx.export(
        model,
        dummy_input,
        "Cnn14_DecisionLevelMax.onnx",
        export_params=True,
        opset_version=12,
        input_names=['input'],
        output_names=['clipwise_output', 'framewise_output'],
        dynamic_axes={'input': {1: 'audio_length'}}  # Allows variable input lengths
    )

    print("Model successfully converted to ONNX format.")

if __name__ == "__main__":
    export_model_to_onnx()

Run the script to generate the ONNX model:
```
python convert_to_onnx.py
```

This will create the ONNX model file Cnn14_DecisionLevelMax_mAP=0.385.onnx, which you can use in the web application.

Local Development Setup

Requirements

To set up the project locally, ensure you have the following installed:

Node.js
Vue.js
ONNX Runtime for Web for running the model inference in the browser.

Installation

Clone the repository:

git clone https://github.com/TUT-ARG/WebSED
cd WebSED

Install the project dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```
Open your browser and navigate to http://localhost:5173 to interact with the web application.

Usage

Once the development server is up and running, you can interact with the WebSED interface to perform real-time sound event detection. The web app takes in audio input, processes it through the ONNX model in the browser, and displays the detected sound events.

Example

Start the server using the command npm run dev.
Access the application via your browser at http://localhost:5173.
Provide an audio input through your microphone or an audio file.
The detected sound events will be displayed in real-time.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
public		public
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
copyWasm.cjs		copyWasm.cjs
favicon.ico		favicon.ico
index.html		index.html
jsconfig.json		jsconfig.json
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebSED: Web-Based Sound Event Detection

Overview

Try the Application

Features

Model Conversion to ONNX

Prerequisites

Conversion Steps

Local Development Setup

Requirements

Installation

Usage

Example

License

About

Releases

Packages

Contributors 2

Languages

License

TUT-ARG/WebSED

Folders and files

Latest commit

History

Repository files navigation

WebSED: Web-Based Sound Event Detection

Overview

Try the Application

Features

Model Conversion to ONNX

Prerequisites

Conversion Steps

Local Development Setup

Requirements

Installation

Usage

Example

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages