GitHub - abhinavnandwani/arm-llama2-asic: This repository implements a scaled-down LLaMA 2-like model on an ARM Cortex-M3 soft core, with a custom systolic array RTL module for efficient INT8 matrix multiplication and high-throughput inference.

this is a work in progress, I'm a overworked college student :((

All synthesizable code is written in SystemVerilog.

Architecture

Vivado Schematic

Repository Structure

`/hardware/`

Contains HDL (SystemVerilog) files that define the systolic array and its supporting modules:

pe.sv: Defines a single processing element (PE) within the systolic array, which performs an INT8 multiply-accumulate (MAC) operation.
systolic_mult.sv: Defines the full systolic array as a 32x32 matrix multiplier for INT8 operations, with each PE performing a MAC operation.
fifo_inject.sv: Implements FIFO buffers to inject data into the systolic array in a staggered, pipelined fashion, ensuring data availability for each PE and improving throughput.
systolic_control.sv: Manages the control logic for matrix data flow into the systolic array, coordinating data injection and handling completion flags.

`/software/`

Contains driver and inference code for the ARM Cortex-M3 to control the systolic array and perform LLaMA 2 inference:

driver.c: Driver code to interface with the systolic array. It manages matrix loading, triggering computations, and retrieving results.
run.c: High-level inference code inspired by karpathy/llama2.c. This file includes a scaled-down LLaMA 2 model and offloads matrix multiplication tasks to the systolic array driver.
utils.c: Utility functions for quantization, dequantization, and data preparation, ensuring model weights and input data are correctly formatted for INT8 operations.

`/scripts/`

Python scripts for pre-processing model weights and converting them to a quantized format suitable for loading onto the FPGA.

Key Features

Custom Systolic Array Accelerator: A 32x32 systolic array optimized for high-speed INT8 matrix multiplication, allowing high-throughput and low-power inference crucial for LLaMA 2 computations.
ARM Cortex-M3 Integration: Driver code for the ARM Cortex-M3 soft core to manage the systolic array, handle data flow, trigger computations, and retrieve results, enabling efficient control and minimal CPU overhead.
Quantized LLaMA 2 Model: A scaled-down, quantized 50M parameter LLaMA 2 model, adapted to utilize INT8 precision. This allows the model to run on low-power FPGA-based systems.
Vivado Deployment: All code is compatible with Vivado for easy FPGA synthesis and deployment.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Inspired by karpathy/llama2.c.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
hardware/systolic_array_rtl		hardware/systolic_array_rtl
scripts		scripts
software/arm_cortex_m3_driver		software/arm_cortex_m3_driver
vivado_systolic		vivado_systolic
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
block_diagram.png		block_diagram.png
systolic_hier.v		systolic_hier.v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architecture

Vivado Schematic

Repository Structure

`/hardware/`

`/software/`

`/scripts/`

Key Features

License

Acknowledgments

About

Releases

Packages

Languages

License

abhinavnandwani/arm-llama2-asic

Folders and files

Latest commit

History

Repository files navigation

Architecture

Vivado Schematic

Repository Structure

/hardware/

/software/

/scripts/

Key Features

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`/hardware/`

`/software/`

`/scripts/`

Packages