GitHub - chochain/eJsv32: Forth in SystemVerilog with Java opcodes VM

eJ32 - a Forth CPU on FPGA that runs Java opcodes

A reincarnation of eP32, a 32-bit CPU by Dr. Ting. However, deviating from the long linage of eForth, it uses Java Bytecode as the internal instruction set and hence the name J. After developing CPUs for decades, Dr. Ting, in a write up for eJsv32 manual, he concluded the following

Which instruction set will be the best and to survive to the next century? Looking around, I can see only one universal computer instruction set, and it is now gradually prevailing. It is Java.

Responding to the invitation from Don Golding of CORE-I FPGA project AI & Robotics Group, Dr. Ting dedicated the last few months of his life on developing eJ32. Based on his VHDL eP32, the transcoded SystemVerilog set was completed but never has the chance been fully verified or validated in time before his passing.

I appreciate that Dr. Ting took me in his last projects and considered me one of his student. Though a trained software engineer, who have never worked on any FPGA before, I felt overwhelmingly obligated to at least carry his last work to a point that future developers can benefit from the gems of his life's effort.

My goal is to make the learning journey of building eJ32 as an example of designing and implementing an FPGA CPU regardless whether Java will be the prevailing ISA or not.

Status

Currently, though eJ32 has been successfully simulated with Dr. Ting's test cases but yet synthesized on the targeted ICE40. It will take sometime to realize for lack of hardware design knowledge on my part. If interested in a fully functional Forth CPU, Dr. Ting's origial eP16 or Bowman's J1a are both great to start. Anyway, for a kick, here're what I've done for eJ32 so far.

Adaptations of eJsv32k

keep Dr. Ting's original code in ~/orig/eJsv32k
keep Dr. Ting's documentation in ~/docs
create ~/source/eJ32.sv as the main core
update mult/divide/shifter/ushifter modules using simple *, /, <<
externalize ram_memory.v module, use spram.sv and eJ32_if.sv for Lattice iCE40UP FPGA
create a dictionary ROM from eJsv32.hex, 8K bytes, sourcing from original ej32i.mif (see source/README for details)
add ~/test/dict_setup.sv, use $fload to install memory map (i.e. eJsv32.hex)
add top module ~/test/outer_tb.sv to drive memory block, dict_setup, and inner interpreter eJ32
add eJ32.vh, use enum for opcodes replacing list of parameters
refactor eJ32.sv
- use common tasks and macros to reduce verbosity
- removed phaseload, aselload which are always 1'b1
- add many $display for tracing (and my own understanding)
fix divider, add one extra cycle for TOS update before next instruction
modulize into a 2-bus hierachical design
use iCE40 EBR (embedded block memory) for 64-deep data and return stacks (was 32-deep)
use EBR as ROM which is populated from hex image file (contains 3.4K eForth + 1K test cases)
add JTAG, HSOSC, RGB in top module for Map, P&R,...

Modulization, flat->hierarchical (v2)

module	desc	components	LUTs/freq area	LUTs/freq timing	LUTs (47op)	note	err
ROM	eForth image (3.4K bytes)	8K bytes onboard ROM	49 166.5	17 272.9	49	8-bit 16 EBR blocks
RAM	memory	128K bytes onboard RAM	48 2392.3	49 2392.3	48	8-bit pseudo-dual port
AU	arithmetic unit	ALU and data stack	928 31.3	939 31.3	1755	2 EBRs
BR	branching unit	program counter and return stack	425 26.8	435 31.0	333	2 EBRs
DC	decoder unit	state machines	194 34.7	193 39.8	211		divider patch
DP	data processor unit	shr/shl/mul/div	731 17.9	621 21.3	439	3 DSPs
LS	load/store unit	memory and buffer IO	522 54.0	530 47.4	201	54.0
CTL	control bus	TOS, code, phase	NA	NA	NA	interface not synthsized

EJ32	top module	JTAG,HSOSC,RGB	3905 11.4	3721 11.4	NA	JTAG=778	slow...

Bus Design

To refactor:

make all outputs registered (sync sub-blocks)
compare to eP16 design
tune DP for 24MHz (i.g. set_multicycle_path on divider, immd register)
combine IU (instruction unit, in eJ32.sv) and BR
BR add R (top of return stack) register to help EBR slow path
AU add S (NOS) register to help EBR slow path
break IF (instruction fetch) off LS
break RR (t Register Read), WB (t, s Write Back) off AU
study pipelining hazards
- structure - RR-WB, BR-IU
- data - p_inc, divz, s
- control - p (and exception)

Installation

Install Lattice Radiant 3.0+ (with Free license from Lattice, comes with ModelSim 32-bit)
clone this repository to your local drive
Open eJsv32.rdf project from within Radiant
Compile, Synthesis if you really want to, and simulate (with ModelSim)

Memory Map (128K bytes)

section	starting address	note
eForth image	0x0000	loaded from ROM
Input buffer	0x1000	no RX unit yet, loaded from ROM
Output buffer	0x1400	no TX unit yet

Limitations

targeting only Lattice iCE40UP FPGA for now
No serial interface (i.e. UART, SPI, ..)
- fixed validation cases hardcoded in TIB (at 'h1000)
- output writes into output buffer byte-by-byte (starting at 'h1400)
33-cycle soft divider (iCE40 has no hardware divider)
No Map or Route provided
Data and return stacks
- 64-deep
- use iCE40 EBR, embedded block memory, pseudo dual-port, Lattice generated netlist, with negative edged clock
eForth image (3.4K)
- use iCE40 EBR as ROM
- loaded from ROM into RAM during at start-up (8K cycles)

Results - Staging for future development

The design works OK on ModelSim
- Core ~2.9K LUTs which should fit in iCE40 (3K or 5K)
ModelSim COLD start - completed
- v1 - 10K cycles, ~/docs/eJ32_trace.txt
- v2 - 10K cycles, ~/docs/eJ32v2_trace_20240108.txt
ModelSim Dr. Ting's 6 embeded test cases - completed
- v1 - 600K+ cycles OK, ~/docs/eJ32_trace_full_20220414.txt.gz (from Dr. Ting's)
- v1 - 520K+ cycles OK, ~/docs/eJ32_trace_full_20231223.txt.gz (before modulization)
- v2 - 520K+ cycles OK, ~/docs/eJ32v2_trace_full_20240117.txt.gz (after modulization)

Statistics

For the 6 test cases Dr. Ting gave, they take ~520K cycles.

units	instructions (in K)	total cycles(in K)	note
AU only	108	159	mostly 1 cycle
BR only	10	20	jreturn
DP only	0.4	14	idiv,irem,imul,ishr
LS only	24	112	b/i/saload
AU + BR	50	145
AU + LS	14	69

So, within the total cycles. details here

Only 47 total opcodes are used.
Arithmetic takes about 1/3, mostly 1-cycle except bipush(2), pop2(2), dup2(4)
Branching takes about 1/3, all 3-cycle except jreturn 2-cycle.
Load/Store takes about 1/3, all multi-cycles (avg. 5/instructions)

TODO

check P16 variant here
learn how to really Map, Place & Route (here's the 1st try with JTAG + RGB, at 11.4MHz)
Consider memory clock at higher freq i.g. 4x CPU's (so 32-bit returns in 1 cycle)
Consider i-cache + branch prediction to reduce branching delay
Consider 32-bit and/or d-cache to reduce load/store delay
Consider Pipelined design (see bus design above)
- Note: Pure combinatory module (no clock) returns in 1 cycle but lengthen the path which slows down the max frequency. Pipeline does the opposite.
- build hardwired control table
- learn how to resolve Hazards
- learn CSR + Hyper Pipelining

Reference

IceStorm, open source synthesis, https://clifford.at/icestorm
Project-F, https://github.com/projf/projf-explore
- Verilator + SDL2, https://projectf.io/posts/verilog-sim-verilator-sdl/
Verilator
- part-1~4 https://itsembedded.com/dhd/verilator_1/ ...
SpinalHDL
RISC-V Pipeline design https://passlab.github.io/CSE564/notes/lecture09_RISCV_Impl_pipeline.pdf

Revision History

20220110 - Chen-hanson Ting: eJsv32k.v in Quartus II SystemVerilog-2005
20220209 - Chochain: rename to eJ32 for Lattice and future versions
20230216 - Chochain: consolidate ALU modules, tiddy macro tasks
20231216 - Chochain: modulization flat to hierarchical (v2.0)
20240108 - Chochain: use EBR for data/return stacks and eForth image

Name		Name	Last commit message	Last commit date
Latest commit History 386 Commits
config		config
docs		docs
orig		orig
source		source
test		test
.gitignore		.gitignore
README.md		README.md
eJsv32.rdf		eJsv32.rdf
ej321.sty		ej321.sty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eJ32 - a Forth CPU on FPGA that runs Java opcodes

Status

Adaptations of eJsv32k

Modulization, flat->hierarchical (v2)

Bus Design

Installation

Memory Map (128K bytes)

Limitations

Results - Staging for future development

Statistics

TODO

Reference

Revision History

About

Releases

Packages

Languages

chochain/eJsv32

Folders and files

Latest commit

History

Repository files navigation

eJ32 - a Forth CPU on FPGA that runs Java opcodes

Status

Adaptations of eJsv32k

Modulization, flat->hierarchical (v2)

Bus Design

Installation

Memory Map (128K bytes)

Limitations

Results - Staging for future development

Statistics

TODO

Reference

Revision History

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages