Skip to content

A fast, lightweight Japanese IME engine for Zig projects that converts romaji to hiragana, kanji, and full-width characters. Supports Google 日本語入力-style input patterns with an easy-to-use API.

License

Notifications You must be signed in to change notification settings

egegungordu/jaime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jaime

A headless Japanese IME (Input Method Editor) engine for Zig projects that provides:

  • Romaji to hiragana/katakana conversion

    eiennni → えいえんに

  • Full-width character conversion

    abc123 → abc123

  • Dictionary-based word conversion

    かんじ → 漢字

  • Built-in cursor and buffer management

Based on Google 日本語入力 behavior.

On the terminal with libvaxis

View repository

Terminal demo

On the web with webassembly

Online demo

Web demo

Zig Version

The minimum Zig version required is 0.13.0.

Licensing Information

This project includes the IPADIC dictionary, which is provided under the license terms stated in the accompanying COPYING file. The IPADIC license imposes additional restrictions and requirements on its usage and redistribution. If your application cannot comply with the terms of the IPADIC license, consider using the ime_core module with a custom dictionary implementation instead.

Integrating jaime into your Zig Project

You can add jaime as a dependency in your build.zig.zon file in two ways:

Development Version

# Get the latest development version from main branch
zig fetch --save git+https://github.com/egegungordu/jaime

Release Version

# Get a specific release version (replace x.y.z with desired version)
zig fetch --save https://github.com/egegungordu/jaime/archive/refs/tags/vx.y.z.tar.gz

Then instantiate the dependency in your build.zig:

const jaime = b.dependency("jaime", .{});
exe.root_module.addImport("kana", jaime.module("kana"));         // For simple kana conversion
exe.root_module.addImport("ime_core", jaime.module("ime_core")); // For IME without dictionary
exe.root_module.addImport("ime_ipadic", jaime.module("ime_ipadic")); // For IME with IPADIC dictionary

Usage

The library provides three modules for different use cases:

1. Kana Module - Simple Conversions

For simple romaji to hiragana conversions without IME functionality:

const kana = @import("kana");

// Using a provided buffer (no allocations)
var buf: [100]u8 = undefined;
const result = try kana.convertBuf(&buf, "konnnichiha");
try std.testing.expectEqualStrings("こんにちは", result);

// Using an allocator (returns owned slice)
const result2 = try kana.convert(allocator, "konnnichiha");
defer allocator.free(result2);
try std.testing.expectEqualStrings("こんにちは", result2);

2. IME IPADIC Module - Full Featured IME

For applications that want to use the full-featured IME with the IPADIC dictionary:

const ime_ipadic = @import("ime_ipadic");

// Using owned buffer (with allocator)
var ime = ime_ipadic.Ime(.owned).init(allocator);
defer ime.deinit();

// Using borrowed buffer (fixed size, no allocations)
var buf: [100]u8 = undefined;
var ime = ime_ipadic.Ime(.borrowed).init(&buf);

// Common IME operations
const result = try ime.insert("k");
const result2 = try ime.insert("o");
const result3 = try ime.insert("n");
try std.testing.expectEqualStrings("こん", ime.input.buf.items());

// Dictionary Matches
if (ime.getMatches()) |matches| {
    // Get suggested conversions from the dictionary
    // Returns []WordEntry containing possible word matches
}
try ime.applyMatch();    // Apply the best dictionary match to the current input

// Cursor Movement and Editing
ime.moveCursorBack(1);   // Move cursor left n positions
ime.moveCursorForward(1);// Move cursor right n positions
try ime.insert("y");     // Insert at cursor position
ime.clear();             // Clear the input buffer
try ime.deleteBack();    // Delete one character before cursor
try ime.deleteForward(); // Delete one character after cursor

Warning

The IPADIC dictionary is subject to its own license terms. If you need to use a different dictionary or want to avoid IPADIC's license requirements, use the ime_core module with your own dictionary implementation.

3. IME Core Module - Custom Dictionary

For applications that want to use IME functionality with their own dictionary implementation:

const ime_core = @import("ime_core");

// Create your own dictionary loader that implements the required interface
const MyDictLoader = struct {
    pub fn loadDictionary(allocator: std.mem.Allocator) !Dictionary {
        // Your dictionary loading logic here
    }

    pub fn freeDictionary(dict: *Dictionary) void {
        // Your dictionary cleanup logic here
    }
};

// Use the IME with your custom dictionary
var ime = ime_core.Ime(MyDictLoader).init(allocator);
defer ime.deinit();

WebAssembly Bindings

For web applications, you can build the WebAssembly bindings:

# Build the WebAssembly library
zig build

The WebAssembly library uses the IPADIC dictionary by default. For a complete example of how to use the WebAssembly bindings in a web application, check out the web example.

The WebAssembly library provides the following functions:

// Initialize the IME
init();

// Get pointer to input buffer for writing input text
getInputBufferPointer();

// Insert text at current position
// length: number of bytes to read from input buffer
insert(length);

// Get information about the last insertion
getDeletedCodepoints(); // Number of codepoints deleted
getInsertedTextLength(); // Length of inserted text in bytes
getInsertedTextPointer(); // Pointer to inserted text

// Cursor movement and editing
deleteBack(); // Delete character before cursor
deleteForward(); // Delete character after cursor
moveCursorBack(n); // Move cursor back n positions
moveCursorForward(n); // Move cursor forward n positions

Example usage in JavaScript:

// Initialize
init();

// Get input buffer
const inputPtr = getInputBufferPointer();
const inputBuffer = new Uint8Array(memory.buffer, inputPtr, 64);

// Write and insert characters one by one
const text = "ka";
for (const char of text) {
  // Write single character to buffer
  const bytes = new TextEncoder().encode(char);
  inputBuffer.set(bytes);

  // Insert and get result
  insert(bytes.length);

  // Get the inserted text
  const insertedLength = getInsertedTextLength();
  const insertedPtr = getInsertedTextPointer();
  const insertedText = new TextDecoder().decode(
    new Uint8Array(memory.buffer, insertedPtr, insertedLength)
  );

  // Check if any characters were deleted
  const deletedCount = getDeletedCodepoints();

  console.log({
    inserted: insertedText,
    deleted: deletedCount,
  });
}
// Final result is "か"

Testing

To run the test suite:

zig build test --summary all

Features

  • Romaji to hiragana/full-width character conversion based on Google 日本語入力 mapping
    • Basic hiragana (あ、い、う、え、お、か、き、く...)
      • a -> あ
      • ka -> か
    • Small hiragana (ゃ、ゅ、ょ...)
      • xya -> や
      • li -> ぃ
    • Sokuon (っ)
      • tte -> って
    • Full-width characters
      • k -> k
      • 1 -> 1
    • Punctuation
      • . -> 。
      • ? -> ?
      • [ -> 「

Contributing

Contributions are welcome! Please feel free to open an issue or submit a Pull Request.

Acknowledgments

Further Reading & References

For those interested in the data structures and algorithms used in this project, or looking to implement similar functionality, the following academic papers provide excellent background:

About

A fast, lightweight Japanese IME engine for Zig projects that converts romaji to hiragana, kanji, and full-width characters. Supports Google 日本語入力-style input patterns with an easy-to-use API.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published