POC of LLMDecoder on both Pt trace model and Onnx model (LMAdapter part) #2509

KexinFeng · 2023-04-07T00:49:05Z

Interface stepGenerator will be renamed to LMAdapter

Description

This is a POC which shows that the GPT2 can be traced and exported into DJL for a step inference. Moreover, this step inference allows the following:

batch sequence input
cached past_key_values input, which is a Tuple[Tuple[Tensor, Tensor], Tuple[Tensor, Tensor], ...] in python.

In this POC, both GPT2_init.pt and GPT2.pt are used since they have different inputs, the former not having past_key_values input.

Run TextGeneration.java to test it.

Design

The new classes are the following:

public class JavaDecoder {
    private StepGenerator generator;
    
    private DecodeParam decodingParams;
    
    public Text generateText() {};
}

JavaDecoder is made engine agnostic. The inference loop can be implemented inside generateText().

public interface StepGenerator {
    private String modelUrl;  // used to load the model
    
    public Token stepGen(inputIds, positionIds, attentionMask, pastKeyValues);  
    // This function is the same as P(w_n|w_{n-1}, w_{n-2}, ...)
}

This interface is a wrapper over the model files from different sources, e.g. gpt2.pt, gpt2.onnx, etc.
It can be seen as a Java abstraction of a causal language model, i.e. the conditional probability p_\theta(v_t | x_1, ..., x_{t-1}), i.e. given the past tokens up to a certain time x_{< t}, the probability that the next token is v, taken from a vocabulary set V. \theta is the model's weight.

It will be implemented individually to adapt for different models, which applies to different scenarios.

Traced GPT2.pt + JNI_for_pytorch. Its POC is done.
GPT2.onnx + OnnxEngine. This will be the use case corresponding to the graph above.
GPT2 + nueronx
FasterTransformer + tensorRTEngine

Model tracing

The onnx model gpt2.onnx is loaded from https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-using-past-keysvalues-in-the-decoder.
See also https://github.com/huggingface/optimum/releases.

The gpt2.pt is traced with the following scripts: https://gist.github.com/KexinFeng/4876c6bfb27f40abffe4d5a92c02acff

zachgk · 2023-05-11T00:30:05Z

api/src/main/java/ai/djl/translate/LMAdapter.java

+ * range(|inputIds|). This means for each i, the output probability is conditional on the past
+ * sequence up to i.
+ */
+public interface LMAdapter extends AutoCloseable {


Per our discussion, see if you can remake the LMAdapter to be a type of Block. Then, you can load it using Model.load() rather than requiring the special handling in Engine.

Type change to Block is done. But the special handling in Engine is not avoided. GPT2PtLMBlock depends on module pytorch-engines.main and can not be used in engine agnostic frontend. The reason GPT2PtLMBlock is engine specific is that it adapts to different engines and that engine-specific types are used, like IValue for pytorch.

api/src/main/java/ai/djl/nn/Block.java

engines/pytorch/pytorch-engine/src/main/java/ai/djl/pytorch/jni/IValueUtils.java

zachgk · 2023-05-11T00:47:52Z

examples/src/main/java/ai/djl/examples/inference/TextGeneration.java

+        mainPt(args);
+    }
+
+    public static void mainOnnx(String[] args) {


I'm noticing that these examples are almost identical. Is it possible to refactor out a common helper. So, each one would just initialize the LMAdapter and then pass it into the helper. Maybe the helper would look something like public static void generateInternal(LMAdapter generator);

The testing data used here, like inputIds, positionIds and attentionMask, are more of case by case. It's like those test examples in NDIndexTest.java. They are not likely to be reused. The rest common parts have been factored out.

KexinFeng · 2023-05-13T02:14:47Z

See the latest PR for the updates.

KexinFeng · 2023-06-21T04:16:03Z

Merged in a different PR.

POC of LLMDecoder

1c4f88a

KexinFeng requested review from zachgk, frankfliu and a team as code owners April 7, 2023 00:49

KexinFeng marked this pull request as draft April 7, 2023 00:53

Convert POC to class implementations

d1651f9

KexinFeng force-pushed the LLMDecoder branch 2 times, most recently from fe1aa0a to a74696a Compare April 7, 2023 13:29

fix compilation

ac76db4

KexinFeng force-pushed the LLMDecoder branch from 63e74b9 to ac76db4 Compare April 7, 2023 13:43

KexinFeng added 3 commits April 7, 2023 12:50

fix

6a576b4

doc

8a77128

Onnx merged GPT2 POC

35ca7f3

KexinFeng changed the title ~~POC of LLMDecoder~~ POC of LLMDecoder on both Pt trace model and Onnx model Apr 14, 2023

KexinFeng changed the title ~~POC of LLMDecoder on both Pt trace model and Onnx model~~ POC of LLMDecoder on both Pt trace model and Onnx model (stepGenerator part) Apr 14, 2023

deduplicate

52bb417

KexinFeng changed the title ~~POC of LLMDecoder on both Pt trace model and Onnx model (stepGenerator part)~~ POC of LLMDecoder on both Pt trace model and Onnx model (LMAdapter part) Apr 14, 2023

KexinFeng added 3 commits April 14, 2023 21:55

refactor and clean

e2a4f79

format

563ddb8

rename

c4afe6b

KexinFeng mentioned this pull request Apr 17, 2023

Contrastive Search #2547

Closed

KexinFeng marked this pull request as ready for review April 17, 2023 17:04

KexinFeng mentioned this pull request Apr 20, 2023

Greedy search and beam search #2557

Closed

KexinFeng mentioned this pull request May 3, 2023

Batch the sequences with ContrastiveSeqBatchScheduler #2572

Closed

zachgk requested changes May 11, 2023

View reviewed changes

KexinFeng mentioned this pull request Jun 14, 2023

[api] implements text-generation search algorithm #2637

Merged

KexinFeng closed this Jun 21, 2023

KexinFeng mentioned this pull request Aug 14, 2023

[api] Restore Lm search unittest to recover coverage rate #2723

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC of LLMDecoder on both Pt trace model and Onnx model (LMAdapter part) #2509

POC of LLMDecoder on both Pt trace model and Onnx model (LMAdapter part) #2509

KexinFeng commented Apr 7, 2023 •

edited

Loading

zachgk May 11, 2023

KexinFeng May 13, 2023

zachgk May 11, 2023

KexinFeng May 13, 2023 •

edited

Loading

KexinFeng commented May 13, 2023

KexinFeng commented Jun 21, 2023

POC of LLMDecoder on both Pt trace model and Onnx model (LMAdapter part) #2509

POC of LLMDecoder on both Pt trace model and Onnx model (LMAdapter part) #2509

Conversation

KexinFeng commented Apr 7, 2023 • edited Loading

Description

Design

Model tracing

zachgk May 11, 2023

Choose a reason for hiding this comment

KexinFeng May 13, 2023

Choose a reason for hiding this comment

zachgk May 11, 2023

Choose a reason for hiding this comment

KexinFeng May 13, 2023 • edited Loading

Choose a reason for hiding this comment

KexinFeng commented May 13, 2023

KexinFeng commented Jun 21, 2023

KexinFeng commented Apr 7, 2023 •

edited

Loading

KexinFeng May 13, 2023 •

edited

Loading