Skip to content

Commit

Permalink
Llama2 7b mmlu stdcase (FlagOpen#211)
Browse files Browse the repository at this point in the history
* test

* finishfp32

* upd

* upd

* upd
  • Loading branch information
shh2000 authored and zhoujiamin01 committed Aug 31, 2023
1 parent a9295d0 commit 1796b55
Show file tree
Hide file tree
Showing 12 changed files with 417 additions and 0 deletions.
69 changes: 69 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
### 1. 推理数据集

* 下载地址:`https://huggingface.co/datasets/Stevross/mmlu/tree/main`
1. 下载其中的data.tar
2. 将.tar文件还原为目录
3. 将解压后的data目录放置在config.data_dir/config.mmlu_dir

### 2. 模型与权重

* 模型实现
* pytorch:transformers.LlamaForCausalLM
* 权重加载
* pytorch:LlamaForCausalLM.from_pretrained(config.data_dir/config.weight_dir)
* 权重获取方式
1. 填写申请表,向meta ai申请获取llama2模型权重,并同意相关协议
2. 下载其中的llama2-7b权重(注意不是chat)
3. 使用huggingface提供的convert.py将权重转化为huggingface格式,并保存在config.data_dir/config.weight_dir

### 3. 软硬件配置与运行信息参考

#### 3.1 Nvidia A100

- ##### 硬件环境
- 机器、加速卡型号: NVIDIA_A100-SXM4-40GB
- 多机网络类型、带宽: InfiniBand,200Gb/s

- ##### 软件环境
- OS版本:Ubuntu 20.04
- OS kernel版本: 5.4.0-113-generic
- 加速卡驱动版本:470.129.06
- Docker 版本:20.10.16
- 训练框架版本:pytorch-2.1.0a0+4136153
- 依赖软件版本:
- cuda: 12.1

- 推理工具包
- Inductor (torch._dynamo) pytorch-2.1.0a0+4136153

- ##### 优化策略

- None

- ##### 并行策略

- None

### 4. 运行情况(Llama2_7b_MMLU)

* 指标列表

| 指标名称 | 指标值索引 | 特殊说明 |
| ------------------ | ----------------- | ----------------------------------------------------------- |
| 数据精度 | precision | 可选fp32/fp16 |
| 硬件存储使用 | mem | 通常称为“显存”,单位为GiB |
| 端到端时间 | e2e_time | 总时间+Perf初始化等时间 |
| 验证总吞吐量 | p_val_whole | 实际验证序列数除以总验证时间 |
| 验证计算吞吐量 | p_val_core | 不包含IO部分耗时 |
| 推理总吞吐量 | p_infer_whole | 实际推理序列数除以总推理时间 |
| **推理计算吞吐量** | **\*p_infer_core** | 不包含IO部分耗时 |
| **计算卡使用率** | **\*MFU** | model flops utilization |
| 推理结果 | acc(推理/验证) | 单位为MMLU回答准确率 |

* 指标值


| 推理工具 | precision | e2e_time | p_val_whole | p_val_core | p_infer_whole | \*p_infer_core | \*MFU | acc | mem |
| ----------- | --------- | ---- | ---- | -------- | ----------- | ---------- | ------------- | ------------ | ----------- | ----------- |
| inductor | fp16 | 2558 | 8596.9 | 8630.3 | 9230.8 | 10052.2 | 45.1% | 45.8%/45.8% | 28.0/40.0 |
| inductor | fp32 | 4143 | 5455.3 | 5469.4 | 5675.7 | 5951.8 | 53.4% | 45.8%/45.8% | 35.0/40.0 |
5 changes: 5 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/pytorch/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .dataloader import build_dataloader
from .model import create_model
from .export import export_model
from .evaluator import evaluator
from .forward import model_forward, engine_forward
144 changes: 144 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/pytorch/dataloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
import os
import pandas as pd
from transformers import AutoTokenizer
import torch
from torch.utils.data import DataLoader, Dataset
from loguru import logger

TASKS = [
'abstract_algebra',
'anatomy',
'astronomy',
'business_ethics',
'clinical_knowledge',
'college_biology',
'college_chemistry',
'college_computer_science',
'college_mathematics',
'college_medicine',
'college_physics',
'computer_security',
'conceptual_physics',
'econometrics',
'electrical_engineering',
'elementary_mathematics',
'formal_logic',
'global_facts',
'high_school_biology',
'high_school_chemistry',
'high_school_computer_science',
'high_school_european_history',
'high_school_geography',
'high_school_government_and_politics',
'high_school_macroeconomics',
'high_school_mathematics',
'high_school_microeconomics',
'high_school_physics',
'high_school_psychology',
'high_school_statistics',
'high_school_us_history',
'high_school_world_history',
'human_aging',
'human_sexuality',
'international_law',
'jurisprudence',
'logical_fallacies',
'machine_learning',
'management',
'marketing',
'medical_genetics',
'miscellaneous',
'moral_disputes',
'moral_scenarios',
'nutrition',
'philosophy',
'prehistory',
'professional_accounting',
'professional_law',
'professional_medicine',
'professional_psychology',
'public_relations',
'security_studies',
'sociology',
'us_foreign_policy',
'virology',
'world_religions'
]
choices = ["A", "B", "C", "D"]

def format_subject(subject):
l = subject.split("_")
s = ""
for entry in l:
s += " " + entry
return s


def gen_prompt(train_df, subject, k=-1):
prompt = "The following are multiple choice questions (with answers) about {}.\n\n".format(format_subject(subject))
if k == -1:
k = train_df.shape[0]
for i in range(k):
prompt += format_example(train_df, i)
return prompt


def format_example(df, idx, include_answer=True):
prompt = df.iloc[idx, 0]
k = df.shape[1] - 2
for j in range(k):
prompt += "\n{}. {}".format(choices[j], df.iloc[idx, j+1])
prompt += "\nAnswer:"
if include_answer:
prompt += " {}\n\n".format(df.iloc[idx, k + 1])
return prompt


class mmlu(Dataset):

def __init__(self, config):
self.tokenizer = AutoTokenizer.from_pretrained(os.path.join(config.data_dir, config.weight_dir))
self.records = []
self.length = 0

for task in TASKS:

logger.debug("Loading 5-shot " + str(task))

dev_df = pd.read_csv(os.path.join(config.data_dir, config.mmlu_dir, "dev", task + "_dev.csv"), header=None)[:config.few_shots]
test_df = pd.read_csv(os.path.join(config.data_dir, config.mmlu_dir, "test", task + "_test.csv"), header=None)

for i in range(test_df.shape[0]):
k = config.few_shots
prompt_end = format_example(test_df, i, include_answer=False)
train_prompt = gen_prompt(dev_df, task, k)
prompt = train_prompt + prompt_end
while len(self.tokenizer.tokenize(prompt)) + 1> 2048:
prompt_split = prompt.split("\n\n")
prompt_split.pop(1)
prompt = "\n\n".join(prompt_split)
label = test_df.iloc[i, test_df.shape[1]-1]
token_prompt = self.tokenizer(prompt, return_tensors="pt")
token_label = self.tokenizer([label], return_tensors="pt")
self.records.append({"prompt":token_prompt, "answer":token_label.input_ids})
self.length += 1


def __len__(self):
return self.length

def __getitem__(self, idx):
return self.records[idx]


def build_dataloader(config):
dataset = mmlu(config)
assert config.batch_size == 1
loader = DataLoader(dataset,
batch_size=config.batch_size,
shuffle=False,
drop_last=False,
num_workers=config.num_workers,
pin_memory=True)

return loader
11 changes: 11 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/pytorch/evaluator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import torch


def evaluator(pred, y):
gt = float(y[0][0][1])
predict = pred[:,-1,:]
answer = float(torch.argmax(predict, dim=1))
if answer == gt:
return 1
else:
return 0
9 changes: 9 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/pytorch/export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import torch
import os


def export_model(model, config):
if config.exist_onnx_path is not None:
return config.exist_onnx_path

return None
117 changes: 117 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/pytorch/forward.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
from loguru import logger
import torch
import numpy as np
import time
from tools import torch_sync


def cal_perf(config, tokens, duration, core_time, str_prefix):
model_forward_perf = config.repeat * tokens / duration
logger.info(str_prefix + "(" + config.framework + ") Perf: " +
str(model_forward_perf) + " tps")
model_forward_core_perf = config.repeat * tokens / core_time
logger.info(str_prefix + "(" + config.framework + ") core Perf: " +
str(model_forward_core_perf) + " tps")
return round(model_forward_perf, 3), round(model_forward_core_perf, 3)


def model_forward(model, dataloader, evaluator, config):
if config.no_validation:
return None, None, None
start = time.time()
core_time = 0.0

token_cnt = 0
correct = 0
whole = 0

for times in range(config.repeat):

logger.debug("Repeat: " + str(times + 1))

for step, item in enumerate(dataloader):
if step % config.log_freq == 0:
logger.debug("Step: " + str(step) + " / " +
str(len(dataloader)))

tokens = item["prompt"].input_ids.cuda()[0]

with torch.no_grad():

torch_sync(config)
core_time_start = time.time()

y = model(tokens)

torch_sync(config)
core_time += time.time() - core_time_start

token_cnt += len(tokens[0])

pred = y[0]
r = evaluator(pred, item["answer"])

correct += r
whole += 1

logger.info("MMLU" + str(config.few_shots) + "-shots Acc: " + str(correct / whole))

duration = time.time() - start
model_forward_perf, model_forward_core_perf = cal_perf(
config, token_cnt, duration, core_time, "Validation")

return model_forward_perf, model_forward_core_perf, round(correct / whole, 3)


def engine_forward(model, dataloader, evaluator, config):
if config.no_validation:
return None, None, None
start = time.time()
core_time = 0.0
foo_time = 0.0

token_cnt = 0
correct = 0
whole = 0

for times in range(config.repeat):

logger.debug("Repeat: " + str(times + 1))

for step, item in enumerate(dataloader):
if step % config.log_freq == 0:
logger.debug("Step: " + str(step) + " / " +
str(len(dataloader)))

tokens = item["prompt"].input_ids[0]
model_inputs = [tokens]

with torch.no_grad():

torch_sync(config)
core_time_start = time.time()

y = model(model_inputs)

torch_sync(config)
core_time += time.time() - core_time_start

foo_time += y[1]
model_outputs = y[0]

token_cnt += len(tokens[0])

y = model_outputs[0]
pred = y[0]
r = evaluator(pred, item["answer"])

correct += r
whole += 1

logger.info("MMLU" + str(config.few_shots) + "-shots Acc: " + str(correct / whole))

duration = time.time() - start
model_forward_perf, model_forward_core_perf = cal_perf(
config, token_cnt, duration, core_time - foo_time, "Inference")

return model_forward_perf, model_forward_core_perf, round(correct / whole, 3)
11 changes: 11 additions & 0 deletions inference/benchmarks/llama2_7b_mmlu/pytorch/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from transformers import LlamaForCausalLM


def create_model(config):
model = LlamaForCausalLM.from_pretrained(config.data_dir + "/" +
config.weight_dir).eval().cuda().float()

if config.fp16:
model.half()

return model
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
transformers
16 changes: 16 additions & 0 deletions inference/configs/llama2_7b_mmlu/configurations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
batch_size: 1
# 1 item(like 1 sequence, 1 image) flops
# Attention! For transformer decoder like bert, 1 token cause 2*param flops, so we need 2*length*params like 2*512*0.33B here
# format: a_1*a*2*...*a_nea_0,like 2*512*0.33e9(bert) or 4.12e9(resnet50)
flops: 2*7e9
fp16: true
compiler: inductor
num_workers: 8
log_freq: 100
repeat: 1
# skip validation(will also skip create_model, export onnx). Assert exist_onnx_path != null
no_validation: false
# set a real onnx_path to use exist, or set it to anything but null to avoid export onnx manually(like torch-tensorrt)
exist_onnx_path: null
# set a exist path of engine file like resnet50.trt/resnet50.plan/resnet50.engine
exist_compiler_path: null
3 changes: 3 additions & 0 deletions inference/configs/llama2_7b_mmlu/parameters.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
weight_dir: "llama2_7b_hf"
mmlu_dir: "mmlu_dataset/data"
few_shots: 5
Loading

0 comments on commit 1796b55

Please sign in to comment.