-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid output and errors using model = ipex.optimize(model): split master weight unsupported, Conv BatchNorm folding failed, Linear BatchNorm folding failed #302
Comments
Which GPU did you run on? |
We will look into this issue. |
Sorry, I should have mentioned that. Arc 770, latest drivers on Ubuntu. Thank you very much for looking into this, I really appreciate it! |
Is there an eta for someone to look at this? Just curious as I have a project I'm trying to validate on ARC. Thanks! |
We are looking into this issue, and will update later. Seems like there are some issues found. |
similar issue while trying to run openai-whisper on A770 from . import load_model
+ import intel_extension_for_pytorch as ipex
model = load_model(model_name, device=device, download_root=model_dir)
+ model.eval()
+ model = model.to('xpu')
+ ipex.optimize(model) whisper --model tiny --language en --task transcribe --device xpu ... results in
Whipser then fails to decode the tokens.
. /opt/intel/oneapi/tbb/2021.8.0/env/vars.sh
. /opt/intel/oneapi/compiler/2022.2.0/env/vars.sh
. /opt/intel/oneapi/mkl/2022.2.0/env/vars.sh
|
Update, I have also tried this with an Intel I9-11900K CPU and A770 with the same result. The first attempt was using an AMD Threadripper. The code does not work on either platform. Is there a timeline for this issue? Thanks so much! |
This issue will be fixed in the next release soon. |
Just a note, I have gotten bad results with every single model I've tried to use with XPU, it's not limited to this model. From my perspective, ARC has been unusable for almost 2 months now. I bought 6 Arc A770s for a project and this has been a waste so far. I understand that I'm just one user and your team has their own plan. Can you give me anything to help me use these cards though? Is there a branch I can try or at least can you provide a release date so I know if I should continue trying with this hardware? Thanks very much! |
This incorrect output issue had been fixed in the latest code base. The next release is pending, though, you can try compile from source at this moment with https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/scripts/compile_bundle.sh. |
Hi, at this moment, please try compiling the latest code from source for now. Please take a reference to the comment above. |
compilation took hours and multiple attempts, but whisper is working with the xpu-master branch and even loads the large model into the 16GB VRAM.
speed looks ok'ish, but given the warnings probably room for improvement. intel_gpu_top shows 52% Render, 75% Blitter, 24% unknown. whisper patch diff --git a/whisper/transcribe.py b/whisper/transcribe.py
index ed6d820..0d9e3c8 100644
--- a/whisper/transcribe.py
+++ b/whisper/transcribe.py
@@ -429,8 +429,13 @@ def cli():
torch.set_num_threads(threads)
from . import load_model
+ import intel_extension_for_pytorch as ipex
model = load_model(model_name, device=device, download_root=model_dir)
+ model.eval()
+ model = model.to(device)
+ if device == 'xpu':
+ ipex.optimize(model)
writer = get_writer(output_format, output_dir)
for audio_path in args.pop("audio"): python modules
apt packages
|
above warnings go away when found a metric to display GPU memory usage using lsgpu normal usage > lsgpu -p | grep ^lmem_
lmem_avail_bytes : 16260284416
lmem_total_bytes : 17079205888 openai whisper large mode loaded
|
took hours to build, so uploaded unofficial wheels of xpu-master here: |
@leuc How much RAM does your computer possess? It builds in around 20-25min on my workstation, utilizing slightly under 20GB of memory. However, when attempting building using a Github Actions I made (per Github docs, the VM has 7GB of memory) or a self-hosted runner on a laptop with 8GB of RAM, I didn't even get a build to finish. @jingxu10 Having something akin to a nightly beta build from Intel could be really useful here. |
@fredlarochelle it wasn't a resource issue, but the script doesn't build well without conda. I may work on a PR for better portability, with aim for CI/CD and containers. |
@leuc Yeah, I know about conda + the GCC 11 requirement, however I had no luck with GCC 11, not consistent at all, got it working way better with GCC 9. We should probably have a look into the compiler flags used too. |
what are error messages? I would recommend to do the compilation in a docker container. |
addressed some build issues with PR #334 |
I'm using a tiny test network that is just one linear layer. Using the updated build I still get: /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: I don't know how this is possible because there's no LSTM at all! import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
import math
import os
import glob
import random
import librosa
import soundfile as sf
import numpy as np
import intel_extension_for_pytorch as ipex
default_device = torch.device("xpu")
class DummyLayer(nn.Module):
def __init__(self):
super(DummyLayer, self).__init__()
self.layer = nn.Linear(1, 1)
def forward(self, src):
src = src.unsqueeze(-1)
src = self.layer(src)
src = src.squeeze(-1)
return src
model = DummyLayer()
model.to(default_device)
criterion = nn.MSELoss()
lr_factor = 0.1
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16, inplace=True)
target_sample_rate=8000
def load_file(path):
data, sample_rate = librosa.load(path, sr=target_sample_rate)
data = torch.from_numpy(data)
data = data.unsqueeze(0)
data = torch.mean(data.to(default_device), dim=0).unsqueeze(0)
return data
train = load_file("testrecording_8k.wav")
target = load_file("testrecording_target_8k.wav")
# Training loop
num_epochs = 150000
for epoch in range(num_epochs):
print("running")
# batch = batch.to(memory_format=torch.channels_last)
# target = target.to(memory_format=torch.channels_last)
train = train.bfloat16()
target = target.bfloat16()
optimizer.zero_grad()
with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
output = model(train)
loss = criterion(output, target)
print(f'Epoch: {epoch+1}/{num_epochs}, Step: {epoch+1}, Loss: {loss.item()}')
print("output", output.cpu())
print("target", target.cpu())
loss.backward()
optimizer.step()
print(f'Epoch: {epoch+1}/{num_epochs}, Step: {epoch+1}, Loss: {loss.item()}')
# every few steps save the output
if (epoch+1) % 50 == 0:
# Save the output to file
output = torch.flatten(output, start_dim=0)
print(output.size())
sf.write("samples2/testrecording_8k_progress2_" + str(epoch) + ".wav", output.float().cpu().detach().numpy(), target_sample_rate)
|
@zejun-chen Is this a known issue we already fixed? |
Hi, @turbobuilt @gujinghui |
Is this included in latest drivers, or still need to compile? I just ordered two a770 to fine tune and run Whisper and some other models |
I have found an interesting feature. model = get_peft_model(model, peft_config)
model.eval()
model = ipex.optimize(model)
model.train() instead of: model = get_peft_model(model, peft_config)
model, optimizer = ipex.optimize(model, optimizer=Lion(...)) Or maybe the cause of that is an unsupported optimizer? P.S. Also i am struggling with long-context forward passes due to >4gb allocations. I already peeked into other issues and didn`t find proper fixes there. (ARC A770 16gb) |
Hi, trying to run inference with a pretrained OFA (OFA-huge) model according to these instructions:
https://github.com/OFA-Sys/OFA/blob/feature/add_transformers/transformers.md
This runs fine on both CPU and CUDA but using XPU results in gibberish. I also get several warnings which go away when
model = ipex.optimize(model)
is commented out. With essentially the only change between CPU/CUDA and XPU being the.to('xpu')
part, the model still outputs gibberish.Warnings from model = ipex.optimize(model):
[' this is the ch ch chaval all the is is the word for the band that is']
^ gibberish output
With CPU/CUDA:
[' a black and white photo of a wolf walking through the woods at night.']
^ correct output
I'm running Ubuntu 22.04 with 1.13.10+xpu, code is below:
Image:

Thanks!
The text was updated successfully, but these errors were encountered: