Skip to content

Commit

Permalink
Merge pull request #171 from adrianeboyd/bugfix/bugfix/wordpiecer-tru…
Browse files Browse the repository at this point in the history
…ncation-169-0.5.x

Truncate on IDs rather than length
  • Loading branch information
honnibal authored Apr 21, 2020
2 parents ad833b4 + 4fa3aa9 commit e52fcdf
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions spacy_transformers/pipeline/wordpiecer.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,10 @@ def predict(self, docs):
segment, seg_words, offset=offset
)
seg_words = seg_words[:max_seq_length]
for idx in range(max_seq_length, len(seg_align)):
seg_align[idx] = []
for i, align in enumerate(seg_align):
if len(align) >= 1 and align[-1] < max_seq_length:
continue
seg_align[i] = [x for x in align if x < max_seq_length]
assert len(segment) == len(seg_align)
sent_words.append(seg_words)
sent_align.append(seg_align)
Expand Down

0 comments on commit e52fcdf

Please sign in to comment.