Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunk length is incorrect for files less than min_size #20

Closed
Jwink3101 opened this issue Nov 21, 2023 · 0 comments · Fixed by #22
Closed

chunk length is incorrect for files less than min_size #20

Jwink3101 opened this issue Nov 21, 2023 · 0 comments · Fixed by #22

Comments

@Jwink3101
Copy link

When a chunk is smaller than min_size, such as a small file/stream , the reported size is incorrect.

Consider the following example:

data = b'\x04\xc9KM\x8a\xeaiH\x83\xaf\x01{\xd6\xe1\xab(# \xdb\xaf' # from os.urandom(20)
print(f'{len(data) = }')

chunks = fastcdc.fastcdc(
    data, 
    min_size=1024, # 1 kb
    avg_size=4*1024, # 4 kb
    max_size=16*1024, # 16 kb
    fat=True, # for demo
)
chunk = next(chunks)

print(f'{chunk.length = }')
print(f'{len(chunk.data) = }')
print(f'{data == chunk.data = }')

print(f'{fastcdc.__version__ = }')

Out:

len(data) = 20
chunk.length = 1024
len(chunk.data) = 20
data == chunk.data = True
fastcdc.__version__ = '1.4.2'

As you can see, chunk.length is incorrect for a data stram of 20 bytes (20 << 1024). When used with fat=True, I can ascertain the true size but that is needless using extra memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant