Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when downloading records from the mimic3db database #452

Closed
Favourj-bit opened this issue May 12, 2023 · 26 comments
Closed

Error when downloading records from the mimic3db database #452

Favourj-bit opened this issue May 12, 2023 · 26 comments
Labels

Comments

@Favourj-bit
Copy link

Favourj-bit commented May 12, 2023

I'm trying to download records from the mimic3wdb:MIMIC-III Waveform Database using the code given in the demo

import os
import wfdb

cwd = os.getcwd()
dl_dir = os.path.join(cwd, 'tmp_dl_dir')

wfdb.dl_database('mimic3wdb', dl_dir=dl_dir)
display(os.listdir(dl_dir))

But I keep on getting the error shown:
image
Please how could I solve this?

@tompollard
Copy link
Member

@Favourj-bit please could you provide code to reproduce the issue?

@Favourj-bit
Copy link
Author

Favourj-bit commented May 12, 2023

Hi @tompollard I just did that. I got the same error when trying to access the 'mimic3wdb-matched', 'MIMIC-III Waveform Database Matched Subset' database

@tompollard
Copy link
Member

Thanks for adding the content @Favourj-bit. The code in your example works okay for me. Please could you show me the output to the following code?

import wfdb
print(wfdb.__version__)

A quick fix might be to upgrade to the latest version (e.g. with pip install wfdb --upgrade)

@Favourj-bit
Copy link
Author

Hi @tompollard
here is the output from the code:
image

@Favourj-bit
Copy link
Author

hi @tompollard, I have tried updating the wfdb but I'm still getting the same error.

NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb/1.0/30/3000003/.hea

@tompollard
Copy link
Member

tompollard commented May 12, 2023

thanks @Favourj-bit I'll take a look into this as soon as I have the opportunity. There is a problem with the path that is being generated, so you are getting a 404 not found error.

Side note, but he MIMIC-III database that you are looking to work with is hosted at: https://physionet.org/content/mimic3wdb/1.0/

@Favourj-bit
Copy link
Author

Hi @tompollard , I wanted to find out if you have been able to look into this

@tompollard
Copy link
Member

I'm sorry, not yet. I have some other commitments that I need to focus on, but will take a look at this when I can (if someone doesn't get there before me).

@briangow
Copy link
Contributor

@Favourj-bit , are you still having trouble with this? I cannot reproduce your problem. The files are downloading properly for me using this code.

@Favourj-bit
Copy link
Author

Favourj-bit commented May 26, 2023

@briangow
Yes, I am.
Which code is that?

@Favourj-bit
Copy link
Author

@briangow

What app are you using?
I used jupyter lab for that code

@briangow
Copy link
Contributor

I used Jupyter Notebook, could you give it a try?

@Favourj-bit
Copy link
Author

ok, i will do that. thanks

@Favourj-bit
Copy link
Author

Favourj-bit commented May 26, 2023

i just tried with jupyter notebook, gives the same error. I wanted to let you know however that I'm just copying the code for my usecase, I did not clone the notebook. Could that cause an issue?
image

@briangow
Copy link
Contributor

You should be able to simply copy the code and have it work, so I don't think that is the problem. To properly debug this we'd need to inspect record_list and nested_records in the dl_database function here: https://github.com/MIT-LCP/wfdb-python/blob/main/wfdb/io/record.py#L2971 . Feel free to give that a try. These lists should point the files here https://physionet.org/content/mimic3wdb/1.0/ (at the bottom). Your error shows a url which doesn't have the filename before the .hea, which isn't correct.

I won't be able help with this again until later next week. If you are anxious to get started using the mimic3wdb files, I'd suggest you download them directly from the physionet.org link above. Keep in mind that these files do take a substantial amount of disk space.

If you don't need to download them locally I'd suggest reading directly from the database (without saving them locally). See this section in the demo.ipynb for an example on how to do this for the matched subset of MIMIC-III waveforms (https://physionet.org/content/mimic3wdb-matched/1.0/):

# Can also read the same files hosted on PhysioNet (takes long to stream the many large files)
signals, fields = wfdb.rdsamp('3269321_0001', pn_dir = 'mimic3wdb/matched/p00/p000878')
wfdb.plot_items(signal=signals, fs=fields['fs'], title='Record p000878/3269321_0001')
display((signals, fields))

@Favourj-bit
Copy link
Author

Hi @briangow Thanks so much for your help. i will check out those functions and ensure to inform you of anything I find out.
Also, I actually do need the data because I want to preprocess it, I'm doing a blood pressure monitoring system project in my college hence why I need access to ECG, ppg and abp signals. Is it possible to preprocess it without downloading it outrightly?

I will give the code a trial, I will also try to preprocess it if possible.
Unfortunately, I could not get the wfdb package working on my system no matter what I tried, hence why I decided to follow the demo provided by the python package.

I'll also try to check out methods to get the data locally. Anyways, I won't mind holding on till next week too in case none of the other methods I'll try works out.
Thanks again for your concern

@tompollard
Copy link
Member

@Favourj-bit This also runs fine for me! Please could you:

  1. Add details of your operating system (Windows?)
  2. Post the output of pip freeze to show us which packages you have installed.
  3. Post the commands you are running as text, inside three backticks (```)
  4. Post any output that you see as text, inside three backticks (```)
  5. Post the full error message as text, inside three backticks (```)

@Favourj-bit
Copy link
Author

Favourj-bit commented May 26, 2023

Hi @tompollard

  1. Windows 10 Pro

image
3.. ''' import os
import wfdb

cwd = os.getcwd()
dl_dir = os.path.join(cwd, 'tmp_dl_dir')

wfdb.dl_database('mimic3wdb-matched', dl_dir=dl_dir)
display(os.listdir(dl_dir)) '''
4. The output is very long, so i'm posting only some of it. The beginning and the ending of the output.
''' Generating record list for: p00/p000020/
Generating record list for: p00/p000030/
Generating record list for: p00/p000033/
Generating record list for: p00/p000052/
Generating record list for: p00/p000079/
Generating record list for: p00/p000085/
Generating record list for: p00/p000107/
Generating record list for: p00/p000109/
Generating record list for: p00/p000123/
Generating record list for: p00/p000124/
Generating record list for: p00/p000125/
Generating record list for: p00/p000135/
Generating record list for: p00/p000138/
Generating record list for: p00/p000145/
Generating record list for: p00/p000154/
'''
''' Generating record list for: p09/p099836/
Generating record list for: p09/p099863/
Generating record list for: p09/p099865/
Generating record list for: p09/p099873/
Generating record list for: p09/p099880/
Generating record list for: p09/p099883/
Generating record list for: p09/p099894/
Generating record list for: p09/p099897/
Generating record list for: p09/p099913/
Generating record list for: p09/p099922/
Generating record list for: p09/p099946/
Generating record list for: p09/p099955/
Generating record list for: p09/p099982/
Generating record list for: p09/p099983/
Generating record list for: p09/p099992/
Generating record list for: p09/p099999/
Generating list of all files for: p00/p000020/
'''
5. ''' ---------------------------------------------------------------------------
NetFileNotFoundError Traceback (most recent call last)
in
5 dl_dir = os.path.join(cwd, 'tmp_dl_dir')
6
----> 7 wfdb.dl_database('mimic3wdb-matched', dl_dir=dl_dir)
8 display(os.listdir(dl_dir))

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io\record.py in dl_database(db_dir, dl_dir, records, annotators, keep_subdirs, overwrite)
3064 dir_name, base_rec_name = os.path.split(rec)
3065 record = rdheader(
-> 3066 base_rec_name, pn_dir=posixpath.join(db_dir, dir_name)
3067 )
3068

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io\record.py in rdheader(record_name, pn_dir, rd_segments)
1845 header_content = f.read()
1846 else:
-> 1847 header_content = download._stream_header(file_name, pn_dir)
1848
1849 # Separate comment and non-comment lines

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io\download.py in _stream_header(file_name, pn_dir)
107 # Get the content of the remote file
108 with _url.openurl(url, "rb") as f:
--> 109 content = f.read()
110
111 return content.decode("iso-8859-1")

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in read(self, size)
579 raise ValueError("invalid size: %r" % (size,))
580
--> 581 result = b"".join(self._read_range(start, end))
582 self._pos += len(result)
583 return result

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in _read_range(self, start, end)
472 buffer_store = True
473
--> 474 with RangeTransfer(self._current_url, req_start, req_end) as xfer:
475 # Update current file URL.
476 self._current_url = xfer.response_url

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in init(self, url, start, end)
166 self._content_iter = self._response.iter_content(4096)
167 try:
--> 168 self._parse_headers(method, self._response)
169 except Exception:
170 self.close()

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in _parse_headers(self, method, response)
216 % (response.status_code, response.reason, response.url),
217 url=response.url,
--> 218 status_code=response.status_code,
219 )
220

NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb-matched/1.0/p00/p000020/.hea
'''

@briangow
Copy link
Contributor

briangow commented Jun 1, 2023

@Favourj-bit , can you run this from Jupyter Notebook:

import wfdb

signals, fields = wfdb.rdsamp('3000003', pn_dir = 'mimic3wdb/1.0/30/3000003')
wfdb.plot_items(signal=signals, fs=fields['fs'], title='30/3000003/3000003')
display((signals, fields))

please post your output with any error messages.

@Favourj-bit
Copy link
Author

@briangow , this code works perfectly.
image

@Favourj-bit
Copy link
Author

I also noticed that when i clicked on the link in the error message:
image
It actually directs me to a page that shows 404 error which is shown below:
image

@Favourj-bit
Copy link
Author

Favourj-bit commented Jun 1, 2023

Hi @briangow
so I was going through the function you suggested. I noticed something:
image
it seems it appends .hea to all the records directly, i'm not sure

the records in this database i'm trying to access are not just p00/p000020/.hea, so i'm guessing maybe that's where the issue is coming from.
This is the directory and the files present there:
image

@briangow
Copy link
Contributor

briangow commented Jun 2, 2023

@Favourj-bit , yes, the path you are seeing ending in /.hea isn't pointing to an actual file which is causing your problem.

Given what you need to do I'd suggest the following:

  1. Use wfdb.io.get_record_list to get a list of all of the records / files you're interested in processing
  2. Loop through the output from that and pass the relevant information to wfdb.io.rdsamp to read the record into your local memory
  3. Pre-process the data as needed
  4. Save the result to your local computer if needed by using wfdb.io.wrsamp

Details about these functions are available at https://wfdb.readthedocs.io/en/latest/index.html . Hopefully the wfdb.io.get_record_list will produce valid paths to the files for you. If not, you'll have to outsmart the code to create a valid path (ex: from https://physionet.org/files/mimic3wdb/1.0/30/3000003/.hea , create https://physionet.org/files/mimic3wdb/1.0/30/3000003/3000003.hea, etc.)

The issue you are having with wfdb.io.dl_database appears to be a bug. I've marked this issue as such. We can leave this issue open until someone with a Windows machine can debug the problem.

@briangow briangow added the bug label Jun 2, 2023
@Favourj-bit
Copy link
Author

@briangow
Thank you very much for your help. I will ensure to try out the functions you recommended and provide feedbck.
That is great too, maybe we could rename the issue in case someone with a windows system come across this. Hopefully, we will be able to figure it out

@Favourj-bit
Copy link
Author

@briangow
I realised the code works while using google colab. It takes a lot of time to run so it is still running. However, it got to this point: Generating list of all files and it is still running for the files. I was just wondering why i did not use colab before now since i could always just download the results from there. I will let you know the outcome when i'm done. Thanks once again

@bemoody
Copy link
Collaborator

bemoody commented Sep 29, 2023

Should be fixed by pull #465.

@bemoody bemoody closed this as completed Sep 29, 2023
tompollard added a commit that referenced this issue Jan 21, 2025
This pull request adds a changelog for `v4.2.0`. The changelog is based
on the following auto-generated summary of merge commits generated by
GitHub:

```
## What's Changed

* bug-fix: Numpy ValueError when cheking empty list equality by @ajadczaksunriselabs in #459
* bug-fix: Pandas set indexing error by @ajadczaksunriselabs in #460
* fix for /issues/452 by @tecamenz in #465
* Use numpydoc to render documentation by @SnoopJ in #472
* build(deps): bump readthedocs-sphinx-search from 0.1.1 to 0.3.2 in /docs by @dependabot in #477
* Update style by @bemoody in #482
* Fix NaN handling in Record.adc, and other fixes by @bemoody in #481
* Set upper bound on Numpy version (numpy = ">=1.10.1,<2.0.0"). Ref #493. by @tompollard in #494
* Update actions to use actions/checkout@v3 and actions/setup-python@v4. by @tompollard in #495
* Fix: Indent code to ensure 'j' is within for-loop in GQRS algorithm by @tompollard in #499
* Add write_dir argument to csv_to_wfdb. Fixes #67. by @tompollard in #492
* Fix warnings by @cbrnr in #502
* README improvements by @bemoody in #503
* Change in type promotion. Fixes to annotation.py by @tompollard in #506
* Use uv by @cbrnr in #504
* Change in type promotion. Fixes to _signal.py by @tompollard in #507
* Test round-trip write/read of supported binary formats by @bemoody in #509
* Corrected typo and extended allowed types for MultiSegmentRecord by @agent3gatech in #514
* Allow expanded physical signal in `calc_adc_params` by @briangow in #512
* Add capability to write signal with unique `samps_per_frame` to `wfdb.io.wrsamp` by @briangow in #510
* Fix selection of channels when converting to EDF by @SamJelfs in #519
* Change in type promotion introduced in Numpy 2.0. Fixes to edf.py. by @tompollard in #527
* Bump dependencies for NumPy 2 compatibility by @cbrnr in #511
* Bump version to v4.2.0 and update notes on creating new releases by @tompollard in #497

## New Contributors

* @ajadczaksunriselabs made their first contribution in #459
* @tecamenz made their first contribution in #465
* @SnoopJ made their first contribution in #472
* @dependabot made their first contribution in #477
* @agent3gatech made their first contribution in #514
* @SamJelfs made their first contribution in #519

**Full Changelog**: v4.1.2...v4.2.0
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants