Skip to content

Commit

Permalink
Fix incomplete abstract and title
Browse files Browse the repository at this point in the history
Closes gijswobben#23

In some cases the title and/or abstract obtained was incomplete.

Example: PMID 31689885
TItle tag:
<ArticleTitle>Gamma Irradiated <i>Rhodiola sachalinensis</i> Extract Ameliorates [...]</ArticleTitle>
Result was: 'Gamma Irradiated ' (now is 'Gamma Irradiate Rhodiola sachalinensis Extract[...]')
<AbstractText>The effect of <i>Rhodiola sachalinensis</i> Boriss extract irradiated [...]</ArticleTitle>
Result was: 'The effect of '

Solution: cleanup of html markup tags such as <i>, <sub>, <sup>.
  • Loading branch information
iacopy committed Mar 22, 2020
1 parent 5273166 commit e2ae50d
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions pymed/api.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import datetime
import re
import requests
import itertools

Expand Down Expand Up @@ -170,6 +171,9 @@ def _getArticles(self: object, article_ids: list) -> list:
url="/entrez/eutils/efetch.fcgi", parameters=parameters, output="xml"
)

# Remove html markup tags (<i>, <sub>, <sup>) to prevent text truncation
response = re.sub("</?i>|</?su[bp]>", "", response)

# Parse as XML
root = xml.fromstring(response)

Expand Down

0 comments on commit e2ae50d

Please sign in to comment.