Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace usage of deprecated classad.parseOld #8285

Closed
Tracked by #8337
belforte opened this issue Mar 14, 2024 · 8 comments
Closed
Tracked by #8337

replace usage of deprecated classad.parseOld #8285

belforte opened this issue Mar 14, 2024 · 8 comments
Assignees

Comments

@belforte
Copy link
Member

belforte commented Mar 14, 2024

with reference to #8258 (comment)

In El9/Alma9 with HTCondor 10.2 , RetryJob badly crashes at line 108 here

for dag_retry in range(self.dag_retry):
job_ad_file = os.path.join(".", "finished_jobs", "job.%s.%d" % (self.job_id, dag_retry))
if os.path.isfile(job_ad_file):
try:
with open(job_ad_file, encoding='utf-8') as fd:
ad = classad.parseOld(fd)
except Exception: # pylint: disable=broad-except
msg = "Unable to parse classads from file %s. Continuing." % (job_ad_file)
self.logger.warning(msg)
continue
if ad:
self.ads.append(ad)
else:
msg = "File %s does not exist. Continuing." % (job_ad_file)
self.logger.warning(msg)

with a core dump.

This part of the code is only executed when resubmitting a job/

Trying to open and parse same file on vocms059, HTC 10.0.9 on CC7 I get instead

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
classad.ClassAdParseError: None

So the error sort of understood.
Things are broken inside classAd.pareseOld(), which is anyhow deprecated in HTC 10.0 , so can't ask Madison to fix it !

What I do not understand atm is how can work on CC7 since classad.parseOld() fails there too, even if not so explosively and in EL9.

Anyhow I will switch to the replacement classAd.parseAds which reads the out classAd file w/o errors on both CC7 and Alma9.

Given the complexity and obscurity of PostJob and RetryJob a strong validation will be important :-(

@belforte
Copy link
Member Author

Looking for a simple example to post here, I found that the problem seems to come from the empty line at the bottom of the SPOOL_DIR/finished_jobs/job.N.r files.

Here's the reproduces on A:

belforte@crab-sched-901/~> condor_version
$CondorVersion: 10.2.0 2023-01-05 BuildID: 621409 PackageID: 10.2.0-1 $
$CondorPlatform: x86_64_AlmaLinux9 $
belforte@crab-sched-901/~> cat ADS-no-emptyline
cat: ADS-no-emptyline: No such file or directory
belforte@crab-sched-901/~> cat > ADS-no-emptyline
ClusterId = 246
belforte@crab-sched-901/~> 
belforte@crab-sched-901/~> 
belforte@crab-sched-901/~> cat ADS-no-emptyline 
ClusterId = 246
belforte@crab-sched-901/~> cat ADS
ClusterId = 246

belforte@crab-sched-901/~> python3
Python 3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import classad
>>> with open('ADS-no-emptyline','r') as fh: classad.parseOld(fh)
... 
<stdin>:1: DeprecationWarning: ClassAd Deprecation: parseOld is deprecated; use parseOne, parseNext, or parseAds instead.
[ ClusterId = 246 ]
>>> with open('ADS','r') as fh: classad.parseOld(fh)
... 
/usr/include/c++/11/bits/basic_string.h:1059: std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::reference std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::operator[](std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>; std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::reference = char&; std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]: Assertion '__pos <= size()' failed.
Aborted (core dumped)
belforte@crab-sched-901/~> 

While on CC7:

belforte@vocms0199/~> condor_version
$CondorVersion: 10.0.9 2023-09-28 BuildID: 678199 PackageID: 10.0.9-1 $
$CondorPlatform: x86_64_CentOS7 $
belforte@vocms0199/~> python3
Python 3.6.8 (default, Nov 14 2023, 16:29:52) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import classad
>>> with open('ADS-no-emptyline','r') as fh: classad.parseOld(fh)
... 
[ ClusterId = 246 ]
>>> with open('ADS','r') as fh: classad.parseOld(fh)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
classad.ClassAdParseError: None
>>> 

our python code handes the 'None`, but not the assertion failure in C++

So we can start with removing the empty line at the bottom, and take a bit of time to fully validate replacing classad.parseOld() with classad.parseAds(), although in simple test this works perfectly fine.

@belforte
Copy link
Member Author

the extra empty line is a "feature" of the stdout from condor_q. Adding | grep = in here should do

cmd = 'condor_q -l %s > %s' % (self.dag_jobid, job_ad_file_name)

At least as a quick patch to progress with testing on alma9

@belforte
Copy link
Member Author

there's actually a mix of all manners of using classad parsing in current code base.
All of these are used in various files

  • the deprecated: parse(), parseOld - the former only support classAds in new formatn, the latter only in old format
  • the valid : parseOne(), parseAds() - parseAds automatically adapts to new/old format

so there is no need to do extensive validation of the new parseAds, all in all i

belforte added a commit to belforte/CRABServer that referenced this issue Mar 18, 2024
@belforte
Copy link
Member Author

the new classad.parseAds is an iterator while classad.parseOld was returning a full classAd object :-(

@belforte
Copy link
Member Author

looks like classad.parseOne(filehandle) is what we should use to replace classad.parseOld
https://htcondor.readthedocs.io/en/latest/apis/python-bindings/api/classad.html
image
to be compared with
image

belforte added a commit to belforte/CRABServer that referenced this issue Mar 19, 2024
@belforte
Copy link
Member Author

belforte commented Mar 19, 2024

testing new fix running StatusTracking again

@belforte
Copy link
Member Author

try again to tag and validate

@belforte
Copy link
Member Author

belforte commented Mar 25, 2024

@belforte belforte mentioned this issue Apr 15, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant