Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incompatibility with SAS9.4 LIN 64, V8 header record that does not declare number of obs #322

Open
DanteDT opened this issue Jan 18, 2025 · 8 comments

Comments

@DanteDT
Copy link

DanteDT commented Jan 18, 2025

Please see Record Layout for a SAS Version 8 or 9 Data Set in SAS Transport Format, especially the "Observation header" containing "OBSV8 HEADER RECORD". This suggests that this header record is composed of zeros (0).

According to SAS 9.4 %XPT2LOC Autocall Macro, SAS expects a V8 format to declare number of observations in the OBSV8 header, rather than a string of 30 zeros (or 15? see code snippet, below).

SAS 9.04.01M8P022223 LIN 64 XPT2LOC code snippet:

     else if buffer=:'HEADER RECORD*******OBS     HEADER RECORD!!!!!!!'
          or buffer=:'HEADER RECORD*******OBSV8   HEADER RECORD!!!!!!!'
        then do;

        if substr(buffer,49,15) ne '000000000000000' then do;
           nobs=input(substr(buffer,49,15),15.);

           end;
        else do;

           nobs=.;
           /* this will be the last record we read before a new member or eof
              if there are 0 obs on a v5 format data set. */
           zeroObs=1;
           end;

So when reading these XPT files (OBSV8 without nobs in this header record), SAS9.4 XPT2LOC creates a zero-obs data sets - stops after defining the table structure; skips reading data.

A kludgy hack gets SAS to read the obs, but I'd rather that this "OBSV8" header record declares number of obs, as SAS now apparently expects (although this is not explicit in the Record Layout doc, above).

@DanteDT
Copy link
Author

DanteDT commented Jan 18, 2025

Looking at this more closely, it seems that SAS 9.4 documentation is wrong, or the 9.4 XPT2LOC macro is wrong.

XPT2LOC handles XPT data sets based on header record keywords like those in readstat_xport_write.c#L363:

  • MEMBV8
  • DSCPTV8
  • NAMSTV8
  • LABELV8
  • OBSV8
  • LIBV8

So SAS vers like 6.06 or whatever in these header records seem irrelevant to how XPT2LOC operates.

I've submitted the basic issue to SAS. It seems that either

Nonetheless, in case it helps anyone: If you're using SAS 9.4 to read XPTs created by readstat, you need to modify XPT2LOC so that it does not assume that a data set has 0 obs when it finds the specified 80-char observation header:

  • "HEADER RECORD*******OBSV8 HEADER RECORD!!!!!!!000000000000000000000000000000 "

and does not STOP reading the obs.

I attach a couple of annotated XPT header excerpts, cause it's easier to see than read.

sas94lin-xpt-header.txt
misread-xpt-header.txt

@DanteDT
Copy link
Author

DanteDT commented Jan 21, 2025

Please feel free to close this issue as "not a readstat bug". For anyone needing more information, while SAS address defects:

@evanmiller
Copy link
Contributor

Related to #316?

@DanteDT
Copy link
Author

DanteDT commented Jan 22, 2025

Related to #316?

Absolutely the same SAS confusion / inconsistency. Apologies for not spotting that one. The SAS Communities discussion, above details a minor tweak to the installed XPT2LOB macro, to prevent the macro from skipping the valid observations.

The mistake by SAS (as I see it) is their inconsistency across SAS versions in the definition of an open V8 XPT format, which should not be sensitive to installed SAS version. V8 XPT record layout should be published (correctly :) in one definitive spot. Instead, they maintain multiple discrepant versions, like:

SAS 9.4 installations apparently include XPT2LOC code that requires number of observations in the OBSV8 headers, as documented only in SAS 9.4 / Viya 3.5 Docs, V8 Record Layout

Nonetheless,

  • majority of V8 record layout specs that I checked do not mention "number of observations" for OBSV8 headers
  • including the "latest", which seems like it should be the definitive spec for such a standard
  • and in fact "number of observations" is not needed for SAS 9.4 XPT2LOC to correctly read all V8 XPT records.
    • Craig's minimal tweak noted above works, to prevent XPT2LOC from stopping if number of obs is not in that header

@evanmiller
Copy link
Contributor

Thanks for the additional research. It seems like including the observation count would be an easy fix. Reading the count will require some more extensive code changes, just given the way that the XPT reader is implemented.

Try this 4cae7c9

@DanteDT
Copy link
Author

DanteDT commented Jan 26, 2025

Thanks for that, Evan. I'm not sure how I can test that. I'm using pyreadstat, base on your project. I could make the same mods there, but am not sure I can recompile in my env. I'm having a look here, without luck so far.

@ofajardo
Copy link

Hi @DanteDT If you have access to a unix machine, like a linux server, there it is very easy to compile pyreadstat.

If you dont have access to such a machine, if you write a short python script with the test, I can take the modifications, recompile and run your test.

@DanteDT
Copy link
Author

DanteDT commented Jan 27, 2025

Thank you for the guidance, @ofajardo. I found your compile recipes, which worked for me.

My simple test (write familiar iris data to a V8 XPT, and then extract in SAS9.4 env) was successful with the modified code. See attached components, modified to facilitate comparison, in case it helps.

Mainly, with modified code, the XPT OBSV8 header has 80chars including number of observations:

  • HEADER RECORD*******OBSV8 HEADER RECORD!!!!!!! 150 0

In SAS9.4, the installed XPT2LOC reads zero obs from the original XPT, and the expected 150 from the XPT based on modified code

  • XPT based on original (pyreadstat v1.2.8) unmodified code, SAS 9.4 XPT2LOC completes "successfully" but stops before reading records:

    • "NOTE: The data set WORK.DATASET has 0 observations and 5 variables."
  • XPT based on modified src/sas/readstat_xport_write.c, above, SAS 9.4 XPT2LOC succeeds, actually reading expected records:

    • "NOTE: The data set WORK.MODIFIED has 150 observations and 5 variables."

Thanks, again!

pyreadstat_test.py.txt
iris_df_orig.xpt.txt
iris_df_mod.xpt.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants