-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Spark Excel not reading whole columns and is only reading specific data address ranges #930
Comments
Hi @bitbythecron, as you are familiar with Java: |
Hi, I will try this suggestion. Please give me until tomorrow and I will post the results. |
Hi @nightscape apologies it took a few more days than expected to get this to you. Please see this gist. You would need to modify the When I use this line:
The program reads the file and correctly reads all the data between A2 and D7. But when I change that to just read all data in the columns:
It doesn't throw an error, but it comes back empty. With Apache POI being as mature as it is, I'm leaning towards this being expected behavior, albeit its disappointing. Can you think of any way around this limitation? In production, my app will be given Excel files without knowing the exact addresses/areas/bounds of the data to read. This feels like a pretty huge limitation of POI to be honest! I mean, how many times are you going to know (ahead of time) the exact rows in the Excel file you are processing?! Thanks for any input/advice here! |
Have you tried using a very high end address? I'm not a 100% sure, but it might be that this just works. |
Well I guess that would work for most use cases, but I'm more apt to lean towards something like this:
I hate to read the same doc twice but I think that's what I may be stuck with if you can't think of anything else. No worries either way, and thanks for the quick responses so far! |
If you want, you could create a PR and add a method to read the data address ranger from the WorkbookReader, as is already done for getting the sheetNames. |
Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
Java app here using the Spark Excel library to read an Excel file into a
Dataset<Row>
. When I use the following configurations:This works beautifully and my
Dataset<Row>
is instantiated without any issues whatsoever. But the minute I go to just tell it to read any rows between A through D, it reads an emptyDataset<Row>
:This also happens if I set the
sheetName
anddataAddress
separately:And it also happens when, instead of providing the
sheetName
, I provide asheetIndex
:My question: is this expected behavior of the Spark Excel library, or is it a bug I have discovered, or am I not using the Options API correctly here?
Expected Behavior
Explained above, I would have expected all three option configurations to work, but only the first one does.
Steps To Reproduce
Code is provided above. I am pulling in the following Gradle libraries:
I am using a Java application (not Scala).
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: