Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test all examples from library-user-guide & user-guide docs #14544

Merged
merged 77 commits into from
Feb 10, 2025

Conversation

ugoa
Copy link
Contributor

@ugoa ugoa commented Feb 7, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Library User Guide:

  • docs/source/library-user-guide/adding-udfs.md
  • docs/source/library-user-guide/api-health.md
  • docs/source/library-user-guide/building-logical-plans.md
  • docs/source/library-user-guide/catalogs.md
  • docs/source/library-user-guide/custom-table-providers.md
  • docs/source/library-user-guide/extending-operators.md
  • docs/source/library-user-guide/extensions.md
  • docs/source/library-user-guide/index.md
  • docs/source/library-user-guide/profiling.md
  • docs/source/library-user-guide/query-optimizer.md
  • docs/source/library-user-guide/using-the-dataframe-api.md
  • docs/source/library-user-guide/using-the-sql-api.md
  • docs/source/library-user-guide/working-with-exprs.md

User Guide:

  • docs/source/user-guide/example-usage.md
  • docs/source/user-guide/faq.md
  • docs/source/user-guide/introduction.md
  • docs/source/user-guide/cli/index.rst
  • docs/source/user-guide/cli/overview.md
  • docs/source/user-guide/cli/datasources.md
  • docs/source/user-guide/cli/usage.md
  • docs/source/user-guide/cli/installation.md
  • docs/source/user-guide/configs.md
  • docs/source/user-guide/concepts-readings-events.md
  • docs/source/user-guide/dataframe.md
  • docs/source/user-guide/explain-usage.md
  • docs/source/user-guide/expressions.md
  • docs/source/user-guide/crate-configuration.md
  • docs/source/user-guide/sql/sql_status.md
  • docs/source/user-guide/sql/index.rst
  • docs/source/user-guide/sql/scalar_functions.md
  • docs/source/user-guide/sql/write_options.md
  • docs/source/user-guide/sql/explain.md
  • docs/source/user-guide/sql/information_schema.md
  • docs/source/user-guide/sql/select.md
  • docs/source/user-guide/sql/special_functions.md
  • docs/source/user-guide/sql/dml.md
  • docs/source/user-guide/sql/operators.md
  • docs/source/user-guide/sql/ddl.md
  • docs/source/user-guide/sql/aggregate_functions.md
  • docs/source/user-guide/sql/data_types.md
  • docs/source/user-guide/sql/subqueries.md
  • docs/source/user-guide/sql/window_functions.md

Are these changes tested?

Yes

Are there any user-facing changes?

No

@@ -152,6 +154,8 @@ serde_json = { workspace = true }
sysinfo = "0.33.1"
test-utils = { path = "../../test-utils" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot", "fs"] }
dashmap = "6.1.0"
snmalloc-rs = "0.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need snmalloc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment below I don't think we can use snmalloc in the dev dependencies without messing up the other builds -- can you please remove that (and maybe add a note to the example that tried to use it?)

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

Ah, I see the snmalloc comes from running

docs/source/user-guide/crate-configuration.md

I think the issue there is that snmalloc takes over the global allocator,

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @ugoa -- this is looking really cool

Let's try and get this PR ready for merge -- it is already an amazing improvement. I'll try and fix up the tests and get it in later today. Then maybe we can make additional improvements as follow ons

@@ -152,6 +154,8 @@ serde_json = { workspace = true }
sysinfo = "0.33.1"
test-utils = { path = "../../test-utils" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot", "fs"] }
dashmap = "6.1.0"
snmalloc-rs = "0.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment below I don't think we can use snmalloc in the dev dependencies without messing up the other builds -- can you please remove that (and maybe add a note to the example that tried to use it?)

# specific language governing permissions and limitations
# under the License.

import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to DeepSeek generating this code snippet haha, too powerful

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

I also double checked the "hide lines starting with # and it looks great!"

Screenshot 2025-02-10 at 8 14 32 AM

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

I will push a few commits to fix this

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the generation script, such as

And add the sql there (there are equivalents for the other types of functions)

Perhaps you can make a follow on PR?

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

I plan to merge this PR in once the tests pass

@ugoa
Copy link
Contributor Author

ugoa commented Feb 10, 2025

Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the generation script

Ah I didn't know that, so I can generate the doc by run cargo run --bin print_functions_docs -- scalar right?

And add the sql there (there are equivalents for the other types of functions)
Perhaps you can make a follow on PR?

Sure I can open another PR to address this, cheers.

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the generation script

Ah I didn't know that, so I can generate the doc by run cargo run --bin print_functions_docs -- scalar right?

The way I do it is

./dev/update_function_docs.sh

And add the sql there (there are equivalents for the other types of functions)
Perhaps you can make a follow on PR?

Sure I can open another PR to address this, cheers.

🙏 thank you so much

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

Ah, I see now the issue is that we'll try and test all examples: https://github.com/apache/datafusion/actions/runs/13241873169/job/36958947894?pr=14544

So they need to be marked as SQL to avoid CI failures. I'll look into it

@ugoa
Copy link
Contributor Author

ugoa commented Feb 10, 2025

So they need to be marked as SQL to avoid CI failures. I'll look into it

Yeah, with just ``` it will be treated as rust code and be run and tested

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

Yeah, with just ``` it will be treated as rust code and be run and tested

I pushed a fix in e0b360c

Let's see if we can get a clean run

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

So close...

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

🚀

Thanks again @ugoa -- this is great work

FYI @tshauck who I think started this documentation many months ago

@alamb alamb merged commit 3f900ac into apache:main Feb 10, 2025
27 checks passed
@ugoa
Copy link
Contributor Author

ugoa commented Feb 10, 2025

My pleasure! I learned a lot during the time as well.

@ugoa ugoa deleted the run-test-from-library-user-guide-docs branch February 10, 2025 15:42
@ugoa
Copy link
Contributor Author

ugoa commented Feb 11, 2025

Hey @alamb , btw we probably need to remove the installation of cmake in the .github/actions/setup-builder/action.yaml since it was added for snmalloc-rs = "0.3" which we don't need anymore

@alamb
Copy link
Contributor

alamb commented Feb 11, 2025

Hey @alamb , btw we probably need to remove the installation of cmake in the .github/actions/setup-builder/action.yaml since it was added for snmalloc-rs = "0.3" which we don't need anymore

Makes sense -- thank you @ugoa

Can you make a PR to do so?

@ugoa
Copy link
Contributor Author

ugoa commented Feb 11, 2025

@alamb Sure, here it is #14606, please help review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hide boilerplate in documentation examples Run / Test all examples in Documentation
2 participants