Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] File upload api refactor #210865

Merged
merged 23 commits into from
Feb 26, 2025

Conversation

jgowdyelastic
Copy link
Member

@jgowdyelastic jgowdyelastic commented Feb 12, 2025

Adds a v2 version of the file upload api which spits away the upload initialisation step from the data upload api.
Previously the import data API would behave differently depending on whether an ID was passed to it. If an ID was not present, the api would "initialize the upload" by creating the index, mappings and pipeline.
Subsequent calls to the api would the pass in an ID as well as the data. The ID being present meant the data would be ingested.
The ID had not other purpose other than signifying whether this was the initial call to create the index or the subsequent calls to ingest the data.
This change adds a new initialize_import api which is called first to create the index et al.
Subsequent calls to the import api behave as before and the data is ingested.

A temporary v1 version of the import has been kept for backwards compatibility during upgrades.

The initialize_import also creates multiple ingest pipelines by default. Improving the previous "hacked in" addition of having two sets of pipelines passed to it to provide backwards compatibility.

@jgowdyelastic jgowdyelastic self-assigned this Feb 17, 2025
@jgowdyelastic jgowdyelastic added :ml release_note:skip Skip the PR/issue when compiling release notes Feature:File and Index Data Viz ML file and index data visualizer Feature:File Upload v9.1.0 v8.19.0 labels Feb 17, 2025
@jgowdyelastic jgowdyelastic marked this pull request as ready for review February 17, 2025 20:54
@jgowdyelastic jgowdyelastic requested review from a team as code owners February 17, 2025 20:55
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@jgowdyelastic jgowdyelastic removed the request for review from nreese February 17, 2025 20:55
@jgowdyelastic jgowdyelastic added the backport:version Backport to applied version labels label Feb 18, 2025
IndicesIndexSettings,
MappingTypeMapping,
} from '@elastic/elasticsearch/lib/api/typesWithBodyKey';
import {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use import type here and above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, I'm too used to the consistent-type-imports eslint rule we have in our other plugins.
I think I'll add that rule to the file_upload plugin too in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 3b340af

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -13,56 +13,72 @@ import type {
MappingTypeMapping,
} from '@elastic/elasticsearch/lib/api/typesWithBodyKey';
import { INDEX_META_DATA_CREATED_BY } from '../common/constants';
import { ImportResponse, ImportFailure, InputData, IngestPipelineWrapper } from '../common/types';
import {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use import type here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 3b340af

settings: schema.maybe(schema.any()),
/** Mappings */
mappings: schema.any(),
/** Ingest pipeline definition */
ingestPipeline,
createPipelines: schema.maybe(schema.arrayOf(ingestPipeline)),
ingestPipelines: schema.arrayOf(ingestPipeline),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ingestPipelines ever be a schema.maybe(schema.arrayOf(ingestPipeline))? Or is it always empty array if there's no ingest pipeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. It should always be present and just empty if there are no pipelines to create.

Copy link
Member

@qn895 qn895 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and LGTM 🎉

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
fileUpload 317 318 +1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fileUpload 96 93 -3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
dataVisualizer 616.1KB 616.1KB +3.0B
fileUpload 644.7KB 644.6KB -96.0B
total -93.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fileUpload 14.9KB 15.0KB +145.0B
Unknown metric groups

API count

id before after diff
fileUpload 96 93 -3

History

cc @jgowdyelastic

@jgowdyelastic jgowdyelastic merged commit 0121f4b into elastic:main Feb 26, 2025
10 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/13541986386

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.x Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 210865

Questions ?

Please refer to the Backport tool documentation

@jgowdyelastic
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

jgowdyelastic added a commit to jgowdyelastic/kibana that referenced this pull request Feb 26, 2025
Adds a v2 version of the file upload api which spits away the upload
initialisation step from the data upload api.
Previously the import data API would behave differently depending on
whether an ID was passed to it. If an ID was not present, the api would
"initialize the upload" by creating the index, mappings and pipeline.
Subsequent calls to the api would the pass in an ID as well as the data.
The ID being present meant the data would be ingested.
The ID had not other purpose other than signifying whether this was the
initial call to create the index or the subsequent calls to ingest the
data.
This change adds a new `initialize_import` api which is called first to
create the index et al.
Subsequent calls to the `import` api behave as before and the data is
ingested.

A temporary v1 version of the `import` has been kept for backwards
compatibility during upgrades.

The `initialize_import` also creates multiple ingest pipelines by
default. Improving the previous "hacked in" addition of having two sets
of pipelines passed to it to provide backwards compatibility.

(cherry picked from commit 0121f4b)
jgowdyelastic added a commit that referenced this pull request Feb 26, 2025
# Backport

This will backport the following commits from `main` to `8.x`:
- [[ML] File upload api refactor
(#210865)](#210865)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"James
Gowdy","email":"jgowdy@elastic.co"},"sourceCommit":{"committedDate":"2025-02-26T10:39:30Z","message":"[ML]
File upload api refactor (#210865)\n\nAdds a v2 version of the file
upload api which spits away the upload\ninitialisation step from the
data upload api.\nPreviously the import data API would behave
differently depending on\nwhether an ID was passed to it. If an ID was
not present, the api would\n\"initialize the upload\" by creating the
index, mappings and pipeline.\nSubsequent calls to the api would the
pass in an ID as well as the data.\nThe ID being present meant the data
would be ingested.\nThe ID had not other purpose other than signifying
whether this was the\ninitial call to create the index or the subsequent
calls to ingest the\ndata.\nThis change adds a new `initialize_import`
api which is called first to\ncreate the index et al.\nSubsequent calls
to the `import` api behave as before and the data is\ningested.\n\nA
temporary v1 version of the `import` has been kept for
backwards\ncompatibility during upgrades.\n\nThe `initialize_import`
also creates multiple ingest pipelines by\ndefault. Improving the
previous \"hacked in\" addition of having two sets\nof pipelines passed
to it to provide backwards
compatibility.","sha":"0121f4b87b4f3a602eb0ef253afdb904416ab0db","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":[":ml","release_note:skip","Feature:File
and Index Data Viz","Feature:File
Upload","backport:version","v9.1.0","v8.19.0"],"title":"[ML] File upload
api
refactor","number":210865,"url":"https://github.com/elastic/kibana/pull/210865","mergeCommit":{"message":"[ML]
File upload api refactor (#210865)\n\nAdds a v2 version of the file
upload api which spits away the upload\ninitialisation step from the
data upload api.\nPreviously the import data API would behave
differently depending on\nwhether an ID was passed to it. If an ID was
not present, the api would\n\"initialize the upload\" by creating the
index, mappings and pipeline.\nSubsequent calls to the api would the
pass in an ID as well as the data.\nThe ID being present meant the data
would be ingested.\nThe ID had not other purpose other than signifying
whether this was the\ninitial call to create the index or the subsequent
calls to ingest the\ndata.\nThis change adds a new `initialize_import`
api which is called first to\ncreate the index et al.\nSubsequent calls
to the `import` api behave as before and the data is\ningested.\n\nA
temporary v1 version of the `import` has been kept for
backwards\ncompatibility during upgrades.\n\nThe `initialize_import`
also creates multiple ingest pipelines by\ndefault. Improving the
previous \"hacked in\" addition of having two sets\nof pipelines passed
to it to provide backwards
compatibility.","sha":"0121f4b87b4f3a602eb0ef253afdb904416ab0db"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/210865","number":210865,"mergeCommit":{"message":"[ML]
File upload api refactor (#210865)\n\nAdds a v2 version of the file
upload api which spits away the upload\ninitialisation step from the
data upload api.\nPreviously the import data API would behave
differently depending on\nwhether an ID was passed to it. If an ID was
not present, the api would\n\"initialize the upload\" by creating the
index, mappings and pipeline.\nSubsequent calls to the api would the
pass in an ID as well as the data.\nThe ID being present meant the data
would be ingested.\nThe ID had not other purpose other than signifying
whether this was the\ninitial call to create the index or the subsequent
calls to ingest the\ndata.\nThis change adds a new `initialize_import`
api which is called first to\ncreate the index et al.\nSubsequent calls
to the `import` api behave as before and the data is\ningested.\n\nA
temporary v1 version of the `import` has been kept for
backwards\ncompatibility during upgrades.\n\nThe `initialize_import`
also creates multiple ingest pipelines by\ndefault. Improving the
previous \"hacked in\" addition of having two sets\nof pipelines passed
to it to provide backwards
compatibility.","sha":"0121f4b87b4f3a602eb0ef253afdb904416ab0db"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels Feature:File and Index Data Viz ML file and index data visualizer Feature:File Upload :ml release_note:skip Skip the PR/issue when compiling release notes v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants