Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws_s3 source): Support vhost-style S3 bucket addressing #22475

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions changelog.d/s3-source-vhosts.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Adds a `force_path_style` option to the `aws_s3` source, matching support added in the `aws_s3` sink previously, that allows users to configure usage of virtual host-style bucket addressing. The value defaults to `true` to maintain existing (path-based addressing) behavior.

Check warning on line 1 in changelog.d/s3-source-vhosts.feature.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`vhosts` is not a recognized word. (check-file-path)

Authors: sbalmos

Check warning on line 3 in changelog.d/s3-source-vhosts.feature.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`sbalmos` is not a recognized word. (unrecognized-spelling)
9 changes: 7 additions & 2 deletions src/sources/aws_s3/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,12 @@ pub struct AwsS3Config {
#[serde(default = "default_decoding")]
#[derivative(Default(value = "default_decoding()"))]
pub decoding: DeserializerConfig,

/// Specifies which addressing style to use.
///
/// This controls if the bucket name is in the hostname or part of the URL.
#[serde(default = "crate::serde::default_true")]
pub force_path_style: bool,
Copy link
Member

@pront pront Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be cleaner if this was an enum. I missed this #21999.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

/// The Amazon OpenSearch service type, either managed or serverless; primarily, selects the
/// correct AWS service to use when calculating the AWS v4 signature + disables features
/// unsupported by serverless: Elasticsearch API version autodetection, health checks
#[configurable_component]
#[derive(Clone, Debug, Eq, PartialEq)]
#[serde(deny_unknown_fields, rename_all = "lowercase")]
pub enum OpenSearchServiceType {
/// Elasticsearch or OpenSearch Managed domain
Managed,
/// OpenSearch Serverless collection
Serverless,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was going to say, it's to be consistent with what the sink does. Do you want me to cross-implement in both the source and sink?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can accept as is (with the bool) and in the future deprecate all the bools and introduce enums. You don't have to do the later, but it would be greatly appreciated if you did :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this change for the next version round, since the bool version was already approved for the sink, and the bool version for the source is approved here. I think if things are okay with the first version, I'll concurrently change the default on both the sink and source to use vhost format. It'd be a breaking change anyway to switch from bool to an enum, so changing the default from path to vhost is about the same level I'd think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to keep this as is. Note here, that we can avoid breaking the behavior here by introducing new fields and deprecate the old ones. See https://github.com/vectordotdev/vector/blob/master/docs/DEPRECATION.md

}

const fn default_framing() -> FramingConfig {
Expand Down Expand Up @@ -230,11 +236,10 @@ impl AwsS3Config {
) -> crate::Result<sqs::Ingestor> {
let region = self.region.region();
let endpoint = self.region.endpoint();
let force_path_style_value: bool = true;

let s3_client = create_client::<S3ClientBuilder>(
&S3ClientBuilder {
force_path_style: Some(force_path_style_value),
force_path_style: Some(self.force_path_style),
},
&self.auth,
region.clone(),
Expand Down
9 changes: 9 additions & 0 deletions website/cue/reference/components/sources/base/aws_s3.cue
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,15 @@ base: components: sources: aws_s3: configuration: {
required: false
type: string: examples: ["http://127.0.0.0:5000/path/to/service"]
}
force_path_style: {
description: """
Specifies which addressing style to use.

This controls if the bucket name is in the hostname or part of the URL.
"""
required: false
type: bool: default: true
}
framing: {
description: """
Framing configuration.
Expand Down
Loading