Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-25025: [C++] Move non core compute kernels into separate shared library #45618

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
1db5be5
GH-25025: [C++] Move non core compute kernels into separate shared li…
raulcd Feb 24, 2025
1864b2e
Fix build for some benchmarks and examples
raulcd Feb 25, 2025
12c3d15
Only link arrow_compute to benchmark if we are building benchmarks
raulcd Feb 25, 2025
edb0005
Rename arrow_compute target to arrow_compute_core
raulcd Feb 25, 2025
bcce3da
Try fixing arrow_compute_core target for Windows
raulcd Feb 25, 2025
80d83d7
Remove ARROW_EXPORT from arrow_compute in order to fix inconsistent l…
raulcd Feb 25, 2025
1d1bcca
Remove ARROW_EXPORT from arrow_compute in order to fix inconsistent l…
raulcd Feb 25, 2025
bde6fb7
Remove ARROW_EXPORT from arrow_compute in order to fix inconsistent l…
raulcd Feb 25, 2025
b61f2e5
Link arrow_compute with libarrow
raulcd Feb 25, 2025
b2b7e76
Some code reorganization and add ARROW_EXPORT to required function
raulcd Feb 25, 2025
b2da510
Add duplicated codegen_internal to arrow_compute
raulcd Feb 25, 2025
88c709b
Remove some more ARROW_EXPORT (this will have to be reverted)
raulcd Feb 25, 2025
794a8ca
Add ArrowCompute dependency to ArrowAcery and interface libs
raulcd Feb 25, 2025
11a3fea
Define ARROW_COMPUTE_EXPORT and add it to some of the deleted ARROW_E…
raulcd Feb 26, 2025
f76d3fb
Fix wrongly added ARROW_EXPORT for ARROW_COMPUTE_EXPORT
raulcd Feb 26, 2025
b0122dc
Add some more missing ARROW_COMPUTE_EXPORT
raulcd Feb 26, 2025
d6a7828
Export symbols for kernel registration and move outside of internal n…
raulcd Feb 26, 2025
bc6839e
Move problematic function to compute/codegen_internal instead of comp…
raulcd Feb 27, 2025
2d26f3d
Add ARROW_EXPORT ot FirstType
raulcd Feb 27, 2025
de11ab2
Remove duplicated symbol, expose new required symbols
raulcd Feb 27, 2025
7240ca2
Add missing header
raulcd Feb 27, 2025
98b4608
Fix linked libs for arrow_compute
raulcd Feb 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion cpp/examples/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,13 @@ if(ARROW_SUBSTRAIT)
endif()

if(ARROW_COMPUTE AND ARROW_CSV)
add_arrow_example(compute_and_write_csv_example)
if(ARROW_BUILD_SHARED)
set(COMPUTE_KERNELS_LINK_LIBS arrow_compute_shared)
else()
set(COMPUTE_KERNELS_LINK_LIBS arrow_compute_static)
endif()
add_arrow_example(compute_and_write_csv_example EXTRA_LINK_LIBS
${COMPUTE_KERNELS_LINK_LIBS})
endif()

if(ARROW_FLIGHT)
Expand Down
2 changes: 2 additions & 0 deletions cpp/examples/arrow/compute_and_write_csv_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include <arrow/io/api.h>
#include <arrow/result.h>
#include <arrow/status.h>
#include "arrow/compute/kernels/registry.h"

#include <iostream>
#include <vector>
Expand All @@ -41,6 +42,7 @@
// in the current directory

arrow::Status RunMain(int argc, char** argv) {
ARROW_RETURN_NOT_OK(arrow::compute::RegisterComputeKernels());
// Make Arrays
arrow::NumericBuilder<arrow::Int64Type> int64_builder;
arrow::BooleanBuilder boolean_builder;
Expand Down
2 changes: 2 additions & 0 deletions cpp/examples/arrow/join_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <arrow/csv/api.h>
#include "arrow/acero/exec_plan.h"
#include "arrow/compute/expression.h"
#include "arrow/compute/kernels/registry.h"

#include <arrow/dataset/dataset.h>
#include <arrow/dataset/plan.h>
Expand Down Expand Up @@ -82,6 +83,7 @@ arrow::Result<std::shared_ptr<arrow::dataset::Dataset>> CreateDataSetFromCSVData
}

arrow::Status DoHashJoin() {
ARROW_RETURN_NOT_OK(arrow::compute::RegisterComputeKernels());
arrow::dataset::internal::Initialize();

ARROW_ASSIGN_OR_RAISE(auto l_dataset, CreateDataSetFromCSVData(true));
Expand Down
38 changes: 38 additions & 0 deletions cpp/src/arrow/ArrowComputeConfig.cmake.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# This config sets the following variables in your project::
#
# ArrowCompute_FOUND - true if Arrow Compute found on the system
#
# This config sets the following targets in your project::
#
# ArrowCompute::arrow_compute_shared - for linked as shared library if shared library is built
# ArrowCompute::arrow_compute_static - for linked as static library if static library is built

@PACKAGE_INIT@

include(CMakeFindDependencyMacro)
find_dependency(Arrow)

include("${CMAKE_CURRENT_LIST_DIR}/ArrowComputeTargets.cmake")

arrow_keep_backward_compatibility(ArrowCompute arrow_compute)

check_required_components(ArrowCompute)

arrow_show_details(ArrowCompute ARROW_COMPUTE)
114 changes: 89 additions & 25 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -722,15 +722,14 @@ set(ARROW_COMPUTE_SRCS
compute/api_scalar.cc
compute/api_vector.cc
compute/cast.cc
compute/codegen_internal.cc
compute/exec.cc
compute/expression.cc
compute/function.cc
compute/function_internal.cc
compute/kernel.cc
compute/ordering.cc
compute/registry.cc
compute/kernels/chunked_internal.cc
compute/kernels/codegen_internal.cc
compute/kernels/ree_util_internal.cc
compute/kernels/scalar_cast_boolean.cc
compute/kernels/scalar_cast_dictionary.cc
Expand All @@ -740,8 +739,8 @@ set(ARROW_COMPUTE_SRCS
compute/kernels/scalar_cast_numeric.cc
compute/kernels/scalar_cast_string.cc
compute/kernels/scalar_cast_temporal.cc
compute/kernels/util_internal.cc
compute/kernels/vector_hash.cc
compute/kernels/vector_run_end_encode.cc
compute/kernels/vector_selection.cc
compute/kernels/vector_selection_filter_internal.cc
compute/kernels/vector_selection_internal.cc
Expand All @@ -750,13 +749,16 @@ set(ARROW_COMPUTE_SRCS
if(ARROW_COMPUTE)
# Include the remaining kernels
list(APPEND
ARROW_COMPUTE_SRCS
ARROW_COMPUTE_LIB_SRCS
compute/kernels/aggregate_basic.cc
compute/kernels/aggregate_mode.cc
compute/kernels/aggregate_quantile.cc
compute/kernels/aggregate_tdigest.cc
compute/kernels/aggregate_var_std.cc
compute/kernels/codegen_internal.cc # This is wrong but I am testing something
compute/kernels/chunked_internal.cc
compute/kernels/hash_aggregate.cc
compute/kernels/registry.cc
compute/kernels/scalar_arithmetic.cc
compute/kernels/scalar_boolean.cc
compute/kernels/scalar_compare.cc
Expand All @@ -770,13 +772,13 @@ if(ARROW_COMPUTE)
compute/kernels/scalar_temporal_binary.cc
compute/kernels/scalar_temporal_unary.cc
compute/kernels/scalar_validity.cc
compute/kernels/util_internal.cc
compute/kernels/vector_array_sort.cc
compute/kernels/vector_cumulative_ops.cc
compute/kernels/vector_nested.cc
compute/kernels/vector_pairwise.cc
compute/kernels/vector_rank.cc
compute/kernels/vector_replace.cc
compute/kernels/vector_run_end_encode.cc
compute/kernels/vector_select_k.cc
compute/kernels/vector_sort.cc
compute/kernels/vector_swizzle.cc
Expand All @@ -791,39 +793,101 @@ if(ARROW_COMPUTE)
compute/util.cc
compute/util_internal.cc)

append_runtime_avx2_src(ARROW_COMPUTE_SRCS compute/kernels/aggregate_basic_avx2.cc)
append_runtime_avx512_src(ARROW_COMPUTE_SRCS compute/kernels/aggregate_basic_avx512.cc)
append_runtime_avx2_src(ARROW_COMPUTE_SRCS compute/key_hash_internal_avx2.cc)
append_runtime_avx2_bmi2_src(ARROW_COMPUTE_SRCS compute/key_map_internal_avx2.cc)
append_runtime_avx2_src(ARROW_COMPUTE_SRCS compute/row/compare_internal_avx2.cc)
append_runtime_avx2_src(ARROW_COMPUTE_SRCS compute/row/encode_internal_avx2.cc)
append_runtime_avx2_bmi2_src(ARROW_COMPUTE_SRCS compute/util_avx2.cc)
append_runtime_avx2_src(ARROW_COMPUTE_LIB_SRCS compute/kernels/aggregate_basic_avx2.cc)
append_runtime_avx512_src(ARROW_COMPUTE_LIB_SRCS
compute/kernels/aggregate_basic_avx512.cc)
append_runtime_avx2_src(ARROW_COMPUTE_LIB_SRCS compute/key_hash_internal_avx2.cc)
append_runtime_avx2_bmi2_src(ARROW_COMPUTE_LIB_SRCS compute/key_map_internal_avx2.cc)
append_runtime_avx2_src(ARROW_COMPUTE_LIB_SRCS compute/row/compare_internal_avx2.cc)
append_runtime_avx2_src(ARROW_COMPUTE_LIB_SRCS compute/row/encode_internal_avx2.cc)
append_runtime_avx2_bmi2_src(ARROW_COMPUTE_LIB_SRCS compute/util_avx2.cc)

set(ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS)
set(ARROW_COMPUTE_SHARED_LINK_LIBS)
set(ARROW_COMPUTE_STATIC_LINK_LIBS)
set(ARROW_COMPUTE_STATIC_INSTALL_INTERFACE_LIBS)
set(ARROW_COMPUTE_SHARED_INSTALL_INTERFACE_LIBS)

list(APPEND ARROW_COMPUTE_STATIC_INSTALL_INTERFACE_LIBS Arrow::arrow_static)
list(APPEND ARROW_COMPUTE_SHARED_INSTALL_INTERFACE_LIBS Arrow::arrow_shared)
list(APPEND ARROW_COMPUTE_STATIC_LINK_LIBS arrow_static)
list(APPEND ARROW_COMPUTE_SHARED_LINK_LIBS arrow_shared)

if(ARROW_USE_BOOST)
list(APPEND ARROW_COMPUTE_STATIC_LINK_LIBS Boost::headers)
list(APPEND ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS Boost::headers)
endif()
if(ARROW_USE_XSIMD)
list(APPEND ARROW_COMPUTE_STATIC_LINK_LIBS ${ARROW_XSIMD})
list(APPEND ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS ${ARROW_XSIMD})
endif()
if(ARROW_WITH_OPENTELEMETRY)
list(APPEND ARROW_COMPUTE_STATIC_LINK_LIBS ${ARROW_OPENTELEMETRY_LIBS})
list(APPEND ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS ${ARROW_OPENTELEMETRY_LIBS})
endif()
if(ARROW_WITH_RE2)
list(APPEND ARROW_COMPUTE_STATIC_LINK_LIBS re2::re2)
list(APPEND ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS re2::re2)
endif()
if(ARROW_WITH_UTF8PROC)
list(APPEND ARROW_COMPUTE_STATIC_LINK_LIBS utf8proc::utf8proc)
list(APPEND ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS utf8proc::utf8proc)
endif()

add_arrow_lib(arrow_compute
CMAKE_PACKAGE_NAME
ArrowCompute
PKG_CONFIG_NAME
arrow-compute
SHARED_LINK_LIBS
${ARROW_COMPUTE_SHARED_LINK_LIBS}
SHARED_PRIVATE_LINK_LIBS
${ARROW_COMPUTE_SHARED_PRIVATE_LINK_LIBS}
SHARED_INSTALL_INTERFACE_LIBS
${ARROW_COMPUTE_SHARED_INSTALL_INTERFACE_LIBS}
STATIC_LINK_LIBS
${ARROW_COMPUTE_STATIC_LINK_LIBS}
STATIC_INSTALL_INTERFACE_LIBS
${ARROW_COMPUTE_STATIC_INSTALL_INTERFACE_LIBS}
OUTPUTS
ARROW_COMPUTE_LIBRARIES
SOURCES
${ARROW_COMPUTE_LIB_SRCS}
SHARED_LINK_FLAGS
${ARROW_VERSION_SCRIPT_FLAGS} # Defined in cpp/arrow/CMakeLists.txt
)
foreach(LIB_TARGET ${ARROW_COMPUTE_LIBRARIES})
target_compile_definitions(${LIB_TARGET} PRIVATE ARROW_COMPUTE_EXPORTING)
endforeach()
endif()

arrow_add_object_library(ARROW_COMPUTE ${ARROW_COMPUTE_SRCS})
arrow_add_object_library(ARROW_COMPUTE_CORE ${ARROW_COMPUTE_SRCS})
# TODO: Review whether the following (Boost, xsimd, opentelemetry, re2 and utf8proc) are required
# for the core compute library.
if(ARROW_USE_BOOST)
foreach(ARROW_COMPUTE_TARGET ${ARROW_COMPUTE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_TARGET} PRIVATE Boost::headers)
foreach(ARROW_COMPUTE_CORE_TARGET ${ARROW_COMPUTE_CORE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_CORE_TARGET} PRIVATE Boost::headers)
endforeach()
endif()
if(ARROW_USE_XSIMD)
foreach(ARROW_COMPUTE_TARGET ${ARROW_COMPUTE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_TARGET} PRIVATE ${ARROW_XSIMD})
foreach(ARROW_COMPUTE_CORE_TARGET ${ARROW_COMPUTE_CORE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_CORE_TARGET} PRIVATE ${ARROW_XSIMD})
endforeach()
endif()
if(ARROW_WITH_OPENTELEMETRY)
foreach(ARROW_COMPUTE_TARGET ${ARROW_COMPUTE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_TARGET} PRIVATE ${ARROW_OPENTELEMETRY_LIBS})
foreach(ARROW_COMPUTE_CORE_TARGET ${ARROW_COMPUTE_CORE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_CORE_TARGET}
PRIVATE ${ARROW_OPENTELEMETRY_LIBS})
endforeach()
endif()
if(ARROW_WITH_RE2)
foreach(ARROW_COMPUTE_TARGET ${ARROW_COMPUTE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_TARGET} PRIVATE re2::re2)
foreach(ARROW_COMPUTE_CORE_TARGET ${ARROW_COMPUTE_CORE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_CORE_TARGET} PRIVATE re2::re2)
endforeach()
endif()
if(ARROW_WITH_UTF8PROC)
foreach(ARROW_COMPUTE_TARGET ${ARROW_COMPUTE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_TARGET} PRIVATE utf8proc::utf8proc)
foreach(ARROW_COMPUTE_CORE_TARGET ${ARROW_COMPUTE_CORE_TARGETS})
target_link_libraries(${ARROW_COMPUTE_CORE_TARGET} PRIVATE utf8proc::utf8proc)
endforeach()
endif()

Expand Down Expand Up @@ -1032,7 +1096,7 @@ add_arrow_lib(arrow
${ARROW_SHARED_LINK_FLAGS}
SHARED_PRIVATE_LINK_LIBS
${ARROW_ARRAY_TARGET_SHARED}
${ARROW_COMPUTE_TARGET_SHARED}
${ARROW_COMPUTE_CORE_TARGET_SHARED}
${ARROW_CSV_TARGET_SHARED}
${ARROW_FILESYSTEM_TARGET_SHARED}
${ARROW_INTEGRATION_TARGET_SHARED}
Expand All @@ -1048,7 +1112,7 @@ add_arrow_lib(arrow
${ARROW_SYSTEM_LINK_LIBS}
STATIC_LINK_LIBS
${ARROW_ARRAY_TARGET_STATIC}
${ARROW_COMPUTE_TARGET_STATIC}
${ARROW_COMPUTE_CORE_TARGET_STATIC}
${ARROW_CSV_TARGET_STATIC}
${ARROW_FILESYSTEM_TARGET_STATIC}
${ARROW_INTEGRATION_TARGET_STATIC}
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/acero/ArrowAceroConfig.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@

include(CMakeFindDependencyMacro)
find_dependency(Arrow)
find_dependency(ArrowCompute)

include("${CMAKE_CURRENT_LIST_DIR}/ArrowAceroTargets.cmake")

Expand Down
10 changes: 6 additions & 4 deletions cpp/src/arrow/acero/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,12 @@ if(ARROW_WITH_OPENTELEMETRY)
list(APPEND ARROW_ACERO_STATIC_LINK_LIBS ${ARROW_OPENTELEMETRY_LIBS})
endif()

list(APPEND ARROW_ACERO_STATIC_INSTALL_INTERFACE_LIBS Arrow::arrow_static)
list(APPEND ARROW_ACERO_SHARED_INSTALL_INTERFACE_LIBS Arrow::arrow_shared)
list(APPEND ARROW_ACERO_STATIC_LINK_LIBS arrow_static)
list(APPEND ARROW_ACERO_SHARED_LINK_LIBS arrow_shared)
list(APPEND ARROW_ACERO_STATIC_INSTALL_INTERFACE_LIBS Arrow::arrow_static
ArrowCompute::arrow_compute_static)
list(APPEND ARROW_ACERO_SHARED_INSTALL_INTERFACE_LIBS Arrow::arrow_shared
ArrowCompute::arrow_compute_shared)
list(APPEND ARROW_ACERO_STATIC_LINK_LIBS arrow_static arrow_compute_static)
list(APPEND ARROW_ACERO_SHARED_LINK_LIBS arrow_shared arrow_compute_shared)

add_arrow_lib(arrow_acero
CMAKE_PACKAGE_NAME
Expand Down
6 changes: 6 additions & 0 deletions cpp/src/arrow/acero/aggregate_node_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

#include "arrow/acero/test_util_internal.h"
#include "arrow/compute/api_aggregate.h"
#include "arrow/compute/kernels/test_util_internal.h"
#include "arrow/compute/test_util_internal.h"
#include "arrow/result.h"
#include "arrow/table.h"
Expand All @@ -33,8 +34,13 @@

namespace arrow {

using compute::ComputeKernelEnvironment;
using compute::ExecBatchFromJSON;

// Register the compute kernels
::testing::Environment* compute_kernels_env =
::testing::AddGlobalTestEnvironment(new ComputeKernelEnvironment);

namespace acero {

Result<std::shared_ptr<Table>> TableGroupBy(
Expand Down
6 changes: 6 additions & 0 deletions cpp/src/arrow/acero/asof_join_node_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
#include "arrow/api.h"
#include "arrow/compute/api_scalar.h"
#include "arrow/compute/cast.h"
#include "arrow/compute/kernels/test_util_internal.h"
#include "arrow/compute/row/row_encoder_internal.h"
#include "arrow/compute/test_util_internal.h"
#include "arrow/testing/gtest_util.h"
Expand All @@ -67,11 +68,16 @@ using testing::UnorderedElementsAreArray;
namespace arrow {

using compute::Cast;
using compute::ComputeKernelEnvironment;
using compute::Divide;
using compute::ExecBatchFromJSON;
using compute::Multiply;
using compute::Subtract;

// Register the compute kernels
::testing::Environment* compute_kernels_env =
::testing::AddGlobalTestEnvironment(new ComputeKernelEnvironment);

namespace acero {

bool is_temporal_primitive(Type::type type_id) {
Expand Down
6 changes: 6 additions & 0 deletions cpp/src/arrow/acero/hash_aggregate_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
#include "arrow/compute/exec_internal.h"
#include "arrow/compute/kernels/aggregate_internal.h"
#include "arrow/compute/kernels/codegen_internal.h"
#include "arrow/compute/kernels/test_util_internal.h"
#include "arrow/compute/registry.h"
#include "arrow/compute/row/grouper.h"
#include "arrow/table.h"
Expand Down Expand Up @@ -71,6 +72,7 @@ using internal::ToChars;

using compute::ArgShape;
using compute::CallFunction;
using compute::ComputeKernelEnvironment;
using compute::CountOptions;
using compute::default_exec_context;
using compute::ExecBatchFromJSON;
Expand All @@ -88,6 +90,10 @@ using compute::TDigestOptions;
using compute::ValidateOutput;
using compute::VarianceOptions;

// Register the compute kernels
::testing::Environment* compute_kernels_env =
::testing::AddGlobalTestEnvironment(new ComputeKernelEnvironment);

namespace acero {

TEST(AggregateSchema, NoKeys) {
Expand Down
Loading
Loading