Add optional DUCC FFT support #463

mreineck · 2024-06-22T09:53:29Z

No description provided.

…files

mreineck · 2024-06-22T12:44:24Z

I think I managed to fix all build problems. Please double-check very carefully, much of my cmake-related changes are pure guesswork and can definitely be done more elegantly!

mreineck · 2024-06-22T19:53:14Z

OK, I give up on the Windows build ... the rest seems to be OK.

mreineck · 2024-06-23T07:23:39Z

@DiamonDinoia, I think this is ready for a first round of feedback!

mreineck

Added some explanations and questions.

mreineck · 2024-06-24T07:19:10Z

CMakeLists.txt

@@ -1,6 +1,7 @@
 cmake_minimum_required(VERSION 3.19)

 project(finufft VERSION 2.2.0 LANGUAGES C CXX)
+set(CMAKE_CXX_STANDARD 17)


If we use this here, we can remove the C++17 flags everywhere else. Not sure if this is worth it ... just a suggestion.

It is better to set it by target as this will influence projects that include this as a dependency.

CMakeLists.txt

mreineck · 2024-06-24T07:22:20Z

include/finufft/defs.h

@@ -225,8 +231,12 @@ typedef struct FINUFFT_PLAN_S { // the main plan object, fully C++
  FLT *phiHat2;        // " y-axis.
  FLT *phiHat3;        // " z-axis.

-  FFTW_CPX *fwBatch;   // (batches of) fine grid(s) for FFTW to plan
+#ifdef FINUFFT_USE_DUCC0


If we don't need backwards compatibility in memory layout, these members can of course just go away.

backward compatibility to fftw?

Not to FFTW, but maybe for interfacing with other languages. I'm thinking about the Matlab interface for example. If you have to describe the memory layout of the FINUFFT plan type there, then it would not be good if it changes depending on the used FFT library.
Maybe I'm just overly cautious here; should we just remove the entries when ducc is used and see if anything breaks?

include/finufft_eitherprec.h

src/finufft.cpp

mreineck · 2024-06-24T07:27:08Z

src/finufft.cpp

-    p->fwBatch = FFTW_ALLOC_CPX(p->nf * p->batchSize); // the big workspace
+#ifndef FINUFFT_USE_DUCC0
+    timer.restart();
+    p->fwBatch = (CPX *)FFTW_ALLOC_CPX(p->nf * p->batchSize); // the big workspace


This cast is guaranteed to work correctly, since C++ guarantees the memory layout of std::complex to be compatible to the FFTW type.

It's interesting for DUCC0 you don't do the big workspace allocation (fwBatch) here at the plan stage, but seem to do it in the execute. Wouldn't it be faster to have the allocation just done here once? (like we had?) Any explanation we're missing here?

mreineck · 2024-06-24T07:29:23Z

src/finufft.cpp

      timer.restart();
+#ifdef FINUFFT_USE_DUCC0


Question: I'm using opts.nthreads in the following, and it if it is >1,but OpenMP is switched off,the code will still run in parallel. Do I need to safeguard against this case?

Yes, If omp is switched off no parallelization should happen. It might conflict with users that parallelize by hand

If I build without openMP I get:

libfinufft.so: undefined reference to `pthread_create' libfinufft.so: undefined reference to `pthread_getaffinity_np' libfinufft.so: undefined reference to `pthread_setaffinity_np' libfinufft.so: undefined reference to `pthread_join'

I can have CMAKE linking against pthread. I have a question, are you using ptread directly in ducc0 (it is not windows compatible...)? Shall we have this wrapped in ifdefs? if pthread is not linked -> no parallelism? ( we can mode this discussion in ducc0 )

OK, I'll take extra care to switch off multithreaing if OpenMP is not active!

I am using C++'s std::thread class for multithreading, which should in principle work out of the box. On Linux and MacOS this is built on pthreads.
The best way to get multithreading for ducc compiled in is to use -pthread for both compilation and linking; that should take care of everything. Not sure how it works on Windows, but it's possible that no flags are needed at all.

OK, Im now using min<size_t>(MY_OMP_GET_MAX_THREADS(), p->opts.nthreads) as the number of threads; that should work as expected in all circumtances, as fas as I can see.

DiamonDinoia · 2024-06-24T16:31:50Z

Hi Martin,

This looks good. I will merge this into my branch and change the cmakefile. I did not pass -march=native and other flags to ducc as I was thinking of writing a cmake for ducc and opening a PR. I am waiting for the right time for it.

I would like one change to happen, I would like to have a

fft.h (and an fft.cpp) that wraps all the fft functionalities so that inside [finufft.cpp|h] there are no ifdef(unless strictly necessary). So that, if in the future users want to use a third fft they can do so.

inside fft.h when writing a wrapper function we can have function that do something

fft_execute(arg1, .... argN){
if FFTW
fftw(arg1, arg2)
else
ducc.fft(arg1, arg3 ... argN)
}

Basically passing the union of the parameters and relying on the compiler to throw away the unused ones.
The other advantage is that the nufft logic is more clear, all the fftw initialization and handling is under the hood and the user can see that we are doing an fft of this vector.

Cheers,
Marco

DiamonDinoia · 2024-06-24T18:47:59Z

Hi Martin. I pushed some changes to https://github.com/DiamonDinoia/finufft/tree/switchable-fft let me know how can I help.

DiamonDinoia · 2024-06-24T18:51:46Z

OK, I give up on the Windows build ... the rest seems to be OK.

I would not worry about CI for now. I will handle the cmake relate changes at the end. Some of the CI uses makefiles so I would need to update that too.

mreineck · 2024-06-25T08:02:32Z

This looks good. I will merge this into my branch and change the cmakefile. I did not pass -march=native and other flags to ducc as I was thinking of writing a cmake for ducc and opening a PR. I am waiting for the right time for it.

Thank you for that! I'm happy to include this ... I just might have to come back o your for help from time to time, whenever something nontrivial changes :-)

Concerning the separation of the FFT into a new file: I think this is a good plan, but I'm not absolutely sure whether the timing is optimal. Assuming that the switch from FFTW #define magic to templates happens in the near future, it might be easier to do this separation afterwards, as the code should have become much simpler by then (as well as the compilation ... no more compiling files twice etc.). What do you think?

mreineck · 2024-07-10T11:55:06Z

Sorry, false alarm ... I had actually taken care of batched FFTs, phew!

ahbarnett · 2024-07-11T16:33:55Z

src/finufft.cpp

  ker is real-valued FLT array of length nf1/2+1.

  Single thread only, but shouldn't matter since mostly data movement.

  It has been tested that the repeated floating division in this inner loop
-  only contributes at the <3% level in 3D relative to the fftw cost (8 threads).
+  only contributes at the <3% level in 3D relative to the FFT cost (8 threads).
  This could be removed by passing in an inverse kernel and doing mults.

  todo: rewrite w/ C++-complex I/O, check complex divide not slower than


remember to remove this todo post--merge

ahbarnett · 2024-07-11T16:43:36Z

src/finufft.cpp

@@ -1092,6 +1073,13 @@ int FINUFFT_EXECUTE(FINUFFT_PLAN p, CPX *cj, CPX *fk) {
  CNTime timer;
  timer.start();

+#ifdef FINUFFT_USE_DUCC0
+  std::vector<CPX> fwBatch_(p->nf * p->batchSize); // the big workspace


See above comment - could you explain why this isn't better done in the plan stage?

The motivation is to reduce memory consumption of plans. I remember the HVOX people complaining at some point that finufft plans are huge, and with FFTW this is not so easy to avoid. With DUCC it's easier and I think it's worth the slight performance penalty.

I guess we could always use a caching allocator to avoid the issue. I'd rather keep the allocation in the plan for now and maybe switch to this with a flag or something. I do prefer having the same behavior for now.

If you are really sure ... for me this was one of the two big selling points of the patch (the other being not waiting for the FFTW planner).

It's reverted now ... but I sincerely hope that you will change your mind again! :-)

Ok, interesting that this was a selling point. Let's think about where this big malloc is done in the future, as a separate issue. We would like to understand how HVOX people are using the plan interface. The big malloc has to be done somewhere because the FFT is about to write into it! But if one were to keep multiple plans around that were not going to be done simultaneously, then, yes, it could be wasteful of RAM - but that seems like a weird use-case. Maybe people are using FINUFFT in a way I hadn't anticipated.

A very common situation would be iterative approximate inversion of a NUFFT via conjugate gradient or similar. For that you need (I think) two FINUFFT plans that are used alternatingly, and therefore there is always an overhead of a fine grid (and a copy of the coordinates of nonuniform points, but that's a different topic again).

The motivation is to reduce memory consumption of plans. I remember the HVOX people complaining at some point that finufft plans are huge, and with FFTW this is not so easy to avoid. With DUCC it's easier and I think it's worth the slight performance penalty.

@mreineck Just to clarify, we don't complain that FINUFFT plans are huge in HVOX, but rather that exploiting sparsity in HVOXv1 requires creating multiple plans of different sizes, which is time-consuming when planning 100+ NUFFTs since planning cannot be done in parallel due it not being thread-safe. Also, depending on what your (x,z) point distribution is, the FFTs may indeed be huge.

That being said we don't need to do this anymore: a single plan suffices now in HVOXv2. [There is no HVOXv2 paper at the moment, but you can read this in my thesis soon. We'll draft a v2 paper proper afterwards.]

Thank you very much for the clarification! I'm definitely looking forward to reading about the improved algorithm!

src/finufft.cpp

lu1and10 · 2024-07-11T16:48:31Z

CMakeLists.txt


    add_library(finufft_f64 OBJECT ${FINUFFT_PRECISION_DEPENDENT_SOURCES})
    target_compile_definitions(finufft_f64 PRIVATE)
    set_finufft_options(finufft_f64)
+    target_link_libraries(finufft_f64 PUBLIC ${FINUFFT_FFTLIBS})
+    target_link_libraries(finufft_f64 PRIVATE xsimd)


@DiamonDinoia Marco, the reason windows can not find xsimd seems to be here, this line is moved out of set_finufft_options function. the below if(WIN32) does not add link xsimd.

Great thanks for fixing it!

src/fft.cpp

mreineck · 2024-07-11T20:13:02Z

BTW, if you are running benchmarks on your hardware comparing the two FFTs within finufft I'd be extremely interested to see the results!

ahbarnett · 2024-07-11T21:03:43Z

Did anyone run basic timing tests on this? Just ran benchmarks on 5700U AMD laptop, 1 thread, but I must be doing something wrong, since ducc0 is 50x slower than fftw version. This appears to be mostly due to spreading (which shouldn't be different!) but also fft. I am confused!

I cloned via

git fetch origin pull/463/head:ducc
git checkout ducc

Here's using the makefile build (hence fftw), showing just the debug=2 for a small 3d1:

(base) alex@ross /home/alex/numerics/finufft> OMP_NUM_THREADS=1 test/finufft3d_test 1e2 1e2 1e2 1e6 1e-6 2
test 3d type 1:
[finufft_makeplan] new plan: FINUFFT version 2.2.0 .................
[finufft_makeplan] set auto upsampfac=2.00
setup_spreader (kerevalmeth=1) eps=1e-06 sigma=2: chose ns=7 beta=16.1
[finufft_makeplan] 3d1: (ms,mt,mu)=(100,100,100) (nf1,nf2,nf3)=(200,200,200)
               ntrans=1 nthr=1 batchSize=1 
[finufft_makeplan] kernel fser (ns=7):		7.9e-05 s
[finufft_makeplan] fwBatch 0.13GB alloc:   	1.6e-05 s
[finufft_makeplan] FFTW plan (mode 64, nthr=1):	0.000795 s
[finufft_setpts] spreadcheck (1):	1e-06 s
	sorted (1 threads):	0.0146 s
[finufft_setpts] sort (didSort=1):		0.0147 s
[finufft_execute] start ntrans=1 (1 batches, bsize=1)...
[finufft_execute] start batch 0 (size 1):
	spread 3D (M=1000000; N1=200,N2=200,N3=200), nthr=1
	zero output array	0.0575 s
	capping subproblem sizes to max of 100000
	t1 fancy spread: 	0.239 s (10 subprobs)
	FFT exec:		0.121 s
[finufft_execute] done. tot spread:		0.296 s
               tot FFT:				0.121 s
               tot deconvolve:			0.00954 s
     1000000 NU pts to (100,100,100) modes in 0.455 s 	2.2e+06 NU pts/s
	one mode: rel err in F[37,26,-39] is 4.43e-07

Typical 1-thread perf. Same as for master. Now for ducc I have to use cmake, so did:

 mkdir build 
   cd build/
   cmake .. -D FINUFFT_BUILD_TESTS=ON  -D FINUFFT_USE_DUCC0=ON
   cmake --build . -j
    ctest

I see ducc0 is built (no error msgs). Did I get the flags right?
Then:

(base) alex@ross /home/alex/numerics/finufft> OMP_NUM_THREADS=1 build/test/finufft3d_test 1e2 1e2 1e2 1e6 1e-6 2
test 3d type 1:
[finufft_makeplan] new plan: FINUFFT version 2.2.0 .................
[finufft_makeplan] set auto upsampfac=2.00
setup_spreader (kerevalmeth=1) eps=1e-06 sigma=2: chose ns=7 beta=16.1
[finufft_makeplan] 3d1: (ms,mt,mu)=(100,100,100) (nf1,nf2,nf3)=(200,200,200)
               ntrans=1 nthr=1 batchSize=1 
[finufft_makeplan] kernel fser (ns=7):		0.000184 s
[finufft_makeplan] fwBatch 0.13GB alloc:   	2.3e-05 s
[finufft_setpts] spreadcheck (1):	0 s
	sorted (1 threads):	0.0589 s
[finufft_setpts] sort (didSort=1):		0.059 s
[finufft_execute] start ntrans=1 (1 batches, bsize=1)...
[finufft_execute] start batch 0 (size 1):
	spread 3D (M=1000000; N1=200,N2=200,N3=200), nthr=1
	zero output array	0.0694 s
	capping subproblem sizes to max of 100000
	t1 fancy spread: 	12.7 s (10 subprobs)
	FFT exec:		2.68 s
[finufft_execute] done. tot spread:		12.7 s
               tot FFT:				2.68 s
               tot deconvolve:			0.0131 s
     1000000 NU pts to (100,100,100) modes in 15.5 s 	6.45e+04 NU pts/s
	one mode: rel err in F[37,26,-39] is 4.43e-07

So, I'm hoping bad math flags are in this cmake, or something? (The error is fine, but not the 50x slow-down...) :(
I will check cmake non-ducc build now...

ahbarnett · 2024-07-11T21:08:57Z

CMakeLists.txt

@@ -36,6 +36,7 @@ option(FINUFFT_USE_CUDA "Whether to build CUDA accelerated FINUFFT library (libc
 option(FINUFFT_USE_CPU "Whether to build the ordinary FINUFFT library (libfinufft)." ON)
 option(FINUFFT_STATIC_LINKING "Whether to link the static FINUFFT library (libfinufft_static)." ON)
 option(FINUFFT_BUILD_DEVEL "Whether to build development executables" OFF)
+option(FINUFFT_USE_DUCC0 "Whether to use DUCC0 for CPU fft library" ON)


Ah, this has to be OFF by default, since users will need to be eased into this over time, if it turns out to be useful for them..

ahbarnett · 2024-07-11T21:14:49Z

I now used fresh

cmake .. -D FINUFFT_BUILD_TESTS=ON  
cmake --build . -j

weirdly, it again downloads ducc and builds it (it should not, since fftw is default).
Oh, I see ducc0 is the default!! (We have to make fftw default so as not to shock the users...)

cmake .. -D FINUFFT_BUILD_TESTS=ON -D FINUFFT_USE_DUCC0=OFF

Now I get the FFTW version, which shows FFTW timing as expected, but again a crippling spread (50x slower):

(base) alex@ross /home/alex/numerics/finufft> OMP_NUM_THREADS=1 build/test/finufft3d_test 1e2 1e2 1e2 1e6 1e-6 2
test 3d type 1:
[finufft_makeplan] new plan: FINUFFT version 2.2.0 .................
[finufft_makeplan] set auto upsampfac=2.00
setup_spreader (kerevalmeth=1) eps=1e-06 sigma=2: chose ns=7 beta=16.1
[finufft_makeplan] 3d1: (ms,mt,mu)=(100,100,100) (nf1,nf2,nf3)=(200,200,200)
               ntrans=1 nthr=1 batchSize=1 
[finufft_makeplan] kernel fser (ns=7):		0.00018 s
[finufft_makeplan] fwBatch 0.13GB alloc:   	1.4e-05 s
[finufft_makeplan] FFTW plan (mode 64, nthr=1):	0.00079 s
[finufft_setpts] spreadcheck (1):	0 s
	sorted (1 threads):	0.0591 s
[finufft_setpts] sort (didSort=1):		0.0591 s
[finufft_execute] start ntrans=1 (1 batches, bsize=1)...
[finufft_execute] start batch 0 (size 1):
	spread 3D (M=1000000; N1=200,N2=200,N3=200), nthr=1
	zero output array	0.0695 s
	capping subproblem sizes to max of 100000
	t1 fancy spread: 	12.7 s (10 subprobs)
	FFT exec:		0.12 s
[finufft_execute] done. tot spread:		12.8 s
               tot FFT:				0.12 s
               tot deconvolve:			0.0132 s
     1000000 NU pts to (100,100,100) modes in 13 s 	7.71e+04 NU pts/s
	one mode: rel err in F[37,26,-39] is 4.43e-07

I just rechecked master, and the cmake build is also equally crippled spreader. So there is a cmake problem in master (makefile is, as usual, fine).

mreineck · 2024-07-11T21:26:13Z

Just curious: how do you build the ducc0 version using make? I haven't figured this out yet...

ahbarnett · 2024-07-11T21:26:32Z

Finally:

cmake .. -D FINUFFT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release

gives:

(base) alex@ross /home/alex/numerics/finufft/build> OMP_NUM_THREADS=1 test/finufft3d_test 1e2 1e2 1e2 1e6 1e-6 2
test 3d type 1:
[finufft_makeplan] new plan: FINUFFT version 2.2.0 .................
[finufft_makeplan] set auto upsampfac=2.00
setup_spreader (kerevalmeth=1) eps=1e-06 sigma=2: chose ns=7 beta=16.1
[finufft_makeplan] 3d1: (ms,mt,mu)=(100,100,100) (nf1,nf2,nf3)=(200,200,200)
               ntrans=1 nthr=1 batchSize=1 
[finufft_makeplan] kernel fser (ns=7):		7.6e-05 s
[finufft_makeplan] fwBatch 0.13GB alloc:   	2.4e-05 s
[finufft_setpts] spreadcheck (1):	1e-06 s
	sorted (1 threads):	0.0141 s
[finufft_setpts] sort (didSort=1):		0.0141 s
[finufft_execute] start ntrans=1 (1 batches, bsize=1)...
[finufft_execute] start batch 0 (size 1):
	spread 3D (M=1000000; N1=200,N2=200,N3=200), nthr=1
	zero output array	0.0575 s
	capping subproblem sizes to max of 100000
	t1 fancy spread: 	0.239 s (10 subprobs)
	FFT exec:		0.0481 s
[finufft_execute] done. tot spread:		0.297 s
               tot FFT:				0.0481 s
               tot deconvolve:			0.00965 s
     1000000 NU pts to (100,100,100) modes in 0.381 s 	2.62e+06 NU pts/s
	one mode: rel err in F[37,26,-39] is 4.43e-07

Spreading back to normal, and ducc0 fft w/ 3d-zero-patterns is 2.5x as fast as fftw - good!

mreineck · 2024-07-11T21:35:38Z

Nice - that means the FFT must be intrinsically faster (for this particular setup) than FFTW, since the savings can never give you more than a factor of 2.

ahbarnett · 2024-07-11T21:42:12Z

one thread, double-prec. using M=1e2 so as to make spread/interp negligible. Thus all time is FFT

OMP_NUM_THREADS=1 test/finufft3d_test 300 300 300 1e2 1e-6 0 2 2.0

etc. Thus FFT-dominated. (Since the timing of zeroing the array vs fft could be mixed, I just did total times).

3d1, upsampfac=1.25: ducc0_with_zero-pattern 3x faster than fftw (or 7x if looking at fft time in debug=1 output, but I don't know if believe that..)
upsampfac=2.0: 2x (weird, since this ratio should be better for upsampfac larger...)

8-threads (no HT):
upsampfac = 1.25: 1.6x
upsampfac = 2.0: 1.25x

Excellent!

mreineck · 2024-07-11T21:45:57Z

The particular timings depend very strongly on the prime factors of the array sizes ... there won't be a uniform runtime ratio between ducc0.fft and FFTW, since they are internally really different.

ahbarnett · 2024-07-11T21:56:34Z

2D, double-prec: OMP_NUM_THREADS=1 test/finufft2d_test 5e3 5e3 1e2 1e-6 0 2 2.0

1 thread:
up=1.25: 2x
up=2.0: 1.8x

8 threads:
up=1.25: 1.1x (not much diff)
up=2.0: 1.3x

1D, double-prec: OMP_NUM_THREADS=1 test/finufft1d_test 1e7 1e2 1e-6 0 2 2.0

1 thread:
up=1.25: 1.0x (no diff)
up=2.0: 1.4x

8 threads:
up=1.25: 0.55x ducc0 slower than fftw - and looking at debug=1, seems like ducc0=0.3s but fftw=0.1s, ie 3x slower :(
up=2.0: 1.1x (but debug=1 shows that is because fftw plan is slow, although the xform is faster than ducc. Need to test multiple transforms to amortize this)

So, pretty good story for ducc0 so far, in the case of a single-fftw with FFTW_ESTIMATE plan included.
Need to test batched-transforms now...

ahbarnett · 2024-07-11T22:02:09Z

Hi Martin,
so, could we edit next235even to be more favorable to ducc0 ? We get to tweak the nf{1,2,3} sizes :)
Best, Alex

mreineck · 2024-07-11T22:02:49Z

Don't expect miracles from ducc in 1D; this is where it's worst. Batched 1D might be better though.

mreineck · 2024-07-11T22:05:50Z

The output of next235even should already be pretty good for ducc; I don't think there's much to be gained there.
(By saying that the two libraries are quite different internally I don't mean that the principles are different. But for example, FFTW goes to great lengths to optimize lengths that are a large power of 2, spending thousands of lines of code on that particular case; ducc just is not that ambitious...)

ahbarnett · 2024-07-11T22:06:19Z

batched 1d (8-threads, batchsize=8):

(base) alex@ross /home/alex/numerics/finufft/build> OMP_NUM_THREADS=8 ../test/finufft1dmany_test 32 1e7 1e2 1e-6 0 0 0 2 1.25
test 1d1 many vs repeated single: ------------------------------------
ntr=32: 100 NU pts to 10000000 modes in 3.38 s  	948 NU pts/s
	one mode: rel err in F[3700000] of trans#31 is 1.3e-07
^C
(base) alex@ross /home/alex/numerics/finufft/build> OMP_NUM_THREADS=8 test/finufft1dmany_test 32 1e7 1e2 1e-6 0 0 0 2 1.25
test 1d1 many vs repeated single: ------------------------------------
ntr=32: 100 NU pts to 10000000 modes in 4.9 s  	654 NU pts/s
	one mode: rel err in F[3700000] of trans#31 is 1.3e-07
^C
(base) alex@ross /home/alex/numerics/finufft/build> em ../src/finufft.cpp 
(base) alex@ross /home/alex/numerics/finufft/build> OMP_NUM_THREADS=8 ../test/finufft1dmany_test 32 1e7 1e2 1e-6 0 0 0 2 2.0
test 1d1 many vs repeated single: ------------------------------------
ntr=32: 100 NU pts to 10000000 modes in 13.1 s  	244 NU pts/s
	one mode: rel err in F[3700000] of trans#31 is 1.02e-07
^C
(base) alex@ross /home/alex/numerics/finufft/build> OMP_NUM_THREADS=8 test/finufft1dmany_test 32 1e7 1e2 1e-6 0 0 0 2 2.0
test 1d1 many vs repeated single: ------------------------------------
ntr=32: 100 NU pts to 10000000 modes in 10.4 s  	308 NU pts/s
	one mode: rel err in F[3700000] of trans#31 is 1.02e-07
^C

Shows that for nf1=12500000 ducc0 is 0.7x faster (ie slower), but at nf1=20000000 ducc0 is 1.3x (faster).
Strange!

But it is quite exciting for 2d,3d.

ahbarnett · 2024-07-11T22:10:45Z

I will have a shot at adding ducc0 to my makefile, but I see no reason not to merge this branch now. I will set the cmake default to fftw for now. We also need to update docs to reflect the ducc0 option. Best, Alex

DiamonDinoia and others added 5 commits June 21, 2024 16:56

downloading ducc0 in cmake

6b17007

most things are in place; need CMake machinery to compile ducc's .cc …

6479f47

…files

change names of unused variables, to spot inadvertent usage

bc6f789

almost working; some tests don't compile yet

0ff245a

seems to run

2a8b39a

mreineck added 6 commits June 22, 2024 15:05

more fixes

127e015

fix typo

910d656

require C++17

83c7ecb

more fixes

2de0e50

one more

34013ab

try to fix Windows

4fabaef

DiamonDinoia and others added 6 commits June 22, 2024 19:33

resolved conflicts

ff0944e

Added missing files

2142092

merge Marco's updates

a252fff

cleanup

32bc8a1

adjust indentation style

4975637

fix

be575fc

mreineck marked this pull request as ready for review June 23, 2024 07:23

mreineck mentioned this pull request Jun 23, 2024

Tech demo: switch from FFTW to pocketfft/ducc.fft #287

Closed

pedantic cleanups

09874c2

mreineck commented Jun 24, 2024

View reviewed changes

added fft wrapper files and fixed cmake

b7d10fd

be more careful about the number of threads to use

601468a

shorten the partial FFT part

632b335

ahbarnett reviewed Jul 11, 2024

View reviewed changes

src/finufft.cpp Outdated Show resolved Hide resolved

lu1and10 reviewed Jul 11, 2024

View reviewed changes

ahbarnett reviewed Jul 11, 2024

View reviewed changes

src/fft.cpp Show resolved Hide resolved

mreineck added 3 commits July 11, 2024 19:13

allow switching off ducc FFT optimization via #define

8b4fadf

allocate big arrays during plan generation unconditionally

17e8e04

add a comment on FFT savings

6f8e4c8

ahbarnett reviewed Jul 11, 2024

View reviewed changes

ahbarnett merged commit ad02250 into flatironinstitute:master Jul 11, 2024
9 checks passed

ahbarnett mentioned this pull request Jul 11, 2024

consider whether the big array fwbatch should be malloced at plan or at execute. #480

Open

This was referenced Jul 17, 2024

Towards 2.3 #490

Closed

Exploit structure of upsampled input data when doing FFTs #304

Closed

DiamonDinoia added this to the 3.0 milestone Jul 17, 2024

Add optional DUCC FFT support #463

Add optional DUCC FFT support #463

Conversation

mreineck commented Jun 22, 2024

mreineck commented Jun 22, 2024

mreineck commented Jun 22, 2024

mreineck commented Jun 23, 2024

mreineck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahbarnett Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DiamonDinoia Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DiamonDinoia commented Jun 24, 2024

DiamonDinoia commented Jun 24, 2024

DiamonDinoia commented Jun 24, 2024

mreineck commented Jun 25, 2024

mreineck commented Jul 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DiamonDinoia Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreineck commented Jul 11, 2024

ahbarnett commented Jul 11, 2024

Choose a reason for hiding this comment

ahbarnett commented Jul 11, 2024

mreineck commented Jul 11, 2024

ahbarnett commented Jul 11, 2024

mreineck commented Jul 11, 2024

ahbarnett commented Jul 11, 2024

mreineck commented Jul 11, 2024

ahbarnett commented Jul 11, 2024

ahbarnett commented Jul 11, 2024

mreineck commented Jul 11, 2024

mreineck commented Jul 11, 2024 • edited Loading

ahbarnett commented Jul 11, 2024

ahbarnett commented Jul 11, 2024

ahbarnett Jul 11, 2024 •

edited

Loading

DiamonDinoia Jun 24, 2024 •

edited

Loading

DiamonDinoia Jul 11, 2024 •

edited

Loading

mreineck commented Jul 11, 2024 •

edited

Loading