Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting sync-offset and sync-size has no effect. #129

Open
Jeason-Hu opened this issue Jan 20, 2025 · 10 comments
Open

Setting sync-offset and sync-size has no effect. #129

Jeason-Hu opened this issue Jan 20, 2025 · 10 comments

Comments

@Jeason-Hu
Copy link

Hi ikwzm,

I encountered a problem where I cannot set the sync-offset and sync-size. The PL writes data cyclically to the physical address range from 0x87B000000 to 0x87F000000, and the address space is split into 16 parts. The device tree is defined as follows:

reserved-memory {  
		#address-cells= <2>;  
		#size-cells= <2>;  
		ranges;  
		run_buffer: run_buffer@0 {  
			compatible = "shared-dma-pool";  
			reusable;  
			reg = <0x00000008 0x7B00000000 0x00000000 0x04000000>;
			label = "run_buffer";  
		}
};
udmabuf@0x00 {
		compatible = "ikwzm,u-dma-buf";
		device-name = "runlength";
		size = <0x04000000>;
		dma-mask = <40>;				// 40bits dma addr
		sync-mode = <2>;				// 
		//sync-offset = <0x00000000>;
		//sync-size = <0x04000000>;
		sync-direction = <2>;			// from device
		quirk-mmap-auto;
		memory-region = <&run_buffer>;	// use cma memory
};

When I directly use system("echo 1 > /sys/class/u-dma-buf/runlength/sync_for_cpu"); to synchronize the entire block of memory, there is no cache consistency problem, but the synchronization time is relatively long.
When I change to calling the previously set sync-offset and sync-size, I encounter cache consistency problems.

		int fd;
		if ((fd = open("/dev/runlength", O_RDWR)) != -1) {
			// sync_size is slice buffer size, frameIdx is slice buffer index
			unsigned long sync_offset    		= frameIdx*RUN_V3_MAX_NUM*sizeof(tRUN_V3);
			unsigned long sync_size      		= RUN_V3_MAX_NUM*sizeof(tRUN_V3);
			unsigned int sync_direction 		= 2;
	        uint64_t sync_for_cpu   = ((uint64_t)(sync_offset    & 0xFFFFFFFF) << 32) |
		                                ((uint64_t)(sync_size      & 0xFFFFFFF0) <<  0) |
										((uint64_t)(sync_direction & 0x00000003) <<  2) |
										0x00000001;

	        ioctl(fd, U_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU, &sync_for_cpu);
	        close(fd);
        }

The same problem occurs when I switch to other interfaces:

	    unsigned int sync_offset    = frameIdx*RUN_V3_MAX_NUM*sizeof(tRUN_V3);
	    unsigned int sync_size      = RUN_V3_MAX_NUM*sizeof(tRUN_V3);
	    unsigned int sync_direction = 2;
	    unsigned int sync_for_cpu = 1;
	    if ((fd  = open("/sys/class/u-dma-buf/runlength/sync_for_cpu", O_WRONLY)) != -1) {
	    	sprintf(attr, "0x%08X%08X", (sync_offset & 0xFFFFFFFF), (sync_size & 0xFFFFFFF0) | (sync_direction << 2) | sync_for_cpu);
	        write(fd, attr, strlen(attr));
	        close(fd);
	    }
@ikwzm
Copy link
Owner

ikwzm commented Jan 20, 2025

Thanks for the issue.

Please provide more detailed information.

What is the CPU architecture?
What is the Linux Kernel version?
What is the value of RUN_V3_MAX_NUM?
What is the value of sizeof(tRUN_V3)?
What is the return value of ioctl(fd, U_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU,&sync_for_cpu)?

@Jeason-Hu
Copy link
Author

Hi ikwzm

Thank you for your reply.
What is the CPU architecture?
-- xilinx zu5ev Quad-core A53
What is the Linux Kernel version?
-- Linux 5.15.0
What is the value of RUN_V3_MAX_NUM?
-- #define RUN_V3_MAX_NUM 0x3FFFF
What is the value of sizeof(tRUN_V3)?
-- sizeof(tRUN_V3) = 16
What is the return value of ioctl(fd, U_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU,&sync_for_cpu)?
-- return = 0x0

@ikwzm
Copy link
Owner

ikwzm commented Jan 24, 2025

The situation could not be confirmed here.
Please give us some more time to investigate.

@Jeason-Hu
Copy link
Author

Ok, thank you.
I'll investigate the HPC port, which keep cache consistent with hardware.

@pmdaye
Copy link

pmdaye commented Jan 29, 2025

Any updates @Jeason-Hu ? Did you find a fix? We have a similar situation and we also have a cache synchrony issue...

@Jeason-Hu
Copy link
Author

Any updates @Jeason-Hu ? Did you find a fix? We have a similar situation and we also have a cache synchrony issue...

Not yet. I have to handle some urgent tasks.

@pierredaye
Copy link

Maybe a tip. We realized that if the size of the packet sent by the axi datamover was smaller than 0x1000, then when had cache issues, independently of the type of cache flush/invalidation we were doing.

@Jeason-Hu
Copy link
Author

Yes, I also observed this phenomenon. Sometimes the amount of data output by the FPGA algorithm is quite small, and adjustments to the output are necessary to address the issue. Furthermore, if the data sent exceeds 0x1000 in size after adjustment, then the udmabuf driver is no longer needed.

@guerricmeurice
Copy link

I confirm that when packet are sent by the axi datamover (so from PL) to the PS SDRAM DDR4 and are not of size 0x1000, we can have issues. Issue is visible when trying to read-back data from Linux.
When writing then reading from PL into PL SDRAM DDR4 with a datamover, there is not issue with any packet size.

@ikwzm
Copy link
Owner

ikwzm commented Feb 11, 2025

@Jeason-Hu @pierredaye @guerricmeurice, thank you all for your valuable opinions and discussions.
However, we have not been able to reproduce the issue that you have pointed out.
At this point, we would like to organize and clarify everyone's situation. Would that be acceptable to you?

If possible, we would appreciate it if you could provide more details about the environment where the issue occurs.
Additionally, if feasible, sharing materials such as the internal block diagram of the FPGA, the device tree, the source code of the application program, the data used, as well as logs of both the expected and actual results would be extremely helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants