You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rga2: sync one cmd end time 2414 //Print the RGA. hardware time of the work,in us.
566
566
```
567
567
568
-
- multi
568
+
- multi-rga
569
569
570
+
> Versions below 1.3.0
571
+
570
572
```
571
573
rga3_reg: set cmd use time = 196 //Time elapsed from start processing request to configuration register.
572
574
rga_job: hw use time = 554 //Time-consuming from hardware startup to hardware interrupt return.
573
575
rga_job: (pid:3197) job done use time = 751 //Time-consuming from the start of processing the request to the completion of the request.
574
576
rga_job: (pid:3197) job clean use time = 933 //Time-consuming from the start of processing the request to the completion of the request resource processing.
575
577
```
576
-
577
-
578
+
579
+
> Version 1.3.0 and above
580
+
581
+
```
582
+
rga_mm: request[3300], get buffer_handle info cost 188 us
583
+
rga3_reg: request[3300], generate register cost time 2 us
584
+
rga3_reg: request[3300], set register cost time 301 us
585
+
rga_job: request[3300], hardware[RGA3_core0] cost time 539 us
586
+
rga_mm: request[3300], put buffer_handle info cost 153 us
587
+
rga_job: request[3300], job done total cost time 1023 us
588
+
rga_job: request[3300], job cleanup total cost time 1030 us
589
+
```
590
+
591
+
578
592
579
593
##### Version Information Query
580
594
@@ -807,7 +821,7 @@ This section introduces common questions about RGA in the form of Q&A. If the pr
807
821
808
822
**Q1.4**:The efficiency of RGA cannot meet the needs of our products. Is there any way to improve it?
809
823
810
-
**A1.4**:The RGA frequency of the factory firmware of some chips is not the highest frequency. For example, the RGA frequency of chips such as 3399 and 1126 can be up to 400M. The RGA frequency can be improved in the following two ways:
824
+
**A1.4**:The RGA frequency of the factory firmware of some chips(Before 2021) is not the highest frequency. For example, the RGA frequency of chips such as 3399 and 1126 can be up to 400M. The RGA frequency can be improved in the following two ways:
811
825
812
826
- Set by command (temporarily modified, frequency restored upon device restart)
813
827
@@ -919,6 +933,54 @@ Therefore, for this scenario, it is recommended to apply for memory within 4G to
919
933
920
934
921
935
936
+
**Q1.10**: Why is the API time-consuming higher than the hardware time printed in the log?
937
+
938
+
**Q1.10.1**: Through the "TIME" running log, it is found that the map/unmap buffer takes too much time.
939
+
**Q1.10.2**: A comparison of the kernel log timestamps reveals a large gap between the timestamps of the "MSG" log and the "REG" log.
940
+
**Q1.10.3**: The same parameter configuration, but using different memory allocators only results in a large difference in running time.
941
+
942
+
**A1.10**: The reasons for the time-consuming exception here are all caused by the memory mapping behavior (map/unmap) of the external buffer. All external buffers need to be mapped and bound to the RGA driver to ensure that the hardware can eventually access the specified buffer. The differences in the underlying implementations corresponding to the different allocators can lead to different time consumptions when the driver maps and binds the memory, resulting in a situation where it looks as if the API time consumptions will be much larger than the hardware time consumptions. Common dma-buf allocators with high extra time consumption are ION, V4L2, etc. Usually these differences are related to the synchronization of the cache, and this type of problem can be confirmed by comparing the time consumption of using different allocators.
943
+
944
+
This type of issue can usually be optimized in the following ways:
945
+
946
+
1). You can choose a memory allocator that is relatively more reasonable in terms of time consumption for the map/unmap process. Common ones are dma_heap, DRM, and the corresponding wrapper memory allocator. The following is sample code for calling RGA using memory allocated by these memory allocators:
2). The calling scenario corresponding to this problem is to encapsulate rga_buffer_t through wrapbuffer_fd() or use importbuffer_fd to run only one frame and then immediately releasebuffer_handle. This is normal for temporary tests or scenarios where the buffer changes every frame, but it itself In actual products, repeated buffer reallocation has poor performance and is unreasonable. It is recommended to optimize the buffer process as a whole.
953
+
954
+
Generally we recommend that the overall process be designed in the following way:
955
+
956
+
> 1. Construct buffer_pool and allocate <n> buffers to be used as rotation buffers. The size of <n> is configured according to the actual scenario.
957
+
> 2. Import this buffer into RGA through importbuffer_fd() and obtain the buffer_handle of RGA.
958
+
> 3. Use the rotated buffer_handle to call RGA to perform image operations, and repeatedly rotate and loop.
959
+
> 4. When the buffer in this buffer_pool is no longer needed, call releasebuffer_handle() to release the reference of this part of the buffer in RGA to ensure that the buffer can be released and destroyed subsequently.
960
+
> 5. Release unnecessary buffers in buffer_pool.
961
+
962
+
According to the above process design, even if the allocator's map/unmap behavior will cause abnormal time-consuming, it will be converged to the call of importbuffer_fd()/releasebuffer_handle(), and the call will no longer have an impact on each frame of the actual runtime. This is A good way to avoid performance differences due to differences in memory allocator implementation.
963
+
964
+
3). For scenarios where the memory allocator and business process cannot be changed, the time-consuming optimization can only be done by modifying the map/unmap process of the memory allocator used. This is a very dangerous behavior, and you need to ensure that you are aware of all use of the memory. After applying the behavior of the allocator module, submit it to redmine to consult the corresponding memory allocator maintainer for technical support.
965
+
966
+
967
+
968
+
**Q1.11**: Why is the importbuffer_fd()/importbuffer_virtualaddr() call time-consuming? Why do we need to call this API?
969
+
970
+
**A1.11**: The related usage and instructions of this interface can be viewed in the "Overview" chapter of ["Rockchip_Developer_Guide_RGA_EN"](./Rockchip_Developer_Guide_RGA_EN.md) in the docs folder in the source code directory - "[Image Buffer Preprocessing ](./Rockchip_Developer_Guide_RGA_EN.md#Image Buffer Preprocessing)" for usage instructions. The function of importbuffer_xx() is to import the external buffer into the RGA driver, so that every subsequent frame RGA call can quickly access the buffer through buffer_handle. Importing an external buffer is a time-consuming operation. It is necessary to map the external buffer to the RGA driver and save the corresponding physical address and buffer information. This is indispensable behavior for calling RGA.
971
+
972
+
973
+
974
+
**Q1.12**: Does RGA support parallel operations? Why does the time consumption of individual frames increase or double when calling RGA from multiple threads?
975
+
976
+
**A1.12**: The RGA API can support parallel calls by multiple threads/processes, but whether image operations can be executed in parallel on the actual hardware depends on the number of RGA cores currently used on the chip. That is, the number of cores installed is the maximum supported number of parallel tasks. Tasks that exceed the number of cores will enter the waiting state until a core enters the idle state. Therefore, when the number of parallel calls exceeds the maximum number of parallel calls supported by the hardware, some frame calls will increase the time spent waiting for the hardware to become idle. Specifically, you can obtain the number of cores and supported functions of the current chip through the following debugging nodes (for specific instructions, please see the "Hardware Information Query" section in the "Drive Debugging Node" section):
977
+
978
+
```shell
979
+
/# cat hardware
980
+
```
981
+
982
+
983
+
922
984
### Functions Consulting
923
985
924
986
**Q2.1**:How do I know what version of RGA is available on my current chip platform and what functions are available?
0 commit comments