Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993
9.3.2025 18:56Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU #multithreading. ππ
Horizontal sum in #OpenCL was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric #raytracing!
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2
Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. ππ€―
The #RTX 5090 looks like a toy in comparison.
MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in #GPGPU: #CUDA is not the performance leader anymore. ππ
You need a cross-vendor language like #OpenCL to leverage its power.
FluidX3D on #GitHub: https://github.com/ProjectPhysX/FluidX3D
3.3.2025 22:19Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of...Mr. and Mrs. Woodpecker were both at the log of snacks today ππ
1.3.2025 18:25Mr. and Mrs. Woodpecker were both at the log of snacks today ππI added hardware dp4a support in #OpenCL also for Intel/AMD #GPUβs.
6 (!) spec/driver bugs needed workarounds:
- CL_DEVICE_OPENCL_C_VERSION unreliable: reports 1.2 if 3.0 is supported but not 2.X
- CL_DEVICE_OPENCL_C_ALL_VERSIONS broken for AMD
- CL_DEVICE_INTEGER_DOT_PRODUCT.. does UB on old Intel drivers
- dp4a feature macro only supported for -cl-std=CL3.0, and always falsely set on old Intel driver
- dot_acc_sat(a,b,c) on Intel gets translated to slow add_sat(dot(a,b),c), must be c+dot(a,b)
I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! ππ€
https://youtu.be/csGLVZqr0SE
The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD, achieving 78 GLUPs/s #LBM performance at ~1650W #GPU power draw. πππ₯οΈπ₯
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#multi-gpu-benchmarks
https://www.hpc.uni-bayreuth.de/clusters/festus/#__tabbed_1_3
My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8 tghroughput.
dp4a is not exposed in #OpenCL C, but can still be used via inline PTX assembly and compiler pattern recognition. Even Nvidia's compiler will turn the emulation implementation into dp4a, but in some cases does so with a bunch of unnecessary shifts/permutations on inputs, so better use inline PTX directly. ππ§
https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.8
#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute capability, fixed broken voxelization on some #GPUβs and added a workaround for a CPU compiler bug that corrupted rendering. Also #AMD GPUs will now show up with their correct name (no idea why AMD can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...)
Have fun! ππ
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1
German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one poster caught my eye: This party is for the 90's generation; I immediately feel heard and represented. Instead of dumb election promises that poster even invites to a local information event. Shall I go there? ππ§π³οΈ
/s
Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPUβs each. Guess who still has access! ππ
#OpenCL specs: https://opencl.gpuinfo.org/displayreport.php?id=4731
Cluster website: https://www.hpc.uni-bayreuth.de/clusters/festus/#compute-nodes
RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the #FluidX3D #GitHub page a bit better - especially larger font size. Hacking the mermaid gantt chart currently is the only way to embed a compact bar chart directly into markdown, without extra image file.
The mermaid language is still horriffic - inconsistent and half the styling commands don't even work. No way yet to color bars blue.
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks
The woodpecker is at it again! ππ
25.1.2025 14:13The woodpecker is at it again! ππWoodpecker at the bell of snacks ππ
18.1.2025 16:33Woodpecker at the bell of snacks ππIt's 2025 and Libre Office is still garbage ππ¬
Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess I'll do presentations with an overhead projector then...
3 different #GPUβs, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid cells in 36GB combined VRAM
https://www.youtube.com/watch?v=9VP3fruwnXc
Finally 2ΒΉΒ² β for #FluidX3D on #GitHub! ππ€
18.12.2024 19:22Finally 2ΒΉΒ² β for #FluidX3D on #GitHub! ππ€Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. π π§
This helps a lot with application compatibility. FP64 support was absent on Arc Alchemist.
https://github.com/ProjectPhysX/OpenCL-Benchmark
#Intel Arc B580 #OpenCL specs:
- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564
- Linux: https://opencl.gpuinfo.org/displayreport.php?id=4562
Dual #Intel Arc B580 go brrrr ππ #OpenCL #Battlemage
13.12.2024 18:22Dual #Intel Arc B580 go brrrr ππ #OpenCL #Battlemage