News by ProjectPhysX@mast.hpc.social

Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993

https://mast.hpc.social/@Project...

Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993

9.3.2025 18:56Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993
https://mast.hpc.social/@Project...

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU...

https://mast.hpc.social/@Project...

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU #multithreading. 🖖😋
Horizontal sum in #OpenCL was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric #raytracing!
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2

9.3.2025 17:56#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU...
https://mast.hpc.social/@Project...

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of...

https://mast.hpc.social/@Project...

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. 🖖🤯
The #RTX 5090 looks like a toy in comparison.

MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in #GPGPU: #CUDA is not the performance leader anymore. 🖖😛
You need a cross-vendor language like #OpenCL to leverage its power.

FluidX3D on #GitHub: https://github.com/ProjectPhysX/FluidX3D

3.3.2025 22:19Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of...
https://mast.hpc.social/@Project...

Mr. and Mrs. Woodpecker were both at the log of snacks today 🖖😍

https://mast.hpc.social/@Project...

Mr. and Mrs. Woodpecker were both at the log of snacks today 🖖😍

1.3.2025 18:25Mr. and Mrs. Woodpecker were both at the log of snacks today 🖖😍
https://mast.hpc.social/@Project...

I added hardware dp4a support in #OpenCL also for Intel/AMD #GPUs.6 (!) spec/driver bugs needed workarounds:- CL_DEVICE_OPENCL_C_VERSION...

https://mast.hpc.social/@Project...

I added hardware dp4a support in #OpenCL also for Intel/AMD #GPUs.
6 (!) spec/driver bugs needed workarounds:
- CL_DEVICE_OPENCL_C_VERSION unreliable: reports 1.2 if 3.0 is supported but not 2.X
- CL_DEVICE_OPENCL_C_ALL_VERSIONS broken for AMD
- CL_DEVICE_INTEGER_DOT_PRODUCT.. does UB on old Intel drivers
- dp4a feature macro only supported for -cl-std=CL3.0, and always falsely set on old Intel driver
- dot_acc_sat(a,b,c) on Intel gets translated to slow add_sat(dot(a,b),c), must be c+dot(a,b)

1.3.2025 08:42I added hardware dp4a support in #OpenCL also for Intel/AMD #GPUs.6 (!) spec/driver bugs needed workarounds:- CL_DEVICE_OPENCL_C_VERSION...
https://mast.hpc.social/@Project...

I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! 🖖🤠https://youtu.be/csGLVZqr0SE

https://mast.hpc.social/@Project...

I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! 🖖🤠
https://youtu.be/csGLVZqr0SE

23.2.2025 13:28I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! 🖖🤠https://youtu.be/csGLVZqr0SE
https://mast.hpc.social/@Project...

The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD,...

https://mast.hpc.social/@Project...

The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD, achieving 78 GLUPs/s #LBM performance at ~1650W #GPU power draw. 🖖😋🖥️🔥
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#multi-gpu-benchmarks
https://www.hpc.uni-bayreuth.de/clusters/festus/#__tabbed_1_3

23.2.2025 08:48The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD,...
https://mast.hpc.social/@Project...

My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8...

https://mast.hpc.social/@Project...

My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8 tghroughput.
dp4a is not exposed in #OpenCL C, but can still be used via inline PTX assembly and compiler pattern recognition. Even Nvidia's compiler will turn the emulation implementation into dp4a, but in some cases does so with a bunch of unnecessary shifts/permutations on inputs, so better use inline PTX directly. 🖖🧐
https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.8

22.2.2025 20:27My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8...
https://mast.hpc.social/@Project...

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute...

https://mast.hpc.social/@Project...

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute capability, fixed broken voxelization on some #GPUs and added a workaround for a CPU compiler bug that corrupted rendering. Also #AMD GPUs will now show up with their correct name (no idea why AMD can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...)
Have fun! 🖖😉
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1

8.2.2025 14:03#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute...
https://mast.hpc.social/@Project...

German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one...

https://mast.hpc.social/@Project...

German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one poster caught my eye: This party is for the 90's generation; I immediately feel heard and represented. Instead of dumb election promises that poster even invites to a local information event. Shall I go there? 🖖🧐🗳️
/s

7.2.2025 20:28German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one...
https://mast.hpc.social/@Project...

Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPUs each....

https://mast.hpc.social/@Project...

Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPUs each. Guess who still has access! 🖖😋
#OpenCL specs: https://opencl.gpuinfo.org/displayreport.php?id=4731
Cluster website: https://www.hpc.uni-bayreuth.de/clusters/festus/#compute-nodes

3.2.2025 21:32Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPUs each....
https://mast.hpc.social/@Project...

RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the...

https://mast.hpc.social/@Project...

RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the #FluidX3D #GitHub page a bit better - especially larger font size. Hacking the mermaid gantt chart currently is the only way to embed a compact bar chart directly into markdown, without extra image file.
The mermaid language is still horriffic - inconsistent and half the styling commands don't even work. No way yet to color bars blue.
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks

26.1.2025 16:10RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the...
https://mast.hpc.social/@Project...

The woodpecker is at it again! 🖖😋

https://mast.hpc.social/@Project...

The woodpecker is at it again! 🖖😋

25.1.2025 14:13The woodpecker is at it again! 🖖😋
https://mast.hpc.social/@Project...

Woodpecker at the bell of snacks 🖖😋

https://mast.hpc.social/@Project...

Woodpecker at the bell of snacks 🖖😋

18.1.2025 16:33Woodpecker at the bell of snacks 🖖😋
https://mast.hpc.social/@Project...

It's 2025 and Libre Office is still garbage 🖖😬Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess...

https://mast.hpc.social/@Project...

It's 2025 and Libre Office is still garbage 🖖😬
Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess I'll do presentations with an overhead projector then...

6.1.2025 11:34It's 2025 and Libre Office is still garbage 🖖😬Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess...
https://mast.hpc.social/@Project...

3 different #GPUs, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid...

https://mast.hpc.social/@Project...

3 different #GPUs, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid cells in 36GB combined VRAM
https://www.youtube.com/watch?v=9VP3fruwnXc

23.12.2024 21:213 different #GPUs, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid...
https://mast.hpc.social/@Project...

Finally 2¹² ⭐ for #FluidX3D on #GitHub! 🖖🤓

https://mast.hpc.social/@Project...

Finally 2¹² ⭐ for #FluidX3D on #GitHub! 🖖🤓

18.12.2024 19:22Finally 2¹² ⭐ for #FluidX3D on #GitHub! 🖖🤓
https://mast.hpc.social/@Project...

Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. 🖖 🧐This helps a...

https://mast.hpc.social/@Project...

Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. 🖖 🧐
This helps a lot with application compatibility. FP64 support was absent on Arc Alchemist.
https://github.com/ProjectPhysX/OpenCL-Benchmark

16.12.2024 21:21Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. 🖖 🧐This helps a...
https://mast.hpc.social/@Project...

#Intel Arc B580 #OpenCL specs:- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564- Linux:...

https://mast.hpc.social/@Project...

#Intel Arc B580 #OpenCL specs:
- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564
- Linux: https://opencl.gpuinfo.org/displayreport.php?id=4562

13.12.2024 19:12#Intel Arc B580 #OpenCL specs:- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564- Linux:...
https://mast.hpc.social/@Project...

Dual #Intel Arc B580 go brrrr 🖖😋 #OpenCL #Battlemage

https://mast.hpc.social/@Project...

Dual #Intel Arc B580 go brrrr 🖖😋 #OpenCL #Battlemage

13.12.2024 18:22Dual #Intel Arc B580 go brrrr 🖖😋 #OpenCL #Battlemage
https://mast.hpc.social/@Project...

ProjectPhysX@mast.hpc.social

ProjectPhysX - Network

Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU...

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of...

Mr. and Mrs. Woodpecker were both at the log of snacks today 🖖😍

I added hardware dp4a support in #OpenCL also for Intel/AMD #GPU​s.6 (!) spec/driver bugs needed workarounds:- CL_DEVICE_OPENCL_C_VERSION...

I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! 🖖🤠https://youtu.be/csGLVZqr0SE

The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD,...

My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8...

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute...

German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one...

Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPU​s each....

RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the...

The woodpecker is at it again! 🖖😋

Woodpecker at the bell of snacks 🖖😋

It's 2025 and Libre Office is still garbage 🖖😬Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess...

3 different #GPU​s, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid...

Finally 2¹² ⭐ for #FluidX3D on #GitHub! 🖖🤓

Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. 🖖 🧐This helps a...

#Intel Arc B580 #OpenCL specs:- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564- Linux:...

Dual #Intel Arc B580 go brrrr 🖖😋 #OpenCL #Battlemage

I added hardware dp4a support in #OpenCL also for Intel/AMD #GPUs.6 (!) spec/driver bugs needed workarounds:- CL_DEVICE_OPENCL_C_VERSION...

Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPUs each....

3 different #GPUs, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid...