lade...
random avatar

ProjectPhysX - Network

Posts Subscribe

Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993

https://mast.hpc.social/@Project...

Here's my implementation: github.com/ProjectPhysX/FluidX

9.3.2025 18:56Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993
https://mast.hpc.social/@Project...

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU...

https://mast.hpc.social/@Project...

v3.2 is out! I've implemented the much requested summation for object force/torque; it's ~20x faster than . πŸ––πŸ˜‹
Horizontal sum in was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric !
github.com/ProjectPhysX/FluidX

9.3.2025 17:56#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU...
https://mast.hpc.social/@Project...

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of...

https://mast.hpc.social/@Project...

Hot Aisle's 8x AMD server is the fastest computer I've ever tested in , achieving a peak performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. πŸ––πŸ€―
The 5090 looks like a toy in comparison.

MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in : is not the performance leader anymore. πŸ––πŸ˜›
You need a cross-vendor language like to leverage its power.

FluidX3D on : github.com/ProjectPhysX/FluidX

3.3.2025 22:19Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of...
https://mast.hpc.social/@Project...

Mr. and Mrs. Woodpecker were both at the log of snacks today πŸ––πŸ˜

https://mast.hpc.social/@Project...

Mr. and Mrs. Woodpecker were both at the log of snacks today πŸ––πŸ˜

1.3.2025 18:25Mr. and Mrs. Woodpecker were both at the log of snacks today πŸ––πŸ˜
https://mast.hpc.social/@Project...

I added hardware dp4a support in #OpenCL also for Intel/AMD #GPU​s.6 (!) spec/driver bugs needed workarounds:- CL_DEVICE_OPENCL_C_VERSION...

https://mast.hpc.social/@Project...

I added hardware dp4a support in also for Intel/AMD ​s.
6 (!) spec/driver bugs needed workarounds:
- CL_DEVICE_OPENCL_C_VERSION unreliable: reports 1.2 if 3.0 is supported but not 2.X
- CL_DEVICE_OPENCL_C_ALL_VERSIONS broken for AMD
- CL_DEVICE_INTEGER_DOT_PRODUCT.. does UB on old Intel drivers
- dp4a feature macro only supported for -cl-std=CL3.0, and always falsely set on old Intel driver
- dot_acc_sat(a,b,c) on Intel gets translated to slow add_sat(dot(a,b),c), must be c+dot(a,b)

1.3.2025 08:42I added hardware dp4a support in #OpenCL also for Intel/AMD #GPU​s.6 (!) spec/driver bugs needed workarounds:- CL_DEVICE_OPENCL_C_VERSION...
https://mast.hpc.social/@Project...

I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! πŸ––πŸ€ https://youtu.be/csGLVZqr0SE

https://mast.hpc.social/@Project...

I'm doing a podcast about today with Improbable Matter, going live in 30 minutes! πŸ––πŸ€ 
youtu.be/csGLVZqr0SE

23.2.2025 13:28I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes! πŸ––πŸ€ https://youtu.be/csGLVZqr0SE
https://mast.hpc.social/@Project...

The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD,...

https://mast.hpc.social/@Project...

The 4x SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in , achieving 78 GLUPs/s performance at ~1650W power draw. πŸ––πŸ˜‹πŸ–₯️πŸ”₯
github.com/ProjectPhysX/FluidX
hpc.uni-bayreuth.de/clusters/f

23.2.2025 08:48The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD,...
https://mast.hpc.social/@Project...

My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8...

https://mast.hpc.social/@Project...

My OpenCL-Benchmark now uses the dp4a instruction on supported hardware ( Pascal, , RDNA, or newer) to benchmark INT8 tghroughput.
dp4a is not exposed in C, but can still be used via inline PTX assembly and compiler pattern recognition. Even Nvidia's compiler will turn the emulation implementation into dp4a, but in some cases does so with a bunch of unnecessary shifts/permutations on inputs, so better use inline PTX directly. πŸ––πŸ§
github.com/ProjectPhysX/OpenCL

22.2.2025 20:27My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8...
https://mast.hpc.social/@Project...

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute...

https://mast.hpc.social/@Project...

v3.1 is out! I have updated the headers for better device specs detection via device ID and compute capability, fixed broken voxelization on some ​s and added a workaround for a CPU compiler bug that corrupted rendering. Also GPUs will now show up with their correct name (no idea why AMD can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...)
Have fun! πŸ––πŸ˜‰
github.com/ProjectPhysX/FluidX

8.2.2025 14:03#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute...
https://mast.hpc.social/@Project...

German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one...

https://mast.hpc.social/@Project...

German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one poster caught my eye: This party is for the 90's generation; I immediately feel heard and represented. Instead of dumb election promises that poster even invites to a local information event. Shall I go there? πŸ––πŸ§πŸ—³οΈ
/s

7.2.2025 20:28German streets are littered with election posters and none of the parties speak to me as they all only represent pensioners. But this one...
https://mast.hpc.social/@Project...

Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPU​s each....

https://mast.hpc.social/@Project...

Uuhhh my university has the new cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad ​s each. Guess who still has access! πŸ––πŸ˜‹
specs: opencl.gpuinfo.org/displayrepo
Cluster website: hpc.uni-bayreuth.de/clusters/f

3.2.2025 21:32Uuhhh my university has the new #HPC cluster online, with a couple dual Intel Xeon Platinum 8480+ nodes with 2TB RAM and quad #GPU​s each....
https://mast.hpc.social/@Project...

RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the...

https://mast.hpc.social/@Project...

RTX 5090 performance numbers for are in - thanks to @phoronix! And I finally found a way to format the performance chart on the page a bit better - especially larger font size. Hacking the mermaid gantt chart currently is the only way to embed a compact bar chart directly into markdown, without extra image file.
The mermaid language is still horriffic - inconsistent and half the styling commands don't even work. No way yet to color bars blue.
github.com/ProjectPhysX/FluidX

26.1.2025 16:10RTX 5090 performance numbers for #FluidX3D are in - thanks to @phoronix! And I finally found a way to format the performance chart on the...
https://mast.hpc.social/@Project...

The woodpecker is at it again! πŸ––πŸ˜‹

https://mast.hpc.social/@Project...

The woodpecker is at it again! πŸ––πŸ˜‹

25.1.2025 14:13The woodpecker is at it again! πŸ––πŸ˜‹
https://mast.hpc.social/@Project...

Woodpecker at the bell of snacks πŸ––πŸ˜‹

https://mast.hpc.social/@Project...

Woodpecker at the bell of snacks πŸ––πŸ˜‹

18.1.2025 16:33Woodpecker at the bell of snacks πŸ––πŸ˜‹
https://mast.hpc.social/@Project...

It's 2025 and Libre Office is still garbage πŸ––πŸ˜¬Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess...

https://mast.hpc.social/@Project...

It's 2025 and Libre Office is still garbage πŸ––πŸ˜¬
Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess I'll do presentations with an overhead projector then...

6.1.2025 11:34It's 2025 and Libre Office is still garbage πŸ––πŸ˜¬Microsoft Office 365 is garbage too, except for expensive subscription fee. I guess...
https://mast.hpc.social/@Project...

3 different #GPU​s, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid...

https://mast.hpc.social/@Project...

3 different ​s, 1 simulation - "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid cells in 36GB combined VRAM
youtube.com/watch?v=9VP3fruwnX

23.12.2024 21:213 different #GPU​s, 1 #CFD simulation - #FluidX3D "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid...
https://mast.hpc.social/@Project...

Finally 2ΒΉΒ² ⭐ for #FluidX3D on #GitHub! πŸ––πŸ€“

https://mast.hpc.social/@Project...

Finally 2ΒΉΒ² ⭐ for on ! πŸ––πŸ€“

18.12.2024 19:22Finally 2ΒΉΒ² ⭐ for #FluidX3D on #GitHub! πŸ––πŸ€“
https://mast.hpc.social/@Project...

Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. πŸ–– 🧐This helps a...

https://mast.hpc.social/@Project...

Did you know? / Arc B580 adds support for (a little bit of) , with FP64:FP32 ratio of 1:16. πŸ–– 🧐
This helps a lot with application compatibility. FP64 support was absent on Arc Alchemist.
github.com/ProjectPhysX/OpenCL

16.12.2024 21:21Did you know? #Battlemage / #Intel Arc B580 adds support for (a little bit of) #FP64, with FP64:FP32 ratio of 1:16. πŸ–– 🧐This helps a...
https://mast.hpc.social/@Project...

#Intel Arc B580 #OpenCL specs:- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564- Linux:...

https://mast.hpc.social/@Project...

Arc B580 specs:
- Windows: opencl.gpuinfo.org/displayrepo
- Linux: opencl.gpuinfo.org/displayrepo

13.12.2024 19:12#Intel Arc B580 #OpenCL specs:- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564- Linux:...
https://mast.hpc.social/@Project...

Dual #Intel Arc B580 go brrrr πŸ––πŸ˜‹ #OpenCL #Battlemage

https://mast.hpc.social/@Project...

Dual Arc B580 go brrrr πŸ––πŸ˜‹

13.12.2024 18:22Dual #Intel Arc B580 go brrrr πŸ––πŸ˜‹ #OpenCL #Battlemage
https://mast.hpc.social/@Project...
Subscribe
To add news/posts to your profile here, you must add a link to a RSS-Feed to your webfinger. One example how you can do this is to join Fediverse City.
         
Webfan Website Badge
Nutzungsbedingungen   DatenschutzerklΓ€rung  Impressum
Webfan | @Web pages | Fediverse Members