I have found a way to add the appropriate metadata to my output rasters (for the curious, you need to add several metadata items in the GEOLOCATION domain, see <https://gdal.org/en/stable/development/rfc/rfc4_geolocate.html> for the details about which).
It works, in the sense that e.g. gdalwarp can reproject/rectify my output rasters correctly. However, QGIS still doesn't use the information, per
https://github.com/qgis/QGIS/issues/51758
15.4.2025 12:08I have found a way to add the appropriate metadata to my output rasters (for the curious, you need to add several metadata items in the...I have a #GDAL / #QGIS question for the #GIS commnunity in the Fediverse #askFedi concerning the use of geolocation arrays in raster datsets, i.e. in bands the hold (for example) the latitude and longitude of each cell in the raster.
I'm used to produce datasets in UTM format, for which adding the appropriate georeference metadata is simple. However, in this case I'm producing a raster dataset where the cells are distributed in a “warped” area (think aerial imagery with non-affine transform from data matrix to geo coordinates). I have arrays holding resp. the latitude and longitude of each pixel, and can add it to the dataset, I can specify the GCRS (lat/lon WGS84), but I can't seem to be able to find the metadata to add to the raster so that the goelocation arrays are correctly identified as such.
Online there seems to be only sparse documentation about this, generally implying that this is only supported for specific dataset types in specific formats (usually, the standard format from common satellite data) and the only pages I've found on how to “roll your own” refer to a single old-ish mailing list contribution that recommends the use of a sidecar VRT file. Are there any more modern solutions possible?
29.3.2025 11:53I have a #GDAL / #QGIS question for the #GIS commnunity in the Fediverse #askFedi concerning the use of geolocation arrays in raster...BTW, this is exactly the reason why I always challenge the students to try and come up with their solution, even if I already know the answer (or rather: *an* answer). It's easy to get lazy on already answered question, while coming at the problem with a fresh mind always an opportunity to find a different answer, and even when it is not necessarily better (and in this case it *was* better), it's still interesting and stimulating.
(It's also a good way to gauge the students' attention, but TBH, this is secondary for me.)
3/3
25.3.2025 08:07BTW, this is exactly the reason why I always challenge the students to try and come up with their solution, even if I already know the...Another approach, which is the one I've been using and that I've also found commonly used elsewhere, is to compute D as
D = (N + L - 1)/L
This works because integer division always rounds down, so:
1. if N is a multiple of L, we only increment the numeratore by “just barely enough” to *not* go to the next multiple
2. if N is not a multiple, we will get to *at least* the next multiple, and possibly overshoot it, but not enough to go to the “next next” multiple.
Example: assume L = 8. With N = 7, 8, 9 we have N + L - 1 = 14, 15, 16, and dividing by L = 8 we have 1, 1, 2 which is exactly what we want.
But before I could propose this idea to the sutdents, one of the came up with a different one:
D = (N - 1)/L + 1
The idea here is to “always underestimate”, even when N is a multiple of L. And the way to do this is by decreasing it by one, which ensures we “miss the mark” if it's a multiple, but never get to the “previous previous” if it's not.
Fun fact: the two expressions are equivalent, but in computing they fail at opposite ends.
With the expression I've been using, there's the risk of an *overflow*: if N is close to the largest representable integer of the data type, N + L can overflow and wrap to either a very small value or to negative values depending on signedness of the data type.
The formula proposed by my student, OTOH, only fails if N = 0 (due to the underflow of N - 1). And honestly, that's actually much better (and easier to check for).
2/n
25.3.2025 07:49Another approach, which is the one I've been using and that I've also found commonly used elsewhere, is to compute D asD = (N + L -...I'm liking the class this year. Students are attentive and participating, and the discussion is always productive.
We were discussing the rounding up of the launch grid in #OpenCL to avoid the catastrophic performance drops that come from the inability to divide the “actual” work size by anything smaller than the maximum device local work size, and were discussing on how to compute the “rounded up” work size.
The idea is this: given the worksize N and the local size L, we have to round N to the smallest multiple of L that is not smaller than N. This effectively means computing D = ceili(N/L) and then using D*L.
There are several ways to compute D, but on the computer, working only with integers and knowing that integer division always rounded down, what is the “best way”?
D = N/L + 1 works well if N is not a multiple of L, but gives us 1 more than the intended result if N *is* a multiple of L. So we want to add the extra 1 only if N is not a multiple. This can be achieved for example with
D = N/L + !!(N % L)
which leverages the fact that !! (double logical negation) turns any non-zero value into 1, leaving zero as zero. So we round *down* (which is what the integer division does) and then add 1 if (and only if) there is a reminder to the division.
This is ugly not so much because of the !!, but because the modulus operation % is slow.
1/n
25.3.2025 07:34I'm liking the class this year. Students are attentive and participating, and the discussion is always productive.We were discussing the...The big question remains: did we let ourselves (OK, that was mainly *my*self) get carried away by the “power of abstraction”? I'll confess that some of the choices I've made in the past about how to abstract things were driven by “vision of the future” rather than immediate needs.
Still, maybe I was just lucky, maybe after years of working on the codebase my perception of what the future could become was sufficiently in-focus, I can't say that any of the work done at the time was really wasted: it has all turned out to be useful in the end, and in fact often insufficient (i.e. requiring some additional abstractions to smooth things out better). But it all started from something which was already well-focused and working.
And this is the impression I get from the comments on that famous thread I mentioned early: do not get carried away by possibilities when developing your own game engine, do let the actual game you intend to develop be your guide.
But, I'll add, dare yourself to dream bigger when that's done.
12/12
10.3.2025 22:42The big question remains: did we let ourselves (OK, that was mainly *my*self) get carried away by the “power of abstraction”? I'll...Most if not all of the tricks we adopted are well-known to expert C++ programmers. But they aren't as easily accessible to less expert developers, and even less to to researchers without a strong software development background.
This makes the codebase less accessible to the “casual” developer or contributor. And this, IMO, is a *huge* negative.
In my experience, everybody can be taught how to handle adding some new feature to GPUSPH, and again IME once onboarded their work is massively simplified. But the onboarding itself takes time, which we would like to reduce.
(Some of that could be achieved if we could up our C++ requirements to C++17, since that wold allow us to simplify quite a few metatemplate usages; but we support older versions of CUDA that can hardly push beyond C++11. Hopefully after we release GPUSPH 6.0 we can drop those requirements and move towards a simplification.)
11/
10.3.2025 22:35Most if not all of the tricks we adopted are well-known to expert C++ programmers. But they aren't as easily accessible to less expert...By Tesler's law of conservation of complexity
https://en.wikipedia.org/wiki/Law_of_conservation_of_complexity
there's a lower bound to which you can reduce complexity. Beyond that, you're only moving complexity from one aspect to another.
In the case of #GPUSPH, this has materialized in the fact that the exponential complexity of variant support has been converted in what is largely a *linear* complexity of interaction functions. You can find an example in my #SPHERIC2019 presentation:
https://www.gpusph.org/presentations/spheric/2019/bilotta-spheric2019/#9.0
Those slides (if you want you can start at the beginning here <https://www.gpusph.org/presentations/spheric/2019/bilotta-spheric2019/>) also give you an idea of what happens to the code. And probably also give you a hint about what the issue is.
10/
10.3.2025 22:29By Tesler's law of conservation of complexityhttps://en.wikipedia.org/wiki/Law_of_conservation_of_complexitythere's a lower bound to...How can we support such an extensive ranges of options? The short answer is C++ template metaprogramming, and you can find some details in our preprint
https://doi.org/10.48550/arXiv.2207.11328
(part of it has been published as https://doi.org/10.1016/j.advengsoft.2024.103711, but you'll want to refer to the preprint for some of the juicy details that have been removed from the published paper and will be prepared for a separate publication.)
The long answer is that a lot of effort has gone into designing an abstraction that allows us to assemble the full particle-particle interaction from small composable functions, and the details of the specific variant are delegated as far down the stack as possible, in order to preserve genericity at the higher level.
9/
10.3.2025 22:20How can we support such an extensive ranges of options? The short answer is C++ template metaprogramming, and you can find some details in...I'm not going to claim that we found the perfect balance in #GPUSPH, but one thing I can say is that I often find myself thanking my past self for insisting on pushing for this or that abstraction over more ad hoc solutions, because it has made a lot of later development easier *and* more robust.
AFAIK, our software is the SPH implementation that supports the widest range of SPH formulations. Last time I tried counting how many variants were theoretically possible with all options combinations, it was somewhere in the whereabouts of 9 *billion* variants, taking into account the combinatorial explosion of numerical and physical modeling options —and even if not all of them are actually supported in the code, the actual number is still huge, and the main reason why we switched from trying to compile all of them and then let the user choose whatever they wanted at runtime to forcing the user to make some compile-time choices when defining a test case.
8/
10.3.2025 22:14I'm not going to claim that we found the perfect balance in #GPUSPH, but one thing I can say is that I often find myself thanking my...The second point, if you remember what I wrote on the first post of this thread <https://fediscience.org/@giuseppebilotta/114140164838336955> is about the handling of multiple formulations, more complex physics and so on.
This is actually a place where I'm more cautious to embrace the preference expressed in the original thread and the comments, particularly the concerns about abstraction.
#GPUSPH has been in development for over a decade now, and it has grown organically from something *very* hard coded to something with a lot of abstraction. Finding the right balance between the two is really not that simple. There are three issues at hand:
1. how much abstraction is actually needed?
2. what is the cost of introducing an abstraction when it *becomes* needed?
3. what is the cost of introducing the *wrong* abstraction?
Writing generic code without a specific scope is truly dispersive, and can lead to extreme(ly unnecessary) complexity.
OTOH, *not* writing generic code leads to a lot of repetition, which can make composability and bugfixing harder (did you fix that same bug in all occurrences of the code?)
7/
10.3.2025 22:07The second point, if you remember what I wrote on the first post of this thread...Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <https://doi.org/10.1002/cpe.8313>
This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <https://doi.org/10.1016/j.jcp.2022.111413>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is
a. too much effort
b. probably not worth it.
Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.
6/
10.3.2025 21:59Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in...I believe our approach to “develop what we need when we need it”, which has been a staple in the development of #GPUSPH, has been a strong point. We *do* depend on a few external dependencies, but most of the code has been developed “in-house”.
Fun fact: the only “hard dependency” for GPUSPH is NVIDIA's Thrust library, which we depend on for particle sorting (and in a few other places, but for optional features, like the segmented reductions used to collect body forces when doing #FSI i.e. #FluidStructureInteraction aka #FluidSolidInteraction).
And this hard dependency has been a *pain*.
We first had issues in the Maxwell architecture era, which stalled our work for months because all simulations would consistently lead to a hardware lock-up
https://github.com/NVIDIA/thrust/issues/742
and a few years later we had another issue —luckily one we could work around within GPUSPH this time:
https://github.com/NVIDIA/thrust/issues/1341
5/
10.3.2025 21:48I believe our approach to “develop what we need when we need it”, which has been a staple in the development of #GPUSPH, has been a...We *could* rely on some external CSG library, which would possibly be more powerful than what we have, but then we'd have to make it work with the rest of the system (the particle discretization thing in particular). Or we could have focused on the CSG part (including the particle discretization thing), but that would have distracted us from the core of GPUSPH (i.e. the SPH on GPU, that doesn't really care *how* particles were generated, just that whatever generated them included all the properties needed for SPH). We might not even have a GPUSPH yet if we had focused on the CSG, and most definitely not one as rich as the one we do have.
(That being said, hey, if anyone wants to help us improve the CSG in GPUSPH, feel free to contact me and we can discuss details, I'll most definitely not drive you away! ;-))
4/
10.3.2025 21:31We *could* rely on some external CSG library, which would possibly be more powerful than what we have, but then we'd have to make it...And of course if possible you want to let the user choose arbitrary inter-particle spacings, possibly at runtime. This is e.g. the reason why #GPUSPH has a built-in simple #CSG (#ConstructiveSolidGeometry) system: it's not needed for #SPH, but it allows user to set up test cases even with relatively complex geometries without resorting to external preprocessing stages that wouldn't give the same flexibility in term of resolution choices.
And here's the thing: the CSG in GPUSPH has a *lot* of room for improvement, but it's not really the “core” of GPUSPH: how much developer time should be dedicated to it, especially considering that the number of developers is small, and most have more competences in the mathematical and physical aspects of SPH rather than in CSG?
The end result is that things get implemented “as needed”, which brings me back to the gamedev thread I was mentioning earlier:
3/
10.3.2025 21:27And of course if possible you want to let the user choose arbitrary inter-particle spacings, possibly at runtime. This is e.g. the reason...Point 1. (non-trivial geometries) is one of the reasons why most CFD systems have a separate, heavy-duty preprocessing stage dedicated to help users design said geometries and make them “accessible” to the method. The problem for us is that the vast majority of the “tools of the trade” are dedicated to mesh-based methods (especially the finite elements method, FEM) and are thus not useful for particle methods such as #SPH.
There's still a lot of ongoing research on how to “optimally” fill arbtirary geometries with particles for these applications (see for example a recent work by the #SPHinXsys team on the topic <https://doi.org/10.1016/j.cpc.2023.108744>): although superficially one could think that this is a simple extension of what is used for the standard meshes (which is almost true as long as one sticks to a single layer of boundary particles) things devolve pretty quickly when multiple layers are needed (which is the case for many solid boundary models in #SmoothedParticleHydrodynamics) or a complete fill to be achieved in a way that respect a number of equilibrium conditions that should be satisfied at the beginning of the simulation. (And remember: you might need to do both an “outer” and an “inner” fill for the same geometry, putting fluid particles on one side, and boundary particles on the other).
2/
10.3.2025 21:21Point 1. (non-trivial geometries) is one of the reasons why most CFD systems have a separate, heavy-duty preprocessing stage dedicated to...This thread about writing games vs writing game engines <https://peoplemaking.games/@eniko/114137694102639793> by @eniko, and the comments within by many other people, is a fascinating read for me, particularly in relation to the similarities and the differences with our experience in the development of what is, for all intents and purposes, a #CFD engine, but also many of the test cases it has been used for.
For many #ComputationalFluidDynamics methods, it's actually pretty simple to write an implementation for a “trivial” test case (straight walls, right angles if any at all, periodic boundary conditions, etc). We did that for example in a couple of hours during the MODCLIM 2016 training school https://modclim.ulpgc.es/index.php/events/8-modclim-events
Things become quickly non-trivial as soon as you start needing
1. non-trivial geometries
and
2. more complex physics and/or more sophisticated methods.
1/
10.3.2025 21:04This thread about writing games vs writing game engines <https://peoplemaking.games/@eniko/114137694102639793> by @eniko, and the...#askFedi quick question for anyone who has had the opportunity to buy an #NVIDIA #L40: I see that it has a power connector different from the standard power connector, but mine was delivered without an adapter. Is this normal? #HPC #GPU
6.3.2025 15:24#askFedi quick question for anyone who has had the opportunity to buy an #NVIDIA #L40: I see that it has a power connector different from...First day of the #GPGPU course at #UniCT. Class is small, but students seem curious, gave me the opportunity to discuss in more details some things that usually go unmentioned. Hopefully it'll hold.
Only negative side, I had to take a longer route home because the park between my house and the university was closed 8-(
3.3.2025 18:57First day of the #GPGPU course at #UniCT. Class is small, but students seem curious, gave me the opportunity to discuss in more details some...⬆️
⬇️