https://logarithmic.net/pfh-files/random/grad.mov
9.3.2025 02:03https://logarithmic.net/pfh-files/random/grad.movHappy little accidents...
9.3.2025 01:52Happy little accidents...I'm really liking this course on generative diffusion models. They seem to have boiled many years of confusing development of ideas down to a simple approach.
https://diffusion.csail.mit.edu/
8.3.2025 08:44I'm really liking this course on generative diffusion models. They seem to have boiled many years of confusing development of ideas down...The optimizer tries to find the lowest energy. This closely resembles "maximum likelihood" or "maximum a posteriori" estimation in statistics. We might hope this finds the most representative estimate. Clearly, here it does not! The estimate is smoother than most samples from the distribution.
Also the optimizer has not found the lowest energy state, which would be all-dark or all-light. It might take a very long time to reach one of these optima!
23.2.2025 04:19The optimizer tries to find the lowest energy. This closely resembles "maximum likelihood" or "maximum a posteriori"...Second, sampling from the distribution with a Langevin Dynamics simulation. The algorithm is almost identical to gradient descent with momentum, but we add just the right amount of noise to the momentum at each step.
23.2.2025 04:18Second, sampling from the distribution with a Langevin Dynamics simulation. The algorithm is almost identical to gradient descent with...Comparison of optimization and sampling from a distribution defined by an energy function. I use a continuous version of the Ising model spin lattice energy.
First, optimization from a random initial state using gradient descent with momentum, using the SGD optimizer in PyTorch.
23.2.2025 04:14Comparison of optimization and sampling from a distribution defined by an energy function. I use a continuous version of the Ising model...A common pattern for me is a "chunked flow" (better name desperately needed).
R very much leans towards operations on complete datasets. A flow of data feels more Python-ish than R-ish. Working with chunks in R retains most of the efficiency.
In a local({ }) block I have one or more data generators producing chunks, which undergo processing and then are saved in one or more outputs (often parquet files).
A further refinement is to farm out processing of the chunks with multiprocessing.
15.2.2025 02:05A common pattern for me is a "chunked flow" (better name desperately needed). R very much leans towards operations on complete...Also, each "with_" function has a corresponding "local_" function that stacks into your current environment (a function body or local({ ... })).
This is syntactic sugar to flatten nested resource usage, much like the pipe %>% flattens nested function calls. Makes coding easier for humans.
15.2.2025 01:59Also, each "with_" function has a corresponding "local_" function that stacks into your current environment (a function...I think I'm a little bit in love with the {{withr}} package in R.
Similar to "with" in Python, you can guarantee to properly clean up when using a resource such as a connection to a file, a temporary directory, or a temporary global setting change.
15.2.2025 01:56I think I'm a little bit in love with the {{withr}} package in R.Similar to "with" in Python, you can guarantee to properly...Some reflections on the series here.
https://logarithmic.net/pfh/blog/01735193695
26.12.2024 07:13Some reflections on the series here.https://logarithmic.net/pfh/blog/01735193695I've been watching the 2023 Statistical Rethinking lecture series by Richard McElreath. These cover a complete approach to statistics based on causal reasoning and Bayesian analysis. They are excellent, highly recommend.
https://www.youtube.com/playlist?list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus
26.12.2024 07:12I've been watching the 2023 Statistical Rethinking lecture series by Richard McElreath. These cover a complete approach to statistics...Ideas I've stopped believing in: effective altruism.
This is a utilitarian approach to altruism. The most good can be done by earning as much money as possible then donating it. For example Sam Bankman-Fried attempted this. He didn't get to the donating step, but in principle it could have been a good gamble that didn't pay off. Next we think about the long term, the most good for future generations, including being born at all. Finally we imagine those future generations being a certain race.
24.12.2024 23:11Ideas I've stopped believing in: effective altruism.This is a utilitarian approach to altruism. The most good can be done by earning as...The plural of anecdote is anecdotes.
19.12.2024 21:43The plural of anecdote is anecdotes.https://pfh.github.io/rezbaz/myplot.html
27.11.2024 08:42https://pfh.github.io/rezbaz/myplot.htmlI just ran my 1.5 hour HTML+SVG+JavaScript+D3 workshop at ResBaz Victoria 2024. Touched on a minimal set of ideas ideas needed for an interactive web-page data visualization, and even uploaded to github pages. #resbaz
27.11.2024 08:40I just ran my 1.5 hour HTML+SVG+JavaScript+D3 workshop at ResBaz Victoria 2024. Touched on a minimal set of ideas ideas needed for an...So I'm thinking it might be possible to take a policy like BH which is maybe not quite right in a particular setup and *recalibrate* it based on simulation or a resampling scheme. For example a Tukey all-pairs comparison version of FDR, where the test statistics aren't independent. Or gene-set enrichment.
18.11.2024 20:39So I'm thinking it might be possible to take a policy like BH which is maybe not quite right in a particular setup and *recalibrate* it...Pondering FDR control. Procedures like BH represent a certain policy, which can be confirmed to work by simulation for any given number of true discoveries. The actually achieved FDR guarantee is the FDR obtained in the worst case.
18.11.2024 20:38Pondering FDR control. Procedures like BH represent a certain policy, which can be confirmed to work by simulation for any given number of...Something that should exist:
geom_principal_curve()
If x and y have a symmetric relationship, eg both x and y are noisy measurements of an underlying hidden variable, geom_smooth will under-estimate the slope, just like linear regression.
Prompted by a case where the slope really should have been 1. geom_smooth made it look less, and even made it seem like the data should be broken into groups.
⟋
⟋
⟋
sgdGMF is a general purpose matrix factorization package. Like PCA, but better, e.g. works well with count data. Should be fast, e.g. suitable for scRNA-Seq. Presented by @drisso at #abacbs2024.
https://github.com/CristianCastiglione/sgdGMF
7.11.2024 20:57sgdGMF is a general purpose matrix factorization package. Like PCA, but better, e.g. works well with count data. Should be fast, e.g....The Knockoff Framework looks very interesting as a broadly applicable method of maintaining an FDR. (Guannan Yang talked about this at #abacbs2024)
https://web.stanford.edu/group/candes/knockoffs/index.html
6.11.2024 09:24The Knockoff Framework looks very interesting as a broadly applicable method of maintaining an FDR. (Guannan Yang talked about this at...