My self-advice for writing grants is to write each sentence in the voice of David Attenborough.
Good writing is a story. A story has an arc. A story has a theme. A story has characters. A story has resolution. The best stories, though, are also narrated by a kind British man with a passion for nature and education.
10.1.2024 15:47My self-advice for writing grants is to write each sentence in the voice of David Attenborough. Good writing is a story. A story has an arc....I'm crowdsourcing career advice. I want to study ⭐ What humans find easy or hard to learn ⭐ Tell me: what does this bring to mind for you? Whose research? What approaches?
I'm open to suggestions spanning all fields, including:
- learning science
- critical period & controlled rearing research
- deep learning theory
- dev. psych
⭐ What defines the line between easy vs. hard tasks?
⭐ When can brain areas change specialties (think chess experts, blind individuals), and what determines their new specialty?
⭐ How do learning biases sculpt the adult brain?
Help me build a reading list or find mentors!
23.8.2023 13:55I'm crowdsourcing career advice. I want to study ⭐ What humans find easy or hard to learn ⭐ Tell me: what does this bring to...I felt called out by this, as a scientist:
"Our success – my success – is the community's success. Your talent, your skill, I will celebrate it because I also see that as mine – even though you are the one that is performing that song. Because we are so interconnected as a community, I am practicing to see your joy as my joy. So there’s freedom there, there’s a freedom in sharing the happiness. There’s freedom in sharing the success and in the growth also."
– Br. Pháp Hữu
I'd love to feel more of this sentiment in science. What of one's work and ideas is truly and solely one's own?
4.6.2023 19:20I felt called out by this, as a scientist:"Our success – my success – is the community's success. Your talent, your skill, I...Interpretable AI really wants to understand what neurons in LLMs are doing. But this effort is very likely to fail – and it's not the right approach to understand what AI is doing and why.
Like, today, there's weirdly a lot of press about how OpenAI just showed that "Language models can explain neurons in language models" (https://openai.com/research/language-models-can-explain-neurons-in-language-models). But look at the metrics – this was a failed effort. GPT-4 *cannot explain* what neurons in GPT-2 are doing.
More importantly, single-unit interpretability in LLMs is not the same as understanding why and what LLMs as a whole are doing. Even if you did understand when a handful of units activate, you will never be able to stitch these together into a general understanding of why an LLM says the words that it does.
LLMs may someday be able to explain themselves in plain language. But describing (in plain language) when each neuron fires is not going to get us there.
#interpretableAI #LLMs #openai
10.5.2023 15:03Interpretable AI really wants to understand what neurons in LLMs are doing. But this effort is very likely to fail – and it's not the...I love this preprint from Tzuhsuan Ma and Ann Hermundstad for its point that, as a theorist, you can't separate "optimal" sensory representations from "optimal" behavior. The optimal action depends the constraints upon a sensory system.(https://www.biorxiv.org/content/10.1101/2022.08.10.503471v1)
For background, there's lots of theory about the optimal way an animal can update its beliefs about the world (a sensory problem) and, separately, the optimal way to act given one's beliefs (an action problem). This separation is fine as long as one has optimal beliefs. But biology is constrained. Sub-optimality means that the action problem is no longer disjoint from the sensory problem – evolution must tailor representations for action.
The analysis is beautiful. One sees first-hand how Bayesian-like behavior does not imply a truly Bayesian program.
8.5.2023 15:54I love this preprint from Tzuhsuan Ma and Ann Hermundstad for its point that, as a theorist, you can't separate "optimal"...I hear you there asking, "What makes us unique as humans? Where did language come from, evolutionarily speaking?" No answers here, but I just learned that chimpanzees have a homolog of Wernicke's area – complete with leftward asymmetry – and also a "Broca's area" that activates during communication.
e.g.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880147
Why is your body tense right now? But really, the deep why. Why do we store emotions in our bodies?
We must waste loads of ATP each day and night sustaining muscle tension in this way. There ought to be some reason we evolved this way.
I wrote a highly speculative blogpost positing a reason why: our bodies (read:musculature) act to store information about our behavioral state and situation. This is why intervening in this state-storage loop with a massage, sauna, or other body-centered practice can change your mood. https://aribenjamin.github.io/embodied-emotion/
29.1.2023 22:47Why is your body tense right now? But really, the deep why. Why do we store emotions in our bodies? We must waste loads of ATP each day and...I imagine that the future of cell biology will rely heavily on large foundation models trained to predict all the -omes (mRNA levels, DNA methylation, atacSEC, and the genome) from huge, collective datasets. It's great to see some progress towards that future: https://www.biorxiv.org/content/10.1101/2023.01.11.523679v1
17.1.2023 15:18I imagine that the future of cell biology will rely heavily on large foundation models trained to predict all the -omes (mRNA levels, DNA...Working memory helps us make dumplings! From Yoo & Collins (2022), "How Working Memory and Reinforcement Learning
Are Intertwined"
This was a real Aha moment for me — the fact that BCI tasks are easier when they align with dimensions of large variance in neural activity ("on-manifold" rather than "off-manifold") is an expected behavior of gradient descent. Very cool. I remember being fascinated, and puzzled, by that finding back when I first heard of it in 2016.
https://www.biorxiv.org/content/10.1101/2022.12.08.519453v1.full.pdf
For those who followed my last thread, it's the same math & result: gradient descent learns faster in the directions that correspond to the input dimensions that have high variance.
5.1.2023 15:28This was a real Aha moment for me — the fact that BCI tasks are easier when they align with dimensions of large variance in neural...These are just a few of the results in this paper – and they're somewhat buried in Fig. 5. You can find the rest here: https://www.nature.com/articles/s41467-022-35659-7
I'll end this thread with a quote from the paper's conclusion:
"Optimality can be defined in two ways. It can characterize the maximum achievable code quality, in an information-theoretic sense, given some number of neurons and their biological limitations. Alternatively, one might also describe responses that are optimal given the limited experience by which to learn the statistics of the world. Even ideal observers must learn from limited data, and successful learning from limited data must be constrained."
If we measure how W filters frequencies in time, and construct a threshold of sensitivity to simulate a "network acuity" of the output 𝑊𝑋, we observe a linear increase with training, like in humans. Learning naturally reflects environmental statistics at all stages of learning. As this demo shows, this happens in extremely simple systems you can code and imagine in a few lines.
4.1.2023 15:45If we measure how W filters frequencies in time, and construct a threshold of sensitivity to simulate a "network acuity" of the...Luckily, there's lots of work on gradient descent in learning theory. It turns out that in this system, gradient descent causes W to learn each principal component of the inputs at different rates. In fact, it learns the PCs in order of their variance. (Specifically we're measuring the projection 𝐯ᵢᵀ𝑊𝐯ᵢ for PC 𝐯ᵢ). And since the first PCs of natural images contain lower spatial frequencies, the result is that W acts like a lowpass filter that gets less strict as W learns.
4.1.2023 15:44Luckily, there's lots of work on gradient descent in learning theory. It turns out that in this system, gradient descent causes W to...One way to interpret this result is with efficient coding, the idea that neurons optimally represent the world despite unresolvable constraints (like noise). To explain an increase in acuity, one needs to say that 1) perception is always optimal with age, and 2) neural noise decreases with time. (e.g. Kiopes & Movshon (1998)).
We wanted to examine a different possibility: what if the brain's learning algorithm simply prefers to learn low-frequency information first? To demonstrate this, we created a very-very-simple model system – matrix multiplication – and asked what gradient descent would learn first if trained to reconstruct natural images.
Note that the "optimal solution" is trivial – let W be the identity matrix – and does not better represent any aspect of the data.
4.1.2023 15:42One way to interpret this result is with efficient coding, the idea that neurons optimally represent the world despite unresolvable...Time for a #tootprint to celebrate a publication! I want to share a simple demo of the paper's main concept here — five-lines-of-Python simple.
Back in the '80s, researchers Luisa Mayer and Velma Dobson (among others) found that babies and young children slowly get better at seeing fine details as they age. This improvement is *linear* in time. (This is visual acuity: the finest spacing of a grating that can be resolved before it appears pure gray.) What explains this steady improvement?
4.1.2023 15:35Time for a #tootprint to celebrate a publication! I want to share a simple demo of the paper's main concept here —...