And some classes were recorded: https://www.youtube.com/watch?v=E8qVzi0nBII&list=PLVYZ2jULLUDu8UhB8INpfEp9jukxAvzpM
1.5.2025 23:54And some classes were recorded: https://www.youtube.com/watch?v=E8qVzi0nBII&list=PLVYZ2jULLUDu8UhB8INpfEp9jukxAvzpMWhat is taught in a modern software security course? And how much can the students do with the learned concepts? You can find it in this link. Hint: 18 student teams = 18 0-days discovered: https://marcusbotacin.github.io/teaching/sw-security
1.5.2025 23:53What is taught in a modern software security course? And how much can the students do with the learned concepts? You can find it in this...New grant! New collaboration!Take a look on what we will be working on next: https://engineering.tamu.edu/news/2025/02/chatgpt-for-computer-security.html
12.3.2025 00:56New grant! New collaboration!Take a look on what we will be working on next:...The SATML program is out and our paper "ML-Based Behavioral Malware Detection Is Far From a Solved Problem" is there! Check preprint: https://arxiv.org/abs/2405.06124 It was amazing to work with great researchers on this project!
11.3.2025 22:12The SATML program is out and our paper "ML-Based Behavioral Malware Detection Is Far From a Solved Problem" is there! Check...Proud advisor moment! Congrats Nhat Nguyen for successfully defending his MSc thesis! Nhat is my first advised student to graduate! Thx Dr. Peeples and Dr. Hamilton for participating in the committee. Wait for some cool papers to come on automatic YARA rule generation!
11.3.2025 16:58Proud advisor moment! Congrats Nhat Nguyen for successfully defending his MSc thesis! Nhat is my first advised student to graduate! Thx Dr....[New Paper] "On the uniqueness of AntiVirus labels: How many labels do we need to fingerprint an AV?" https://link.springer.com/article/10.1007/s11416-024-00541-1
22.11.2024 19:09[New Paper] "On the uniqueness of AntiVirus labels: How many labels do we need to fingerprint an AV?"...[New Paper] "Fuzzing and Symbolic Execution for Multipath Malware Tracing: Bridging Theory and Practice via Survey and Experiments" https://dl.acm.org/doi/10.1145/3700147
11.10.2024 19:08[New Paper] "Fuzzing and Symbolic Execution for Multipath Malware Tracing: Bridging Theory and Practice via Survey and...The video for my " GPThreats: Fully automated AI generated malware and its security risks" talk at the HOU.SEC.CON is out. Check out how ChatGPT, Copilot, and GANs create functional, evasive malware samples: https://www.youtube.com/watch?v=5lk_xklzcMg
3.10.2024 15:33The video for my " GPThreats: Fully automated AI generated malware and its security risks" talk at the HOU.SEC.CON is out. Check...New paper. Also staring at RAID conference this week: "What do malware analysts want from academia? A survey on the state-of-the-practice to guide research developments" Paper: https://marcusbotacin.github.io/publication/2024-10-10-paper-survey-36
30.9.2024 02:03New paper. Also staring at RAID conference this week: "What do malware analysts want from academia? A survey on the...New paper. Staring at RAID conference this week: "Cross-Regional Malware Detection via Model Distilling and Federated Learning". Paper: https://marcusbotacin.github.io/publication/2024-10-10-paper-distill-35
30.9.2024 02:02New paper. Staring at RAID conference this week: "Cross-Regional Malware Detection via Model Distilling and Federated Learning"....I presented the talk "GPThreats: Fully-automated AI-generated malware and its security risks" at the Houston Security Conference (HOU.SEC.CON). Check slides at https://marcusbotacin.github.io/talks/housec
27.9.2024 17:51I presented the talk "GPThreats: Fully-automated AI-generated malware and its security risks" at the Houston Security Conference...Want to know more? Take a look at our paper. And let's chat about it!
20.9.2024 18:55Want to know more? Take a look at our paper. And let's chat about it!Once again, pseudo-labels help to mitigate the effects of limited queue sizes, which is a constraint of many real-world pipelines!
20.9.2024 18:54Once again, pseudo-labels help to mitigate the effects of limited queue sizes, which is a constraint of many real-world pipelines!Where the delays come from? In addition to sandbox delays, another source of delays are limited buffer sizes. When the buffer is limited, not all samples are processed. Limiting the number of samples considered in the retraining process cause the same effect as label delays.
20.9.2024 18:54Where the delays come from? In addition to sandbox delays, another source of delays are limited buffer sizes. When the buffer is limited,...A drawback is that the pseudo-labels should be short-lived to be beneficial. Updating the secondary classifier when drifting is detected is required, otherwise the outdated pseudo-label generates start to poison the main classifier, degrading its performance.
20.9.2024 18:54A drawback is that the pseudo-labels should be short-lived to be beneficial. Updating the secondary classifier when drifting is detected is...Using pseudo labels is beneficial not because it increases the detection rate, but because it affects the drift dynamics. Different drift points are observed when pseudo and delayed labels are used.
20.9.2024 18:53Using pseudo labels is beneficial not because it increases the detection rate, but because it affects the drift dynamics. Different drift...The key to mitigate the impact of true delayed labels (e.g., from a sandbox) is to have a mechanism (e.g., a more powerful, cloud-based, static classifier) to provide temporary labels (pseudo-labels) that can help reducing the response time, which has significant effects.
20.9.2024 18:53The key to mitigate the impact of true delayed labels (e.g., from a sandbox) is to have a mechanism (e.g., a more powerful, cloud-based,...Another unrealistic assumption is that the true labels (groundtruth) are immediately available, which is not case in the real world. If the labels are delayed, the retraining is delayed, and the exposure grows.
If the labels are delayed by a long time, drift retraining is not effective anymore, and the performance degrades to the case with no drift detection.
20.9.2024 18:52Another unrealistic assumption is that the true labels (groundtruth) are immediately available, which is not case in the real world. If the...The problem with most evaluations is that they assume ideal situations and real-world is much hard. With a traditional metric, we cannot understand the real impact that restrictions cause, but with the new metric we can highlight its effect.
A first restriction is the amount of data one can keep on the history queue to retrain the models. If we consider a limited queue (triggered only by drift warning) rather than the entire queue, there is a noticeable impact in the long-term.
20.9.2024 18:52The problem with most evaluations is that they assume ideal situations and real-world is much hard. With a traditional metric, we cannot...We believe that we should have a cumulative view bc the same sample keeps threatening users over time if detection is missed. We propose the exposure metric, that highlights the effect of FNs. It makes clear that concept drift is a very efficient approach in the long-term.
20.9.2024 18:50We believe that we should have a cumulative view bc the same sample keeps threatening users over time if detection is missed. We propose...⬆️
⬇️