And check out the full paper in the open access @journalqd with Ryan McGrady, me, Rebecca Curran, Jason Baumgartner, and @ethanz for more on our sampling methods and analysis
https://journalqd.org/article/view/4066
Check out Ryan McGrady's summary on the @iDPI_UMass blog!
https://publicinfrastructure.org/2023/12/21/notes-from-random-youtube-coding/
5. There are an awful lot of video games
Our hand-coding task found that nearly 30% of videos were about video games. Other topics represented a smaller part of our sample but were nonetheless surprising, like religious content, which made up about 3% of the videos we hand-coded!
4. Not everyone is participating in the “creator economy”
Low-effort videos of still photos, homework assignments, inaudible three-second clips of ceremonies, and Zoom meeting recordings — YouTube's vast "dark matter" is more variable and strange than the YouTube we're used to.
3. Most of YouTube doesn’t get many views
Odds are, the videos you watch have >1,000 views, but those videos were just 13% of our sample. 4.9% don’t have any views at all. Even more stark is when it comes to comments and likes — 72.6% have no comments and 88.7% have no likes.
2. YouTube is mostly not in English
Our current best estimate is that 32% of videos where we can detect spoken language are in English, with 10.5% in Hindi, 8% in Spanish, slightly fewer in Portuguese, and just over 6% in Arabic.
1. YouTube hosts over 13 billion public videos (as of this month)
Our sampling method allows us to estimate YouTube's total size and growth. This current estimate is up from our paper's year-old estimate of ~10 billion. Check out https://tubestats.org for our latest data!
Our team's paper, "Dialing for Videos: A Random Sample of YouTube," is out now in @journalqd. We analyzed 10,000 random YouTube videos through metadata analysis, a spoken language identification model, and hand-coding. Here are our 5 main takeaways:
https://journalqd.org/article/view/4066