New blog post: Communicating Ontology – Technical approaches for facilitating use of our Wikibase data
https://semlab.io/blog/communicating-ontology
A look at some tools made to help communicate research data stored in our Wikibase including property usage visualizations and JSON-LD bulk data downloads.
5.3.2025 20:14New blog post: Communicating Ontology – Technical approaches for facilitating use of our Wikibase...New blog post, three interfaces to explore the 50K 1929 HathiTrust resources that entered the public domain last month:
https://thisismattmiller.com/post/hathi-pd-2025/
Including this one which lets you find literature/fiction books by genre and lcsh.
6.2.2025 17:49New blog post, three interfaces to explore the 50K 1929 HathiTrust resources that entered the public domain last...New publication: “Knowledge Graphing Art Archives: Methods and Tools from the Semantic Lab’s E.A.T. Project”
Highlighting work creating a knowledge graph for archival materials from the avant-garde movement, Experiments in Art and Technology (E.A.T.).
https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.268
28.1.2025 16:14New publication: “Knowledge Graphing Art Archives: Methods and Tools from the Semantic Lab’s E.A.T. Project”Highlighting work creating...With TikTok probably shutting down I made some scripts to download and build a local web interface for your TikTok liked and favorited videos:
https://github.com/thisismattmiller/tiktok-shutdown
It downloads the videos locally, I had 2200 videos, which takes up about 20GB.
13.1.2025 01:54With TikTok probably shutting down I made some scripts to download and build a local web interface for your TikTok liked and favorited...A new post on using models like Segment Anything 2 and LLaVA on 14,000 woodcut images from Plantin-Moretus Museum: https://thisismattmiller.com/post/woodblockshop/
I used the results to make a little toy that lets you mashup elements from the woodcuts into new images: https://woodblockshop.glitch.me/
17.12.2024 21:36A new post on using models like Segment Anything 2 and LLaVA on 14,000 woodcut images from Plantin-Moretus Museum:...For Banned Book week I took a look at the metadata for 1500 titles identified by PEN America’s banned and challenged book list. Analyzing subject headings used and other data.
https://thisismattmiller.com/post/banned-metadata/
27.9.2024 17:02For Banned Book week I took a look at the metadata for 1500 titles identified by PEN America’s banned and challenged book list. Analyzing...New post looking at using the Whisper speech to text model on 400+ 1938 folk songs collected by Alan Lomax.
I look at quality, building a lyric focus web component player, search interface and LLM enrichment:
https://thisismattmiller.com/post/lomax-whisper/
13.9.2024 16:30New post looking at using the Whisper speech to text model on 400+ 1938 folk songs collected by Alan Lomax. I look at quality, building a...Played a small part in this new Atlantic article looking at diversity in publishing:
https://www.theatlantic.com/books/archive/2024/06/diversity-publishing-backlash-study/678734/
(my part being supplying the book metadata)
20.6.2024 19:43Played a small part in this new Atlantic article looking at diversity in publishing:...I had some nice examples I wrote of using the new Worldcat /v2/ API endpoints but I guess I better keep those off github, wouldn't want it to be used as evidence of some imaginary offense in the future. Talk about a stupid chilling effect.
16.4.2024 14:24I had some nice examples I wrote of using the new Worldcat /v2/ API endpoints but I guess I better keep those off github, wouldn't want...If you have +11 million names, like in the LC Name Authority File, how many of them anagram to each other? A lot: https://thisismattmiller.com/post/lcnaf-anagrams/
1.4.2024 18:24If you have +11 million names, like in the LC Name Authority File, how many of them anagram to each other? A lot:...Wrote a tutorial on how to migrate your data if you use Dockerized Wikibase to a new server:
https://thisismattmiller.com/post/migrating-your-docker-wikibase/
Very niche, but would have saved me a ton of time if existed.
21.3.2024 20:37Wrote a tutorial on how to migrate your data if you use Dockerized Wikibase to a new server:...Browse 1928 books in HathiTrust that entered the public domain this week by popularity.
I made a couple interfaces that allow you browse and explore by Library of Congress Classification:
https://thisismattmiller.github.io/hathi-pd-2024/
3.1.2024 19:54Browse 1928 books in HathiTrust that entered the public domain this week by popularity. I made a couple interfaces that allow you browse and...I wrote a blog post about political GIFs in Library of Congress Web Archives (https://thisismattmiller.com/post/animated-gifs-in-us-elections/) and I included some examples and now years later I'm getting shook down by a copyright troll for one of the images.
Is an Obama Blingee gonna cost me $400 🙃
14.11.2023 21:30I wrote a blog post about political GIFs in Library of Congress Web Archives...They're moving us into another building at work, and everyone is throwing away their old stuff. And I found a print out of the lc homepage from 2001.
i'm a bit of a web archivist myself...
29.8.2023 14:14They're moving us into another building at work, and everyone is throwing away their old stuff. And I found a print out of the lc...I bought a .zip TLD: https://thisismattmiller.zip
It's like malware, if malware was a zip file of my homepage. Trying to get a copy of this baby on every hard drive out there.
growing the thisismattmiller family of #brands, joining:
https://thisismattmiller.com (original flavor)
https://thisismattmiller.club ("very cool, very fun")
A look at using GPT3/3.5/4 on library and archives collections. Using the crowdsourced transcribed Susan B. Anthony Papers from Library of Congress as a use case.
Using GPT/LLMs to manipulate and extract metadata and the type of interfaces/datasets that makes possible:
https://thisismattmiller.com/post/using-gpt-on-library-collections/
30.3.2023 17:38A look at using GPT3/3.5/4 on library and archives collections. Using the crowdsourced transcribed Susan B. Anthony Papers from Library of...Made some improvements to:
An OCR tool for complicated docs that lets you manually select what text to extract. You can now structure the text into fields and download as JSON. It now also supports multipage PDFs. New tutorial video on the home page.
31.1.2023 20:40Made some improvements to: https://pomodoro.semlab.io/An OCR tool for complicated docs that lets you manually select what text to extract....Updated http://pfch.nyc with new class projects from last semester.
The course is "Programming for Cultural Heritage" I teach at Pratt Institute which is mostly MILS students learning to work with data programmatically.
A lot of them go from zero coding experience to a complete project by the end of the semester. A lot of interesting project ideas.
The office I work in at
Library of Congress is hiring two Linked Data Applications Technical Analysts. GS 13, $112,015 - $145,617. We work onsite 2 days a week. You'd be working on Bibframe, http://id.loc.gov, etc. A great opportunity for junior technical folks who want to focus on linked data. Happy to answer questions.
https://usajobs.gov/job/698606100
A little viz to browse HathiTrust resources that are flipping to public domain today. Narrow the 58K by LCC and then scroll for the list of titles.
https://thisismattmiller.github.io/hathi-pd-2023/
#publicdomain
⬆️
⬇️