← Back

Thom Vaughan

London, UK

Engineer and researcher with 20+ years across data infrastructure, media production, and open source technology. Currently focused on web-scale data pipelines and open data at petabyte scale. Background spans studio engineering, IT management, web infrastructure consultancy, and policy research.

Principal Engineer

Common Crawl Foundation · Contract · Remote · USA

Data pipelines, web archiving, and graph analysis at petabyte scale. Responsible for crawl infrastructure, data quality, and the systems that make Common Crawl's open datasets available to researchers worldwide. Cloud operations across AWS.

Director

London Pixel Exchange · Self-employed · London, UK

Web infrastructure consultancy specialising in open data and ML integration. Project scoping, architecture, cloud security, and delivery of bespoke systems for clients across media, technology, and research.

Founder

Vaughan Type · Part-time · London, UK

Independent type foundry. Designer of Wumpus Mono, a monospaced typeface for programmers, currently ranked #4 globally in the monospaced-font topic on GitHub.

Chief Engineer

Telehack Foundation · Part-time · Remote · USA

Core maintainer of Telehack, a multi-user simulation of a stylised ARPANET/Usenet (c. 1985–1990) with 26,600+ simulated hosts. Built the Dojo code-challenge platform. Ongoing development of the simulation engine, network topology, and community tooling.

Project Manager

High Score Productions · Full-time · London, UK

Managed audio production projects including voice casting, voice direction, and pronunciation work. Cast and directed the voice of Google across 20+ locales. Oversaw computational linguistics workflows and studio IT infrastructure.

Studio Engineer

Wardour Studios · Full-time · London, UK

Recording, editing, and mixing across spoken word, music, and broadcast. System administration for studio infrastructure. Composition and sound design.

Technical Development Coordinator

Metropolis Studios · Full-time · London, UK

Technical systems development and IT infrastructure for one of Europe's largest recording studio complexes. Shell scripting, automation, and staff training.

IT Development Manager

Academy of Contemporary Music · Full-time · Guildford, UK

IT development and systems infrastructure for the music education institution. Software deployment, studio technology, and technical training programmes.

Music Composer & Sound Engineer

North Kingdom / The North Alliance · Freelance · Stockholm, Sweden

Composition and sound design for interactive projects at the award-winning digital agency. Game audio and voice direction.

Vocabulary for Expressing Content Preferences for AI Training

Thom Vaughan

Colour Contrast on the Web: A WCAG 2.1 Level AA Compliance Audit of Common Crawl's Top 500 Domains

Thom Vaughan, Pedro Ortiz Suarez

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

Pedro Ortiz Suarez, Laurie Burchell, Catherine Arnett, Rafael Mosquera-Gómez, Sara Hincapie-Monsalve, Thom Vaughan et al.

Building Data Infrastructure for Low-Resource Languages

Sarah K. K. Luger, Rafael Mosquera, Pedro Ortiz Suarez, Thom Vaughan

Languages: Go, Python, Rust, Perl, JavaScript, Shell

Infrastructure: AWS, S3, IAM, Cloud Security, Dev Ops, System Administration

Languages: English (native), Swedish (fluent)

Type design, amateur radio, electronic music, literature and philosophy, science fiction.

Further details and references available on request.