Curiosity · Clarity · Connection
A Baltimore studio applying epidemiology and causal-inference rigor to messy, real-world data, and to the systems that teach people how to reason through it. We design data architecture, build data-linkage and synthetic-data pipelines, and engineer privacy-first, agentic-AI workflows that turn chaotic data into authoritative, decision-ready intelligence.
End-to-end blueprints, relational schemas, and multi-system integrations built for regulated environments, so analytics and AI sit on a foundation that holds.
Reconstructing fragmented records and generating synthetic datasets to develop and stress-test methods safely, without exposing sensitive data.
Isolating confounding, controlling bias, and validating model inputs: the difference between a correlation that ships and a result that survives scrutiny.
Local LLM orchestration and agentic workflows that run behind internal data firewalls. Advanced automation on open-weights models, without the leakage and compliance risk of open web APIs.
One method runs through all of it: take a tangled, confounded problem and make it legible, without flattening what makes it real.
That means starting from a world model, not a quick correlation: mapping what drives the data before trusting what it appears to say. It's why the work holds up under audit, regulatory review, and the kind of messy, high-stakes data that breaks generic AI tools.
Working notes from the studio's own builds, across public-health, pharma, and learning data. First entries publishing soon.
Hands-on experience with record linkage and confounding control on messy, real-world data: administrative and adverse-event records in public-health and pharma/biotech settings. Fragmented records, maternal–infant linkages that don't resolve cleanly, systems too outdated for probabilistic matching: working knowledge of where data integration actually breaks, and the exact friction an agentic approach targets.
Grounded in doctoral epidemiology training (Johns Hopkins), paired with a modern data science and architecture stack: Python, SQL, AWS, agentic-AI tooling (Claude Code, MCP), and data visualization (Tableau, Flourish, Streamlit).
The same instinct shapes how the Labs builds learning systems: investigative environments on realistic, messy data, where answers aren't clean and learners reason under uncertainty the way analysts do.
Explore the learning systems work