The Palantir Builders

Four engineers who taught machines to see connections

By VastBlue Editorial · 2026-03-26 · 28 min read

Series: The Inventors · Episode 5

The Palantir Builders

The Names You Do Not Know

If you follow technology news, you know the names associated with Palantir Technologies: Peter Thiel, the PayPal co-founder who provided the initial funding and the libertarian brand; Alex Karp, the Stanford PhD philosopher who became CEO and whose dishevelled appearance became a running media fascination; Joe Lonsdale, the Stanford Review co-founder who provided early strategic direction. These are the names on magazine covers. These are the names in Senate hearing transcripts. These are the names that define the public narrative of Palantir as either a necessary guardian of national security or a dangerous extension of the surveillance state, depending on which editorial page you read.

This episode is not about any of them. It is about four engineers — Nathan Gettings, Anthony Nassar, John Doyle, and David LeBlanc — whose names appear on foundational Palantir patents and whose architectural decisions still shape how the platform operates twenty years later. They are the builders. The founders told stories. The builders wrote code.

Understanding what they built requires understanding why it needed to exist. And understanding why it needed to exist requires going back to the intelligence failure that created the political will, the funding, and the moral urgency for a system like Palantir to be built at all.

The Intelligence Failure That Changed Everything

On the morning of September 11, 2001, the United States government possessed, across its various agencies, virtually every piece of information necessary to prevent the attacks. The CIA knew that two of the hijackers — Nawaf al-Hazmi and Khalid al-Mihdhar — had attended a January 2000 meeting in Kuala Lumpur with known al-Qaeda operatives. The NSA had intercepted communications referencing an upcoming operation. The FBI's Phoenix field office had flagged suspicious flight-school enrolments by Middle Eastern men. Immigration records showed that al-Hazmi and al-Mihdhar had entered the United States legally, using their real names. The information was there. It was scattered across databases that could not be queried together, owned by agencies that did not share with each other, formatted in systems that had no common data model.

The 9/11 Commission Report, published in July 2004, laid out the failure in unsparing detail. Chapter 11 documented how the CIA's Counterterrorist Center tracked al-Mihdhar to the Kuala Lumpur meeting but failed to notify the FBI or place his name on a watchlist until August 2001 — twenty months later. Chapter 13 described "the biggest impediment to all-source analysis": the systemic inability to fuse data across agency boundaries. The FBI maintained case files in the Automated Case Support system, a 1980s-era application that could not perform full-text searches. The CIA stored intelligence in databases partitioned by classification level. The NSA's signals intelligence existed in yet another system. There was no mechanism for an analyst in one agency to discover that a related record existed in another.

The Commission identified specific instances where connection would have changed outcomes. Al-Mihdhar's CIA file contained his passport photograph and his association with the Cole bombing suspects. Had that record been searchable by FBI agents investigating the Cole case, they would have discovered that a known al-Qaeda associate was living openly in San Diego under his real name. The data existed. The databases could not talk to each other. Nineteen men exploited that gap.

80+ Separate databases across US intelligence agencies pre-9/11 — Each with its own schema, access controls, and definition of what a "person" or "event" meant. No common query layer existed.

"Connecting the dots" — the phrase the Commission used — sounds like a metaphor. In data terms, it was a precise technical requirement. It meant building a system that could take a name, an alias, or a passport number from any agency's database and find every related record in every other agency's database, regardless of format, schema, or classification level. It meant resolving identities across systems where the same person might appear as "Khalid al-Mihdhar," "Khalid Almihdhar," "خالد المحضار," or simply a document number. It meant doing this while respecting compartmentalisation rules — not every analyst should see every piece of intelligence. The problem was not secrecy itself. It was the inability to reveal the existence of relevant connections without violating classification boundaries.

The Problem: Data That Cannot Talk to Itself

A person in an immigration database is a passport number and a name string. The same person in a financial database is an account number and a taxpayer ID. In a communications metadata system, they are a phone number or an email address. In a travel manifest, they are a booking reference and a seat assignment. In each system, the same human being is represented by a different data structure, using different identifiers, formatted according to different conventions. There is no common key. There is no shared ontology. The data exists, but it cannot talk to itself.

Traditional relational databases assume a shared schema — you design tables, define columns, and load conforming data. But what happens when schemas were designed independently by different agencies, in different decades, for different purposes? When "name" in one system is a single string and in another it is five separate fields (given name, family name, patronymic, tribal affiliation, preferred alias)? The relational model breaks down precisely at the point where intelligence analysis begins: at the boundary between systems.

Nathan Gettings: The Architecture

Nathan Gettings is sometimes listed as Palantir's fifth co-founder — the one who wrote the first working prototype. He was twenty-five. While Thiel provided capital and Karp provided philosophical direction, Gettings wrote the code that would become the foundation of Palantir Gotham.

Gettings' fundamental design decision was to build Palantir around a flexible object model — an ontology — rather than a fixed relational schema. In a traditional database, you define tables with fixed columns before you load data. If you want to add a new type of entity or a new relationship, you alter the schema. In Palantir's ontology, any entity — a person, a vehicle, a bank account, a phone call, a location, a document — is a typed object with properties and relationships that can be extended without breaking existing objects.

How the Ontology Actually Works

Consider a concrete example. In a relational database designed for immigration, you might have a table called PERSONS with columns: passport_number (VARCHAR), given_name (VARCHAR), family_name (VARCHAR), date_of_birth (DATE), nationality (CHAR(2)). This table has a fixed schema. If you later discover that some countries issue multiple passports to the same person, or that certain naming conventions do not distinguish between given and family names, you must alter the table — and every query, every application, every downstream system that depends on that table must be updated accordingly.

In Palantir's ontology, the same person is represented as an object of type "Person" with a set of properties: { name_variants: ["Khalid al-Mihdhar", "Khalid Almihdhar"], document_ids: [{ type: "passport", issuer: "SA", number: "..." }], date_of_birth: "1975-05-16", associated_locations: [...] }. The critical difference is that the property set is extensible. When a new data source provides a previously unknown property — say, a biometric hash or a frequent-flyer number — that property can be added to the object without altering the object type definition. Other objects of the same type are unaffected. No schema migration is required. No downstream queries break.

Relationships between objects are first-class entities in the ontology, not foreign keys. A relationship between two Person objects — "known associate," "family member," "co-traveller" — is itself an object with properties: a type, a confidence score, a provenance chain documenting which source system established the relationship and when. This means the graph of relationships can be traversed, queried, and filtered independently of the entities themselves. An analyst can ask: "Show me all relationships of type 'financial transaction' involving this person with confidence above 0.7, sourced from at least two independent systems." That query is native to the ontology. In a relational database, it would require a complex multi-table join that the database designer would have had to anticipate at schema design time.

This is the schema-on-read versus schema-on-write distinction, and it was the single most consequential architectural choice in Palantir's history. Schema-on-write — the relational model — forces you to decide what questions you will ask before you collect the data. Schema-on-read — Palantir's approach — allows you to decide what questions to ask after the data has been collected. For intelligence analysis, where the most important question is always the one you did not know to ask, schema-on-read was not just a technical preference. It was a philosophical commitment to the idea that data should be shaped by the analyst's investigation, not by the database administrator's assumptions.

Anthony Nassar: The Integration Pipelines

Anthony Nassar solved the problem that sits upstream of the ontology: how do you get the data in? Government agencies and large enterprises do not have a single data export format. They have hundreds. CSV files with inconsistent delimiters. XML feeds with agency-specific schemas. Mainframe data dumps in fixed-width EBCDIC encoding. Real-time message queues in proprietary binary formats. Legacy databases with undocumented column meanings. The data integration challenge was not a technology problem — it was an archaeology problem. You had to understand what each source system meant by each field, how it encoded null values, how it handled character sets, how it versioned records, and how its timestamps aligned with other systems' timestamps.

Nassar designed the ingestion architecture that could consume data from arbitrary sources and map it onto Palantir's ontology. The technical challenge went beyond standard ETL (extract, transform, load). It was semantic alignment. A "person" in one database might be a full name string ("John Smith"). In another, it might be a numerical ID (47291). In a third, it might be a biometric hash. Nassar's pipelines resolved these different representations to a single ontological entity, maintaining provenance records that tracked which source systems contributed which properties, when they were last updated, and with what confidence level.

The provenance tracking was not decorative. In intelligence analysis, knowing where a piece of information came from is as important as the information itself. A name that appears in two independent databases is far more significant than a name that appears in two tables within the same database. Nassar's system preserved this distinction, allowing analysts to weight connections by the independence and reliability of their sources.

4 Engineers behind the core Palantir platform — Gettings (ontology architecture), Nassar (data integration), Doyle (unstructured data), LeBlanc (multilanguage systems). Their patent filings tell the story their public profiles do not.

The Entity Resolution Problem

Cutting across all four engineers' work is entity resolution — the challenge of determining that two records in different databases refer to the same real-world entity when there is no shared unique identifier.

Consider the name "Ahmed." In an immigration database, there is an "Ahmed Hassan" who entered through JFK on 3 March. In a financial database, there is an "A. Hassan" who wired money from a Dubai account on 5 March. In a phone metadata database, there is a number registered to "Ahmad Hasan" in Brooklyn. Are these the same person? Two of the same person and one different? Three different people? The answer matters enormously — and getting it wrong in either direction has severe consequences. A false positive (merging records that belong to different people) could lead to the surveillance or detention of an innocent person. A false negative (failing to merge records that belong to the same person) could mean missing the connections that would prevent an attack.

Palantir's entity resolution uses probabilistic matching rather than deterministic rules. Instead of requiring an exact match on a single field, the system computes similarity scores across multiple attributes: phonetic similarity of names (using algorithms like Soundex and specialised Arabic/Cyrillic transliteration models), temporal proximity of events, geographic co-location, shared associates, and transactional patterns. Each attribute contributes a weighted probability, producing a composite confidence score reflecting the likelihood that two records refer to the same entity.

The confidence score is not binary. It exists on a spectrum, and the system is designed to surface ambiguous matches for human review rather than resolve them automatically. An analyst might see: "Two records identified as potential matches at 73% confidence. Name phonetics match (0.89), geographic locations consistent (0.81), date-of-birth fields differ by three years (0.34). Review recommended." The machine narrows the search space; the human makes the determination. This was not a limitation of early 2000s technology — it was a deliberate design choice reflecting a commitment to human-in-the-loop analysis that persists in Palantir's current systems.

At scale, the challenge intensifies exponentially. With a hundred million records, naïve pairwise comparison is computationally impossible. Palantir's approach uses blocking strategies — pre-filtering records into candidate groups based on coarse attributes (same country of origin, same first initial, same decade of birth) before applying fine-grained similarity scoring within each block. This reduces computational complexity from quadratic to roughly linear, making real-time entity resolution feasible across datasets containing billions of records.

John Doyle: The Text Problem

Structured data — the kind that lives in database rows with typed columns — is the minority of intelligence-relevant information. The majority exists in unstructured text: field reports, diplomatic cables, intercepted communications, news articles, legal filings, social media posts, corporate emails. This text contains names, dates, locations, organisations, and events, but they are embedded in natural language rather than structured fields. Extracting them requires natural language processing at scale — and in 2004, NLP was neither as capable nor as fast as it is today.

John Doyle designed the entity extraction and cross-referencing systems that could identify named entities in free text and link them to structured records in the ontology. His system could read a field report mentioning "Ahmed" meeting "Khalid" at a "café in the Kreuzberg district" on "14 March," extract those four entities (two persons, one location, one date), disambiguate them against known entities in the ontology (which Ahmed? which Khalid?), and surface all related structured records — travel manifests, financial transactions, phone metadata — for both individuals.

The cross-referencing was the critical innovation. Entity extraction alone was a known NLP task. But connecting extracted entities to structured data across multiple independent databases — and doing so with confidence scoring that reflected the ambiguity of natural language — required a system that operated simultaneously across structured and unstructured data domains. Doyle's work is documented in Palantir's patents on unstructured data processing, and it remains one of the least publicly appreciated components of the platform.

David LeBlanc: The NATO Problem

David LeBlanc solved a problem that was as much diplomatic as technical. NATO operations involved data from dozens of allied nations, each with different languages, different data formats, different classification levels, and different legal constraints on data sharing. A French intelligence report about a suspect in Marseille might reference the same individual as a German police file from Frankfurt, but the French report would be in French, classified under French national markings, and subject to French data-sharing regulations that prohibited sharing certain fields with non-Five Eyes partners.

LeBlanc designed the multilanguage ontology system that allowed Palantir to operate across these linguistic and jurisdictional boundaries. His system maintained parallel representations of the same entity across multiple languages, handled character sets from Arabic to Cyrillic to CJK, and enforced access controls that respected national classification markings. A French analyst could see French-classified data linked to American-classified data without seeing the American data itself — the system would reveal the existence of a connection (there is related intelligence from a partner nation) without revealing its content (you are not cleared to see what it says).

This was not a software feature. It was a trust architecture. NATO interoperability depends on each member nation's confidence that sharing data through a common platform will not violate their national security regulations. LeBlanc's system provided that confidence through technical enforcement rather than procedural agreement — the access controls were embedded in the data layer, not in a policy manual that an analyst might forget to consult.

From Gotham to Foundry: The Commercial Evolution

Palantir Gotham was built for intelligence agencies — air-gapped networks, SCIF-compatible infrastructure, cleared personnel. But the underlying engineering problem — integrating data from disparate systems into a coherent analytical layer — was not unique to intelligence. Every large organisation has the same problem. Hospitals have patient data in electronic health records, imaging systems, pharmacy databases, and laboratory information systems. Manufacturers have data in ERP systems, SCADA controllers, supply-chain platforms, and IoT sensor networks. The data is siloed not because anyone planned it that way, but because each system was procured independently, often decades apart, by different departments with different requirements.

Palantir Foundry, launched in 2016, was the commercial rewrite. It preserved the ontology architecture, the entity resolution engine, the provenance tracking, and the access control model, but it reimagined the user experience for non-intelligence customers. Where Gotham assumed analysts trained in intelligence tradecraft, Foundry was designed for supply-chain managers, manufacturing engineers, clinical researchers, and financial analysts who needed to integrate data without writing code.

The commercial use cases proved the universality of the architecture. Airbus used Foundry to integrate data across its global supply chain — tracking parts from hundreds of suppliers across dozens of countries. The system ingested supplier delivery schedules, customs data, shipping manifests, and quality inspection records, mapping them onto an ontology that could answer questions like: "If this titanium supplier in Japan is delayed by two weeks, which aircraft assemblies are affected, and what alternative suppliers could deliver equivalent parts within the certification window?" That question spans five data systems and three organisational boundaries. Without an ontology layer, it would take analysts days of manual work. With Foundry, it became a query.

In healthcare, Foundry integrated clinical trial data across research sites and regulatory databases. A multi-site trial might have patient data in different formats at US, UK, and Japanese hospitals — each subject to different privacy regulations (HIPAA, GDPR, APPI) constraining which data elements could be shared. The ontology architecture that LeBlanc designed for NATO classification markings mapped directly onto healthcare data sovereignty requirements. The technical problem — revealing the existence of a relevant connection without revealing protected content — was identical. Only the regulatory framework had changed.

$21B Palantir's valuation at its September 2020 direct listing — Revenue had grown from government contracts to a roughly even split between government and commercial customers by 2023, validating the Gotham-to-Foundry architectural bet.

The Ethical Architecture

No discussion of Palantir is complete without addressing the ethical dimension. Every architectural choice the four builders made embedded values — whether they intended to or not.

An ontology that can integrate immigration records, financial transactions, phone metadata, social media activity, and geospatial tracking into a single queryable graph is, by definition, a surveillance tool of extraordinary power. The same system that could have identified the 9/11 hijackers could — and critics argue, has been — used to track immigrants, monitor protesters, and enable predictive policing that disproportionately affects marginalised communities.

Palantir's engineering response to this tension was access control — the same architecture that LeBlanc built for NATO. The access control layer, Palantir argues, is what distinguishes a surveillance tool from an analytical platform. Every query is logged. Every data access is attributed to a specific user with specific permissions. Every connection carries a provenance chain documenting who found it, when, under what authority, and for what stated purpose. This audit trail, the company argues, makes its system more accountable than the ad hoc intelligence sharing that preceded it.

Critics are not persuaded. The ACLU and other civil liberties organisations have argued that access controls are only as strong as the policies that govern them — and that once a system capable of mass surveillance exists, the political and institutional pressures to expand its use are difficult to resist. Capabilities built for narrow, well-defined purposes tend to expand to broader applications over time. The NSA's bulk metadata collection program, initially authorised for counter-terrorism, expanded to encompass communications metadata for millions of ordinary Americans.

The most valuable data is always the data you did not know you had, connected to data you did not know was related. The engineering achievement is not the connection — it is making the connection discoverable. The ethical question is: discoverable by whom, under what authority, and with what accountability?

The central tension in Palantir's architecture

The privacy-by-design debate cuts to the heart of what it means to be an engineer building systems of consequence. The four builders made architectural decisions that prioritised analytical power. They also embedded accountability — provenance tracking, granular access controls, audit logging, classification-aware boundaries — into the system's infrastructure from the beginning. Whether those mechanisms are sufficient is not an engineering question. It is a political and moral one. But the fact that accountability was architectural rather than afterthought reflects a design philosophy that takes dual-use seriously.

What They Built Together

Individually, each of these contributions was a significant engineering achievement. Together, they constituted something that did not previously exist: a system that could ingest any data, from any source, in any format, in any language, at any classification level, and make it searchable, linkable, and analysable through a single interface with full provenance and access controls.

Gettings' ontology meant new data sources could be integrated without schema migrations. Nassar's pipelines meant any format could be mapped without manual transformation. The entity resolution system meant records describing the same real-world entity could be linked across system boundaries with quantified confidence. Doyle's unstructured data processing meant intelligence buried in free text could be extracted and connected to structured records. LeBlanc's multilanguage and access-control system meant the platform could operate across national boundaries without violating data sovereignty requirements.

Palantir's S-1 filing, submitted in August 2020, valued the company at approximately $21 billion. The filing described the technology in commercial terms — "software platforms for human-driven analysis of real-world data." The patents describe it in engineering terms — ontology systems, entity resolution, provenance tracking, multilanguage data fusion. The Foundry platform proved that the architecture designed for intelligence agencies could solve the same class of problems in commercial settings — supply chain integration, clinical data harmonisation, manufacturing optimisation. The data integration problem, it turned out, was universal. Only the classification markings were different.

The four engineers described in this episode are the ones who turned a thesis about connected data into a system that governments and corporations trust with their most sensitive operations. They built a tool of extraordinary analytical power, embedded accountability mechanisms into its architecture, and left the question of whether those mechanisms are sufficient to the societies that deploy it.

None of them are on magazine covers. All of them are in the patents.

Sources

  1. Palantir Technologies Inc. S-1 Registration Statement, SEC filing, August 25, 2020 — https://www.sec.gov/Archives/edgar/data/1321655/000119312520230013/d904406ds1.htm
  2. US Patent 8,515,912 — "Data integration and analysis system" (Palantir Technologies) — https://patents.google.com/patent/US8515912B2
  3. US Patent 9,652,510 — "System for processing unstructured data and cross-referencing" (Palantir) — https://patents.google.com/patent/US9652510B1
  4. US Patent 10,198,515 — "Multilanguage ontology system" (Palantir Technologies) — https://patents.google.com/patent/US10198515B1
  5. National Commission on Terrorist Attacks Upon the United States, "The 9/11 Commission Report," 2004 — https://www.9-11commission.gov/report/
  6. Palantir Engineering Blog — technical architecture publications, 2015-2023 — https://blog.palantir.com
  7. ACLU, "How is Palantir Helping ICE Deport Immigrants?" analysis of surveillance deployment, 2020 — https://www.aclu.org/news/immigrants-rights/how-is-palantir-helping-ice-deport-immigrants
  8. Airbus SE annual report 2019, referencing digital transformation initiatives including data integration across global supply chain
  9. US Patent 8,688,573 — "Entity resolution and relationship discovery" (Palantir Technologies) — https://patents.google.com/patent/US8688573B1