Methodology
Technical notes on data sources, classification rules, metrics, and known limitations for the ATMP Research Platform.
What are ATMPs?
Advanced Therapy Medicinal Products (ATMPs) are a class of medicines regulated under EU Regulation 1394/2007. They fall into four sub-classes:
| Abbreviation | Full name | Description |
|---|---|---|
| GTMP | Gene Therapy Medicinal Product | Recombinant nucleic acid used to regulate, repair, replace, add, or delete a genetic sequence |
| sCTMP | Somatic Cell Therapy Medicinal Product | Cells that have been manipulated to change biological characteristics, or used to treat or diagnose disease |
| TEP | Tissue Engineered Product | Contains or consists of engineered cells or tissues; intended to repair, regenerate, or replace human tissue |
| cATMP | Combined ATMP | An ATMP that integrates a medical device in addition to cells or tissues |
This platform covers research publications related to all four ATMP classes, identified via MeSH term classification rather than regulatory approval status. The corpus therefore includes basic research, translational studies, and clinical investigations, not only EMA-approved products.
MeSH ATMP classification
Source vocabulary
The identification strategy uses the Medical Subject Headings (MeSH) 2026 descriptor vocabulary (31,110 terms), parsed from the NLM XML release (desc2026.xml). Publications are classified as ATMP-related if indexed in Dimensions under one or more of 112 classified MeSH descriptors.
Classification methodology
MeSH terms were selected in two passes:
- Keyword matching — terms were screened against a curated list of ATMP-relevant keywords spanning gene therapy, cell therapy, tissue engineering, vectors, editing tools, and regulatory biology
- Tree-number matching — MeSH tree codes were used to capture entire sub-trees of related concepts (e.g., all descendants of
D020871 - Gene Transfer Techniques)
This produced a three-tier classification:
- Certain ATMP (51 terms) — unambiguous ATMP concepts (e.g., CAR T-Cell Therapy, CRISPR-Cas9, Induced Pluripotent Stem Cells)
- Edge Cases (59 terms) — concepts with significant ATMP overlap but also broader use (e.g., Stem Cells, Viral Vectors, Tissue Engineering); included by default
- Non-ATMP (~31,000 remaining terms)
Expert validation of the classification is ongoing. The "Edge Cases" tier may be refined in future versions.
Note: Only canonical
DescriptorNamevalues are queried in Dimensions, never entry terms (synonyms). Dimensions indexes publications by canonical descriptor; querying synonyms would be redundant and introduce false positives.
Scheme D technology domains
For Scheme D analysis, ATMP publications are further grouped into 8 technology domains aligned with the VR/VINNOVA Excellence Clusters for Groundbreaking Technologies grant framework:
| Code | Domain | Description |
|---|---|---|
| F1 | DNA tailoring | Gene editing, CRISPR, DNA repair and modification |
| F2 | Identity & Fate reprogramming | iPSCs, cell differentiation, epigenetic reprogramming |
| F3 | Delivery | Vectors, nanoparticles, delivery vehicles |
| F4 | Sensing & Control systems | Synthetic biology, gene switches, biosensors |
| E1 | Phenotyping | Omics, single-cell analysis, biomarkers |
| E2 | Bioprocessing | Cell culture, bioreactors, growth factors |
| E3 | Preclinical modelling | Animal models, organoids, disease models |
| E4 | Manufacturing | GMP processes, quality control, scale-up |
Each of the 112 MeSH terms is assigned a primary Scheme D domain (and optionally a secondary domain for multi-domain terms). Long-format assignments allow a paper to contribute to multiple domains if it is indexed under terms from different domains.
Publication data
Source
Publications are retrieved from the Dimensions API (Digital Science) using institution-level queries filtered by MeSH descriptor. Dimensions aggregates publications from PubMed, Crossref, and other bibliographic databases.
Query strategy
Each of the 112 ATMP MeSH descriptors is queried independently. Results are deduplicated by Dimensions publication ID (pub_id). A publication qualifies as ATMP-related if it carries at least one of the 112 classified MeSH terms in its Dimensions metadata.
Country and institution attribution
Country and institutional affiliation is taken directly from Dimensions' parsed affiliation data. Each publication can have multiple country attributions (one per affiliated institution). The unit of analysis in cross-country comparisons is therefore paper-country pair, not unique paper:
- A paper with Swedish and German co-authors counts once for Sweden and once for Germany
- International co-authorship rates and "leadership" (first-author country) are computed from per-paper affiliation arrays, where author order follows the Dimensions JSON array order (which reflects the published byline order)
Known limitations
- Affiliation parsing is imperfect; some affiliations are unresolved or incorrectly attributed
- Conference papers and preprints are included if indexed in Dimensions with MeSH terms
- MeSH indexing in PubMed typically lags 6–18 months for recent publications; 2023–2024 data may be undercounted
Citation metrics
Citation counts
Raw citation counts are from Dimensions and represent forward citations to each focal publication as of the download date (2026). Self-citations are not excluded.
Relative Citation Ratio (RCR)
The Relative Citation Ratio (Hutchins et al., 2016) is a field- and time-normalised citation metric from the NIH iCite database. An RCR of 1.0 means a paper has been cited at the same rate as the average for its field and year; RCR > 1 = above average.
- Coverage: ~93% of ATMP papers published ≤2022 have an RCR value; papers published in 2023–2024 are largely excluded (iCite requires ≥2 years of citation accumulation)
- Computation: iCite computes RCR by comparing citation rates against a co-citation network of papers in the same field, accounting for both field and year of publication
- Source: NIH iCite API, joined to publications via DOI
Commercial potential score (compot)
The commercial potential score (compot) is a proprietary metric provided by Dimensions (Digital Science). It estimates the likelihood that a publication will be cited in a patent, based on the citation patterns of similar papers in the Dimensions network.
- Scale: continuous, typically 0–1 (papers without a score have
compot = NULLorcompot = 0) - Coverage: approximately 60–70% of ATMP publications have a non-zero compot score; papers without a score are excluded from compot analysis
- Science potential score (scipot): also a Dimensions metric, but not available in the current dataset — all values are
NULL
Papers without a compot score are excluded from all commercial potential analysis. Results represent the subset of papers for which Dimensions has computed a score, which may not be a random sample.
Altmetric data
Source
Altmetric mentions are retrieved from the Altmetric Details API (Altmetric.com / Digital Science) for all publications with a DOI. Coverage depends on Altmetric having indexed the publication.
Coverage types
| Type | Description |
|---|---|
| News | Mentions in news outlets tracked by Altmetric |
| Blogs | Mentions in research or science blogs |
| Patents | Patent applications or grants that cite the publication (via USPTO, EPO, WIPO, and national patent offices) |
| Policy documents | Government policy documents and reports citing the publication |
| Clinical guidelines | Clinical practice guidelines citing the publication |
| Clinical trials | Registered clinical trials (ClinicalTrials.gov and WHO ICTRP) that cite the publication |
Patent jurisdiction classification
Patent citations are classified into jurisdiction groups based on the filing office:
| Group | Offices included |
|---|---|
| US | USPTO (United States Patent and Trademark Office) |
| EP | EPO (European Patent Office) |
| WIPO | PCT international applications |
| CN | CNIPA (China National Intellectual Property Administration) |
| JP | JPO (Japan Patent Office) |
| KR | KIPO (Korean Intellectual Property Office) |
| RestEurope | All other European national patent offices (including SE — additive) |
| RestWorld | All remaining offices |
Sweden is additive: Swedish patents (
SEjurisdiction) are counted in bothRestEuropeand the separateSEcolumn. This is intentional — it allows comparison of Sweden against its regional peer group without double-subtraction.
Policy and guideline source classification
242 unique policy sources and 997 unique guideline sources were manually classified by scope:
| Classification | Description |
|---|---|
international |
WHO, UN, OECD, ICH, and other supranational bodies |
eu_regional |
EMA, European Commission, ECDC, and EU bodies |
national |
National health ministries, agencies, and regulatory bodies |
Each source was assigned a certainty level (high / medium). 235 of 242 policy sources and 933 of 997 guideline sources were classified at high certainty. The SE column in policy/guideline tables uses location == "SE" (Altmetric's reported document location), not the source name.
Clinical trials
Source
Clinical trial metadata is retrieved from Dimensions via the clinical_trials endpoint. Dimensions links publications to registered trials via citation and metadata matching.
Trial phase classification
Trials are grouped into three phases:
| Group | Included phases |
|---|---|
| Early | Phase 1, Phase 1/2 |
| Mid | Phase 2, Phase 2/3 |
| Late | Phase 3, Phase 3/4, Phase 4 |
| Not Reported | Phase not specified or "N/A" |
Trial geography
Trial geography is based on the country of the registering organisation (trials_orgs.csv). A trial can be attributed to multiple countries. "Translation rate" = number of publications linked to ≥1 trial / total publications in that group.
Funder classification
Extraction
Funder names are extracted from Dimensions publication metadata. Each paper can have multiple funders. 6,092 unique funder name strings were extracted from the ATMP corpus.
Classification taxonomy
| Category | Description |
|---|---|
public_se |
Swedish public research councils and grant agencies (VR, Vetenskapsrådet, FORMAS, FORTE, MISTRA, Vinnova, KAW) |
public_eu |
EU funding bodies (Horizon 2020, Horizon Europe, ERC, Marie Curie) |
public_foreign |
Public research councils and government agencies outside Sweden and EU |
corporate |
Private companies and industry funders |
foundation |
Private philanthropic foundations (Wellcome Trust, Gates Foundation, etc.) |
unknown |
Unclassified or unrecognised funder name |
Coverage note
252 of 6,092 unique funders have been classified (48 public_se, 108 foundation, 83 public_foreign, 7 public_eu, 6 corporate). The long tail of funders (≤15 papers each) is intentionally left as unknown. Classification coverage is concentrated in the high-volume funders that drive the majority of funded papers.
Dimensions funder name strings often differ from assumed short forms (e.g., "Wellcome Trust Ltd" not "Wellcome Trust"). Name matching is therefore imperfect, and some known funders may be missed due to string variation.
Sample restrictions
- Publications with
year IS NULLare excluded from all time-series analysis - Publications with
compot IS NULL OR compot = 0are excluded from commercial potential analysis - Publications with
rcr IS NULL OR rcr = 0are excluded from RCR analysis - Country-level comparisons exclude records with
country_code IS NULL - The global baseline in Sweden-vs-global comparisons excludes Swedish papers to avoid contamination of the reference distribution
Key decisions log
| Date | Decision |
|---|---|
| 2026-05-07 | Query Dimensions by DescriptorName only, not entry terms |
| 2026-05-17 | Policy/guideline sources classified manually (242 + 997 sources) |
| 2026-05-17 | SE additive in patent and trial tables (SE ⊂ RestEurope, SE ⊂ Europe) |
| 2026-05-19 | Scheme D taxonomy finalised (F1–F4 Fundamental + E1–E4 Enabling) |
| 2026-05-22 | All 18 formerly-Misfit Scheme D terms reclassified by domain expert |
| 2026-05-25 | RCR used as primary quality metric; scipot unavailable (all NULL) |
Citation
If citing this platform or its outputs, please use:
ATMP Research Platform (2026). Descriptive analysis of global ATMP research output and Sweden's position. Developed in support of the VR/VINNOVA Excellence Clusters for Groundbreaking Technologies proposal. Stockholm: Stockholm School of Economics.
Data: Dimensions (publications, citations, clinical trials) · Altmetric (patent, policy, guideline, news, blog mentions) · NIH iCite (RCR) · MeSH 2026 ATMP classification: 112 terms, expert-validated. Platform built with Observable Framework 1.13.4 and Apache DuckDB.