CENTRAL creation details

CENTRAL is comprised of records retrieved from PubMed/MEDLINE, Embase, ClinicalTrials.gov, KoreaMed, all Review Groups' Specialized Registers and records identified by handsearching various biomedical sources, as described below.

Identifying PubMed/ MEDLINE RCTs and CCTs for inclusion in CENTRAL

Metaxis, on behalf of Cochrane, identify all the records in PubMed that are indexed with the Publication Type (PT) randomized controlled trial or controlled clinical trial (or both) in humans. To identify ‘human’ records, they identify all those records that are indexed as humans, humans and animals, or neither humans nor animals - and exclude only those records that are indexed as animals but not also humans. In addition to identifying ‘new’ records that are added to PubMed and meet the above criteria, they also replace any records that have undergone any corrections / updates (such as corrections to the citation, updating of indexing, links to retractions / errata etc.)

The PubMed search string used by Metaxis to generate the record set is:
((("randomized controlled trial"[Publication Type] OR "controlled clinical trial"[Publication Type]) NOT (ANIMALS[MH] NOT HUMANS[MH])) AND (("start"[LR] : "end"[LR]) OR ("start"[CRDT] : "end"[CRDT])))

To reflect changes in healthcare practice and to conform with terminology in current use, the MEDLINE/PubMed thesaurus (MeSH) is updated annually by the US National Library of Medicine (NLM), with new terms added, some old terms removed and some terms replaced.

At the end of each year, the new MeSH thesaurus is released by NLM, at which point Metaxis download all CENTRAL records that have PubMed IDs and refresh all fields including the MeSH field. The updated thesaurus is added to the Cochrane library’s search interface (usually in January of the following year).

A substantial proportion of the MEDLINE records with randomized controlled trial or controlled clinical trial in the Publication Type field have been indexed as such as a result of Cochrane’s trial identification work (2011). In the past handsearch results from Cochrane groups, centres and fields were sent to the NLM, so that the MEDLINE records could be re-tagged with the publication types randomized controlled trial or controlled clinical trial as appropriate. In addition, the US Cochrane Center and the UK Cochrane Centre conducted a search of MEDLINE for randomized controlled trials and quasi-randomized controlled trials using the Cochrane Highly Sensitive Search Strategy for the following years: US Cochrane Center (1966–1984; 1998–2004) and the UK Cochrane Centre (1985–1997) (Lefebvre et al 2011). All of the trials identified through this search are also included in CENTRAL. This project to re-tag randomized controlled trials and quasi-randomized controlled trials in MEDLINE ceased after the processing of the 2004 data was complete, due to lack of funding. MEDLINE records tagged with the randomized controlled trial or controlled clinical trial Publication Type continue, however, to be used as a core building block for the development of CENTRAL and the Cochrane Register of Studies (CRS).

A substantial proportion of the MEDLINE records indexed as randomized controlled trial or controlled clinical trial in the Publication Type field have been indexed as such as a result of the work of The Cochrane Collaboration (2011). Handsearch results from Cochrane entities have been sent to the US National Library of Medicine, where MEDLINE records have been re-tagged with the publication types randomized controlled trial or controlled clinical trial as appropriate. In addition, the US Cochrane Center (USCC) and the UK Cochrane Centre (UKCC) conducted a search of MEDLINE for randomized controlled trials and quasi-randomized controlled trials using the Cochrane Highly Sensitive Search Strategy for the following years: US Cochrane Center (1966–1984; 1998–2004) and the UK Cochrane Centre (1985–1997) (Lefebvre et al 2011). All of the trials identified through this search are also included in CENTRAL. This project to re-tag randomized controlled trials and quasi-randomized controlled trials in MEDLINE ceased after the processing of the 2004 data was complete, due to lack of funding. MEDLINE records tagged with the randomized controlled trial or controlled clinical trial Publication Type continue, however, to be used as a core building block for the development of CENTRAL and the Cochrane Register of Studies (CRS).

Identifying Embase RCTs and CCTs for inclusion in CENTRAL

Background

A retrospective search for reports of trials in Embase was completed by the UK Cochrane Centre (UKCC) for the years 1980 to 2009. For 1980 to 2008 the free-text terms searched were: random$; factorial$; crossover$; placebo$; doubl$ adj blind$; singl$ adj blind$; assign$; allocat$; volunteer$; and the index terms (known as Emtree terms) searched were: crossover-procedure; double-blind procedure; randomized controlled trial; single-blind procedure. For 2009 the same searches were run but the terms trial and comparison were limited to the title only and the terms factorial$, assign$ and volunteer$ were no longer included.

A separate search for the years 1974 to 1979 inclusive was completed using the free-text terms: random$; factorial$; crossover$ and placebo$.

The UKCC continued to process Embase records until 2011; the last set comprised all records published in 2010. Records for this set were identified using the following index terms: crossover procedure, double-blind procedure, single-blind procedure and randomized controlled trial; and the following free-text terms were searched limited to the title, abstract and original title fields only: crossover$, cross over$, placebo$, doubl$ adj blind$, allocat$, random$. The term trial$ was searched limited to the title only. The 2010 data set was added to CENTRAL in two batches: the first batch of approximately 1,700 records was added to CENTRAL in October 2011 (Issue 4) and the second batch of approximately 3,100 records was added in January 2012 (Issue 1).

In March 2013, the contract to identify Embase records was awarded to a consortium made up of Metaxis Ltd, the Cochrane Dementia and Cognitive Improvement Group, and York Health Economics Consortium (YHEC). That new contract covered both clearing the backlog of records that had not been added to CENTRAL between the end of the UKCC contract and the start of the new contract, and the ongoing addition of new Embase records. Some Embase records go directly into CENTRAL as described below and others need to be screened to determine whether they meet the criteria for inclusion in CENTRAL. A description of the screening process is provided below..

Clearing the backlog

A search of Embase covering January 2011 to December 2013, was run via Ovid SP using the Emtree headings Randomized Controlled Trial (RCT) or Controlled Clinical Trial (CCT), from which 28,442 unique Embase records were identified and published in CENTRAL in January 2014 (Issue 1). It is estimated that this search, using only these two headings, identified around two thirds of the eligible records from the 2011-2013 Embase backlog.

The same search was used to retrieve conference abstracts published between 2010 and October 2014 and resulted in the publication of 9193 records in CENTRAL in October 2014 (Issue 10).

A further 20,655 records were identified through a screening process covering January 2011 to December 2013, inclusive, for journal publications, and January 2010 to December 2013, inclusive, for conference records. These records had been identified through the UKCC search strategy and did not have the Emtree headings RCT or CCT. They were published in CENTRAL in December 2014 (Issue 12).

The info graphic below details the numbers retrieved and the numbers published for the backlog period for both journal records and for conference records

Backlog records that need to be screened

backlog records that were automatic

Ongoing search and retrieval of records

Since the beginning of March 2014, eligible Embase records have been identified prospectively. This process entails the identification of two sets of records each month: Set 1 is identified using the Emtree heading Randomized Controlled Trial (RCT), and Set 2 is identified by a newly developed search outlined below.

Between the beginning of March 2016 and September 2016, there were two key Emtree terms that formed the Set 1 records: Randomized Controlled Trial, and Controlled Clinical Trial. After an analysis of records that were being directly fed into CENTRAL as part of this process, it was decided to drop the Controlled Clinical Trial term from the direct feed, and instead add the term to the Set 2 search. This was because the term was contributing a high proportion of non-RCT records directly into CENTRAL

Set 1: Records with Publication Type RCT

The first set of records retrieved each month, using the Randomized Controlled Trial (RCT) Emtree headings, have the following filter applied to help identify animal studies:

  1. exp experimental organism/
  2. animal tissue/
  3. animal cell/
  4. exp animal disease/
  5. exp carnivore disease/
  6. exp bird/
  7. exp experimental animal welfare/
  8. exp animal husbandry/
  9. animal behavior/
  10. exp animal cell culture/
  11. exp mammalian disease/
  12. exp mammal/
  13. exp marine species/
  14. nonhuman/
  15. animal.hw.
  16. or/1-15
  17. 16 not human/

Records with the publication type RCT are loaded into CENTRAL in the issue following the month they appeared in Embase. For example, records that appeared in Embase during February are retrieved in March and appear in the March issue of CENTRAL.

Set 2: Records retrieved by the new search

The development of the search strategy used to retrieve the second set of records each month evolved throughout 2014-17. The following string has been used since January 2015, however an amendment was made in September 2017, whereby the Controlled clinical trial Emtree term was removed from Set 1 and added to Set 2, as shown below:

  1. Randomized controlled trial/
  2. Controlled clinical study/
  3. Random$.ti,ab.
  4. randomization/
  5. intermethod comparison/
  6. placebo.ti,ab.
  7. (compare or compared or comparison).ti.
  8. ((evaluated or evaluate or evaluating or assessed or assess) and (compare or compared or comparing or comparison)).ab.
  9. (open adj label).ti,ab.
  10. ((double or single or doubly or singly) adj (blind or blinded or blindly)).ti,ab.
  11. double blind procedure/
  12. parallel group$1.ti,ab.
  13. (crossover or cross over).ti,ab.
  14. ((assign$ or match or matched or allocation) adj5 (alternate or group$1 or intervention$1 or patient$1 or subject$1 or participant$1)).ti,ab.
  15. (assigned  or allocated).ti,ab.
  16. (controlled adj7 (study or design or trial)).ti,ab.
  17. (volunteer or volunteers).ti,ab.
  18. human experiment/
  19. trial.ti.
  20. or/2-19
  21. 20 not 1
  22. random$ adj sampl$ adj7 ("cross section$" or questionnaire$1 or survey$ or database$1)).ti,ab. not (comparative study/ or controlled study/ or randomi?ed controlled.ti,ab. or randomly assigned.ti,ab.)
  23. Cross-sectional study/ not (randomized controlled trial/ or controlled clinical study/ or controlled study/ or randomi?ed controlled.ti,ab. or control group$1.ti,ab.)
  24. (((case adj control$) and random$) not randomi?ed controlled).ti,ab.
  25. (Systematic review not (trial or study)).ti.
  26. (nonrandom$ not random$).ti,ab.
  27. "Random field$".ti,ab.
  28. (random cluster adj3 sampl$).ti,ab.
  29. (review.ab. and review.pt.) not trial.ti.
  30. "we searched".ab. and (review.ti. or review.pt.)
  31. "update review".ab.
  32. (databases adj4 searched).ab.
  33. (rat or rats or mouse or mice or swine or porcine or murine or sheep or lambs or pigs or piglets or rabbit or rabbits or cat or cats or dog or dogs or cattle or bovine or monkey or monkeys or trout or marmoset$1).ti. and animal experiment/
  34. Animal experiment/ not (human experiment/ or human/)
  35. or/22-34
  36. 21 not 35

This set of records is retrieved and screened in the month after they appeared in Embase, before being added to CENTRAL the month after that. For example, records that do not have the RCT heading and appeared in Embase during February are retrieved and screened in March and will appear in the April issue of CENTRAL.

Current screening process (from January 2017)

Records for which CENTRAL eligibility is unclear (i.e. records from Set 2) go through a two-stage screening process using Cochrane’s RCT machine classifier and Cochrane’s new platform, Cochrane Crowd, which have been built as part of Project Transform’s Evidence Pipeline. In the first stage, the machine classifier determines the likelihood that the record is describing a randomized trial. Records with a 10% or less likelihood score will be discarded. In the second stage, records that have a likelihood score of 11% or more are sent to Cochrane Crowd to be screened by humans. Performance evaluations shows over 99% accuracy at the thresholds described above. Within Cochrane Crowd, every record is screened at least twice with all disagreements resolved by two experienced expert screeners.

Identifying ClinicalTrials.gov RCTs and CCTs for inclusion in CENTRAL

From August 2017, eligible ClinicalTrials.gov (CT.gov) records are being identified and systematically added to CENTRAL through Cochrane’s Centralised Search Service project.

Process description

All CT.gov records will go through Cochrane’s RCT machine classifier and some go through Cochrane Crowd (crowd.cochrane.org). The classifier provides likelihood scores for each record being either a randomized or quasi-randomized trial report. Records with an 80% or greater likelihood score will be submitted directly to CENTRAL. Records with a 10% or less likelihood score will be rejected without any further action. Records that receive 11%-79% will be sent to Cochrane Crowd to be screened by humans. Performance evaluations shows over 99% accuracy at the thresholds described above.

Backlog

The backlog stands at a total of 248,928 as of end of August 2017. Of these, 72,030 have a classifier score of 10% or less; these records will be rejected. 74,801 have a score of 80% or more; these records will be de-duplicated against CENTRAL and unique records will then be added to CENTRAL in September 2017 (available in issue 9). The 102,097 records with a likelihood score of between 11-79% and will be screened by Cochrane Crowd. It is estimated that this backlog will be cleared by the end of March 2018.

Prospective workflow

The prospective workflow is the same as the workflow described above in terms of classifier use and thresholds applied. Both the backlog and CT.gov records added since the beginning of August 2017 will be processed in parallel until the backlog is cleared.

Field mappings

The CT.gov records contain a number of fields, but not all fields will be displayed in CENTRAL. The fields that will be displayed in CENTRAL will be the Public and Scientific titles, the url to the registry record, the brief summary of the trial, MeSH, and the “date first received” (i.e. the date the record was first processed by ClinicalTrials.gov). The following data fields from ClinicalTrials.gov have not been republished in CENTRAL: Recruitment status, Study results, Condition, Intervention, Sponsor, Gender, Age, Phase, Enrolment, Funded by, Study type, Study design, Other IDs, Start date, Completion date, Last updated, Last verified, Acronym, Primary completion date, Outcome measures.

Identifying KoreaMed RCTs and CCTs for inclusion in CENTRAL

KoreaMed (koreamed.org) is a database provided by the Korean Association of Medical Journal Editors that contains citations to articles published in Korean medical, dental, nursing and nutrition related journals. This database is now routinely searched and records systematically added to CENTRAL through Cochrane’s Centralised Search Service project.

Process description

Inception to December 2013
A project led by Cochrane Australia, in partnership with KoreaMed, sought to identify all unique reports of randomized trials across all dates within the database. As part of this work a search strategy was developed and run in KoreaMed. The search strategy was:

placebo*[ALL] OR randomi*[ALL] OR randomly[ALL] OR trial*[ALL] OR ((singl* OR doubl* OR tripl* OR trebl*) AND (blind OR mask)) OR "randomized controlled trial"[PT] OR "clinical trial"[PT] OR "double blind method"[MH] OR "single blind method"[MH]

That work identified approximately 3300 unique reports of randomized trials, which were published in CENTRAL in April, 2015.

January 2014 to July 2017
Between January 2014 and up to and including June 2017, all records that were added to KoreaMed within that time frame were manually screened by the Centralised Search Service team, with 1,100 records submitted to CENTRAL during this time.

August 2017 onwards
From August 2017, a new process has been implemented. All KoreaMed records go through the Cochrane’s RCT machine classifier and Cochrane Crowd (crowd.cochrane.org). The classifier provides likelihood scores for each record being a randomized trial report. Records that receive a score of 10% or less will be automatically rejected. Records that receive a score of 11% or above will be sent to Cochrane Crowd for manual screening.

To identify records from KoreaMed within CENTRAL, use the All Text field and the search term: HS-KOREAMED

Specialized Registers

Each of the 52 Cochrane Review Groups are responsible for the development of a Specialized Register, in which reports of RCTs and CCTs which fit the Group’s scope. Many of these records are published in CENTRAL via the Cochrane Register of Studies (CRS). The Information Specialist of the Review Group is responsible for ensuring that all eligible records from the Group’s Specialized Register are published in CENTRAL. Each record that has been submitted as part of a Specialized Register is assigned a Review Group code in CENTRAL. (See Appendix for a list of groups and codes.)

Handsearch results

Some Cochrane Centres search the general healthcare literature of their country or region. In addition, some Cochrane Review Groups and Fields/Networks will handsearch specialist literature in their areas of interest, which is not indexed in bibliographic databases. Identified trial reports that are not relevant to a Review Group's scope and thus are not appropriate for their Specialized Register are submitted to CENTRAL as ‘handsearch’ records and are assigned the HS-HANDSEARCH code as well as being assigned the handsearch code from the Appendix list below. 


Building CENTRAL

Each month, CENTRAL is re-built using records from the four sources mentioned above, in the following order of precedence: (1) MEDLINE, (2) Embase, (3) handsearch results and (4) Specialized Registers. Therefore, for example, if a Specialized Register record matches to an existing MEDLINE or EMBASE record, the MEDLINE or Embase source record will be preferentially published. In these cases, the relevant Specialized Register code will be appended to the MEDLINE or Embase record in the CENTRAL ‘Cochrane Group Code’ field. No other information from the record, as originally submitted through the Specialized Register, will be added to the corresponding MEDLINE or Embase record that is published in CENTRAL.

CENTRAL was originally created as an immediate repository for all citations to reports of trials identified by Cochrane. Because of the required quick turn-around time and relative lack of quality control, CENTRAL inevitably contains some typographical errors, duplicates, and reports of non-trials. Advances in machine learning and the advent of crowd-sourcing presents new opportunities to ‘clean’ CENTRAL. In future efforts will be made to remove duplicates and non-eligible study designs.

References

Lefebvre C, Manheimer E, Glanville J. Chapter 6: Searching for studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. http://www.cochrane-handbook.org/.


Appendix: Review Group or Field/Network Specialised Register and Handsearch Codes

SR codeHS codeGroup name
SR-ADDICTN HS-ADDICTN Drugs and Alcohol Group
SR-AIRWAYS HS-AIRWAYS Airways Group
SR-ANAESTH HS-ANAESTH Anaesthesia, Critical and Emergency Care Group
SR-ARI HS-ARI Infections Group
SR-BACK HS-BACK Back Group
SR-BEHAV HS-BEHAV Developmental, Psychosocial and Learning Problems Group
SR-BEHAVMED HS-BEHAVMED Behavioural Medicines Field
SR-BREASTCA HS-BREASTCA Breast Cancer Group
SR-CANCER HS-CANCER Cancer Network
SR-CF HS-CF Genetic Disorders Group
SR-CHILD HS-CHILD Child Health Field
SR-CHILDCA HS-CHILDCA Childhood Cancer Group
SR-COLOCA HS-COLOCA Colorectal Cancer Group
SR-COMMUN HS-COMMUN Consumers and Communication Group
SR-COMPMED HS-COMPMED Complementary Medicine Field
SR-DEMENTIA HS-DEMENTIA Dementia and Cognitive Improvement Group
SR-DEPRESSN HS-DEPRESSN Common Mental Disorders Group
SR-ENDOC HS-ENDOC Metabolic and Endocrine Disorders Group
SR-ENT HS-ENT ENT Group
SR-EPILEPSY HS-EPILEPSY Epilepsy Group
SR-EPOC HS-EPOC Effective Practice and Organisation of Care Group
SR-EYES HS-EYES Eyes and Vision Group
SR-FERTILREG HS-FERTILREG Fertility Regulation Group
SR-GYNAECA HS-GYNAECA Gynaecological, Neuro-oncology and Orphan Cancer Group
SR-HAEMATOL HS-HAEMATOL Haematological Malignancies Group
SR-HIV HS-HIV HIV/AIDS Group
SR-HTN HS-HTN Hypertension Group
SR-IBD HS-IBD Inflammatory Bowel Disease Group
SR-INCONT HS-INCONT Incontinence Group
SR-INFECTN HS-INFECTN Infectious Diseases Group
SR-INJ HS-INJ Injuries Group
SR-LIVER HS-LIVER Hepato-Biliary Group
SR-LUNGCA HS-LUNGCA Lung Cancer Group
SR-MENSTR HS-MENSTR Gynaecology and Fertility Group
SR-MOVEMENT HS-MOVEMENT Movement Disorders Group
SR-MS HS-MS Multiple Sclerosis Group
SR-MUSKEL HS-MUSKEL Musculoskeletal Group
SR-MUSKINJ HS-MUSKINJ Bone, Joint and Muscle Trauma Group
SR-NEONATAL HS-NEONATAL Neonatal Group
SR-NEUROMUSC HS-NEUROMUSC Neuromuscular Group
SR-ORAL HS-ORAL Oral Health Group
- HS-PRECENTRL Handsearch records lost from CENTRAL before 2000 issue 1
SR-PREG HS-PREG Pregnancy and Childbirth Group
SR-PROSTATE HS-PROSTATE Urology Group
SR-PVD HS-PVD Vascular Group
SR-PUBHLTH
also
SR-HEALTHP
HS-PUBHLTH
HS-HEALTHP
Public Health Group
SR-REHAB HS-REHAB Rehabilitation and Related Therapies Field
SR-RENAL HS-RENAL Kidney and Transplant Group
SR-SCHIZ HS-SCHIZ Schizophrenia Group
SR-SKIN HS-SKIN Skin Group
SR-SPECTR HS-SPECTR Social, Psychological, and Educational Controlled Trials Register
SR-STD HS-STD Sexually Transmitted Infections Group
SR-STROKE HS-STROKE Stroke Group
SR-SYMPT HS-SYMPT Social, Psychological, and Educational Controlled Trials Register
SR-TOBACCO HS-TOBACCO Tobacco Addiction Group
SR-UPPERGI HS-UPPERGI Upper Gastrointestinal and Pancreatic Diseases Group
SR-VASC HS-VASC Heart Group
SR-WOUNDS HS-WOUNDS Wounds Group

Centre Handsearch codes

HS CodeCentre
HS-ACC Cochrane Australia 
HS-BCC Cochrane Brazil
HS-CANCC Cochrane Canada
HS-CHINESECC Cochrane China
HS-DCC Cochrane Netherlands
HS-GCC Cochrane Germany
HS-IBEROCC Cochrane Iberoamerica 
HS-ITALCC Cochrane Italy
HS-NCC Cochrane Nordic
HS-SACC Cochrane South Africa
HS-SASIANCC Cochrane South Asia
HS-TCN Cochrane Thailand
HS-UKCC Cochrane UK
HS-USCC Cochrane United States
HS-NBSSI National Blood Service Systematic Review Initiative