CDISC SDTM Mapping

Introduction

Timely, clear and actionable data insights allow important clinical decisions to be made in real time. And earlier insights mean life-changing or life-enhancing medicines can be brought to those in need faster. When it comes to clinical trials, data is everything. Data analytics answers questions like Is a drug working? Do side effects exist? Is it safe to continue with the trial?  

To conduct clear, comprehensive data analytics, quality SDTM datasets need to be produced. This can mean lots of manual spreadsheet work and programming to get the clinical data into the format required by the US Food and Drug Administration (FDA). The process of mapping raw data to the SDTM format is known as SDTM mapping.

The CDISC SDTM Structure:

CDISC standards were put in place to allow the FDA to quickly and easily analyze clinical trial data to understand the safety and efficacy of new drugs. For clinical trial data to be accepted by regulatory reviewers, the submission must be in the correct format. If not, it may mean lots of time, money and work is wasted. Not providing quality data to the agency could mean the entire trial is jeopardized.

The CDISC SDTM Structure:

Implementing CDISC standards provides the submitter benefits:

  • Better data quality.
  • Greater trial process efficiency.
  • Reduced trial timeline and costs.
  • Streamlined processes.
  • Easier sharing of data.
  • Full traceability in the clinical research process from beginning to end.
  • Allows for greater innovation.

Retrospective SDTM Mapping Challenges:

Typically, data is aligned with SDTM format at the end of the trial after all patient data has been collected. Working with non-standardized data can pose some challenges, including terminology and structure inconsistencies. The manual work required to map non-standardized data to SDTM can be time-consuming, and often leads to submission delays. Non-Standardized data causes problems with mapping terminology, problems with data structure, and inconsistencies across studies.

SDTM Mapping Specification Document:

SDTM mapping is a complicated task, so plan and organize using an SDTM mapping specification document. This document specifies how the raw data is to be converted and is used by the SDTM programmer and the testing team. During the data mapping process, there’s a risk that data can be lost or distorted. Consider a scenario where your source data is in text format, and your target data should be enumerated. Unless you provide a logical specification for how the text values should be mapped to the required values, you might get errors in the resulting data. The specification document helps you identify and resolve these potential errors before you start data mapping.

The specification document can be created manually as follows:

1. Determine the SDTM domains needed, examine the CRFs and raw data, and identify which SDTM domains you need.

2. Note the relevant raw dataset for each against each SDTM domain, note which raw dataset will provide the input data.

3. List and describe programming for all variables against each SDTM domain and list all variables and describe how they are to be programmed.

The Mapping Process:

To be CDISC compliant, raw datasets must be stepwise mapped from the structure used in your clinical data management system (or another database) to the CDISC SDTM structure.

  • Identify the datasets you want to map.
  • Identify the SDTM datasets that correspond to those datasets.
  • Gather the metadata of the datasets and the corresponding SDTM metadata.
  • Map variables in the datasets from step 1 to SDTM Domain Variables.
  • Create Custom domains for other datasets that don’t have corresponding SDTM datasets.

Typical Mapping Scenarios:

A typical SDTM mapping process includes 9 likely scenarios.

  1. The direct carry forward: These are variables already SDTM compliant. These can be directly carried forward to the SDTM datasets and don’t need to be modified.
  2. The variable rename: Some variables need to be renamed to map to the corresponding SDTM variable. For example, if the original variable is GENDER, it should be renamed SEX to comply with CDISC SDTM standards.
  3. The variable attribute change: As well as variable names, variable attributes must be mapped. Attributes such as label, type, length, and format must comply with the SDTM attributes.
  4. The reformat: The value that is represented doesn’t change, but the format it’s stored in does. For example, converting a SAS date to an ISO 8601 format character string.
  5. The combine: In some cases, multiple variables must be combined to form a single SDTM variable.The split: A non-SDTM variable might need to be split into two or more SDTM variables to comply with SDTM standards.
  6. The derivation: Some SDTM variables are obtained by deriving a conclusion from data in the non-SDTM dataset. For example, using date of birth and study start date to derive a patient’s age instead of manually entering the age upfront.
  7. The split: A non-SDTM variable might need to be split into two or more SDTM variables to comply with SDTM standards.
  8. The variable value map and new code list application: Some variable values need to be recoded or mapped to match the values of a corresponding SDTM variable. This mapping is recommended for variables with a code list attached that has non-extensible controlled terminology. It’s also advisable to map all values in the controlled terminology, rather than just for the values present in the dataset. This would cover values that are not in the dataset currently but may come in during future dataset update.
  9. The horizontal to vertical data structure transpose:  If the structure of the non-CDISC dataset is completely different from its corresponding SDTM dataset, it may need to be transformed to one that is SDTM-compliant. The Vital Signs dataset is a good example. When data is collected in wide form, every test and recorded value is stored in separate variables. As SDTM requires data to be stored in a vertical form, the dataset must be transposed to have the tests, values, and unit under three variables. If there are variables that cannot be mapped to an SDTM variable, they would go into supplemental qualifiers. More than one type of mapping may be needed to create an SDTM variable.

    SDTM Mapping Best Practice:

    a. Implement SDTM from the start of the trial. When data is aligned with SDTM at the end of the trial, trying to make it fit the SDTM structure retrospectively is difficult. It takes a lot of time and manual work to retrospectively map data, and this can lead to submission delays. Best practice is to align with CDISC standards before patient data is collected. When designing case report forms (CRFs) consider the SDTM format. Doing it this way means you’ll save a lot of time and effort manually mapping the data down the line.

    b. Map the Data Upfront. Create simulated datasets before collecting any patient data and check the datasets will meet regulatory requirements.

      c. Do CDISC SDTM dataset validation. Validating SDTM dataset designs against CDISC standards (SDTM-IG and NCI controlled terms). NCI controlled terms are a set of code lists and valid values used to standardize terminology in clinical trials. The National Cancer Institute (NCI) and CDISC (Clinical and Translational Science Informatics) collaborate to develop and maintain these terms. The goal is to ensure that the same information is represented in the same way across studies.

      NCI Code lists are approved lists of controlled terms that can be used repeatedly on other questions. New versions of the controlled terms are released quarterly. The controlled terms are available for download from the NCI FTP site in a number of formats, including Excel, text, and PDF. New terms can be suggested for review and inclusion in the controlled terms. The terms must be valid and unique. Some controlled terms are extensible, meaning that users can add their own terms in addition to the existing ones. If a term is requested but denied by the Terminology teams, the reason for denial is included in a list on the CDISC website.

      The FDA and PMDA publish compliance rules for clinical data submissions, including SDTM. The output must ultimately be compared against the rules, and any deviations must be resolved or fully documented. Validation is a crucial way to increase data quality while reducing time spent in development and review. Key factors such as correct use of controlled terms must be built into the EDC CRFs and SDTM conversion process. This includes EDC edit check programming and data management checks programmed to highlight data issues early.