Data Expectations
NEST CoreDev Team
10.11.2022
data-expectations.Rmd
Introduction
teal.goshawk
expects to be provided ADSL and an
accompanying ADLB clinical trials data set in ADaM format. For more
information on ADaM please the ADaM standards
.
The package provides ready-to-use teal
modules you can
embed in your teal
application. The modules generate highly
customizable plots and outputs often used in exploratory data analysis,
e.g.:
- box plots -
tm_g_gh_boxplot()
- correlation and scatter plots -
tm_g_gh_correlationplot()
andtm_g_gh_scatterplot()
- density distribution plots -
tm_g_gh_density_distribution_plot()
- line plots -
tm_g_gh_lineplot()
- spaghetti plots -
tm_g_spaghettiplot()
Data Expectations
ADSL
This is a subject level data set with one record per subject and includes any variables that are intended to be used for plot splitting
- For example if it is the intention to be able to split the line plot
by some outcome variable then that outcome variable must be in ADSL.
- e.g. ABCWK24 which represents a two level outcome variable with
values of “Y” and “N” at Week 24.
- e.g. ABCWK24 which represents a two level outcome variable with
values of “Y” and “N” at Week 24.
ADLB
This is a Basic Data Structure (BDS) data set meaning multiple records per subject per assay (PARAM) across unique time points. Additional variables that are intended to be used for plot splitting should be joined to ADLB.
- See ADSL example above where ABCWK24 would need to be joined to ADLB
Other Basic Data Structures
Other BDS data sets could be provisioned to teal.goshawk
like ADQS which contains multiple records per subject per question
(PARAM) across unique time points. However with all cases other than
ADLB there are likely workarounds needed.
- For example the concept of assay units, stored in AVALU, is not
really relevant to a BDS like ADQS which contains questionnaire data.
Given
teal.goshawk
expects an AVALU variable and uses the values in the plot title and y-axis label, AVALU would need to be added to ADQS with some appropriate value: Perhaps “Q”. If a value is not provided the the “()” portion of the title and y-axis label will be empty.
Required Variables
Several variables are required to realize the full functionality of
teal.goshawk
.
TRTORD
Definition: This variable orders treatment values in the legend
Rationale: Allows for congruent ordering as compared to other outputs being generated by a study team
Alternative: Variable is required
AVISITCD
Definition: This variable contains abbreviated values of AVISIT values
Rationale: Many AVISIT values are long and contain arguably superfluous information in some cases. Using these long values as x-axis tick labels can really chew into the real estate area available for the plot. Using thoughtful abbreviations conveys chronology with no substantive loss of information and maximizes the area available for the plot.
Alternative: If in cases creating abbreviations is
not helpful then simply set AVISITCD <- AVISIT
AVISITCDN
Definition: This variable contains the numeric portion of AVISITCD values
Rationale: Often AVISITN contains values that are not particularly helpful to reflect the proportional chronology of visits. Once AVISITCD is created then it is helpful to create the numeric values from AVISITCD values that can be seen as intuitively reflecting the visit chronology. For example: 0, 2, 4, 12, 24, 56, 84 etc. for weeks or 0, 14, 28, 84, 168, 392, 588 etc. for days. Using these, the longitudinal visualization x-axis will nicely reflect proportional distances between visits.
Alternative: If in cases creating a more intuitive
numeric chronology is not helpful then simply set
AVISITCDN <- AVISITN
AVALU
Definition: Analysis Value Unit
Rationale: Used in the plot title and y-axis labels. Please see “Other BDS data sets” comments above
Alternative: Variable is required.
LBSTRESC
Definition: From SDTM Character Result/Finding in Std Format
Rationale: As a character type variable, this variable contains values that include those below or above limits of quantitation (LOQ). These might look like “2.1<” or “>20.7”. When this is the case then AVAL is often missing. It is important to be able to still capture these values so the following derivation is used to do so and when this is needed the LOQFL variable should be set to “Y”. This signifies that the AVAL value for this record has been derived. - For values below the limit of quantitation, AVAL is set to the numeric portion of LBSTRESC divided by 2. - For values above the limit of quantitation, AVAL is set to the numeric portion of LBSTRESC.
Alternative: Variable is required
LOQFL
Definition: This is not an ADaM standard variable but represents the Limit of Quantitation Flag
Rationale: Set to “Y” when LBSTRESC value is used to
populate AVAL and LBSTRESC value is either below a limit of quantitation
for the assay or above the limit of quantitation for the assay.
Derivations for AVAL and LOQFL could look like the following in a
mutate() statement. -
AVAL = if_else(grepl("<|>", LBSTRESC), as.numeric(gsub("[^0-9, .]+", "", LBSTRESC)), AVAL)
-
LOQFL = if_else(grepl("<|>", LBSTRESC), "Y", "N")
Alternative: If the limit of quantitation concept is
not relevant then please set LOQFL <- "N"
BASE2
Definition: This is not an ADaM standard variable but represents the assay value at Screening
Rationale: When change from Screening visit analyses are needed then this variable contains the assay value at Screening
Alternative: If Screening visit analyses are not
relevant then please set BASE2 <- NA
CHG2
Definition: This is not an ADaM standard variable but represents the change from Screening assay value
Rationale: When change from Screening visit analyses are needed then this variable contains the assay value change between Screening and subsequent visit
Alternative: If Screening visit analyses are not
relevant then please set CHG2 <- NA
PCHG2
Definition: This is not an ADaM standard variable but represents the percent change from Screening assay value
Rationale: When percent change from Screening visit analyses are needed then this variable contains the assay value percent change between Screening and subsequent visit
Alternative: If Screening visit analyses are not
relevant then please set PCHG2 <- NA