Data Expectations
NEST CoreDev Team
10.11.2022
Source:vignettes/data-expectations.Rmd
data-expectations.RmdIntroduction
teal.goshawk expects to be provided ADSL
and an accompanying ADLB clinical trials data set in
ADaM format. For more information on ADaM
please the ADaM standards.
The package provides ready-to-use teal modules you can
embed in your teal application. The modules generate highly
customizable plots and outputs often used in exploratory data analysis,
e.g.:
- box plots -
tm_g_gh_boxplot() - correlation and scatter plots -
tm_g_gh_correlationplot()andtm_g_gh_scatterplot() - density distribution plots -
tm_g_gh_density_distribution_plot() - line plots -
tm_g_gh_lineplot() - spaghetti plots -
tm_g_spaghettiplot()
Data Expectations
ADSL
This is a subject level data set with one record per subject and includes any variables that are intended to be used for plot splitting
- For example if it is the intention to be able to split the line plot
by some outcome variable then that outcome variable must be in
ADSL.- e.g.
ABCWK24which represents a two level outcome variable with values of"Y"and"N"at Week 24.
- e.g.
ADLB
This is a Basic Data Structure (BDS) data set meaning
multiple records per subject per assay (PARAM) across
unique time points. Additional variables that are intended to be used
for plot splitting should be joined to ADLB.
- See
ADSLexample above whereABCWK24would need to be joined toADLB
Other Basic Data Structures
Other BDS data sets could be provisioned to
teal.goshawk like ADQS which contains multiple
records per subject per question (PARAM) across unique time
points. However with all cases other than ADLB there are
likely workarounds needed.
- For example the concept of assay units, stored in
AVALU, is not really relevant to aBDSlikeADQSwhich contains questionnaire data. Giventeal.goshawkexpects anAVALUvariable and uses the values in the plot title and y-axis label,AVALUwould need to be added toADQSwith some appropriate value: Perhaps"Q". If a value is not provided the the"()"portion of the title and y-axis label will be empty.
Required Variables
Several variables are required to realize the full functionality of
teal.goshawk.
TRTORD
Definition: This variable orders treatment values in the legend
Rationale: Allows for congruent ordering as compared to other outputs being generated by a study team
Alternative: Variable is required
AVISITCD
Definition: This variable contains abbreviated
values of AVISIT values
Rationale: Many AVISIT values are long
and contain arguably superfluous information in some cases. Using these
long values as x-axis tick labels can really chew into the real estate
area available for the plot. Using thoughtful abbreviations conveys
chronology with no substantive loss of information and maximizes the
area available for the plot.
Alternative: If in cases creating abbreviations is
not helpful then simply set AVISITCD <- AVISIT
AVISITCDN
Definition: This variable contains the numeric
portion of AVISITCD values
Rationale: Often AVISITN contains
values that are not particularly helpful to reflect the proportional
chronology of visits. Once AVISITCD is created then it is
helpful to create the numeric values from AVISITCD values
that can be seen as intuitively reflecting the visit chronology. For
example: 0, 2, 4, 12, 24, 56, 84 etc. for weeks or 0, 14, 28, 84, 168,
392, 588 etc. for days. Using these, the longitudinal visualization
x-axis will nicely reflect proportional distances between visits.
Alternative: If in cases creating a more intuitive
numeric chronology is not helpful then simply set
AVISITCDN <- AVISITN
AVALU
Definition: Analysis Value Unit
Rationale: Used in the plot title and y-axis labels.
Please see "Other BDS data sets" comments above
Alternative: Variable is required.
LBSTRESC
Definition: From SDTM Character
Result/Finding in Std Format
Rationale: As a character type variable, this
variable contains values that include those below or above limits of
quantitation (LOQ). These might look like
"2.1<" or ">20.7". When this is the case
then AVAL is often missing. It is important to be able to
still capture these values so the following derivation is used to do so
and when this is needed the LOQFL variable should be set to
"Y". This signifies that the AVAL value for
this record has been derived. - For values below the limit of
quantitation, AVAL is set to the numeric portion of
LBSTRESC divided by 2. - For values above the limit of
quantitation, AVAL is set to the numeric portion of
LBSTRESC.
Alternative: Variable is required
LOQFL
Definition: This is not an ADaM
standard variable but represents the Limit of Quantitation Flag
Rationale: Set to "Y" when
LBSTRESC value is used to populate AVAL and
LBSTRESC value is either below a limit of quantitation for
the assay or above the limit of quantitation for the assay. Derivations
for AVAL and LOQFL could look like the
following in a mutate() statement. -
AVAL = if_else(grepl("<|>", LBSTRESC), as.numeric(gsub("[^0-9, .]+", "", LBSTRESC)), AVAL)
-
LOQFL = if_else(grepl("<|>", LBSTRESC), "Y", "N")
Alternative: If the limit of quantitation concept is
not relevant then please set LOQFL <- "N"
BASE2
Definition: This is not an ADaM
standard variable but represents the assay value at Screening
Rationale: When change from Screening visit analyses are needed then this variable contains the assay value at Screening
Alternative: If Screening visit analyses are not
relevant then please set BASE2 <- NA
CHG2
Definition: This is not an ADaM
standard variable but represents the change from Screening assay
value
Rationale: When change from Screening visit analyses are needed then this variable contains the assay value change between Screening and subsequent visit
Alternative: If Screening visit analyses are not
relevant then please set CHG2 <- NA
PCHG2
Definition: This is not an ADaM
standard variable but represents the percent change from Screening assay
value
Rationale: When percent change from Screening visit analyses are needed then this variable contains the assay value percent change between Screening and subsequent visit
Alternative: If Screening visit analyses are not
relevant then please set PCHG2 <- NA