Data Expectations
NEST CoreDev Team
10.11.2022
data-expectations.Rmd
Introduction
teal.goshawk
expects to be provided ADSL
and an accompanying ADLB
clinical trials data set in
ADaM
format. For more information on ADaM
please the ADaM standards
.
The package provides ready-to-use teal
modules you can
embed in your teal
application. The modules generate highly
customizable plots and outputs often used in exploratory data analysis,
e.g.:
- box plots -
tm_g_gh_boxplot()
- correlation and scatter plots -
tm_g_gh_correlationplot()
andtm_g_gh_scatterplot()
- density distribution plots -
tm_g_gh_density_distribution_plot()
- line plots -
tm_g_gh_lineplot()
- spaghetti plots -
tm_g_spaghettiplot()
Data Expectations
ADSL
This is a subject level data set with one record per subject and includes any variables that are intended to be used for plot splitting
- For example if it is the intention to be able to split the line plot
by some outcome variable then that outcome variable must be in
ADSL
.- e.g.
ABCWK24
which represents a two level outcome variable with values of"Y"
and"N"
at Week 24.
- e.g.
ADLB
This is a Basic Data Structure (BDS
) data set meaning
multiple records per subject per assay (PARAM
) across
unique time points. Additional variables that are intended to be used
for plot splitting should be joined to ADLB
.
- See
ADSL
example above whereABCWK24
would need to be joined toADLB
Other Basic Data Structures
Other BDS
data sets could be provisioned to
teal.goshawk
like ADQS
which contains multiple
records per subject per question (PARAM
) across unique time
points. However with all cases other than ADLB
there are
likely workarounds needed.
- For example the concept of assay units, stored in
AVALU
, is not really relevant to aBDS
likeADQS
which contains questionnaire data. Giventeal.goshawk
expects anAVALU
variable and uses the values in the plot title and y-axis label,AVALU
would need to be added toADQS
with some appropriate value: Perhaps"Q"
. If a value is not provided the the"()"
portion of the title and y-axis label will be empty.
Required Variables
Several variables are required to realize the full functionality of
teal.goshawk
.
TRTORD
Definition: This variable orders treatment values in the legend
Rationale: Allows for congruent ordering as compared to other outputs being generated by a study team
Alternative: Variable is required
AVISITCD
Definition: This variable contains abbreviated
values of AVISIT
values
Rationale: Many AVISIT
values are long
and contain arguably superfluous information in some cases. Using these
long values as x-axis tick labels can really chew into the real estate
area available for the plot. Using thoughtful abbreviations conveys
chronology with no substantive loss of information and maximizes the
area available for the plot.
Alternative: If in cases creating abbreviations is
not helpful then simply set AVISITCD <- AVISIT
AVISITCDN
Definition: This variable contains the numeric
portion of AVISITCD
values
Rationale: Often AVISITN
contains
values that are not particularly helpful to reflect the proportional
chronology of visits. Once AVISITCD
is created then it is
helpful to create the numeric values from AVISITCD
values
that can be seen as intuitively reflecting the visit chronology. For
example: 0, 2, 4, 12, 24, 56, 84 etc. for weeks or 0, 14, 28, 84, 168,
392, 588 etc. for days. Using these, the longitudinal visualization
x-axis will nicely reflect proportional distances between visits.
Alternative: If in cases creating a more intuitive
numeric chronology is not helpful then simply set
AVISITCDN <- AVISITN
AVALU
Definition: Analysis Value Unit
Rationale: Used in the plot title and y-axis labels.
Please see "Other BDS data sets"
comments above
Alternative: Variable is required.
LBSTRESC
Definition: From SDTM
Character
Result/Finding in Std Format
Rationale: As a character type variable, this
variable contains values that include those below or above limits of
quantitation (LOQ
). These might look like
"2.1<"
or ">20.7"
. When this is the case
then AVAL
is often missing. It is important to be able to
still capture these values so the following derivation is used to do so
and when this is needed the LOQFL
variable should be set to
"Y"
. This signifies that the AVAL
value for
this record has been derived. - For values below the limit of
quantitation, AVAL
is set to the numeric portion of
LBSTRESC
divided by 2. - For values above the limit of
quantitation, AVAL
is set to the numeric portion of
LBSTRESC
.
Alternative: Variable is required
LOQFL
Definition: This is not an ADaM
standard variable but represents the Limit of Quantitation Flag
Rationale: Set to "Y"
when
LBSTRESC
value is used to populate AVAL
and
LBSTRESC
value is either below a limit of quantitation for
the assay or above the limit of quantitation for the assay. Derivations
for AVAL
and LOQFL
could look like the
following in a mutate()
statement. -
AVAL = if_else(grepl("<|>", LBSTRESC), as.numeric(gsub("[^0-9, .]+", "", LBSTRESC)), AVAL)
-
LOQFL = if_else(grepl("<|>", LBSTRESC), "Y", "N")
Alternative: If the limit of quantitation concept is
not relevant then please set LOQFL <- "N"
BASE2
Definition: This is not an ADaM
standard variable but represents the assay value at Screening
Rationale: When change from Screening visit analyses are needed then this variable contains the assay value at Screening
Alternative: If Screening visit analyses are not
relevant then please set BASE2 <- NA
CHG2
Definition: This is not an ADaM
standard variable but represents the change from Screening assay
value
Rationale: When change from Screening visit analyses are needed then this variable contains the assay value change between Screening and subsequent visit
Alternative: If Screening visit analyses are not
relevant then please set CHG2 <- NA
PCHG2
Definition: This is not an ADaM
standard variable but represents the percent change from Screening assay
value
Rationale: When percent change from Screening visit analyses are needed then this variable contains the assay value percent change between Screening and subsequent visit
Alternative: If Screening visit analyses are not
relevant then please set PCHG2 <- NA