Python TealDatasetConnector
python_dataset_connector.Rd
Create a
TealDatasetConnector from .py file or through python code supplied directly.
Create a
CDISCTealDatasetConnector from .py file or through python code supplied directly.
Usage
python_dataset_connector(
dataname,
file,
code,
object = dataname,
keys = character(0),
label = character(0),
mutate_code = character(0),
mutate_script = character(0),
vars = list(),
metadata = NULL
)
python_cdisc_dataset_connector(
dataname,
file,
code,
object = dataname,
keys = get_cdisc_keys(dataname),
parent = if (identical(dataname, "ADSL")) character(0L) else "ADSL",
mutate_code = character(0),
mutate_script = character(0),
label = character(0),
vars = list(),
metadata = NULL
)Arguments
- dataname
(
character)
A given name for the dataset it may not contain spaces- file
(
character)
Path to the file location containing the python script used to generate the object.- code
(
character)
string containing the python code to be run usingreticulate. Carefully consider indentation to follow proper python syntax.- object
(
character)
name of the object from the python script that is assigned to the dataset to be used.- keys
optional, (
character)
vector of dataset primary keys column names- label
(
character)
Label to describe the dataset.- mutate_code
(
character)
String containing the code used to mutate the object after it is produced.- mutate_script
(
character)
Alternatively tomutate_code- location of the file containing modification code. Can't be used simultaneously withmutate_script.- vars
(named
list))
In case when this object code depends on otherTealDatasetobject(s) or other constant value, this/these object(s) should be included as named element(s) of the list. For example if this object code needsADSLobject we should specifyvars = list(ADSL = <adsl object>). It's recommended to includeTealDatasetorTealDatasetConnectorobjects to thevarslist to preserve reproducibility. Please note thatvarsare included to this object as localvarsand they cannot be modified within another dataset.- metadata
(named
list,NULLorCallableFunction)
Field containing either the metadata about the dataset (each element of the list should be atomic and length one) or aCallableFuntionto pull the metadata from a connection. This should return alistor an object which can be converted to a list withas.list.- parent
(
character, optional) parent dataset name
Details
Note that in addition to the reticulate package, support for python requires an
existing python installation. By default, reticulate will attempt to use the
location Sys.which("python"), however the path to the python installation can be
supplied directly via reticulate::use_python.
The teal API for delayed data requires the python code or script to return a
data.frame object. For this, the pandas package is required. This can be installed
using reticulate::py_install("pandas").
Please see the package documentation for more details.
Note
Raises an error when passed code and file are passed at the same time.
When using code, keep in mind that when using reticulate with delayed data, python
functions do not have access to other objects in the code and must be self contained.
In the following example, the function makedata() doesn't have access to variable x:
import pandas as pd
x = 1
def makedata():
return pd.DataFrame({'x': [x, 2], 'y': [3, 4]})
data = makedata()When using custom functions, the function environment must be entirely self contained:
def makedata():
import pandas as pd
x = 1
return pd.DataFrame({'x': [x, 2], 'y': [3, 4]})
data = makedata()
Additional reticulate considerations:
Note that when using pull
vars,Robjects referenced in the python code or script have to be prefixed withr..reticulateisn't able to convertPOSIXctobjects. Please take extra care when working withdatetimevariables.
Please read the official documentation for the reticulate package for additional
features and current limitations.
Examples
if (FALSE) {
library(reticulate)
# supply python code directly in R
x <- python_dataset_connector(
"ADSL",
code = "import pandas as pd
data = pd.DataFrame({'STUDYID': [1, 2], 'USUBJID': [3, 4]})",
object = "data"
)
x$pull()
x$get_raw_data()
# supply an external python script
python_file <- tempfile(fileext = ".py")
writeLines(
text = "import pandas as pd
data = pd.DataFrame({'STUDYID': [1, 2], 'USUBJID': [3, 4]})",
con = python_file
)
x <- python_dataset_connector(
"ADSL",
file = python_file,
object = "data",
)
x$pull()
x$get_raw_data()
# supply pull `vars` from R
y <- 8
x <- python_dataset_connector(
"ADSL",
code = "import pandas as pd
data = pd.DataFrame({'STUDYID': [r.y], 'USUBJID': [r.y]})",
object = "data",
vars = list(y = y)
)
x$pull()
x$get_raw_data()
}