Python
TealDatasetConnector
python_dataset_connector.Rd
Create a TealDatasetConnector
from .py
file or through python code supplied directly.
Create a CDISCTealDatasetConnector
from .py
file or through python code supplied directly.
Usage
python_dataset_connector(
dataname,
file,
code,
object = dataname,
keys = character(0),
label = character(0),
mutate_code = character(0),
mutate_script = character(0),
vars = list(),
metadata = NULL
)
python_cdisc_dataset_connector(
dataname,
file,
code,
object = dataname,
keys = get_cdisc_keys(dataname),
parent = if (identical(dataname, "ADSL")) character(0L) else "ADSL",
mutate_code = character(0),
mutate_script = character(0),
label = character(0),
vars = list(),
metadata = NULL
)
Arguments
- dataname
(
character
)
A given name for the dataset it may not contain spaces- file
(
character
)
Path to the file location containing the python script used to generate the object.- code
(
character
)
string containing the python code to be run usingreticulate
. Carefully consider indentation to follow proper python syntax.- object
(
character
)
name of the object from the python script that is assigned to the dataset to be used.- keys
optional, (
character
)
vector of dataset primary keys column names- label
(
character
)
Label to describe the dataset.- mutate_code
(
character
)
String containing the code used to mutate the object after it is produced.- mutate_script
(
character
)
Alternatively tomutate_code
- location of the file containing modification code. Can't be used simultaneously withmutate_script
.- vars
(named
list
))
In case when this object code depends on otherTealDataset
object(s) or other constant value, this/these object(s) should be included as named element(s) of the list. For example if this object code needsADSL
object we should specifyvars = list(ADSL = <adsl object>)
. It's recommended to includeTealDataset
orTealDatasetConnector
objects to thevars
list to preserve reproducibility. Please note thatvars
are included to this object as localvars
and they cannot be modified within another dataset.- metadata
(named
list
,NULL
orCallableFunction
)
Field containing either the metadata about the dataset (each element of the list should be atomic and length one) or aCallableFuntion
to pull the metadata from a connection. This should return alist
or an object which can be converted to a list withas.list
.- parent
(
character
, optional) parent dataset name
Details
Note that in addition to the reticulate
package, support for python requires an
existing python installation. By default, reticulate
will attempt to use the
location Sys.which("python")
, however the path to the python installation can be
supplied directly via reticulate::use_python
.
The teal
API for delayed data requires the python code or script to return a
data.frame object. For this, the pandas
package is required. This can be installed
using reticulate::py_install("pandas")
.
Please see the package documentation for more details.
Note
Raises an error when passed code
and file
are passed at the same time.
When using code
, keep in mind that when using reticulate
with delayed data, python
functions do not have access to other objects in the code
and must be self contained.
In the following example, the function makedata()
doesn't have access to variable x
:
import pandas as pd
x = 1
def makedata():
return pd.DataFrame({'x': [x, 2], 'y': [3, 4]})
data = makedata()
When using custom functions, the function environment must be entirely self contained:
def makedata():
import pandas as pd
x = 1
return pd.DataFrame({'x': [x, 2], 'y': [3, 4]})
data = makedata()
Additional reticulate
considerations:
Note that when using pull
vars
,R
objects referenced in the python code or script have to be prefixed withr.
.reticulate
isn't able to convertPOSIXct
objects. Please take extra care when working withdatetime
variables.
Please read the official documentation for the reticulate
package for additional
features and current limitations.
Examples
if (FALSE) {
library(reticulate)
# supply python code directly in R
x <- python_dataset_connector(
"ADSL",
code = "import pandas as pd
data = pd.DataFrame({'STUDYID': [1, 2], 'USUBJID': [3, 4]})",
object = "data"
)
x$pull()
x$get_raw_data()
# supply an external python script
python_file <- tempfile(fileext = ".py")
writeLines(
text = "import pandas as pd
data = pd.DataFrame({'STUDYID': [1, 2], 'USUBJID': [3, 4]})",
con = python_file
)
x <- python_dataset_connector(
"ADSL",
file = python_file,
object = "data",
)
x$pull()
x$get_raw_data()
# supply pull `vars` from R
y <- 8
x <- python_dataset_connector(
"ADSL",
code = "import pandas as pd
data = pd.DataFrame({'STUDYID': [r.y], 'USUBJID': [r.y]})",
object = "data",
vars = list(y = y)
)
x$pull()
x$get_raw_data()
}