Skip to contents

[Experimental] Create a TealDatasetConnector from .py file or through python code supplied directly.

[Experimental] Create a CDISCTealDatasetConnector from .py file or through python code supplied directly.

Usage

python_dataset_connector(
  dataname,
  file,
  code,
  object = dataname,
  keys = character(0),
  label = character(0),
  mutate_code = character(0),
  mutate_script = character(0),
  vars = list(),
  metadata = NULL
)

python_cdisc_dataset_connector(
  dataname,
  file,
  code,
  object = dataname,
  keys = get_cdisc_keys(dataname),
  parent = if (identical(dataname, "ADSL")) character(0L) else "ADSL",
  mutate_code = character(0),
  mutate_script = character(0),
  label = character(0),
  vars = list(),
  metadata = NULL
)

Arguments

dataname

(character)
A given name for the dataset it may not contain spaces

file

(character)
Path to the file location containing the python script used to generate the object.

code

(character)
string containing the python code to be run using reticulate. Carefully consider indentation to follow proper python syntax.

object

(character)
name of the object from the python script that is assigned to the dataset to be used.

keys

optional, (character)
vector of dataset primary keys column names

label

(character)
Label to describe the dataset.

mutate_code

(character)
String containing the code used to mutate the object after it is produced.

mutate_script

(character)
Alternatively to mutate_code - location of the file containing modification code. Can't be used simultaneously with mutate_script.

vars

(named list))
In case when this object code depends on other TealDataset object(s) or other constant value, this/these object(s) should be included as named element(s) of the list. For example if this object code needs ADSL object we should specify vars = list(ADSL = <adsl object>). It's recommended to include TealDataset or TealDatasetConnector objects to the vars list to preserve reproducibility. Please note that vars are included to this object as local vars and they cannot be modified within another dataset.

metadata

(named list, NULL or CallableFunction)
Field containing either the metadata about the dataset (each element of the list should be atomic and length one) or a CallableFuntion to pull the metadata from a connection. This should return a list or an object which can be converted to a list with as.list.

parent

(character, optional) parent dataset name

Details

Note that in addition to the reticulate package, support for python requires an existing python installation. By default, reticulate will attempt to use the location Sys.which("python"), however the path to the python installation can be supplied directly via reticulate::use_python.

The teal API for delayed data requires the python code or script to return a data.frame object. For this, the pandas package is required. This can be installed using reticulate::py_install("pandas").

Please see the package documentation for more details.

Note

Raises an error when passed code and file are passed at the same time.

When using code, keep in mind that when using reticulate with delayed data, python functions do not have access to other objects in the code and must be self contained. In the following example, the function makedata() doesn't have access to variable x:

import pandas as pd

x = 1
def makedata():
  return pd.DataFrame({'x': [x, 2], 'y': [3, 4]})

data = makedata()

When using custom functions, the function environment must be entirely self contained:

def makedata():
  import pandas as pd
  x = 1
  return pd.DataFrame({'x': [x, 2], 'y': [3, 4]})

data = makedata()
  

Additional reticulate considerations:

  1. Note that when using pull vars, R objects referenced in the python code or script have to be prefixed with r..

  2. reticulate isn't able to convert POSIXct objects. Please take extra care when working with datetime variables.

Please read the official documentation for the reticulate package for additional features and current limitations.

Examples

if (FALSE) {
library(reticulate)

# supply python code directly in R

x <- python_dataset_connector(
  "ADSL",
  code = "import pandas as pd
data = pd.DataFrame({'STUDYID':  [1, 2], 'USUBJID': [3, 4]})",
  object = "data"
)

x$pull()
x$get_raw_data()

# supply an external python script

python_file <- tempfile(fileext = ".py")
writeLines(
  text = "import pandas as pd
data = pd.DataFrame({'STUDYID':  [1, 2], 'USUBJID': [3, 4]})",
  con = python_file
)

x <- python_dataset_connector(
  "ADSL",
  file = python_file,
  object = "data",
)

x$pull()
x$get_raw_data()

# supply pull `vars` from R

y <- 8
x <- python_dataset_connector(
  "ADSL",
  code = "import pandas as pd
data = pd.DataFrame({'STUDYID':  [r.y], 'USUBJID': [r.y]})",
  object = "data",
  vars = list(y = y)
)

x$pull()
x$get_raw_data()
}