danfo.streamCsvTransformer
A pipeline transformer to stream a CSV file from local storage, transform it with a custom transformer, and write to the output stream. Only available in Node.js
Last updated
A pipeline transformer to stream a CSV file from local storage, transform it with a custom transformer, and write to the output stream. Only available in Node.js
Last updated
danfo.streamCsvTransformer(func)
Parameters | Type | Description |
---|---|---|
Returns:
A promise that resolves when the pipeline transformation is complete.
The streamCsvTransformer can be used to incrementally transform a CSV file. This is done by:
Streaming a CSV file from a local or remote path.
Passing each corresponding row as a DataFrame to the specified transformer function.
Writing the result to an output stream.
In the example below, we stream a local CSV file (titanic.csv), apply a transformer function, and write the output to titanicOutLocal.csv
.
The transformer takes each Name
column, splits the person's title, and creates a new column from it.
In the example below, we stream a remote CSV file (titanic.csv), applies a transformer function, and write the output to the titanicOutLocal
file.
The transformer takes each Name
column, splits the person's title, and creates a new column from it.
If you need custom control of the output writer, then you can provide a pipe-able custom writer. See https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93/
In the example below, we add a custom writer that logs each row. You can extend this to upload each chunk to a database, or any other function you need.
inputFilePath
Function
The path to the CSV file to stream from.
transformer
Function
The transformer function to apply to each row.
Note that each row of the CSV file is passed as a DataFrame with a single row to the transformer function, and the transformer function is expected to return a transformed DataFrame.
options
object
Configuration options for the pipeline. These include:
outputFilePath
The local file path to write the transformed CSV file to.
customCSVStreamWriter
A custom CSV stream writer function. This is applied at the end of each transform. If not provided, a default CSV stream writer is used, and this writes to local storage.
inputStreamOptions
Configuration options for the input stream. Supports all Papaparse CSV reader config options.
outputStreamOptions
Configuration options for the output stream. This is only applied when using the default CSV stream writer. Supports all toCSV
options.