danfo.streamCsvTransformer
A pipeline transformer to stream a CSV file from local storage, transform it with a custom transformer, and write to the output stream. Only available in Node.js
danfo.streamCsvTransformer(func)
Parameters | Type | Description |
---|---|---|
inputFilePath | Function | The path to the CSV file to stream from. |
transformer | Function | The transformer function to apply to each row. Note that each row of the CSV file is passed as a DataFrame with a single row to the transformer function, and the transformer function is expected to return a transformed DataFrame. |
options | object | Configuration options for the pipeline. These include:
|
Returns:
A promise that resolves when the pipeline transformation is complete.
The streamCsvTransformer can be used to incrementally transform a CSV file. This is done by:
Streaming a CSV file from a local or remote path.
Passing each corresponding row as a DataFrame to the specified transformer function.
Writing the result to an output stream.
Stream processing a local file
In the example below, we stream a local CSV file (titanic.csv), apply a transformer function, and write the output to titanicOutLocal.csv
.
The transformer takes each Name
column, splits the person's title, and creates a new column from it.
Stream processing of remote file
In the example below, we stream a remote CSV file (titanic.csv), applies a transformer function, and write the output to the titanicOutLocal
file.
The transformer takes each Name
column, splits the person's title, and creates a new column from it.
Stream processing with a custom writer
If you need custom control of the output writer, then you can provide a pipe-able custom writer. See https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93/
In the example below, we add a custom writer that logs each row. You can extend this to upload each chunk to a database, or any other function you need.
Last updated