ERDOS Package Reference

erdos.connect(op_type, config, read_streams, *args, **kwargs)

Registers the operator and its connected streams on the dataflow graph.

The operator is created as follows: op_type(*read_streams, *write_streams, *args, **kwargs)

Parameters
  • op_type (type) – The operator class. Should inherit from erdos.Operator.

  • config (OperatorConfig) – Configuration details required by the operator.

  • read_streams – the streams from which the operator processes data.

  • args – arguments passed to the operator.

  • kwargs – keyword arguments passed to the operator.

Returns

ReadStreams corresponding to the

WriteStreams returned by the operator’s connect.

Return type

read_streams (list of ReadStream)

erdos.reset()

Resets internal seed and creates a new dataflow graph.

Note that no streams or operators can be re-used safely.

erdos.run(graph_filename=None, start_port=9000)

Instantiates and runs the dataflow graph.

ERDOS will spawn 1 process for each python operator, and connect them via TCP.

Parameters
  • graph_filename (str) – the filename to which to write the dataflow graph as a DOT file.

  • start_port (int) – the port on which to start. The start port is the lowest port ERDOS will use to establish TCP connections between operators.

erdos.run_async(graph_filename=None, start_port=9000)

Instantiates and runs the dataflow graph asynchronously.

ERDOS will spawn 1 process for each python operator, and connect them via TCP.

Parameters
  • graph_filename (str) – the filename to which to write the dataflow graph as a DOT file.

  • start_port (int) – the port on which to start. The start port is the lowest port ERDOS will use to establish TCP connections between operators.

erdos.add_watermark_callback(read_streams, write_streams, callback)

Adds a watermark callback across several read streams.

Parameters
  • read_streams (list of ReadStream) – streams on which the callback is invoked.

  • write_streams (list of WriteStream) – streams on which the callback can send messages.

  • callback (timestamp, list of WriteStream -> None) – a low watermark callback.

class erdos.NodeHandle(py_node_handle, processes)

Used to shutdown a dataflow created by run_async.

shutdown()

Shuts down the dataflow.