Skip to content

Commit 58f8349

Browse files
authored
doc(map): call out unary, batch, and streaming (#3164)
Signed-off-by: Vigith Maurice <vigith@gmail.com>
1 parent 39e34c5 commit 58f8349

File tree

1 file changed

+23
-30
lines changed
  • docs/user-guide/user-defined-functions/map

1 file changed

+23
-30
lines changed

docs/user-guide/user-defined-functions/map/map.md

Lines changed: 23 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,6 @@
22

33
Map in a Map vertex takes an input and returns 0, 1, or more outputs (also known as flat-map operation). Map is an element wise operator.
44

5-
## Build Your Own UDF
6-
7-
You can build your own UDF in multiple languages.
8-
9-
Check the links below to see the UDF examples for different languages.
10-
11-
- [Python](https://github.com/numaproj/numaflow-python/tree/main/packages/pynumaflow/examples/map/)
12-
- [Golang](https://github.com/numaproj/numaflow-go/tree/main/examples/mapper/)
13-
- [Java](https://github.com/numaproj/numaflow-java/tree/main/examples/src/main/java/io/numaproj/numaflow/examples/map/)
14-
155
After building a docker image for the written UDF, specify the image as below in the vertex spec.
166

177
```yaml
@@ -23,44 +13,45 @@ spec:
2313
image: my-python-udf-example:latest
2414
```
2515
26-
### Streaming Mode
16+
Map supports three modes: [Unary](#unary-mode), [Streaming](#streaming-mode), and [Batch](#batch-mode).
2717
28-
In cases the map function generates more than one output (e.g., flat map), the UDF can be
29-
configured to run in a streaming mode instead of batching, which is the default mode.
30-
In streaming mode, the messages will be pushed to the downstream vertices once generated
31-
instead of in a batch at the end.
18+
## Unary Mode
3219
33-
Note that to maintain data orderliness, we restrict the read batch size to be `1`.
20+
Unary Map is the default mode where each input message is processed individually and returns 0, 1, or more outputs.
3421
35-
```yaml
36-
spec:
37-
vertices:
38-
- name: my-vertex
39-
limits:
40-
# mapstreaming won't work if readBatchSize is != 1
41-
readBatchSize: 1
42-
```
22+
Check the links below to see the UDF examples for different languages.
23+
24+
- [Python](https://github.com/numaproj/numaflow-python/tree/main/packages/pynumaflow/examples/map/)
25+
- [Golang](https://github.com/numaproj/numaflow-go/tree/main/examples/mapper/)
26+
- [Java](https://github.com/numaproj/numaflow-java/tree/main/examples/src/main/java/io/numaproj/numaflow/examples/map/)
27+
28+
## Streaming Mode
29+
30+
In cases the map function generates more than one output (e.g., flat map), the UDF can be
31+
configured to run in a streaming mode where the messages will be pushed to the downstream vertices as
32+
soon as the output is generated instead of collecting all the responses and then sending them
33+
together at the end when the function returns.
4334
4435
Check the links below to see the UDF examples in streaming mode for different languages.
4536
4637
- [Python](https://github.com/numaproj/numaflow-python/tree/main/packages/pynumaflow/examples/mapstream/flatmap_stream/)
4738
- [Golang](https://github.com/numaproj/numaflow-go/tree/main/examples/mapstreamer/flatmap_stream/)
4839
- [Java](https://github.com/numaproj/numaflow-java/tree/main/examples/src/main/java/io/numaproj/numaflow/examples/mapstream/flatmapstream/)
4940
50-
### Batch Map Mode
41+
## Batch Mode
5142
5243
BatchMap is an interface that allows developers to process multiple data items in a UDF single call,
5344
rather than each item in separate calls.
5445
5546
The BatchMap interface can be helpful in scenarios where performing operations on a group of data can be more efficient.
5647
57-
#### Important Considerations
48+
### Important Considerations
5849
5950
When using BatchMap, there are a few important considerations to keep in mind:
6051
61-
- Ensure that the BatchResponses object is tagged with the correct request ID.
52+
- Ensure that the BatchResponses object is tagged with the correct request ID.
6253
Each Datum has a unique ID tag, which will be used by Numaflow to ensure correctness.
63-
- Ensure that the length of the BatchResponses list is equal to the number of requests received. This means that for
54+
- Ensure that the length of the BatchResponses list is equal to the number of requests received. This means that for
6455
every input data item, there should be a corresponding response in the BatchResponses list.
6556
- The total batch size can be up to `readBatchSize` long.
6657

@@ -71,7 +62,7 @@ Check the links below to see the UDF examples in batch mode for different langua
7162
- [Java](https://github.com/numaproj/numaflow-java/tree/main/examples/src/main/java/io/numaproj/numaflow/examples/batchmap/)
7263
- [Rust](https://github.com/numaproj/numaflow-rs/tree/main/examples/batchmap-cat/)
7364

74-
### Available Environment Variables
65+
## Available Environment Variables
7566

7667
Some environment variables are available in the user-defined function container, they might be useful in your own UDF implementation.
7768

@@ -81,7 +72,9 @@ Some environment variables are available in the user-defined function container,
8172
- `NUMAFLOW_PIPELINE_NAME` - Name of the pipeline.
8273
- `NUMAFLOW_VERTEX_NAME` - Name of the vertex.
8374

84-
### Configuration
75+
## Configuration
76+
77+
To achieve ordering, please set `readBatchSize` to 1.
8578

8679
Configuration data can be provided to the UDF container at runtime multiple ways.
8780

0 commit comments

Comments
 (0)