CoGroupByKey

通过其键聚合所有输入元素，并允许下游处理使用与该键关联的所有值。虽然 GroupByKey 在单个输入集合上执行此操作，因此仅对单个类型的输入值执行此操作，但 CoGroupByKey 在多个输入集合上执行此操作。因此，每个键的结果是每个输入集合中与该键关联的值的元组。

在 Beam 编程指南中查看更多信息。

示例

示例 1：假设您有两个包含用户数据的不同文件；一个文件包含姓名和电子邮件地址，另一个文件包含姓名和电话号码。

您可以使用用户名作为公共键，以及其他数据作为关联值来联接这两个数据集。联接后，您将获得一个包含与每个姓名关联的所有信息（电子邮件地址和电话号码）的数据集。

PCollection<KV<UID, Integer>> pt1 = /* ... */;
PCollection<KV<UID, String>> pt2 = /* ... */;

final TupleTag<Integer> t1 = new TupleTag<>();
final TupleTag<String> t2 = new TupleTag<>();
PCollection<KV<UID, CoGBKResult>> result =
  KeyedPCollectionTuple.of(t1, pt1).and(t2, pt2)
    .apply(CoGroupByKey.create());
result.apply(ParDo.of(new DoFn<KV<K, CoGbkResult>, /* some result */>() {
  @ProcessElement
  public void processElement(ProcessContext c) {
    KV<K, CoGbkResult> e = c.element();
    CoGbkResult result = e.getValue();
    // Retrieve all integers associated with this key from pt1
    Iterable<Integer> allIntegers = result.getAll(t1);
    // Retrieve the string associated with this key from pt2.
    // Note: This will fail if multiple values had the same key in pt2.
    String string = e.getOnly(t2);
    ...
}));

示例 2

GroupByKey 接受一个输入集合。

最后更新于 2024/10/31

您是否找到了您要查找的所有内容？

所有内容是否都实用且清晰？您想更改什么吗？请告诉我们！

CoGroupByKey

示例

相关转换

您是否找到了您要查找的所有内容？