OceanRep
Empirical Scalability Evaluation of Window Aggregation Methods in Distributed Stream Processing.
Vonheiden, Björn (2021) Empirical Scalability Evaluation of Window Aggregation Methods in Distributed Stream Processing. (Master thesis), Kiel University, Kiel, 94 pp.
Preview |
Text
msc_bjoern-vonheiden_thesis.pdf - Published Version Download (15MB) | Preview |
Abstract
Nowadays, data stream processing is a paradigm that is used to process large amounts of data in real time. Hopping window (also called sliding window) aggregations are a core operation in distributed stream processing.
In this thesis, we empirically evaluate the scalability of hopping window aggregations. Therefore, we benchmark different window aggregation methods. These are the native hopping window implementations of the stream processing engines Kafka Streams, Apache Flink, and Apache Spark. Further, we benchmark the window aggregation method from the Scotty framework that uses slicing. With sliding windows, Kafka Streams provides another window aggregation method that we evaluate. To empirically evaluate the scalability, we use the Theodolite benchmarking method. We apply two benchmark applications that implement the different windowed aggregation methods. One is an application benchmark from Theodolite and one is a microbenchmark from the Open Stream Processing Benchmark. In our evaluation, we execute benchmarks with different window configurations and evaluate the scalability of the window aggregation methods.
Our results show that all the methods are scalable. With the native hopping window implementations, Spark can process the highest loads, followed by Flink, and Kafka Streams can process the lowest loads. The number of overlapping windows influences the resource demand of the native hopping window implementations. Scotty can process higher loads than the native hopping window implementations of Kafka Streams and Flink. The sliding period of the hopping window influences the resource demand of Scotty. In the sliding window implementation of Kafka Streams, the rate of the processed data and the time difference determine the resource demand. If the number of overlapping windows is the same, the sliding window implementation of Kafka Streams can process higher loads than the native hopping window implementations of Kafka Streams and Flink.
Document Type: | Thesis (Master thesis) |
---|---|
Thesis Advisor: | Hasselbring, Wilhelm and Henning, Sören |
Keywords: | Stream Processing, Benchmark, Scalability, Theodolite |
Research affiliation: | Kiel University > Software Engineering |
Date Deposited: | 14 Jan 2022 14:54 |
Last Modified: | 14 Jan 2022 15:16 |
URI: | https://oceanrep.geomar.de/id/eprint/54892 |
Actions (login required)
View Item |
Copyright 2023 | GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel | All rights reserved
Questions, comments and suggestions regarding the GEOMAR repository are welcomed
at bibliotheksleitung@geomar.de !