Skip to Main content Skip to Navigation
Conference papers

Virtual Log-Structured Storage for High-Performance Streaming

Abstract : Over the past decade, given the higher number of data sources (e.g., Cloud applications, Internet of things) and critical business demands, Big Data transitioned from batchoriented to real-time analytics. Stream storage systems, such as Apache Kafka, are well known for their increasing role in real-time Big Data analytics. For scalable stream data ingestion and processing, they logically split a data stream topic into multiple partitions. Stream storage systems keep multiple data stream copies to protect against data loss while implementing a stream partition as a replicated log. This architectural choice enables simplified development while trading cluster size with performance and the number of streams optimally managed. This paper introduces a shared virtual log-structured storage approach for improving the cluster throughput when multiple producers and consumers write and consume in parallel data streams. Stream partitions are associated with shared replicated virtual logs transparently to the user, effectively separating the implementation of stream partitioning (and data ordering) from data replication (and durability). We implement the virtual log technique in the KerA stream storage system. When comparing with Apache Kafka, KerA improves the cluster ingestion throughput (for replication factor three) by up to 4x when multiple producers write over hundreds of data streams.
Complete list of metadata

https://hal.inria.fr/hal-03300796
Contributor : Ovidiu-Cristian Marcu Connect in order to contact the contributor
Submitted on : Friday, July 30, 2021 - 10:02:51 AM
Last modification on : Thursday, May 5, 2022 - 10:19:29 AM

File

virtual_log_KerA30072021.pdf
Files produced by the author(s)

Licence

Copyright

Identifiers

  • HAL Id : hal-03300796, version 2

Citation

Ovidiu-Cristian Marcu, Alexandru Costan, Bogdan Nicolae, Gabriel Antoniu. Virtual Log-Structured Storage for High-Performance Streaming. Cluster 2021 - IEEE International Conference on Cluster Computing, Sep 2021, Portland / Virtual, United States. pp.1-11. ⟨hal-03300796v2⟩

Share

Metrics

Record views

112

Files downloads

12219