Virtual Log-Structured Storage for High-Performance Streaming - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Virtual Log-Structured Storage for High-Performance Streaming

Ovidiu-Cristian Marcu
  • Fonction : Auteur
  • PersonId : 1106444
Bogdan Nicolae
Gabriel Antoniu

Résumé

Over the past decade, given the higher number of data sources (e.g., Cloud applications, Internet of things) and critical business demands, Big Data transitioned from batchoriented to real-time analytics. Stream storage systems, such as Apache Kafka, are well known for their increasing role in real-time Big Data analytics. For scalable stream data ingestion and processing, they logically split a data stream topic into multiple partitions. Stream storage systems keep multiple data stream copies to protect against data loss while implementing a stream partition as a replicated log. This architectural choice enables simplified development while trading cluster size with performance and the number of streams optimally managed. This paper introduces a shared virtual log-structured storage approach for improving the cluster throughput when multiple producers and consumers write and consume in parallel data streams. Stream partitions are associated with shared replicated virtual logs transparently to the user, effectively separating the implementation of stream partitioning (and data ordering) from data replication (and durability). We implement the virtual log technique in the KerA stream storage system. When comparing with Apache Kafka, KerA improves the cluster ingestion throughput (for replication factor three) by up to 4x when multiple producers write over hundreds of data streams.
Fichier principal
Vignette du fichier
virtual_log_KerA30072021.pdf (882.86 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03300796 , version 1 (27-07-2021)
hal-03300796 , version 2 (30-07-2021)

Licence

Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-03300796 , version 2

Citer

Ovidiu-Cristian Marcu, Alexandru Costan, Bogdan Nicolae, Gabriel Antoniu. Virtual Log-Structured Storage for High-Performance Streaming. Cluster 2021 - IEEE International Conference on Cluster Computing, Sep 2021, Portland / Virtual, United States. pp.1-11. ⟨hal-03300796v2⟩
166 Consultations
9429 Téléchargements

Partager

Gmail Facebook X LinkedIn More