Storage and Ingestion Systems in Support of Stream Processing: A Survey - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Rapport (Rapport Technique) Année : 2018

Storage and Ingestion Systems in Support of Stream Processing: A Survey

Résumé

Under the pressure of massive, exponentially increasing amounts of heterogeneous data that are generated faster and faster, Big Data analytics applications have seen a shift from batch processing to stream processing, which can reduce the time needed to obtain meaningful insight dramatically. Stream processing is particularly well suited to address the challenges of fog/edge computing: much of this massive data comes from Internet of Things (IoT) devices and needs to be continuously funneled through an edge infrastructure towards centralized clouds. Thus, it is only natural to process data on their way as much as possible rather than wait for streams to accumulate on the cloud. Unfortunately, state-of-the-art stream processing systems are not well suited for this role: the data are accumulated (ingested), processed and persisted (stored) separately, often using different services hosted on different physical machines/clusters. Furthermore, there is only limited support for advanced data manipulations, which often forces application developers to introduce custom solutions and workarounds. In this survey article, we characterize the main state-of-the-art stream storage and ingestion systems. We identify the key aspects and discuss limitations and missing features in the context of stream processing for fog/edge and cloud computing. The goal is to help practitioners understand and prepare for potential bottlenecks when using such state-of-the-art systems. In particular, we discuss both functional (partitioning, metadata, search support, message routing, backpressure support) and non-functional aspects (high availability, durability, scalability, latency vs. throughput). As a conclusion of our study, we advocate for a unified stream storage and ingestion system to speed-up data management and reduce I/O redundancy (both in terms of storage space and network utilization).
Fichier principal
Vignette du fichier
RT-0501v2.pdf (1.1 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01939280 , version 1 (29-11-2018)
hal-01939280 , version 2 (14-12-2018)

Identifiants

  • HAL Id : hal-01939280 , version 2

Citer

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María S Pérez-Hernández, Radu Tudoran, et al.. Storage and Ingestion Systems in Support of Stream Processing: A Survey. [Technical Report] RT-0501, INRIA Rennes - Bretagne Atlantique and University of Rennes 1, France. 2018, pp.1-33. ⟨hal-01939280v2⟩
392 Consultations
922 Téléchargements

Partager

Gmail Facebook X LinkedIn More