Español (spanish formal Internacional)English (United Kingdom)

morph-streams

morph-streams is an ontology-based data access system that allows evaluating SPARQL-Stream queries over a range of data streaming systems, which are mapped using the W3C R2RML language. More specifically, the current version of morph-streams provides wrappers for:

  • The complex event processing engine Esper.
  • The sensor network middleware GSN.
  • The data stream management system SNEE.

Previous versions of morph-streams also supported the API of Pachube (now Xively), although this has been deprecated.

Morph-streams supports two modes of operation:

  • It allows submitting SPARQL-Stream queries directly to an R2RML-wrapped data source. Queries are rewritten into the underlying query language or REST API and submitted to the underlying system, and results are then translated back using the same set of R2RML mappings.
  • It allows registering SPARQL-Stream continuous queries over an R2RML-wrapped data source, to which consumers can subscribe, receiving updated results as soon as they are evaluated.

The morph-streams project repository can be found at https://github.com/oeg-upm/morph-streams, together with instructions on how to install it and use it. Besides, a live deployment of morph-streams with several types of streaming data sources can be found at http://streams.linkeddata.es/.

An efficient RDF processing engine for heterogeneous data streams

The purpose of this research is to design and implement an engine that allows complex queries over heterogeneous data streams in near real-time at Web scale.

There is a growing number of applications that depend on the usage of real-time spatiotemporal data, and which allow moving from the usual three levels of decision making (strategic, tactical, and operational) to real-time decision making. One example would be real-time geomarketing, where decisions on offering discount coupons to customers may be made on really short time slots based on the combination of a set of spatiotemporal data streams coming from different providers, e.g. public transport card validations or weather information. Extracting information from these streams is complex because of the heterogeneity of the data, the rate of data generation, and the volume. To tap these data sources accordingly and get relevant information, scalable processing infrastructures are required, as well as approaches to allow data integration and fusion.

Our plan is to build a distributed stream processing engine capable of adapting to changing conditions while serving complex continuous queries. First, adapters for various formats are used to convert heterogeneous streams to Linked Data streams. Then, Adaptive Query Processing (AQP) allows adjusting the query execution plan to varying conditions of the data input, the incoming queries, and the system.

Our engine will address real-time processing following the Lambda principles. Lambda is a 3-layer architecture designed to alleviate the complexities of Big Data management: a batch layer stores all the incoming data in an immutable master dataset and pre-computes batch views; a serving layer indexes views on the master dataset; and a speed layer manages the real-time processing issues and requests data views depending on incoming queries. We will follow this design together with AQP techniques and RDF compressed data structures allowing to decrease access time in large datasets, as well as data transmission time among processing nodes.

 

 

Created under Creative Commons License - 2015 OEG.