Today I put a first version of EMF fragments on google code. My first publication on EMF fragments will be presented at this years MODELS conference. I will make the paper available soon. This article is a little preview from a more non scientific perspective.
EMF fragments is a Eclipse Modeling Framework (EMF) persistence layer for distributed data stores like NoSQL databases (e.g. MongoDB, Hadoop/HBase) or distributed file systems.
What Problem Does It Solve?
The EMF framework is designed to programmatically create, edit, and analyze software models. It provides generated type-safe APIs to manipulate models based on a schema (i.e. metamodel). This is similar to XML and XML-schemas with JAX-like APIs. EMF works fine as long as you really use it for software models and your models fit into main memory. If you use EMF for different data, e.g. sensor-data, geo-date, social-data, you soon run out of main memory and things become more difficult.
Why would I use EMF for this kind of data? EMF provides very good generated APIs, generated GUI tools to visualize data, and a serious of strong model transformation languages. All things one wants to apply to structured data in general. Data in EMF is described through metamodels similar to XML schemas or entity-relationship diagrams. This makes EMF applicable to a wide range of applications.
To use larger models in EMF, we need something to persist models that does not require us to load complete models into memory. Existing solutions include ORM mappings (i.e. eclipse’s CDO). These solutions have three drawbacks:
- ORM mappings store data slowly because data is indexed stored very fine grained
- ORM mappings are slow when structures are traversed because data is loaded piece by piece even though it is used by larger aggregates
- SQL databases are not so easily distributed
EMF fragments are designed to store large object-oriented data models (typed, labeled, bidirectional graphs) efficiently and scalable. EMF fragments emphasize on fast storage of new data and fast navigation of persisted models. The requirements for this framework come from storing and analyzing large ammounts of sensor data in real-time.
How Does EMF Fragments Work?
EMF fragments are different to frameworks based on an object relatational mappings (ORM) like Connected Data Objects (CDO). While ORM mappings map single objects, attributes, and references to databae entries, EMF fragments map larger chunks of a model (fragments) to URIs. This allows to store models on a wide range of distributed data-stores inlcuding distributed file-systems and key-value stores (think NoSQL databases like MongoDB or HBase). EMF fragments therefore provides storage for typed structured data that allows analyzes based on the map-reduce or bulk synchronous parallel (BSP) paradigm (i.e. for cloud computing).
The above figure shows the impact of fragmentation when a connected part of the model (aggregate) of a certain size is loaded. If I load large chunks of my models I should use a more corse grained fragmentation and if I load small parts of my model I should use a more fine grained fragmentation. Full fragmentation (i.e. like with ORMs) or no fragmentation (i.e. with XMI) is not a good solution in most instances.
The EMF fragments framework allows automated transparent background framgmentation of models. Clients designate types of references at which models are fragmented. This allows to control fragmentation without the need to trigger it manually. Fragments are managed automatically: when you create, delete, move, edit model elements new fragments are created and elements are distributed among those fragments on the fly. Fragments (i.e. resources) are identified by URIs. The framework allows to map URIs to (distributed) data-stores (e.g. NoSql databases or distributed file systems).
How Is EMF Fragments Used?
Using EMF fragments is simple if you are used to EMF. You create EMF metamodels as usual, e.g. using the ecore. You generate APIs and tools as usual using normal genmodels with two specific parameters. First, you have to configure your genmodels to use reflective feature delegation. Secondly, you have to use a specific base class. You use the generated APIs and tools as usual. EMF fragments provide a specificResourceSetimplementation. Resources are managed automatically in the background and you do not have to create them.
EMF fragments provides an abstract interface to map resource (fragment) URIs to a physical storage. An implementation for Apache HBase is provided.
A brief example and tutorial will be available soon. Promise.
How does it perform?
You can look at more detailed information (i.e. numbers) when the paper is out. We compared EMF fragments to CDO, XMI and Morsa (a object-by-object persistence on MongoDB) based on Java models taken from the Grabats 2009 workshop. These models contain a few million objects. Since EMF fragments basically stores a model as a set of resources it performs similar to XMI without running out of main memory. It is considerably faster then CDO or Morse when models are stored since no indexes have to be created and data is stored at a lower grain. EMF fragments is also faster when the model is navigated as a whole. It shows similar performance to CDO and Morsa when the models are queried.