October 9, 2015
by Markus

XRaw – JSON and REST without Boilerplate

Check XRaw on GitHub for source code and an up-to date version of this article.

If you, as a Java programmer, think that plain JSON is just not type-safe enough, or that Jackson and co are just to heavy and stiff to be compatible with your Scala or xTend coding style, you should read on.

XRaw provides a set of active annotations that simplifies the development of type-safe Java wrapper for JSON data, RESTful API calls, and MongoDB interfaces. Providing helpful features to create social media aware apps and backends with Java (and xTend).

Active annotations are an xTend feature that allows us to semantically enrich simple data objects declarations with functionality that transparantly (un-)marshalles Java to JSON data, encodes REST requests, or accesses a database.

JSON example

The following small xTend file demonstrates the use of XRaw annotations to create wrapper-types for some typical JSON data:

@JSON class Library {
  List<Book> books
  String adress
  @Name("count") int entries

@JSON class Book {
  String title
  String isbn
  List<String> authors
  @WithConverter(UtcDateConverter) Date publish_date

Based on this data description, we can now simply use the class Library to wrap corresponing JSON data into Java POJOs:

val library = new Library(new JSONObject('''{
  books : [
      title: "Pride and Prejudice",
      authors: "Jane Austin",
      isbn: "96-2345-33123-32"
      publish_date: "1813-04-12T12:00:00Z"
      title: "SAP business workflow",
      authors: "Ulrich Mende, Andreas Berthold",

  adress: "Unter den Linden 6, 1099 Berlin, Germany"
  count: 2

For example, we can use xTend to find all “old” books:

val oldBooks = library.books.filter[it.publishDate.year < 1918]

Since xText compiles to Java, we can also use the wrapper types in Java programs:

public long coutBooksBefore(Library library, int year) {
  return library.getBooks().stream().filter(book->book.getPublishDate().getYear() < year).count();

REST example

This is a simple “script” written based on a Twitter API wrapper created with XRaw.

// For starters, use XRawScript to interactively create a Twitter instance 
// with a pre-configured HTTPService that deals with all OAuth related issues.
val twitter = XRawScript::get("data/store.json", "markus", Twitter)

// REST API endpoints are structured and accessed via fluent interface
val userTimelineReq = twitter.statuses.userTimeline

// Parameters can also be added fluently.

// xResult will execute the request and wrap the returned JSON data.
val userTimelineResult = userTimelineReq.xResult

// Use xTend and its iterable extensions to navigate the results.
userTimelineResult.filter[it.retweetCount > 4].forEach[

// Or as a "one liner".
    .filter[it.retweetCount > 4].forEach[println(it.text)]

This is written in xTend. You could also use Scala, Java or any other JVM/bytecode based language.

Get started

git clone git@github.com:markus1978/xraw.git xraw
cd xraw/de.scheidgen.xraw/
mvn compile

Look at the examples.


XRaw is early in development. There is no release yet; XRaw is not available via maven central yet.


  • JSON
    • wrapper for existing JSON data or to create new JSON
    • support for primitive values, arrays, objects
    • converter to convert complex type to and from string
    • different key names in JSON and Java to adopt to existing code
  • REST
    • wrapper for GET and POST requests
    • with URL and body parameters
    • with parameters encoded in URL path
    • with array and object JSON results
    • customizable HTTP implementation, e.g. to integrate with existing signing and OAuth solutions
    • customizable respone types, e.g. to use API specific data passed through HTTP header, HTTP status codes, etc.
  • MongoDB
    • simple databases wrapper for uni-types collections of JSON data


I need you to try XRaw, check the existing snippets of API (we have some twitter, facebook, youtube, twitch, tumblr). Tell us what works, what doesn’t. What annotations do you need.

August 13, 2015
by Markus

Generation of Random Software Models for Benchmarks

Abstract—Since model driven engineering (MDE) is applied to larger and more complex system, the memory and execution time performance of model processing tools and frameworks has become important. Benchmarks are a valuable tool to evaluate performance and hence assess scalability. But, benchmarks rely on reasonably large models that are unbiased, can be shaped to distinct use-case scenarios, and are ”real” enough (e.g. non-uniform) to cause real-world behavior (especially when mechanisms that exploit repetitive patterns like caching, compression, JIT-compilation, etc. are involved). Creating large models is expensive and erroneous, and neither existing models nor uniform synthetic models cover all three of the wanted properties. In this paper, we use randomness to generate unbiased, non-uniform models. Furthermore, we use distributions and parametrization to shape these models to simulate different use-case scenarios. We present a meta-model-based framework that allows us to describe and create randomly generated models based on a meta-model and a description written in a specifically developed generator DSL. We use a random code generator for an object-oriented programming language as case study and compare our result to non-randomly and synthetically created code, as well as to existing Java-code.

KeywordsEMF, Benchmarks, Generation, Large models


Download Paper
RandomEMF at GitHub


  author = {Scheidgen, Markus},
  booktitle = {Proceedings of the 3rd Workshop on Scalable Model Driven Engineering},
  editor = {Kolovos, Dimitris S and Ruscio, Davide Di and Matragkas, Nicholas and Cuadrado, Jes\'{u}s  S\'{a}nchez and Rath, Istvan and Tisi, Massimo},
  pages = {1--10},
  publisher = {CEUR},
  title = {{Generation of Large Random Models for Benchmarking}},
  url = {http://ceur-ws.org/Vol-1406/paper1.pdf},
  year = {2015}

September 30, 2014
by Markus

Model-Based Mining of Source Code Repositories

Abstract—The Mining Software Repositories (MSR) field analyzes the rich data available in source code repositories (SCR) to uncover interesting and actionable information about software system evolution. Major obstacles in MSR are the heterogeneity of software projects and the amount of data that is processed. Model-driven software engineering (MDSE) can deal with heterogeneity by abstraction as its core strength, but only recent efforts in adopting NoSQL-databases for persisting and processing very large models made MDSE a feasible approach for MSR. This paper is a work in progress report on srcrepo: a model-based MSR system. Srcrepo uses the NoSQL-based EMF-model persistence layer EMF-Fragments and Eclipse’s MoDisco reverse engineering framework to create EMF-models of whole SCRs that comprise all code of all revisions at an abstract syntax tree (AST) level. An OCL-like language is used as an accessible way to finally gather information such as software metrics from these SCR models.

KeywordsEMF, Mining Software Repositories, Metrics, OCL, Software Evolution


 booktitle={System Analysis and Modeling: Models and Reusability},
 series={Lecture Notes in Computer Science},
 editor={Amyot, Daniel and Fonseca i Casas, Pau and Mussbacher, Gunter},
 title={Model-Based Mining of Source Code Repositories},
 publisher={Springer International Publishing},
 author={Scheidgen, Markus and Fischer, Joachim},

June 2, 2014
by Markus

Reference Representation Techniques for Large Models

Abstract—If models consist of more and more objects, time and space required to process these models becomes an issue. To solve this we can employ different existing frameworks that use different model representations (e.g. trees in XMI or relational data with CDO). Based on the observation that these frameworks reach different performance measures for different operations and different model characteristics, we rise the question if and how different model representations can be combined to mitigate performance issues of individual representations.

In this paper, we analyze different techniques to represent references, which are one important aspect to process large models efficiently. We present the persistence framework EMF-Fragments, which combines the representation of references as source-object contained sets of target-objects (e.g. in XMI) within the representation as relations similar to those in relational databases (e.g. with CDO). We also present a performance evaluation for both representations and discuss the use of both representations in three applications: models for source-code repositories, scientific data, and geo-spatial data.

KeywordsEMF, persistence, databases


 author = {Scheidgen, Markus},
 title = {Reference Representation Techniques for Large Models},
 booktitle = {Proceedings of the Workshop on Scalability in Model Driven Engineering},
 series = {BigMDE '13},
 year = {2013},
 isbn = {978-1-4503-2165-5},
 location = {Budapest, Hungary},
 pages = {5:1--5:9},
 articleno = {5},
 numpages = {9},
 url = {http://doi.acm.org/10.1145/2487766.2487769},
 doi = {10.1145/2487766.2487769},
 acmid = {2487769},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {EMF, big data, meta-modeling, mining software repositories, model persistence},

June 24, 2013
by Markus

SrcRepo: A model-based framework for analyzing large scale software repositories

This is a brief introduction to a new research subject that I recently started working on. It serves as a case-study for very large EMF models and applying big data techniques to EMF, which is my current research subject. I also covered this subject in this talk:

Problem: Is Software Engineering a Science?

Science is defined as a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe. But how testable are typical theses of software engineering:

    • DSLs allow domain experts to develop software effectively and more efficiently as with GPLs.
    • Static type systems lead to safer programming and fewer bugs.
    • Functional programming leads to less performant programs.
    • Scrum allows to develop programs faster.
    • My framework allows to develop … more, faster … with less, fewer

The reasons for a missing quantitative empirical research in software engineering are manifold and include issues like data quality, scalability, and heterogeneity. To elaborate on these issues, we should first look at the fields in software engineering that explicitly cover the empirical analysis of software.

Related Fields: Mining Software Repositories (MSR) and Metrics

Software repositories (i.e. source code repositories) contain more than source code. Market basket analysis style reasoning, e.g. “programmars than changed code X also changed code Y“, can be used to extract implicit dependencies from revision histories. This information (that is otherwise opaque) is used by traditional MSR [1] approaches to fix issues with individual repositories: (1) visualize implicit dependencies [2], (2) find or predict bugs [3,4], (3) identify architectural flaws, (4) or mine for API usage patterns [5].

The MSR community lacks a common technology that allows it to apply all developed techniques uniformly [1]. Instead, individual teams seem to build their own proprietary systems that are then only applicable to a specific MST technique. Aside from apparent reasons like concrete repository software or dependencies towards specific programming languages (issue 1: abstractions), this is mainly caused do to the resource extensiveness of MSR. Therefore, only very specialized systems can provide the performance needed (issue 2: scalability).

Software metrics are used to measure certain properties of software (e.g. size, complexity) to assess costs (e.g. to maintain or develop software). Similar to MSR metrics are language dependent (issue 1: abstractions) and calculating metrics over the evolution of software (or many software projects) is computational expensive (issue 2: scalability)

The presented issues make it hard to apply MSR to large scale software repositories (repositories with 100-1000 projects, e.g. Apache, Eclipse). But, I believe that (if these issues are overcome) MSR can be applied in a larger context, where many projects are analysed to learn something about software engineering itself. Traditional software metrics and their evolution over revision history as well as new metrics that include implicit dependency information can be used to empirically analyse (1) engineering methodologies, (2) programming languages, or (3) API design (patterns).

Approach: A Framework for the Analysis of Large-Scale Software Repositories

Programming APIs for source code repositories, reverse engineering software code into models (i.e. AST-level models of code), and frameworks for persisting large models allow us to examine a software repository as meta-model based data (e.g. an EMF model). Our tool srcrepo [6,7] already does this. It uses jGit, MoDisco, and emf-fragments [10] to create AST-level models of the revision histories in git-repositories of eclipse projects. This framework could be extended for other languages and source code repositories due to its (meta-)model-based nature. This abstract can solve issue 1.

For a metrics based analysis of such source code models, we need techniques to effectively describe and execute aggregation queries. To navigate within the extracted data effectively, all queries need to be managed and all accumulated data has to be associated with its source. The (meta-)modeling community has a large variety of appropriate model transformation and query technologies in store.

Applying MSR to a large number of source code repositories requires a lot of computation time. The rational is that model persistence techniques and query languages can be identified/developed that allow us to execute MSR on large computing clusters that are governed by modern cloud-computing frameworks (e.g. hadoop). emf-fragments [7] already uses hadoop’s hbase to persist models in manageable chunks (fragments). It is reasonable that we can tailor a OCL-like language to execute on these fragments in a map/reduce fashion.  This would solve issue 2.

First Case-Studies

Our framework srcrepo already allows us to create EMF-models from git repositories containing eclipse (Java) projects. The eclipse source repositories (git.eclipse.org) provide over 300 of such repositories, containing software projects of varying sizes, including eclipse itself.

To verify the conceptional soundness of a “model-based MSR”, we can apply existing MSR algorithms and techniques. Canditates are:

  • 1. [8] Here implicit dependencies are used to identify cross cutting concerns in software repository. Measurements on many repositories could be used to reason about the effectiveness of AOP or refactoring techniques.
  • 2. [9] Here the evolution of modularity in large code bases is analysed using Design Structure Matrices (DSM). The researches try to estimate the impact of refactoring efforts on the cohesion of modules.

Interesting research tracks

Metrics for revision histories

We have metrics for software code and software models, there are also fundamental metrics for software repositories. But there are no metrics that combine both. Especially are there no metrics that involve the impliciet dependencies hidden within source code repositories. Furthermore, with these dependencies metrics become uncertain and represent statistical processes and not exact numbers.

Comparing languages and methodologies

Language evangelist fight for decades over what is the “best” language and wich development process is the most efficient. MSR allows us to model development efforts precisely and more importantly promises to find the sources for avoidably costs or estimate the impact of certain tasks (e.g. refactoring). To correlate certain properties with used programming languages or methodologies, we need a large base of different software projects (open source) and the used techniques need to scale accordingly.


  1. Ahmed E. Hassan: The Road Ahead for Mining Software Repositories, 2008
  2. Thomas Zimmermann, Peter Weißgerber, Stephan Diehl, Andreas Zeller: Mining Version Histories to Guide Software Changes, 2005
  3. Nachiappan Nagappan, Thomas Ball, Andreas Zeller: Mining Metrics to Predict Component Failures, 2006
  4. Sunghun Kim, E. James Whitehead, Jr., Yi Zhang : Classifying Software Changes: Clean or Buggy? 2008
  5. CC Williams, JK Hollingsworth: Automatic mining of source code repositories to improve bug finding techniques
  6. Markus Scheidgen: Reference Representation Techniques for Large Models; BigMDE 2013
  7. http://github.com/markus1978/srcrepo
  8. Silvia Breu, Thomas Zimmermann, Christian Lindig: Mining Eclipse for Cross-Cutting Concerns, 2006
  9. Alan MacCormack, John Rusnak, Carliss Baldwin: Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code, 2005
  10. Markus Scheidgen, Anatolij Zubow, Joachim Fischer, Thomas H. Kolbe: Automated and Transparent Model Fragmentation for Persisting Large Models; MODELS 2012, Wien.

June 24, 2013
by Markus

Refactorings in Language Development with Asymmetric Bidirectional Model Transformations

Abstract—Software language descriptions comprise several heterogeneous interdependent artifacts that cover different aspects of languages (abstract syntax, notation and semantics). The dependencies between those artifacts demand the simultaneous adaptation of all artifacts when the language is changed. Changes to a language that do not change semantics are referred to as refactorings. This class of changes can be handled automatically by applying predefined types of refactorings. Refactorings are therefore considered a valuable tool for evolving a language.

We present a model transformation based approach for the refactoring of software language descriptions. We use asymmetric bidirectional model transformations to synchronize the various artifacts of language descriptions with a refactoring model that contains all elements that are changed in a particular refactoring. This allows for automatic, type-safe refactorings that also includes the language tooling. We apply this approach to an Ecore, Xtext, Xtend based language description and describe the implementation of a non-trivial refactoring.

KeywordsDSL evolution, language description, refactoring, bidirectional model transformations


  author    = {Martin Schmidt and
               Arif Wider and
               Markus Scheidgen and
               Joachim Fischer and
               Sebastian von Klinski},
  title     = {Refactorings in Language Development with Asymmetric Bidirectional
               Model Transformations},
  booktitle = {SDL Forum},
  year      = {2013},
  pages     = {222-238},
  ee        = {http://dx.doi.org/10.1007/978-3-642-38911-5_13},
  crossref  = {DBLP:conf/sdl/2013},
  bibsource = {DBLP, http://dblp.uni-trier.de}
  editor    = {Ferhat Khendek and
               Maria Toeroe and
               Abdelouahed Gherbi and
               Rick Reed},
  title     = {SDL 2013: Model-Driven Dependability Engineering - 16th
               International SDL Forum, Montreal, Canada, June 26-28, 2013.
  booktitle = {SDL Forum},
  publisher = {Springer},
  series    = {Lecture Notes in Computer Science},
  volume    = {7916},
  year      = {2013},
  isbn      = {978-3-642-38910-8},
  ee        = {http://dx.doi.org/10.1007/978-3-642-38911-5},
  bibsource = {DBLP, http://dblp.uni-trier.de}

June 24, 2013
by Markus

EMF Modeling in Traffic Surveillance Experiments

Abstract—We use a wireless sensor network equipped with acceleration sensors to measure seismic waves caused by rolling traffic. In this paper, we report on our experiences in applying an EMF-based data infrastructure to these experiments. We built an experimentation infrastructure that replaces unstructured text-file based management of data with a model-based approach. We use EMF to represent sensor data and corresponding analysis results; we use an extension of EMF’s resource API to persist data in a database; and we use model transformations to describe data analysis. We conclude that a model based approach leads to safer, better documented, and more reproducible experiments.

KeywordsTraffic surveillance , Wireless Sensor Networks, EMF, Smart City


 author = {Scheidgen, Markus and Zubow, Anatolij},
 title = {EMF modeling in traffic surveillance experiments},
 booktitle = {Proceedings of the Modelling of the Physical World Workshop},
 series = {MOTPW '12},
 year = {2012},
 isbn = {978-1-4503-1808-2},
 location = {Innsbruck, Austria},
 pages = {5:1--5:6},
 articleno = {5},
 numpages = {6},
 url = {http://doi.acm.org/10.1145/2491617.2491622},
 doi = {10.1145/2491617.2491622},
 acmid = {2491622},
 publisher = {ACM},
 address = {New York, NY, USA},

March 22, 2013
by Markus

MAC Diversity in IEEE 802.11n MIMO Networks

Abstract—Opportunistic Routing (OR) is a novel routing technique for wireless mesh networks that exploits the broadcast nature of the wireless medium. OR combines frames from multiple receivers and therefore creates a form of Spatial Diversity, called MAC Diversity [1]. The gain from OR is especially high in networks where the majority of links has a high packet loss probability. The updated IEEE S02.11n standard improves the physical layer with the ability to use multiple transmit and receive antennas, i.e. Multiple-Input and Multiple-Output (MIMO), and therefore already offers spatial diversity on the physical layer, i.e. called Physical Diversity, which improves the reliability of a wireless link by reducing its error rate. In this paper we quantify the gain from MAC diversity as utilized by OR in the presence of PHY diversity as provided by a MIMO system like S02.11n. We experimented with an IEEE S02.11n indoor testbed and analyzed the nature of packet losses. Our experiment results show negligible MAC diversity gains for both interference-prone 2.4 GHz and interference-free 5 GHz channels when using 802.11n. This is different to the observations made with single antenna systems based on 802.11b/g [1], as well as in initial studies with S02.11n [2].

KeywordsIEEE 802.11n , MAC Diversity , Opportunistic Routing , PHY Diversity , Research , Testbed , Wireless Networks

  author    = {Anatolij Zubow and
               Robert Sombrutzki and
               Markus Scheidgen},
  title     = {MAC diversity in IEEE 802.11n MIMO networks},
  booktitle = {Wireless Days},
  year      = {2012},
  pages     = {1-8},
  ee        = {http://dx.doi.org/10.1109/WD.2012.6402802},
  crossref  = {DBLP:conf/wd/2012},
  bibsource = {DBLP, http://dblp.uni-trier.de}
  title     = {Proceedings of the IFIP Wireless Days Conference 2012,n,
               Ireland, November 21-23, 2012},
  booktitle = {Wireless Days},
  publisher = {IEEE},
  year      = {2012},
  isbn      = {978-1-4673-4402-9},
  ee        = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6387977},
  bibsource = {DBLP, http://dblp.uni-trier.de}

March 22, 2013
by Markus

Map/Reduce on EMF Models

Abstract—Map/Reduce is the programming model in cloud computing. It enables the processing of data sets of unprecedented size, but it also delegates the handling of complex data structures completely to its users. In this paper, we apply Map/Reduce to EMF-based models to cope with complex data structures in the familiar an easy-to-use and type-safe EMF fashion, combining the advantages of both technologies. We use our framework EMF-Fragments to store very large EMF models in distributed key-value stores (Hadoop’s Hbase). This allows us to build Map/Reduce programs that use EMF’s generated APIs to process those very large EMF-models. We present our framework and two example Map/Reduce jobs for querying software models and for analyzing sensor data represented as EMF-models.

KeywordsEMF, big data, cloud computing, map/reduce, meta-modeling

 author = {Scheidgen, Markus and Zubow, Anatolij},
 title = {Map/reduce on EMF models},
 booktitle = {Proceedings of the 1st International Workshop on Model-Driven Engineering for High Performance and CLoud computing},
 series = {MDHPCL '12},
 year = {2012},
 isbn = {978-1-4503-1810-5},
 location = {Innsbruck, Austria},
 pages = {7:1--7:5},
 articleno = {7},
 numpages = {5},
 url = {http://doi.acm.org/10.1145/2446224.2446231},
 doi = {10.1145/2446224.2446231},
 acmid = {2446231},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {EMF, big data, cloud computing, map/reduce, meta-modeling},