Creating and Analyzing Source Code Repository Models – A Model-based Approach to Mining Software Repositories

| 0 comments

Abstract—With mining software repositories (MSR), we analyze the rich data created during the whole evolution of one or more software projects. One major obstacle in MSR is the heterogeneity and complexity of source code as a data source. With model-based technology in general and reverse engineering in particular, we can use abstraction to overcome this obstacle. But, this raises a new question: can we apply existing reverse engineering frameworks that were designed to create models from a single revision of a software system to analyze all revisions of such a system at once? This paper presents a framework that uses a combination of EMF, the reverse engineering framework Modisco, a NoSQL-based model persistence framework, and OCL-like expressions to create and analyze fully resolved AST-level model representations of whole source code repositories. We evaluated the feasibility of this approach with a series of experiments on the Eclipse code-base.

KeywordsEMF, MSR, Reverse Engineering, Large Models

Presentation

Download Paper

Official Publication

SrcRepo at GitHub

BibTex

@conference{scheidgenFischerSchmidth2017,
author={Markus Scheidgen and Martin Smidt and Joachim Fischer},
title={Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories},
booktitle={Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development - Volume 1: MODELSWARD,},
year={2017},
pages={329-336},
doi={},
isbn={978-989-758-210-3},
}

Leave a Reply

Required fields are marked *.