Extensible Compiler Construction

Ekman, Torbjörn

Extensible Compiler Construction

Mark

Ekman, Torbjörn ^LU (2006)

Abstract: Processing of programs is a core area in computer science. A compiler that translates source text to machine language is the most well-known kind of tool in this area, but there are numerous other kinds of related applications: source-to-source translators, refactoring tools, reengineering tools, metrics tools, consistency checkers, etc. These tools perform similar analyses and can therefore benefit from shared infrastructure. This thesis addresses the problem of how to build program-processing tools much more easily, providing high-level concise ways of programming the tools, and providing good modularity and extensibility, allowing for a high degree of reuse between tools. We present Rewritable Reference Attributed Grammars (ReRAGs), a... (More); Processing of programs is a core area in computer science. A compiler that translates source text to machine language is the most well-known kind of tool in this area, but there are numerous other kinds of related applications: source-to-source translators, refactoring tools, reengineering tools, metrics tools, consistency checkers, etc. These tools perform similar analyses and can therefore benefit from shared infrastructure. This thesis addresses the problem of how to build program-processing tools much more easily, providing high-level concise ways of programming the tools, and providing good modularity and extensibility, allowing for a high degree of reuse between tools. We present Rewritable Reference Attributed Grammars (ReRAGs), a technique that builds on well-known software development techniques such as object-orientation, aspect-oriented software development, declarative programming, attribute grammars, and transformation systems. ReRAGs combine mechanisms from these areas into one coherent framework with synergistic effects on modularity and extensibility. These mechanisms support several different decomposition criteria for modularization that enable re-use: as separate computations on the program model, as a base language and language extensions, and the same decomposition

as used in a language specification for traceability. ReRAGs allow such modules to be decoupled from each other and automatically resolves complex context-sensitive dependences.

An evaluation algorithm for the formalism is presented and implemented in the JastAdd tool which combines ReRAGs with Java. We have implemented a complete Java 1.4 compiler to demonstrate the full potential of the JastAdd system and to evaluate its mechanisms for modularization and extensibility. We show how name analysis for Java with its complex visibility rules involving nested scopes, inheritance, qualified access, and syntactic ambiguities can be modularized in the same way as the informal Java language specification. We have also extended our Java compiler with non-null types for detection of possible null-pointer violations at compile time. The extension is completely modular and the technique allows for so called pluggable type systems that can be enabled at will.

The techniques scale to real languages and large applications. Our generated Java compiler passes as many tests as production use compilers during compliance testing, compiles applications larger than 100.000 lines of code, and the executable specification is less than two-thirds the size of handwritten compilers. (Less)
Abstract (Swedish): Popular Abstract in Swedish

Automatisk analys av programkod är ett kärnområde inom datavetenskap. Det mest typiska verktyget inom detta område är en kompilator som översätter källkod till maskinkod, men det finns många likartade verktyg, exempelvis översättning mellan språkdialekter, kontroll av kodkonventioner samt automatisk omstrukturering av kod. Eftersom dessa verktyg gör liknande analyser finns det stora möjligheter att återanvända kod. Denna avhandling behandlar problemställningen hur man kan utveckla verktyg så att god modularitet och utvidgningsbarhet uppnås, med målsättningen att låta flera verktyg utnyttja gemensam programkod.

I avhandlingen presenteras en ny teknik, Rewritable Reference... (More); Popular Abstract in Swedish

Automatisk analys av programkod är ett kärnområde inom datavetenskap. Det mest typiska verktyget inom detta område är en kompilator som översätter källkod till maskinkod, men det finns många likartade verktyg, exempelvis översättning mellan språkdialekter, kontroll av kodkonventioner samt automatisk omstrukturering av kod. Eftersom dessa verktyg gör liknande analyser finns det stora möjligheter att återanvända kod. Denna avhandling behandlar problemställningen hur man kan utveckla verktyg så att god modularitet och utvidgningsbarhet uppnås, med målsättningen att låta flera verktyg utnyttja gemensam programkod.

I avhandlingen presenteras en ny teknik, Rewritable Reference Attributed Grammars (ReRAGs), som bygger på ett flertal framgångsrika programvarutekniker: objektorientering, aspektorientering, deklarativ programmering, attributgrammatiker och transformationssystem. ReRAGs kombinerar mekanismer från dessa områden i ett sammanhängande ramverk vilket möjliggör god modularitet och utvidgningsbarhet. Detta gör att man kan dela upp mjukvaran i moduler baserat på olika kriterier, vilket i sin tur möjliggör återanvändning av moduler. Exempel på uppdelningskriterier är: separata analyser som utförs på programkoden, språkutvidgningar som förändrar ett basspråk eller samma indelning som i en språkspecifikation för att underlätta spårbarhet. Med hjälp av ReRAGs kan man beskriva dessa moduler var för sig och automatiskt lösa komplicerade kontextkänsliga beroenden. Vi beskriver en beräkningsalgoritm för ReRAGs som vi har implementerat i kompilatorverktyget JastAdd, där ReRAGs kombineras med Java. Dessutom har vi implementerat en fullständig Java 1.4 kompilator för att utvärdera de möjligheter för modularisering och utvidgningsbarhet som finns i ReRAGs. Vi visar hur man trots svårigheter som blockstruktur, arv, kvalificerade namn och syntaktiska tvetydigheter kan modularisera namnanalys i Java på samma sätt som i den informella språkspecifikationen. Vi har även utvidgat Javakompilatorn med så kallade non-null types som upptäcker felaktig användning av icke-initialiserade pekare redan vid kompileringen. Utvidgningen är helt modulär vilket gör att tekniken kan användas för valfria typsystem där användare själva kan välja önskade kontroller.

Teknikerna skalar upp till både fullständiga programspråk och stora program. Vår genererade Javakompilator uppfyller lika många testfall, ur en kompilatortestsvit, som de mest använda kompilatorerna och kan kompilera program som är större än 100.000 rader kod. Dessutom är vår exekverbara specifikation bara två tredjedelar så stor som traditionella handskrivna kompilatorer. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/546867

author

Ekman, Torbjörn ^LU

supervisor

Görel Hedin ^LU

opponent

Dr. Bracha, Gilad, Sun Microsystems

organization

publishing date

2006

type

Thesis

publication status

published

subject

Computer Sciences

keywords

numerical analysis, Computer science, attribute grammars, extensible compilers, context-sensitive transformations, declarative object-oriented programming, kontroll, system, numerisk analys, Datalogi, control, systems

pages

151 pages

publisher

Department of Computer Science, Lund University

defense location

Room E:B, E-building, Ole Römers Väg 3, Faculty of Engineering, Lund University, Sweden

defense date

2006-06-05 13:15:00

ISBN

91-628-6839-X

language

English

LU publication?

yes

id

a082e224-87f3-4181-9e38-49f34ccdd2c9 (old id 546867)

date added to LUP

2016-04-01 15:45:23

date last changed

2025-04-04 15:29:48

@phdthesis{a082e224-87f3-4181-9e38-49f34ccdd2c9,
  abstract     = {{Processing of programs is a core area in computer science. A compiler that translates source text to machine language is the most well-known kind of tool in this area, but there are numerous other kinds of related applications: source-to-source translators, refactoring tools, reengineering tools, metrics tools, consistency checkers, etc. These tools perform similar analyses and can therefore benefit from shared infrastructure. This thesis addresses the problem of how to build program-processing tools much more easily, providing high-level concise ways of programming the tools, and providing good modularity and extensibility, allowing for a high degree of reuse between tools. We present Rewritable Reference Attributed Grammars (ReRAGs), a technique that builds on well-known software development techniques such as object-orientation, aspect-oriented software development, declarative programming, attribute grammars, and transformation systems. ReRAGs combine mechanisms from these areas into one coherent framework with synergistic effects on modularity and extensibility. These mechanisms support several different decomposition criteria for modularization that enable re-use: as separate computations on the program model, as a base language and language extensions, and the same decomposition<br/><br>
<br/><br>
as used in a language specification for traceability. ReRAGs allow such modules to be decoupled from each other and automatically resolves complex context-sensitive dependences.<br/><br>
<br/><br>
An evaluation algorithm for the formalism is presented and implemented in the JastAdd tool which combines ReRAGs with Java. We have implemented a complete Java 1.4 compiler to demonstrate the full potential of the JastAdd system and to evaluate its mechanisms for modularization and extensibility. We show how name analysis for Java with its complex visibility rules involving nested scopes, inheritance, qualified access, and syntactic ambiguities can be modularized in the same way as the informal Java language specification. We have also extended our Java compiler with non-null types for detection of possible null-pointer violations at compile time. The extension is completely modular and the technique allows for so called pluggable type systems that can be enabled at will.<br/><br>
<br/><br>
The techniques scale to real languages and large applications. Our generated Java compiler passes as many tests as production use compilers during compliance testing, compiles applications larger than 100.000 lines of code, and the executable specification is less than two-thirds the size of handwritten compilers.}},
  author       = {{Ekman, Torbjörn}},
  isbn         = {{91-628-6839-X}},
  keywords     = {{numerical analysis; Computer science; attribute grammars; extensible compilers; context-sensitive transformations; declarative object-oriented programming; kontroll; system; numerisk analys; Datalogi; control; systems}},
  language     = {{eng}},
  publisher    = {{Department of Computer Science, Lund University}},
  school       = {{Lund University}},
  title        = {{Extensible Compiler Construction}},
  year         = {{2006}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Extensible Compiler Construction