Domain Knowledge Driven Program Analysis

Today I drove to Cambridge (it’s just 1h each way) to attend a talk given by Daniel Ratiu, who is visiting Microsoft Research this week. He presented a summary of his PhD work, which he will defend in precisely one month. Daniel’s main message was that program analyses can benefit from the use of some domain knowledge, instead of being mostly based on the syntax and structure of the code (e.g. call graphs).

His approach consists of first extracting a graph of the program’s identifiers and how they are related (e.g. X is-a Y if class X extends class Y). Given a similar graph representing the domain’s ontology, he then uses the identifiers and the graph’s structure to try to match the program’s concepts to the domain’s concepts. This then allows one to see the conceptual coverage of a program, its conceptual redundancy, the logical cohesion of an architecture, etc.

However, there are usually no domain ontologies available. So, he and other colleagues used the approach to extract the concept graph from several APIs for the same domain (e.g. GUIs) and then do pairwise intersections: any concept belonging to at least 2 APIs would be added to the ontology. In this way they obtained e.g. an ontology of GUI artefacts (menus, buttons, etc.) and could see which implementations don’t cover important artefacts (e.g. AWT doesn’t provide tables). The approach of course relies on the program using good identifiers (not just cryptic abbreviations), and that the identifiers reflect the domain’s vocabulary.

Another example he showed was a visualisation of packages or classes of a program that refer a certain concept. This can expose non-optimal modularizations of code w.r.t. the domain’s concepts.

Daniel also gave me a demo of his tool, available as an Eclipse plugin. I especially liked the fact that the extraction of the concept graph and the matching of graphs can be parametrized in a simple configuration file. For example, one can choose Wordnet to make identifiers uniform (e.g. children is mapped to child), although Wordnet has several shortcomings for technical domains as used in most software e.g. concepts related to GUIs. I also liked that Daniel made the concept graphs he extracted publicly available, in two easy to process formats, namely OWL and also as a text file with one triple (node-relation-node) per arc.

Overall, a quite nice talk and body of work; a pity it was very poorly attended.

As an added bonus of taking the time to go to Cambridge, the Microsoft Research reception hall has some goodies for visitors: copies of the DVD of the Royal Institution’s Christmas 2008 lecture and of the 2020 Science report. I’m already looking forward to watch and read them.

Domain Knowledge Driven Program Analysis

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112