Skip to content

3. Intermediate representation

Nuno Saavedra edited this page May 26, 2023 · 6 revisions

Intermediate Representation

The intermediate representation used by GLITCH is able to capture similar concepts from different IaC technologies, while assuring it is expressive enough to apply analyses that identify code smells. The analyses will run on the intermediate representation, allowing to generalize them. The intermediate representation saves information about the original source code, which allows to identify where the issues are in the code.

Abstract Syntax

A simplified abstract syntax of the intermediate representation GLITCH uses is shown below:

<S> ::= <project> 
      | <module> 
      | <unitblock>

<project> ::= 
    Project {
        name: <str>,
        modules: <module>*,
        blocks: <unitblock>*
    }

<module> ::= 
    Module {
        name: <str>,
        blocks: <unitblock>*
    }

<unitblock> ::=
    UnitBlock {
        name: <str>,
        atomic_units: <atomicunit>*,
        variables: <variable>*,
        attributes: <attributes>*,
        comments: <comment>*,
        conditions: <condition>*,
        unit_blocks: <unitblock>*
    }

<atomicunit> ::=
    AtomicUnit {
        name: <str>,
        type: <id>,
        attributes: <attribute>*
    }
    
<attribute> ::=
    Attribute {
        name: <id>,
        value: <value>,
        has_variable: <bool>,
        attributes: <attribute>*
    }
    
<condition> ::=
    ConditionStatement {
        type: IF | SWITCH,
        condition: <str>,
        else_statement: <condition>,
        is_default: <bool>
    }

<variable> ::=
    Variable  {
        name: <id>,
        value: <value>,
        has_variable: <bool>,
        variables: <variable>*
    }

<comment> ::=
    Comment {
        content: <str>
    }

<value> ::= <str> | <number> | <bool> | <value>* | <id> | null
<id> ::= ;sequence of alphanumerics which starts with a letter and can contain underscores
<str> ::= "<character>*"    
<number> ::= ;integer or double
<bool> ::= True | False

Components' description

Here we have a description for some components defined in the abstract syntax:

  • Project - Represents a generic folder that may contain several modules and unit blocks. Usually in IaC technologies, a folder for each project is created and it has a recommended structure (e.g. for Ansible the best practice is defined here)
  • Module - The top component from each code structure (e.g. Roles in Ansible, Cookbooks in Chef or Modules in Puppet) and they agglomerate the scripts necessary to execute a specific functionality. Modules are file system folders, usually with a specific organization (e.g. a role in Ansible usually has a tasks and a vars folder where, respectively, the tasks and variables for the role are defined, as shown here)
  • Unit Blocks - correspond to the IaC scripts themselves or to a group of atomic units (e.g. Classes in Puppet).
  • Atomic Unit - The building block of IaC scripts (e.g. Tasks in Ansible and Resources in Chef and Puppet). Atomic units define the system components we want to change and the actions we want to perform on them.
  • Condition statements - Some IaC technologies may define condition statements in their languages (e.g. if, switch, unless...) which we abstract using this component.

Other components that are not already in used are defined in the intermediate representation:

  • Dependency - Abstracts the concept of dependencies between IaC scripts (e.g. include, require, import, contain...). Every time a statement of this type appears in a script, a Dependency instance should be created.

  • Folder and File - These components were introduced to track the folders and files in which the other components are. A good example for why these components could be useful is to check if a project is following the structure that is considered the best practice.

How to extend the intermediate representation

The code for the intermediate representation is in the package repr. Each element in GLITCH's representation is coded as a class. Components present in scripts should inherit from the class CodeElement. If those components can contain other statements from the languages being abstracted, they should inherit from the class Block. Components that are not part of scripts such as Module or a Project do not need to inherit from any class.

To extend the intermediate representation, there are three main steps:

  1. A class for the new component should be created.
  2. The parsers for the IaC technologies should be changed to consider the new component, if the component exists on that technology.
  3. The RuleVisitor from which every analysis inherits should be changed to consider the new component. A default behavior should be defined on the RuleVisitor or a behavior in every analysis already implemented should be coded.