Java programs are built using types and classes, which consist of class members such as methods and fields. Tai-e assigns a unique identifier, known as a signature, to each type, class, and class member. These signatures enable users to easily configure and specify the behavior of program analyzers for specific elements, such as in taint configuration (see How to Use Taint Analysis?). Additionally, they allow analysis developers to easily retrieve and manipulate program elements through Tai-e’s convenient APIs.

In some cases, it may be necessary to specify a large number of related classes or class members within a configuration or when implementing a particular program analysis. To streamline this process, we have designed and implemented various signature patterns and matchers for classes, methods, and fields, enabling you to specify and retrieve multiple elements using a single signature pattern.

This documentation will guide you through the format of signatures for types, classes, and class members, as well as the APIs for accessing these program elements via their signatures.

Since generic types are erased in Java, type signatures, along with class and class member signatures, do not include type parameters.

1. Type Signatures

In this section, we introduce the signatures for various Java types, including primitive types, reference types, and the void type.

1.1. Primitive Types

The signatures for the eight Java primitive types are simply their names: byte, short, int, long, float, double, char, and boolean.

1.2. Reference Types

Java reference types include class types (encompassing interfaces and enums) and array types. The signature formats for these types are outlined below.

1.2.1. Class Types (Including Interfaces and Enums)

The signature for a class type is its fully-qualified class name, which includes the package name. For an inner class, insert a $ between the outer class name and the inner class name. Here are some examples:

  • java.lang.String

  • pascal.taie.Main

  • org.example.MyClass

  • java.util.Map$Entry

1.2.2. Array Types

An array type signature consists of its base type followed by one or more [], with the number of [] indicating the array’s dimensions. Here are some examples:

  • java.lang.String[]

  • org.example.MyClass[][]

  • char[]

1.3. Void Type

The signature for the void type is simply void. This appears in Method Signatures for methods that do not return a value.

1.4. Programmatically Accessing a Type via Signature

For analysis developers, Tai-e provides convenient APIs to access various types. All the classes related to types, mentioned below, are located in the pascal.taie.language.type package.

In Tai-e, the TypeSystem class (accessible via World.get().getTypeSystem()) offers APIs to retrieve all types (except void, which is discussed later):

  • TypeSystem.getPrimitiveType(String): Retrieves a primitive type by its signature.

  • TypeSystem.getClassType(String): Retrieves a class type by its signature.

  • TypeSystem.getArrayType(Type,int): Retrieves an array type by its base type and the number of dimensions.

  • TypeSystem.getType(String): Retrieves a primitive type, class type, or array type by its signature.

Additionally, primitive types and the void type are implemented as enums in Tai-e, and can be directly accessed through their respective classes, such as IntType.INT and VoidType.VOID.

2. Class and Class Member Signatures

In this section, we introduce the signatures for classes and their members, specifically methods and fields. While constructors are typically considered class members, in Tai-e, they are treated as methods with a special name <init>, as explained in Method Signatures.

2.1. Class Signatures

Unsurprisingly, the format for class signatures is identical to that of class types, so we won’t repeat the details here.

2.2. Method Signatures

The format of a method signature is as follows:

<CLASS_TYPE: RETURN_TYPE METHOD_NAME(PARAMETER_TYPES)>
  • CLASS_TYPE: The signature of the class in which the method is declared.

  • RETURN_TYPE: The signature of the method’s return type.

  • METHOD_NAME: The name of the method.

  • PARAMETER_TYPES: A ,-separated list of parameter type signatures (Do not insert spaces around the ,!). If the method has no parameters, use ().

Here are some examples of method signatures:

<java.lang.Object: java.lang.String toString()>
<java.lang.Object: boolean equals(java.lang.Object)>
<java.util.Map: java.lang.Object put(java.lang.Object,java.lang.Object)>

As mentioned earlier, constructors are treated as methods in Tai-e. Each constructor has the name <init>, and its return type is always void. For example, the constructor signatures for ArrayList are:

<java.util.ArrayList: void <init>()>
<java.util.ArrayList: void <init>(int)>
<java.util.ArrayList: void <init>(java.util.Collection)>

Another special class member is the static initializer (also known as the class initializer), which is treated as a method with no arguments and no return value in Tai-e. The method name for a static initializer is <clinit>. For example, the signature of static initializer for Object is <java.lang.Object: void <clinit>()>.

2.3. Field Signatures

Like methods, field signatures uniquely identify fields within a Java program. The format of a field signature is as follows:

<CLASS_TYPE: FIELD_TYPE FIELD_NAME>
  • CLASS_TYPE: The signature of the class where the field is declared.

  • FIELD_TYPE: The signature of the field’s type.

  • FIELD_NAME: The name of the field.

For example, the signature for the field info in the following code:

package org.example;

class MyClass {
    String info;
}

is:

<org.example.MyClass: java.lang.String info>

2.4. Programmatically Accessing a Class or Member via Signature

Tai-e offers convenient APIs through the pascal.taie.language.classes.ClassHierarchy class, allowing analysis developers to access a class or member by its signature. The available methods are:

  • ClassHierarchy.getClass(String): Retrieves a class (JClass) by its signature.

  • ClassHierarchy.getMethod(String): Retrieves a method (JMethod) by its signature.

  • ClassHierarchy.getField(String): Retrieves a field (JField) by its signature.

3. Signature Patterns

Sometimes, users need to specify multiple related classes or members in a configuration, such as in taint analysis. To simplify this process, we have designed and implemented the signature pattern mechanism, similar to regular expressions but specifically tailored for classes and members. This allows users to conveniently specify multiple related classes or members using a single signature pattern.

In this section, we will introduce the formats of signature patterns and explain how to use them in analysis development.

3.1. Name Wildcards

Signatures are composed of various names, including class names, method names, field names, and type names within method and field signatures. To simplify specifying these names, we introduce the concept of name wildcards, which form the foundation of signature patterns. A name wildcard is any name that contains zero or more * characters, where each * can match any sequence of characters.

Here are some examples:

  • java.util.* matches all classes in the java.util package and its sub-packages (like java.util.regex)

  • get* matches all method names that start with get (like getName or getKey)

  • Names without any * characters match exactly (like toString only matches the toString methods)

3.2. Class Signature Pattern

Class signature patterns come in two forms:

  1. Basic Pattern: A name wildcard that directly matches class names.

    • Example: java.util.* matches all classes in the java.util package

    • Example: java.util.HashMap matches exactly that class

  2. Subclass Pattern: A name wildcard followed by ^ that matches both the specified classes and all their subclasses.

    • Example: java.util.List^ matches List and all classes that extend or implement it

    • Example: java.lang.*Exception^ matches all exception classes in the java.lang package and their subclasses, including classes like RuntimeException, IllegalArgumentException, and any custom exceptions that extend these classes

The subclass pattern is particularly useful when you need to capture an entire class hierarchy without listing each class individually.

3.3. Method Signature Pattern

Method signature patterns follow a format similar to method signatures but with added flexibility to match multiple methods. The general format is:

<CLASS_PATTERN: RETURN_TYPE_PATTERN METHOD_NAME_PATTERN(PARAMETER_TYPE_PATTERNS)>

Each component of the method signature pattern supports different matching mechanisms:

  • CLASS_PATTERN: Can be a class signature pattern (basic or subclass pattern).

  • RETURN_TYPE_PATTERN: A type signature pattern.

  • METHOD_NAME_PATTERN: Can be a name wildcard.

  • PARAMETER_TYPE_PATTERNS: A ,-separated list of type signature patterns (no spaces around ,), which also supports parameter wildcards.

Type Signature Patterns:

  • For class types, they are equivalent to class patterns.

  • For other types, they use simple name wildcard matching.

Parameter Wildcards: Method signature patterns support parameter wildcards, allowing you to specify repetition of type signature patterns. There are three types of repetition:

  1. Repeat exactly N times: TYPE_PATTERN{N}

  2. Repeat at least N times: TYPE_PATTERN{N+}

  3. Repeat between M and N times: TYPE_PATTERN{M-N}

Here are some examples of method signature patterns:

<java.util.List^: * get*(*)>

This pattern matches all methods in List and its implementations that start with get and have one parameter of any type.

<java.lang.*: void set*(java.lang.String,*)>

This pattern matches all methods in classes directly under the java.lang package that start with set, return void, and have two parameters: a String and any other type.

<*: java.lang.String toString()>

This pattern matches toString methods that return String and have no parameters, in any class.

<java.util.Map^: * *(java.lang.Object^,*)>

This pattern matches all methods in Map and its implementations that have two parameters: the first being Object or any of its subclasses, and the second being any type.

<java.lang.String: * format(java.lang.String,java.lang.Object^{0+})>

This pattern matches format methods in the String class that take a String parameter followed by zero or more Object (or subclass) parameters.

<java.util.Arrays: * asList(java.lang.Object{1-5})>

This pattern matches asList methods in the Arrays class that take between 1 and 5 Object parameters.

Method signature patterns provide a powerful way to specify groups of related methods across multiple classes, greatly simplifying configuration in various analyses. The addition of parameter wildcards further enhances this flexibility, allowing for precise matching of methods with varying numbers of parameters.

3.4. Field Signature Pattern

Field signature patterns follow a format similar to field signatures but with added flexibility to match multiple fields. The format of a field signature pattern is:

<CLASS_PATTERN: FIELD_TYPE_PATTERN FIELD_NAME_PATTERN>

This format is simpler than the method signature pattern, as field signatures do not include a parameter list. Each component (CLASS_PATTERN, FIELD_TYPE_PATTERN, and FIELD_NAME_PATTERN) supports the same matching mechanisms as in method signature patterns.

Example:

<java.util.List^: * size>

This pattern matches the size field in java.util.List and its subclasses, regardless of the field’s type.

3.5. Programmatically Accessing Multiple Classes or Members via Signature Pattern

Tai-e provides convenient APIs for analysis developers to retrieve multiple classes or members using signature patterns. To use these, developers first create a pascal.taie.language.classes.SignatureMatcher object, passing a ClassHierarchy as an argument. They can then use the following APIs:

  • SignatureMatcher.getClasses(String): Retrieves classes (JClass) based on the specified class signature pattern.

  • SignatureMatcher.getMethods(String): Retrieves methods (JMethod) based on the specified method signature pattern.

  • SignatureMatcher.getFields(String): Retrieves fields (JField) based on the specified field signature pattern.