LLVM 指令关系框架

本文概述了 TableGen 新的“关系”框架。该 TableGen 功能用于描述用户定义的指令之间关系，于 2012 年 10 月添加到 LLVM。

动机

该功能的动机源自 Hexagon 后端。与其他处理器类似，Hexagon 为许多指令提供了多种变体。在机器指令过程中，在同一指令的不同格式之间切换是一个常见的要求。例如，考虑一个加法指令，它有条件真 (Add_pt) 和条件假 (Add_pf) 两种形式。假设在目标降低过程中选择了非条件加法指令。然而，在条件转换过程中，优化过程可能会决定将非条件加法指令更改为条件真Add_pt形式。这些转换需要一个框架来将非条件形式与相应的条件形式联系起来。如果没有这样的框架，这种转换通常通过大型 switch 语句实现。使用 switch 语句方法有很多不足。switch 语句子句的手动实现需要非常高的维护成本。由于 switch 语句实现不完整，它还会导致优化机会的丢失。缺乏关系模型导致大约 15% 的 Hexagon 后端代码专门用于 switch 语句，其中一些函数的代码行数超过数千行。

这个问题促使我们探索了一些替代方案。我们开始寻找一个易于维护、灵活、可扩展且不易出错的框架。在 Hexagon 小组进行了一些初步讨论和集思广益后，我们决定修改 TableGen 以表达指令关系。最初的设计提交到 LLVM-dev 邮件列表以供审查。Jakob Stoklund 给出了宝贵的建议，并帮助设计了关系框架。这个想法是添加一种查询语言，用于定义不同类型的指令之间关系。TableGen 扩展以解析关系模型。它使用这些信息构建表格，这些表格被查询以确定与关系对应的新的操作码。Hexagon 后端严重依赖关系框架，它显着提高了我们目标代码的质量。

在深入了解实现细节之前，让我们考虑一个 API，该 API 以指令操作码作为输入，并返回其条件真/假形式。我们将首先查看基于 switch 语句的解决方案，然后将其与基于关系的实现进行比较。



short getPredicatedTrue(short opcode) {
switch (opcode) {
default:
  return -1;
case Hexagon::Add:
  return Hexagon::Add_pt;
case Hexagon::Sub:
  return Hexagon::Sub_pt;
case Hexagon::And:
  return Hexagon::And_pt;
case Hexagon::Or:
  return Hexagon::Or_pt;
case ... :
  return ...
}

short getPredicatedFalse(short opcode) {
switch (opcode) {
default:
  return -1;
case Hexagon::Add:
  return Hexagon::Add_pf;
case Hexagon::Sub:
  return Hexagon::Sub_pf;
case Hexagon::And:
  return Hexagon::And_pf;
case Hexagon::Or:
  return Hexagon::Or_pf;
case ... :
  return ...
}

short getPredicatedOpcode(short opcode, bool predSense) {
return predSense ? getPredicatedTrue(opcode)
                 : getPredicatedFlase(opcode);
}

由于案例数量众多，基于 switch 语句的方法变得非常笨拙。此外，随着新指令的添加，它需要持续维护。当一条指令具有多种关系时，问题变得更加复杂，因为每个 API 都必须更新。

关系框架为这个问题提供了一个非常系统的解决方案。它要求指令对其属性进行建模并对其组进行分类。例如，可以使用名为“PredSense”的字段来记录指令是否为条件指令及其条件类型。组中的每条指令都必须是唯一的，因此任何两条指令都不能共享所有相同的属性。至少必须有一个字段的值不同。通过为组中的所有指令分配一个共同的基名称来对指令组进行建模。这种方法的最大优势之一是，通过对这些属性和组进行一次建模，我们可以用很少的精力定义多种关系。

有了关系框架，getPredicatedOpcodeAPI 可以如下实现



short getPredicatedOpcode(short opcode, bool predSense) {
return predSense ? getPredicated(opcode, PredSenseTrue)
                 : getPredicated(opcode, PredSenseFalse);
}

这里，getPredicated()函数由关系框架自动生成。它对相应的关联表（也由框架生成）执行查询，并返回匹配的条件操作码（如果找到）。

架构

整个框架由一个名为InstrMapping的类驱动。TableGen 后端已扩展以包含一个用于使用InstrMapping类实现的关系模型的新解析器。任何关系模型都必须派生自此类并将所有类成员分配给适当的值。这是该类的样子



class InstrMapping {
  // Used to reduce search space only to the instructions using this relation model
  string FilterClass;

  // List of fields/attributes that should be same for all the instructions in
  // a row of the relation table. Think of this as a set of properties shared
  // by all the instructions related by this relationship.
  list RowFields = [];
  // List of fields/attributes that are same for all the instructions
  // in a column of the relation table.
  list ColFields = [];

  // Values for the fields/attributes listed in 'ColFields' corresponding to
  // the key instruction. This is the instruction that will be transformed
  // using this relation model.
  list KeyCol = [];

  // List of values for the fields/attributes listed in 'ColFields', one for
  // each column in the relation table. These are the instructions a key
  // instruction will be transformed into.
  list > ValueCols = [];
}

现在，让我们重新审视getPredicatedAPI。如前所述，此函数可以由关系框架自动生成。它要求我们定义一个关系模型，该模型可以将非条件指令与其条件形式联系起来



def getPredicated : InstrMapping {
  // Choose a FilterClass that is used as a base class for all the instructions modeling
  // this relationship. This is done to reduce the search space only to these set of instructions.
  let FilterClass = "PredRel";

  // Instructions with same values for all the fields in RowFields form a row in the resulting
  // relation table.
  // For example, if we want to relate 'Add' (non-predicated) with 'Add_pt'
  // (predicated true) and 'Add_pf' (predicated false), then all 3
  // instructions need to have a common base name, i.e., same value for BaseOpcode here. It can be 
  // any unique value (Ex: XYZ) and should not be shared with any other instruction not related to 'Add'.
  let RowFields = ["BaseOpcode"];

  // List of attributes that can be used to define key and column instructions for a relation.
  // Here, key instruction is passed as an argument to the function used for querying relation tables.
  // Column instructions are the instructions they (key) can transform into.
  //
  // Here, we choose 'PredSense' as ColFields since this is the unique attribute of the key
  // (non-predicated) and column (true/false) instructions involved in this relationship model.
  let ColFields = ["PredSense"];

  // The key column contains non-predicated instructions.
  let KeyCol = ["none"];

  // Two value columns - first column contains instructions with PredSense=true while the second
  // column has instructions with PredSense=false.
  let ValueCols = [["true"], ["false"]];
}

此关系模型被处理，并且信息用于构建一个表格以及查询该表格的 API。所有这些都将在XXXInstrInfo.inc文件中发出。但是，随着迄今为止所做的更改，我们可能会得到一个空的关联表，因为我们尚未定义PredSense和BaseOpcode用于任何指令。



multiclass ALU32_Pred {
  let PredSense = !if(PredNot, "false", "true"), isPredicated = 1 in
  def NAME : ALU32_rr<(outs IntRegs:$dst),
            (ins PredRegs:$src1, IntRegs:$src2, IntRegs: $src3),
            !if(PredNot, "if (!$src1)", "if ($src1)")#
            " $dst = "#mnemonic#"($src2, $src3)",
            []>;
}

multiclass ALU32_base {
  let BaseOpcode = BaseOp in {
    let isPredicable = 1 in
    def NAME : ALU32_rr<(outs IntRegs:$dst),
            (ins IntRegs:$src1, IntRegs:$src2),
            "$dst = "#mnemonic#"($src1, $src2)",
            [(set (i32 IntRegs:$dst), (OpNode (i32 IntRegs:$src1),
                                              (i32 IntRegs:$src2)))]>;

      defm pt : ALU32_Pred; // Predicate true
      defm pf : ALU32_Pred; // Predicate false
  }
}

let isCommutable = 1 in {
  defm Add : ALU32_base<"add", "ADD", add>, PredRel;
  defm And : ALU32_base<"and", "AND", and>, PredRel;
  defm Xor: ALU32_base<"xor", "XOR", xor>, PredRel;
  defm Or  : ALU32_base<"or", "OR", or>, PredRel;
}
defm Sub : ALU32_base<"sub", "SUB", sub>, PredRel;

以蓝色突出显示的字段仅用于关系框架。这里，PredRel是一个过滤器类，用于提取可能使用getPredicated关系模型相关的指令。使用此模型的所有指令都应派生自PreRel类。BaseOpcode用于将相关的指令分组在一起。在上面的示例中，加法指令Add, Add_pt, Add_pf的所有变体都将将其 BaseOpcode 设置为ADD。类似地，BaseOpcode对于Sub的所有变体，SUB。它可以是所有组中唯一的字符串。PredSense用于在每个组中识别指令。

借助这些额外信息，TableGen 能够构建以下 API。它提供与基于 switch 语句的方法相同的功能，并显着降低了维护开销



int getPredicated(uint16_t Opcode, enum PredSense inPredSense) {
static const uint16_t getPredicatedTable[][3] = {
  { Hexagon::Add, Hexagon::Add_pt, Hexagon::Add_pf },
  { Hexagon::And, Hexagon::And_pt, Hexagon::And_pf },
  { Hexagon::Or, Hexagon::Or_pt, Hexagon::Or_pf },
  { Hexagon::Sub, Hexagon::Sub_pt, Hexagon::Sub_pf },
  { Hexagon::Xor, Hexagon::Xor_pt, Hexagon::Xor_pf },
}; // End of getPredicatedTable

  unsigned mid;
  unsigned start = 0;
  unsigned end = 5;
  while (start < end) {
    mid = start + (end - start)/2;
    if (Opcode == getPredicatedTable[mid][0]) {
      break;
    }
    if (Opcode < getPredicatedTable[mid][0])
      end = mid;
    else
      start = mid + 1;
  }
  if (start == end)
    return -1; // Instruction doesn't exist in this table.

  if (inPredSense == PredSense_true)
    return getPredOpcodeTable[mid][1];
  if (inPredSense == PredSense_false)
    return getPredicatedTable[mid][2];
  return -1;
}

一旦指令被定义为适当地对它们的属性进行建模，定义新的指令映射就变得非常容易。现在，假设我们想要一个 API，该 API 允许我们将条件真指令转换为其条件假形式。这可以通过定义一个新的关系模型来完成。对于此模型，我们不需要修改任何指令定义，因为它们已经包含所有必要的的信息。



//===------------------------------------------------------------------===//
// Generate mapping table to relate predicate-true instructions with their
// predicate-false forms
//
def getFalsePredOpcode : InstrMapping {
  let FilterClass = "PredRel";
  let RowFields = ["BaseOpcode"];
  let ColFields = ["PredSense"];
  let KeyCol = ["true"];
  let ValueCols = [["false"]];
}

结论

我希望本文能够提供有关该框架的一些有用信息。Hexagon 后端广泛使用此功能，可以作为关系框架入门参考资料。