From 8273d0d72c68804f9cc952cf61e0f05a8f7e0e87 Mon Sep 17 00:00:00 2001 From: Mehdi Amini Date: Wed, 12 Jun 2019 15:26:09 -0700 Subject: [PATCH 1/2] MLIR Dialects for TensorFlow --- rfcs/20190612-mlir-dialect.md | 335 ++++++++++++++++++++++++++++++++++ 1 file changed, 335 insertions(+) create mode 100644 rfcs/20190612-mlir-dialect.md diff --git a/rfcs/20190612-mlir-dialect.md b/rfcs/20190612-mlir-dialect.md new file mode 100644 index 000000000..61611af88 --- /dev/null +++ b/rfcs/20190612-mlir-dialect.md @@ -0,0 +1,335 @@ +# TensorFlow MLIR Dialects + +|Status | Proposed | +|:------------ | :-----------------------------------------| +|**Author(s)** | Mehdi Amini (aminim@google.com) | +| | Tatiana Schpeisman (shpeisman@google.com) | +| | Chris Lattner (clattner@google.com) | +|**Sponsor** | Alexandre Passos (apassos@google.com) | +| | Jacques Pienaar (jpienaar@google.com) | +|**Updated** | 2019-06-10 | + +## Objective + +[MLIR](https://medium.com/tensorflow/mlir-a-new-intermediate-representation-and-compiler-framework-beba999ed18d) +is the intermediate representation and compiler framework we are investing in to +build the compiler infrastructure for TensorFlow. The representation for +TensorFlow exposed in this document will be what future high-level +transformations will operate on. + +We make use of two different dialects to model TensorFlow graphs in MLIR: first +the `tf_executor` dialect that represents the execution model of the TensorFlow +executor (e.g. control dependencies, deadness propagation) and the `tf` dialect +which represent the regular operations in a TensorFlow graph (the ones that +don’t have special contract with the executor). + +One intent of this design is that TensorFlow 2.x features can choose to target +just the `tf` dialect, allowing us to phase out the `tf_executor` dialect in +subsequent TensorFlow releases. The combination of the two dialects allows to +represent arbitrary existing TensorFlow graphs. + +The representation in this document does not address the specific needs of +accelerators or "custom backends" for TensorFlow. We plan to provide a generic +infrastructure for replacing the TF/XLA bridge with a more flexible and reusable +system across targets. A later design proposal will address these aspects. Also +this representation does not address shape inference, an independent design +exploration is being conducted separately at the moment. + +## TensorFlow Dialect + +The TensorFlow dialect in MLIR is an open dialect (it allows operations that +MLIR doesn't know about) that can contain any TensorFlow operation that does not +have a specific handling by the executor. These operations don’t operate on dead +values, don’t have control dependencies, and execute conceptually in program +order. The form used in this dialect aligns with the direction taken by +TensorFlow 2.0 with tf.function and autograph, as well as with the needs of +other frontends. This should ease the development of analyses and +transformations: optimizations operate on a simpler semantics and local graph +transformations can be validated in a local scope. Simple patterns like folding +`x-x` into a constant 0 do not need to update any control dependencies. It +should also be easily lowerable towards multiple accelerators and heterogeneous +systems in general. + +Operations in this dialect usually operate on tensor and scalar types defined in +the standard dialect. The extra defined types are specific to TensorFlow: `QINT` +types like !tf.qint8 (etc), `QUINT` types like !tf.quint8, all of the `REF` +types like !tf.uint8ref, as well as !tf.string, !tf.resource, and !tf.variant +which correspond to the tensorflow types of the same name. + +### Example: + +Below is an example of a function operating on the TensorFlow dialect: + +```mlir {.mlir} +/// This is a regular function, taking inputs by value and returning a new value. +/// The body is a regular CFG. +func some_function(%input : tensor<*xf32>) -> tensor<*xf32> { + // TensorFlow operations are not variadic: this `tf.add` operation always + // takes two inputs and returns a single output. This simplifies + // pattern-matching, verification and rewriting. + %added = tf.Add %input, %input : tensor<*xf32> + // Operations have sequential execution semantics in a basic block, there are + // no control dependencies. The compiler can reorder operations according to + // the as-if rule ( https://en.wikipedia.org/wiki/As-if_rule ). + %three = constant splat, 3.0> + %mul = tf.Mul %input, %three : (tensor<*xf32>, tensor) -> tensor<*xf32> + + // Only control flow v2 is supported in TF dialect. + // The tf.If operation takes three functions that accept the same + // arguments: the condition returns a bool and the two branches must return + // the same type, which is also the return of the tf.If. + %value = "tf.If”(%added, %mul) + {cond: @cond_func, true_branch: @func_foo, false_branch: @func_bar} + : (tensor<*xf32>, tensor<*xf32>) -> tensor<*xf32> + + return %value : tensor<*xf32> +} +``` + +## TensorFlow Executor Dialect + +The `tf_executor` dialect is intended to model the current TensorFlow executor +semantics and (when combined with the `tf` dialect) can represent arbitrary +TensorFlow 1.x and 2.x graphs. As such it follows the executor model, including +deadness propagation, concurrent semantics, and control dependencies. The +`tf_executor` dialect defines two dialect-specific types: + +* `!tf_executor.control` to represent control dependencies. +* `!tf_executor.token` to represent the pair of operations modeling + NextIteration operation. + +The `tf_executor` dialect is closed (operations are all known to MLIR) as there +are only 8 TensorFlow ops with specific graph executor behavior and 4 additional +operations to represent islands of predictability. + +This dialect models the TensorFlow executor semantics; as such, a large part of +the defined operations are mirroring the +[TensorFlow Control Flow Ops](https://www.tensorflow.org/api_docs/cc/group/control-flow-ops) +and +[implement Control Flow In TensorFlow](http://download.tensorflow.org/paper/white_paper_tf_control_flow_implementation_2017_11_1.pdf). +Also, almost all the operations accept a variadic number of control tokens and +return an extra control token as output. Except for `tf_executor.Merge` and +`tf_executor.ControlTrigger`, operations are propagating deadness: if any of the +input (control and non-control) is dead, all the outputs (control and +non-control) are dead as well. For `tf_executor.Merge`, the output is dead only +when either an input control token is dead or all of the regular inputs are +dead. For `tf_executor.ControlTrigger`, a live control output is always produced +even when some control inputs are dead. + +### `tf_executor.graph` Operation + +The `tf_executor.graph` operation contains a region with a single block that +lists the operations in a TensorFlow graph. The operations are topologically +sorted in-order (no cycles are allowed in the SSA values). The execution model +for operations in this block follows the TensorFlow executor semantics: + +1. Operations that don’t have any transitive dependencies through the SSA + def/use chains may be executed in parallel + (`tf_executor.NextIteration.Source` is the exception). +2. SSA values in this block can be implicitly dead. This means that every SSA + value defined in a `tf_executor.graph` can be considered implicitly wrapped + in a conceptual `dead_or` structure, and includes a runtime flag + indicating if the value is dead or present. Operations may have special case + handling of dead values. +3. Operations in this dialect return a value of type `!tf_executor.control` as + last returned value (exceptions are `tf_executor.NextIteration.sink` and + `tf_executor.fetch` which don’t return any value). + +The `tf_executor.graph` op only allows specific `tf_executor` dialect operations +in its body: the `tf_executor.graph` verifier will reject any unknown operation. +In order to execute standard `tf` dialect operations (like `tf.Add`) they must +be wrapped in the `tf_executor.island` operation. + +The `tf_executor.graph` operation does not accept any operands, inputs are +implicitly captured by the region, representing the feeds to the graph. + +The region attached to `tf_executor.graph` is terminated by a +`tf_executor.fetch` operation. The non-control operands of the terminator +correspond to the result values (or fetches) of the `tf_executor.graph` +operation. The behavior is undefined if any of the operands of the +`tf_executor.fetch` is dead. + +```mlir {.mlir} +%fetches = tf_executor.graph : tensor<*xf32> { + // Operations in the current block execute when their inputs are ready, + // possibly concurrently. + // Only operations in the tf_executor dialect are expected here. + // Ops can return multiple outputs and a control token for control + // dependencies. + // We don’t mention the control token in the return type here, it is implicit. + %0, %ctl0 = tf_executor.opA %feed#0, %feed#1 : tensor<*xf32> + %1, %ctl1 = tf_executor.opB : tensor<*xf32> + %2, %ctl2 = tf_executor.opC %1, %ctl0 : tensor<*xf32> + %3, %ctl3 = tf_executor.opD %2 : tensor<*xf32> + tf_executor.fetch %3 : tensor<*xf32> +} // end of the “tf_executor.graph" operation/region +``` + +### ‘tf_executor.island’ Operation + +The `tf_executor.graph` operation does not allow `tf` dialect operations to be +immediately nested underneath it. The `tf_executor.island` is introduced as a +wrapper for general computation (for example, all the `tf` dialect operations): +this results in a more consistent representation which makes analysis and +transformation simpler. + +The `tf_executor.island` operation has a single region with a single block +attached (only functional control flow is allowed). The block is terminated by a +`tf_executor.yield` operation. The operands of the terminator correspond to the +result values of the `tf_executor.graph` operation. An extra result of type +`!_tf_executor.control` is always produced by every `tf_executor.island`. + +Within an island, execution semantics follow standard sequential behavior +consistent with the direction of TensorFlow 2.0 and autograph, and desirable for +compiler analyses and transformations. Values in an island can’t be dead. Other +nested `tf_executor.graph` operations can be present in the region (or called +functions) to re-enable the TensorFlow executor behavior for a subsection of the +code. This is important for the following reasons: + +* Initially the functional control flow operations are calling functions + involving nested graphs, if `tf_executor.graph` weren’t allowed in an + island, these operations would need to have an equivalent in the + `tf_executor` dialect. +* Nesting also allows to form islands without involving inter-procedural + analyzes: any function call may involve a callee with a graph. + +The `tf_executor.island` region allows implicit capture. If any value captured +by a `tf_executor.island` is dead, the whole region does not execute and every +produced value is marked as dead as well. + +An arbitrary number of `tf_executor.control` operands are accepted by a +`tf_executor.island` operation. If any operand is dead, the region is not +executed and dead values are immediately returned for every result. + +```mlir {.mlir} +// The island is capturing implicitly %0 and %1. It is also taking a control +// dependency %ctl0 as input. It produces a tensor<*xf32> value matching the +// argument of the yield terminator, as well as an extra control token. +%2, %ctl2 = tf_executor.island (%ctl0) + : (tensor<*xf32>, !tf_executor<"control">) -> tensor<*xf32> { + %added = tf.Add %1, %0 : tensor<*xf32> + %mul = tf.Mul %added, %1 :tensor<*xf32> + + // The yield terminator operands are the result values of the island. + tf_executor.yield %mul : tensor<*xf32> +} +``` + +The case where a single operation is wrapped inside an island can even be +compressed by inferring the terminator to be the returned value of the +operation. The example above if it only contained the addition with implicit +capture would be displayed as: + +```mlir {.mlir} +%2, %ctl2 = tf_executor.island(%ctl0) wraps tf.Add %1, %0 : tensor<*xf32> +``` + +### `tf_executor.Switch` Operation + +[`tf_executor.Switch`](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/switch): +takes two inputs,`predicate`and`data`and returns two regular +outputs,`true_output`,`false_output`. The`data`input is copied +to`true_output`if`predicate`evaluates to true otherwise it is copied +to`false_output`. The other output is marked as dead. If one of the inputs or a +control token is dead, then all of the outputs are marked as dead as well. + +### `tf_executor.SwitchN` Operation + +[`tf_executor.SwitchN`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/control_flow_ops.cc#L49-L53): +takes two inputs,`data`and`index`and an integer attribute`num_outs`indicating +the number of outputs. The`data`input is copied to output indicated by +the`index` input. The other outputs are marked as dead. If one of the inputs or +a control token is dead, then all of the outputs are marked as dead as well. + +### `tf_executor.Merge` Operation + +[`tf_executor.Merge`](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/merge): +takes a variadic number of inputs, and returns a single output. The output is +defined as a non-dead input (selected in a non-defined way if multiple inputs +are non-dead). If all inputs are dead, the output is also dead. + +### NextIteration: `tf_executor.NextIteration.Source` and `tf_executor.NextIteration.Sink` Operation + +The TensorFlow +[`NextIteration`](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/next-iteration) +op is modeled using these two paired operations. Since _NextIteration_ is +intended for modeling the loop back-edges, breaking it in two different +operations allows to keep a structural +DAG.`tf_executor.NextIteration.Source`does not take any operand and produces two +results: one regular value corresponding to the TensorFlow graph, and a second +value of type`tf_executor.loop_token`. This token is consumed by the +paired`tf_executor.NextIteration.Sink`Operation alongside the value that is +passed through the back-edge. No value is returned +by`tf_executor.NextIteration.Sink`. The type of the result of the source must +match the type of the value operand of the sink. + +`tf_executor.NextIteration.Source` is an exception in the executor model in the +sense that it executes after the paired `tf_executor.NextIteration.Sink` even +though there is no data dependency between them. + +### `tf_executor.LoopCond` Operation + +[`tf_executor.LoopCond`](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/loop-cond): +forwards its boolean input to its output, +[it acts as`pivot` for marking the loop termination condition](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/control_flow_ops.h#L115-L118). + +### `tf_executor.Enter` Operation + +[`tf_executor.Enter`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/control_flow_ops.h##77-L79): +takes a single input and a`name` string attribute that identifies the execution +frame. It forwards its input to its output in the new execution frame. + +### `tf_executor.Exit` Operation + +[`tf_executor.Exit`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/control_flow_ops.h#L90-L92): +forwards its single input to its output, exiting the current execution frame. + +### `tf_executor.ControlTrigger` Operation + +[`tf_executor.ControlTrigger`](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/control-trigger): +it is similar to +[a no-op](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/control_flow_ops.h#L23-L26) +that acts as a placeholder for control dependencies. It always produces a live +control output even when some control inputs are dead. + +### `tf_executor.Send` Operation + +[`tf_executor.Send`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/sendrecv_ops.h#L24): +matches TensorFlow semantics. + +### `tf_executor.Recv` Operation + +[`tf_executor.Recv`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/sendrecv_ops.h#L37): +matches TensorFlow semantics. + +## Example + +Below is an example of a loop decrementing an initial `%_count.init` integer +until it reaches 0 and returns the last value in the loop. + +```mlir {.mlir} +// Loop `%count.init` times and return the last counter (always zero) +%fetches = tf_executor.graph { + + %loop.init, %ctl0 = tf_executor.Enter %count.init : i32 + + %next_count, %tok = tf_executor.NextIteration.Source : i32 + + %loop.body.init, %ctlMerge = tf_executor.Merge %loop.init, %next_count : i32 + + %dec_count, %ctlAdd = tf_executor.island + wraps tf.Add %loop.body.init, -1 : (i32, i32) -> i32 + + %loop_cond, %ctlNE = tf_executor.island + wraps tf.NotEqual %dec_count, 0 : (i32, i32) -> i1 + + %true, %false, %ctlSwitch = tf_executor.Switch %loop_cond, %dec_count : i32 + + tf_executor.NextIteration.Sink[%tok] %false : i32 + + %exit_count, %ctlExit = tf_executor.Exit %true : i32 + + tf_executor.fetch %exit_count : i32 +} // end of the "tf_executor.graph" operation/region +``` + From 3c11a3ea44f3bf9623de4078ab871dd40771fd29 Mon Sep 17 00:00:00 2001 From: Edd Wilder-James Date: Mon, 11 Nov 2019 10:36:19 -0800 Subject: [PATCH 2/2] Update 20190612-mlir-dialect.md Change status to Accepted --- rfcs/20190612-mlir-dialect.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20190612-mlir-dialect.md b/rfcs/20190612-mlir-dialect.md index 61611af88..388b44e24 100644 --- a/rfcs/20190612-mlir-dialect.md +++ b/rfcs/20190612-mlir-dialect.md @@ -1,6 +1,6 @@ # TensorFlow MLIR Dialects -|Status | Proposed | +|Status | Accepted | |:------------ | :-----------------------------------------| |**Author(s)** | Mehdi Amini (aminim@google.com) | | | Tatiana Schpeisman (shpeisman@google.com) |