HoCL: High level specification of dataflow graphs

We introduce HoCL (Higher order Coordination Language), a domain specific language (DSL) for specifying hierarchical, parameterized dataflow graphs. HoCL leverages on a purely functional semantics to allow graph structures to be described in an abstract and concise manner. Generic graph patterns, in particular, can be encapsulated as user-definable higher-order functions. HoCL descriptions are independent of the underlying dataflow model of computation and the HoCL compiler is intended to be used as a front-end to existing dataflow visualization, analysis and implementation tools. HoCL and its documentation are freely available on Github [17].


INTRODUCTION
Dataflow modeling is used extensively for designing digital signal processing (DSP) systems. With this approach, applications to be implemented are described as graphs of persistent processing entities, named actors, connected by first in, first out (FIFO) channels and performing processing ("firing") when their incoming FIFOs contain enough data tokens. By varying the semantics of these firing rules, many dataflow models of computations (MoCs) can be defined, offering different trade-offs between expressivity and predictability, while keeping the key property of dataflow models : their ability to naturally express the intrinsic parallelism of DSP applications. * Also with IETR, UMR 6164 Université Rennes 1, INSA Rennes.
ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. As a result, a wide variety of dataflow-based design tools have been developed, such as Ptolemy [3], LabView [10] or Preesm [13], for specification, simulation and synthesis, for hardware or software implementation, of dataflow-oriented applications.
With these tools, the specification of the application is typically carried out textually, using some form of graph notation, or graphically, using a dedicated Graphical User Interface (GUI). In both cases, the specification of large or/and complex graphs quickly becomes tedious and error-prone.
In this paper, we propose HoCL, a domain-specific language (DSL) aimed at simplifying and streamlining the description of large and/or complex dataflow graphs. The key feature of this language is the ability to describe graph structures as functions, so that several well-known and powerful concepts drawn from functional programming languages -such as polymorphic typing and higher order functions -can be applied both to ease and secure the task of describing these graphs.
The rest of this paper is organized in seven sections. Section 2 presents the main features of the HoCL language, by means of small examples. Section 3 gives some insights on its formal semantics. Section 4 describes the current implementation, in particular the available backends. Section 5 describes the design, with HoCL, of a complete DSP application. Section 6 is short review of related work and section 7 concludes the paper.

THE HOCL LANGUAGE
A small example of program is given in Listing 1, with the corresponding DFG in Fig. 1.
The abstract syntax of the HoCL language is given in Fig. 2. The meta-syntax is classical : Keywords are denoted in bold, other terminals in italics; non-terminals are enclosed in angle brackets. Vertical bars are used to indicate alternatives. Constructs enclosed in brackets are optional. An asterisk (*) indicates zero or more repetitions of the previous element, and a plus (+) indicates one or more repetitions.
Type expressions denote types attached to actor input and output ports 1 . A type can be either a base type τ or the constructed type τ param. Base types are associated to "regular" data flows, param types to parameters (see Sec. 2.6). Base types are limited to the predefined types int and bool, abstract types (introduced by an explicit type declaration) and type variables. Type variables are used for declaring polymorphic actors. The type list is predefined but cannot be used in type expressions. § ¤ type t ; node f in ( i : t ) out ( o1 : t , o2 : t ) ; node k in ( i : t ) out ( o : t ) ; node h in ( i 1 : t , i 2 : t ) out ( o : t ) ; Node declarations (introduced by the node keyword) define node models. A node model is made of an interface and a description. The interface gives the name of the node and the name and type of each input and output. In the example, for simplification, all inputs and outputs have type t. The description can be either empty (as for nodes f, k and h) or be given as a list of definitions (as for node g). In the first case, the declaration describes an atomic actor. Instances of such nodes are viewed as black boxes at the specification level 2 . In the second case, the given definitions describe a subgraph 3 .
Definitions (introduced by the val keyword) bind names to values, denoted by expressions. The syntax of expressions is classical. The usual boolean and arithmetic associated operations (+, . . . ) are pre-defined. The data constructors :: and [] (cons and nil, for building and manipulating lists) and () (unit) are also predefined.
Full application of the function associated to a node then instantiates this node in the graph. In Fig. 1, for example, the definition given for node g specifies that the corresponding subgraph is built by • instantiating twice node k, • connecting the input of the first instance to the input of the subgraph, • connecting the input of the second instance to the output of the first instance, • connecting the output of the second instance to the output of the subgraph.
Graph declaration (introduced by the graph keyword) are also made up from an interface and a description but, at the difference of node declarations, the description is always a subgraph and the corresponding graph is automatically (implicitly) instantiated 4 . In the example of Listing 1, the toplevel graph top is built by • first instantiating actor f, connecting its input i, • then instantiating twice subgraph g, connecting the first (resp. second) output of node f 5 to the input of the first (resp. second) instance, • finally instantiating actor h, connecting its first (resp. second) input to the output of the first (resp. second) instance of subgraph g.
It must be noted that the binding(s) performed by a val definition actually depends of the name(s) occurring in the left-hand side. If a name refers to the output of a (sub)graph, then this output is simply connected to the value denoted by the right-hand side (this value must correspond to a (sub)graph input or a node output port, which will be checked by the typing phase). This is the case, for example, for output o in the definition of subgraph g. Otherwise, the val definition is simply used to name intermediate values (like let declarations in functional programming languages). This is the case, for example, in the definition of graph top, where the first val definition is used to distinguish the two outputs of node f.

Labeled arguments
As evidenced by the type signature (1), HoCL supports label-based passing of arguments. For example, the three following applications of node h (as declared in Listing 1) are all valid and equivalent : v a l o1 = h x y v a l o2 = h i 1 : x i 2 : y v a l o3 = h i 2 : y i 1 : x This is useful for nodes having a large number of inputs, when passing the arguments to the corresponding function "in the right order" may become error-prone. This is especially true if a large proportion of these inputs have the same type, because the resulting error(s) will not be caught by the type checker in this case 6 .
As will be shown, in Sec. 2.6, labeled arguments are also useful for relaxing the constraints on parameterized node signatures.

IO-less nodes
Nodes with no input (resp. no output) can be described by specifying en empty input (resp. output) list. In applications, these nodes can represent data sources (resp. sinks) 7 . This is exemplified in Listing 2 and  The ▷ and ▶ operators used in Listing 2 are defined by 8 x Functions like twice or iter may be viewed as a means of capturing wiring patterns in dataflow graphs. For this reason, we call them wiring functions, to distinguish them from "ordinary" functions operating on scalar values.
HoCL comes with a standard library defining several useful wiring functions encapsulating classical graph patterns. Figures 4 and 5, for example, gives the code and interpretation as a graph pattern, of the pipe and map higher order wiring functions. 8 These operators are written |> and |-> in the HoCL concrete syntax respectively.  An important feature is that all these functions are defined using regular HoCL declarations, i.e. within the language itself 9 . The set of available higher order graph patterns is therefore not fixed but can be freely modified and extended by the application programmer to suit her specific needs. This is in strong contrast with most dataflowbased design tools (those listed in the introduction in particular) in which similar abstraction mechanisms, when available, rely on a predefined and fixed set of patterns. 9 In file lib/hocl/stdlib.hcl of the distribution, technically.

Recursive graphs
In a dataflow context, a recursive graph is a graph in which the refinement of some specific nodes is the graph itself. A typical example is provided by Lee and Parks in their classical paper on dataflow process networks [11].
This example is an analysis/synthesis filter bank. The corresponding dataflow graph has a regular structure which can be characterized by its "depth". Fig. 6, for example, shows a graph of depth three 10 .  where the type of the data tokens has here been arbitrarily set to int.

Cyclic graphs
In most of dataflow models, cycles in a graph -i.e. sequences of edges connecting the output of an actor to one of its inputs -denote dependencies between two successive activations of this graph. For example, the graph in Fig. 8 describes a recursive filter, taking as input a sequence x 1 , x 2 , . . . of tokens and producing as output the sequence y 1 , y 2 , . . ., where The special actor named delay is here used to store the z value between two successive activations and to provide the initial value z 0 .

Figure 9: A DFG showing mutual recursion between nodes
It is important to note that HoCL describes dataflow graph without making any assumption on their underlying dynamic semantics. A definition like v a l r e c ( o , z ) = f i z for example, is accepted and gives a graph in which the second output of node f is directly connected to its second input. Such a graph clearly does not satisfy the liveness property of the synchronous dataflow (SDF) model for example (in the absence of an initial token on its second input, the f actor will never fire). The reason is that checking this kind of properties ultimately depends on the underlying model of computation, something that HoCL, as a pure coordination language, does not take into account, leaving the corresponding analysis to dedicated tools taking its output as input 11 .

Parameterized graphs
The term parameterized dataflow was introduced in [1] to describe a meta-model which, when applied to a given dataflow model of computation (MoC), extends this model by adding dynamically reconfigurable actors. Reconfigurations occur when values are dynamically assigned to parameters of such actors, causing changes in the computation they perform and/or their consumption and production rates. The precise nature of changes triggered by reconfigurations and the instants at which these reconfigurations can occur both depend on the target MoC. HoCL offers a MoC-agnostic interface to this feature using a dedicated type to distinguish parameters from "regular" data flows.
Consider, for example, the delay actor occurring in Fig. 8. This actor can be parameterized by the value initially produced on its output. For this, it will be declared as follows : node d e l a y in ( i n i t : α param , i : α ) out ( o : α ) where α designates a type variable 12 .
This makes delay a polymorphic actor with type : init: α param → i: α → α The example given in Fig. 8 can be reformulated using the configurable version of the delay actor as shown in Listing 4 and In Fig. 10, parameter values are drawn as house-shaped nodes and parameter dependencies using dashed lines. In Listing 4, the value of the init parameter has been set to 0. The enclosing quotes are here used to turn a value of type int into a value of type int param.
It can be noted that, with this approach, actor (re)configuration is interpreted as the partial application of the corresponding function. The following definition v a l d e l a y 0 = d e l a y ' 0 ' for example, defines an actor delay0, with type int → int, taking and producing flows of integers and producing 0 as its initial value.
It can be further noted that, using labeled arguments, this interpretation can hold even when parameter(s) are not specified at first position in the list of node inputs. For example, would the delay actor have been defined as : node d e l a y in ( i : α , i n i t : α param ) out ( o : α ) 12 Written 'a in ASCII-encoded source code.
the definition of the delay0 actor would still be possible by writing : v a l d e l a y 0 = d e l a y i n i t : ' 0 '

Parameters and hierarchy
When a parameterized node is refined as a subgraph, the value of the parameter(s) can be used to parameterize the nodes of the subgraph, either directly or by means of some dependent computations. This allows parameters to be propagated across graph hierarchies. This is illustrated by the following program, which expands into the graphs depicted in Fig. 11. In graph sub, k is viewed as an input parameter (drawn as a dashed input port in Fig. 11) and used to parameterize both instances of the mult actor, first directly and second by through the parameter expression k+1. It is important to note that, although this could make sense in this particular example, parameter expression are not statically evaluated by the HoCL compiler since their interpretation ultimately depends on the target MoC (which controls, in particular, when parameters are evaluated to trigger the reconfiguration of the dependent actors). Parameter dependencies create dependency trees. The root of these trees can be either constants, as in the previous example, or specified as top level input parameters, as illustrated in the following program, which is an equivalent reformulation of the previous example. Note that, at the difference of node parameters, toplevel parameters must be given a value.

SEMANTICS
The semantics of the HoCL language is specified in natural (big step) style. It gives the interpretation of HoCL programs, described with the abstract syntax given in Fig. 2, as a set of dataflow graphs, where each graph is defined as a set of boxes connected by wires. The formulation given here assumes that the program has been successfully type checked 13 . This semantics is built upon the semantic domain described in Fig. 12, using the following meta-syntax : The + symbol denotes union; tuples are denoted between angle brackets (⟨. . .⟩), sets between curly brackets ({. . .}); an asterisk ( * ) (resp. + ) in superscript position denotes zero (resp. 1) or more repetitions of the scripted element. Values in the category Loc denote graph locations. Such locations are made of a box index and a selector. Boxes correspond to actor instances. Selectors are used to distinguish inputs (resp. outputs) when the box has several of them. Valid selectors start at 1. The selector value 0 is used for incomplete box definitions.
Nodes are described by • a node category, indicating whether the node is a toplevel graph or an ordinary node 14 , • a list of inputs, each with an attached value 15 , • a list of outputs, • an implementation, which is either empty (in case of opaque actors) or given as a graph.
Boxes are described by • a box category, • a input environment, mapping selector values (1,2,. . . ) to wire identifiers, • a output environment, mapping selector values to sets of wire identifiers 16 , • an optional value.

Box categories separate boxes
• resulting from the instantiation of a node, • materializing graph inputs and outputs, • materializing graph input parameters, • materializing graph local parameters.
The box category rec is used internally for building cyclic graphs. The box value is only meaningful for local parameters bound to constants or for toplevel input parameters (giving in this case the bound value). 13 The type checking rules are classical and not discussed here. 14 This avoids having two distinct but almost identical semantic values for nodes and toplevel graphs. 15 These values are used to handle partial application. 16 A box output can be broadcasted to several other boxes.
Wires are pairs of graph locations : one for the source box and the other for the destination box.

The environments E, B and W respectively bind
• identifiers to semantic values, • box indices to box description, • wire indices to wire description.
In this paper, the description of the semantics is deliberately limited to a subset of the associated inference rules. A complete version is available on the Github repository [18]. In these rules, all environments are viewed as partial maps from keys to values. If E is an environment, the domain of E is denoted by Dom(E). The empty environment is written ∅. [x → y] denotes the singleton environment mapping x to y. E(x) denotes the result of applying the underlying map to x (for ex. if E is [x → y] then E(x) = y) and E ⊕ E ′ the environment obtained by adding the mappings of E ′ to those of E, assuming that E and E ′ are disjoints. The semantics of value declarations (either at the program top level or within node declarations) is described in Fig. 15. Rule VDecls gives the semantics of a sequence of such declarations and rule VDecl of a single declaration. Declarations are interpreted in order. Each declaration updates the value, box and wire environments. Rule Binding gives the semantics of bindings occurring in definitions. The ← − ⊕ operator used in this rule merges box descriptors. If a box appears in both argument environments, the resulting environment contains a single occurrence of this box in which the respective input and output environments have been merged. For example   Figure 14: Semantics, part 1 (node declarations) Fig. 16 describes the semantics of application, the most salient feature of the HoCL language. Rule EAppC deals with the application of closures and follows the classical call-by-value strategy (the closure body is evaluated in an environment augmented with the bindings resulting from binding the pattern to the value of argument). Rules EAppNP and EAppNF deal with the application of nodes. The former is used for partial application. The value resulting from the evaluation of the argument (which must be a graph location) is simply "pushed" on the list of supplied inputs. The latter describes the full application of a node. Here, a new box b is created and its inputs are connected to wires w j representing the arguments. The function cat used in rule EAppNF is trivially defined as cat(actor) = actor and cat(Graph) = graph. For simplicity, the formulation of the rule EAppNF assumes that single values and tuples of size one are semantically equivalent 18 . Note that the outputs of the inserted box are left unconnected at this level. They will be connected when the result of the application is bound by the rule Binding described in Fig. 15.

IMPLEMENTATION
A prototype compiler, implementing the semantics described in the previous section has been written in OCaml and is available on Github [17]. The distribution includes a command-line compiler, turning HoCL source files into various dataflow graph representations, and a toplevel interpreter, supporting interactive building of dataflow graphs.
The command-line compiler currently supports four distinct backends : DOT, DIF, Preesm and SystemC.
The DOT backend produces graphical representations of the generated graphs in .dot format, to be visualized with the graphviz [5] set of tools. All the graph representations used in this paper have been produced by this backend from the corresponding programs.
The DIF backend produces representations in the Dataflow Interchange Format. DIF [9] provides a standard, textual, notation for dataflow graphs aimed at fostering tool cooperation. By using DIF as an intermediate format, graphs specified in HoCL can be passed to a variety of tools for analysis, optimization and implementation. 18 I.e. that ⟨Loc ⟨l, 1⟩ ⟩ ≡ Loc ⟨l, 1⟩.
The Preesm backend generates code for PREESM [13], an open source prototyping tool for implementing dataflow-based signal processing applications on heterogeneous multi/many-core embedded systems. Using this backend is illustrated in Sec. 5.
The SystemC backend generates executable SystemC code for the simulation of simple DDF (Dynamic DataFlow) and SDF (Synchronous DataFlow) graphs (for which the behavior of the actors is described in C or C++).
A short video illustrating the use of the toplevel interpreter is available online [16].

A COMPLETE EXAMPLE
In order to demonstrate the gain in abstraction and programmer's productivity offered by the HoCL language, we consider a small DSP application consisting in applying in parallel a sequence of three filters on a single data stream and selecting the "best" output according to a given criterion. Apart from the fact that it's typical of the kind of processing performed in the DSP domain, this application was chosen because we already had a working implementation, obtained with the Preesm [13] tool.
The dataflow graph, initially specified "by hand" using the Preesm GUI is depicted in Fig. 13. In this figure : • gray boxes denote actors, • orange boxes denote dedicated broadcasting nodes, • blue triangle-shaped boxes denote parameter sources, • black arrows denote data wires and • dashed, blue arrows denote parameter wires. Input data, generated by the src node, is passed, through the bcast node to three parallel chains of nodes. In the first chain (bottom), data goes first through filter f1, then f2 and finally f3. In the second (middle), the order is f3, then f1 and finally f2. In the third (top), it is f2, f3, f1. The respective output data are finally given as input to the select node. Each filter node f takes a parameter input named p. For simplicity, the value of this parameter has here been considered as constant for all filters. The select node also takes a parameter, named thr.

¦ ¥
Listing 5: A description of the graph depicted in Fig. 13 in HoCL Lines 3-14 declare the involved atomic actors. It has been assumed here that all processed data has type f16 (a shorthand for the fix16 type used in the original implementation). Both the p parameter of the f1, f2 and f3 actors and the thr parameter of the select actors are here declared as int.
The graph itself is described in the top declaration, lines 16-26. The global parameters p and thr, with a default value (here arbitrarily set to 2 and 128), are declared as input parameters of this graph.
The value fs, defined at line 20, is a list made of the three filters, with their supplied parameter.
The wiring function chain, defined at line 21, is used to build the horizontal chains of filters depicted in Fig. 13. It takes a list of integers s and a input wire x and connects x to the sequence of nodes obtained by permuting the elements of the fs list. Permutation is done by the shuffle function and chaining by the pipe function. The pipe function has been introduced in Sec. The wiring function sel, defined at lines 22-23, encodes the main graph pattern : it applies its arguments c1, c2 and c3 in parallel to its argument x and routes the three results to the select actor.
The top level graph is built, at lines 24-26 by applying the sel function to the three chains of filters, themselves obtained by applying the chain function to the corresponding lists of permutation indices.
Writing the program in Listing 5 took less than 15 minutes and the resulting dataflow graph was obtained immediately. By contrast, describing the initial version of the graph using the Preesm GUI took more than one hour. This times includes the definition of the node interfaces, the placement of the nodes on the canvas and, above all, the manual, cumbersome, drawing of the connexions between the nodes. This represent a four time increase in productivity. Moreover, and most importantly, whereas it's straightforward, with the HoCL formulation, to modify the graph (adding or modifying the number of chains, changing the permutation choices, etc.) to test new application configurations, this task is much more tedious and error-prone with the purely GUI-based representation.

RELATED WORK
The idea of describing dataflow graphs using a functional programming language goes back to the VAL [12] and SISAL [6] dataflow languages. Since then, it has been exploited in languages such as Lava [2], CLaSH [7] (for designing digital circuits), Fran [4] (in the context of functional reactive programming) or in synchronous programming languages such as Lustre [8] or Lucid Synchrone [14]. Like HoCL, these functional programming languages offer the possibility to encode graph patterns using higher order functions. But, because their goal is to assign both a static and a dynamic semantics to programs -in other words to describe not only the topology of dataflow graphs but also their behavior -they do not really meet the needs of programmers when the goal is simply, and pragmatically, to avoid the "manual", GUI-based, specification of large dataflow graphs, to be passed further to existing analysis and implementation tools. Hence the need for a simple coordination language, acting as a front-end for such existing tools, which is precisely the goal of HoCL.
In their seminal paper on dataflow process networks [11], Lee and Parks noted that the the replication of a given actor on parallel streams can be denoted using the map higher order function. But no attempt was made to generalize the correspondence between functional expressions and graph structures beyond the particular pattern captured by the map HOF. The work of Sane et al. [15] is more closely related to ours. They proposed an extension to the DIF [9] notation supporting the use of so-called topological patterns for explicit and scalable representation of regular structures in DFGs. The definition of these patterns explicitly relies on a indexing mechanism for nodes and edges. HoCL is more general in the sense that any dependency pattern can be described, and not only those based on explicit indexing. Moreover, in the work described in [15], patterns are built-in and the set of available patterns is therefore fixed. By contrast, patterns are first class values in HoCL, and can therefore be defined directly by the programmer, within the language 19 .
The HoCL language was inspired, in part, by the network description language used in the caph language for dataflow-based high-level synthesis [19]. Some design decisions were also motivated by conclusions of a retrospective assessment of the caph project reported in [20]. The idea of a language playing the role of a front-end to existing analysis and implementation tools, in particular, can be viewed as an answer to the "invasiveness" problem mentioned in [20].

CONCLUSION
The design and development of the HoCL language started recently and this paper should be viewed more as a draft specification than as a definite language reference.
Work is undergoing for reformulating in HoCL complex DSP applications, initially developed with tools using lower-order specification formalisms, such as Ptolemy, DIF or Preesm, in order to further assess the gain in expressivity and in the effort required by the specification of the input dataflow graph.
An important issue which remains to be investigated, in particular, is whether MoC-specific features can be "injected" into the language without compromising its use as a general and MoCagnostic coordination language. The current version, for example, allows actor ports to be annotated with production and consumption rates, to be used by backends supporting an SDF semantics (DIF or Preesm for example). It is still uncertain, however, whether such an annotation-based approach is always feasible or whether some specific MoCs may require deeper changes to the syntax or semantics of the language itself.