Using Model Learning for the Generation of Mock Components

. Mocking objects is a common technique that substitutes parts of a program to simplify the test case development, to increase test coverage or to speed up performance. Today, mocks are almost exclusively used with object oriented programs. But mocks could oﬀer the same beneﬁts with communicating systems to make them more reliable. This paper proposes a model-based approach to help developers generate mocks for this kind of system, i.e. systems made up of components inter-acting with each other by data networks and whose communications can be monitored. The approach combines model learning to infer models from event logs, quality metric measurements to help chose the components that may be replaced by mocks, and mock generation and execution algorithms to reduce the mock development time. The approach has been implemented as a tool chain with which we performed experimentations to evaluate its beneﬁts in terms of usability and eﬃciency.


Introduction
A technique commonly used in the context of crafting tests for software applications consists of replacing a software component (typically a class) with a test-specific version called mock, which behaves in a predefined and controlled way, while satisfying some behaviours of the original.Mocks are often used by developers to make test development easier or to increase test coverage.Mocks may indeed be used to simplify the dependencies that make testing difficult (e.g., infrastructure or environment related dependencies [21,3]).Besides, mocks are used to increase test efficiency by replacing slow-to-access components.This paper addresses the generation of mocks for communicating systems and proposes a model-based mock generation.When reviewing the literature, it is particularly noticeable that mocks are often developed for testing object oriented-programs and are usually written by hands, although some papers have focused on the automatic generation of mocks.Related Work: the idea of simulating real components (most of the time objects in the literature) with mocks for testing is not new in software engineering.The notion of mock object originates from the paper of Mackinnon et al. [14] and has then been continuously investigated, e.g., in [10,11,21,3].Some of these works pointed out the distinctions between mocks and other related terms such as stub or fake.In this paper, we will use the term mock to denote a component that mimics an original component and whose behaviours can be verified by tests to ensure that it is invoked as expected by the components being tested.
A few works related to mock generation have been proposed afterwards.Saff et al. [16] proposed to automatically replace some objects instantiated within test cases by mocks to speed up the test execution or to isolate other objects to make the bug detection easier.The mock generation is performed by instrumenting Java classes to record both method calls and responses in transcripts.These ones are used as specifications of mock objects.Tillmann and Schulte proposed to generate mocks by means of a symbolic analysis of .NET codes [24].These mocks represent variables, which can be given to a robustness test generator for producing unexpected but admissible input values.Galler et al. generate mocks from Design by Contract specifications, which allow developers to establish the semantics of objects with pre-, post-conditions and invariants [12].These conditions and invariants are used as specifications of mocks.Alshahwan et al. also proposed the mock generation from method post-conditions but also from test coverage measurements [2].
Apart from some guides or good practices dealing with Web service mocking, we did not find any attempt to mock other kinds of components in the literature, yet the need for replacing components by mocks for testing other kinds of systems under test (SUT) continues.Like object oriented-programs, the use of mocks for testing communicating systems could help experiment in isolation some components having dependencies that make testing difficult.Using mocks could also help increase test coverage.After our literature review, we believe that these four main obstacles currently prevent the use of mocks for testing communicating systems: the lack of specification.If no component specification is provided, it becomes long and difficult to develop interoperable mocks; the difficulty in maintaining mocks when SUT is updated; the difficulty of choosing the mockable components that is, those that may be replaced by mocks; the lack of tools to help generate mock components.
Contributions: this paper addresses these obstacles and proposes an approach for helping developers in: the analysis of a communicating system to classify its components; the choice of mockable components; and the mock generation.In our context, the mock components can be used for several purposes, e.g., for increasing test coverage, for security testing, or for testing new systems made up of reusable components during the development activity.To reach these purposes, our approach combines model learning, quality metrics evaluation and mock generation.Model learning is used to infer models, which encode the behaviours of every component of a communicating system and its architecture.On these models, we evaluate 6 quality metrics mostly related to Auditability, Testability and Dependability.These metrics allow to classify components into 4 categories: "Mock", "Test", "Test in Isolation" and "Code Review".We finally propose model-based algorithms to help generate and execute mocks.
This approach has been implemented as a tool chain available in [18].We performed a preliminary experimentation on a home automation system composed of smart devices to assess its benefits in terms of usability and efficiency.Paper organisation: Section 2 recalls some preliminary definitions and notations.Section 3 presents our approach: we give an overview of our model learning algorithm called CkTail; We define quality metrics and show how to classify components with them; we introduce the mock generation and execution algorithms.The next section introduces an empirical evaluation.Section 5 summarises our contributions and draws some perspectives for future work.

Preliminaries
We express the behaviours of communicating components with Input Output Labelled Transition Systems.This model is defined in terms of states and transitions labelled by input or output actions, taken from a general action set L, which expresses what happens.
Definition 1 (IOLTS).An Input Output Labelled Transition System (IOLTS) is a 4-tuple Q, q0, Σ, → where: -Q is a finite set of states; q0 is the initial state; -Σ ⊆ L is the finite set of actions.Σ I ⊆ Σ is the finite set of input actions, Σ O ⊆ Σ is the finite set of output actions, with We also define the following notations: (q 1 , a, q 2 ) ∈→⇔ def q 1 a − → q 2 ; q a − →⇔ def ∃q 2 ∈ Q : q a − → q 2 .Furthermore, to better match the functioning of communicating systems, an action has the form a(α) with a a label and α an assignment of parameters in P , with P the set of parameter assignments.For example, the action switch(f rom := c 1 , to := c 2 , cmd := on) is made up of the label "switch" followed by parameter assignments expressing the components involved in the communication and the switch command.We use the following notations on action sequences.The concatenation of two action sequences σ 1 , σ 2 ∈ L * is denoted σ 1 .σ 2 .denotes the empty sequence.A run q 0 a 1 (α 1 )q 1 . . .q n of the IOLTS L is an alternate sequence of states and actions starting from the initial state q 0 .A trace is a finite sequence of actions in L * .
The dependencies among the components of a communicating system are captured with a Directed Acyclic Graph (DAG), where component identifiers are labelled on vertices.
Definition 2 (Directed Acyclic Graph).A DAG Dg is a 2-tuple V Dg , E Dg where V is the finite set of vertices and E the finite set of edges.λ denotes a labelling function mapping each vertex v ∈ V to a label λ(v) Our approach is structured into 3 main steps, illustrated in Figure 1.A model learning technique is firstly applied to a given event log collected from a system denoted SUT.For every component c 1 of SUT, it generates one IOLTS L(c 1 ) expressing the behaviours of c 1 along with one dependency graph Dg(c 1 ) expressing how c 1 interacts with some other components of SUT.The second step computes quality metrics on these models, and assists the developer in the component classification under the categories: "Mock", "Test", "Test in Isolation", "Code Review".Once the mockable components are identified, the third step helps the developer in the mock generation by means of the IOLTSs produced previously.It is worth noting that mocks often increase test coverage along with the generation of more logs, which may be later used to generate more precise IOLTSs and re-evaluate metrics.This cycle may help produce mocks that better simulate real components.These steps are detailed in the following.

Fig. 2: Model learning with the CkTail approach
We proposed a model learning approach called Communicating system kTail, shortened CkTail, to learn models of communicating systems from event logs.We summarise here the functioning of CkTail but we refer to [17] for the technical details.The CkTail's algorithms rely on some assumptions, which are required to interpret the communications among the components of SUT in event logs.These are given below: Fig. 3: Example of model generation with CkTail -A1 Event log: we consider the components of SUT as black-boxes whose communications can be monitored.Event logs are collected in a synchronous environment.Furthermore, the messages include timestamps given by a global clock for ordering them.We consider having one event log; -A2 Message content: components produce messages that include parameter assignments allowing to identify the source and the destination of every message.Other parameter assignments may be used to encode data.Besides, a message is either identified as a request or a response; -A3 Component collaboration: the components of SUT can run in parallel and communicate with each other.But, they have to follow this strict behaviour: they cannot run multiple instances; requests are processed by a component on a first-come, first served basis.Besides, every response is associated with the last request w.r.t. the request-response exchange pattern.
The assumption A3 helps segment an event log into sessions, i.e. temporary message interchanges among components forming some behaviours of SUT from one of its initial states to one of its final states.
Figure 2 illustrates the 4 steps of CkTail.The event log is firstly formatted into a sequence of actions of the form a(α) with a a label and α some parameter assignments, by using tools or regular expressions.The second step relies on A3 to recognise sessions in the action sequence and to extract traces.In the meantime, this step detects dependencies among the components of SUT.It returns the trace set T races(SUT), the set of components C and the set Deps(SUT), which gathers component dependencies under the form of component lists c 1 . . .c k .We have defined the notion of component dependency by means of three expressions formulating when a component relies on another one.Intuitively, the two first expressions illustrate that a component c 1 depends on another component c 2 when c 1 queries c 2 with a request or by means of successive nested requests.The last expression deals with data dependency.The third step builds one dependency graph Dg(c 1 ) for every component c 1 ∈ C.These show in a simple way how the components interact together or help identify central components that might have a strong negative impact on SUT when they integrate faults.The last step builds one IOLTS, denoted L(c 1 ) for every component c 1 ∈ C. The IOLTSs are reduced by calling the kTail algorithm [4], which merges the (equivalent) states having the same k-future, i.e. the same event sequences having the maximum length k.
Figure 3 illustrates a simple example of model generation performed by Ck-Tail.The top of the figure shows an action sequence obtained after the first step.For simplicity, the labels directly show whether an action encodes either a request or a response.CkTail covers this action sequence, detects three components and builds three dependency graphs.For instance, Dg(d1) shows that d1 depends on G because the action sequence includes some requests from d1 to G. Thereafter, CkTail generates three IOLTSs whose transitions are labelled by input or output actions.For instance, the action req1(from:=d1,to:=G,...) has been doubled with an output !req1 and an input ?req1.The former is labelled on the transition q 0 → q 1 of L(D1) to express the sending of the request by d1; the latter is labelled on the transition q 0 → q 1 of L(G) to express that G expects to receive the input ?req1.

Quality Metrics
Component Categories Table 1: Classification of a component in component categories w.r.t.quality attributes.X stands for "is member of".X, X+, X++ denote 3 levels of interest.
Some quality attributes can now be automatically evaluated for all the components of SUT.By means of these attributes, we propose to classify the components into 4 categories "Mock", "Testable", "Testable in Isolation", "Code Review", to help developers dress their test plan.To select relevant quality attributes, we firstly studied the papers dealing with the use of mocks for testing, e.g., [10,11,21,3,16,12,2,23,22].In particular, we took back the conclusions of the recent surveys of Spadini et al. [22,21], which intuitively report that developers often use mocks to replace the components that are difficult to interpret, complex, not testable, or those that are called by others (e.g., external components like Web services).Then, we studied some papers related to Testability [7,19,8,9] and Dependability [20,5].We finally selected 6 attributes, which, when used together, help classify a component into the previous categories.We kept the attributes dedicated to: evaluating the degree to which a component of SUT is understandable and reachable through PO or PCO (point of control and observation).We consider Understandability and Accessibility; selecting the components that can be tested.Testability often refers to two other attributes called Observability and Controllability; identifying the dependencies among components.We distinguish between dependent and dependee components.Intuitively, the former depend on other components; the latter are required by other components.With regard to these two kinds of components, we consider In-and Out-Dependability.
Quality attribute measurement is usually performed on specifications with metrics.But in the present work, we have models inferred by a model learning technique.They generalise what we observed about SUT, but may expose more behaviours than those possible (over-approximation) or may ignore behaviours that can occur (under-approximation).As a consequence, we shall talk about fuzzy metrics in the remainder of the paper.We hence measure quality with the 6-tuple Acc f , U nd f , InDeps f , OutDeps f , Obs f , Cont f .This notion of fuzzy metric, albeit unusual, reinforces the fact that the quality measurement may evolve as we gather more data by testing SUT and updating the models.
Table 1 summarises our literature study and our interpretations of the relationships of a component with the four component categories studied in the paper with respect to quality metric measurements.We use the imprecise terms "weak" "strong" to express two levels of range of values whose definition is left to the user's knowledge on SUT.For instance, the range weak < 0.5 and strong ≥ 0.5 is a possible solution, but not suitable for any system.The relations expressed in Table 1 are discussed per category below.

Mock category: Table 1 brings out two kinds of mockable components:
accessible and dependee components, which could be replaced by mocks to deeper test how dependent components interacts with them.When Observability or Controllability of a dependee component are weak, the developer has to assess how the lack of Testability may impede the testing result interpretations.Furthermore, if a dependee component is also a dependant one, the mock may be more difficult to devise; accessible, uncontrollable and dependent only components are also good candidates because such components cannot be experimented with tests although they trigger interactions with other components.Mocking out those components should allow to deeper test SUT.As previously, the developer needs to consider Observability and in-Dependability to assess the difficulty of replacing these components with mocks.
Test, Test in Isolation categories: a testable component has to expose both Observability and Controllability.Out-Dependability (with OutDeps f > 0) is here used to make the distinction between the categories Test and Test in isolation.In Table 1, the level of interest is the lowest when a component exposes weak Observability or Controllability.Here, the developer needs to assess whether testing should be conducted.
Code Review category: Table 1 shows that many kinds of components belong to this category.These components either are unreachable or have unreadable behaviours, or they are not testable (weak Obs f or weak Cont f ).We now define the fuzzy quality metrics in the remainder of this section.The metrics for Understandability and Observability are taken from the papers [7,19,8,9] and adapted to our models.The metrics for Accessibility, Dependability and Controllability are revisited to take into account some specificities of communicating systems.
Component Understandability evaluates how much component information can be interpreted and recognised [1,15].In our context, this attribute mainly depends on the clearness/interpretation of the actions.As these are made up of parameter assignments, we say that Understandability depends on how assignments are interpretable, which we evaluate with the boolean expression is-Readable.For instance, the latter may be implemented to call tools for detecting whether parameter values are encrypted.Given a component c 1 ∈ C, the metric which assesses the Understandability of c 1 is given below.The more U nd f (c 1 ) is close to 1, the more interpretable the IOLTS L(c 1 ) is.
Component Accessibility is usually expressed through the accesses of Points of Control and Observations (PCO).Several PCO may be required to bring a full access to a communicating system.Accessibility may be hampered by diverse restrictions applied on the component interfaces e.g., security policies, or by the nature of the protocols used.We evaluate the ability to interact with a component c 1 ∈ C through its interfaces with: Component Dependability helps better understand the architecture of a component-based system, and may also be used to evaluate or refine other attributes, e.g., Reusability [20,5,13].The metric given below relies upon the DAGs generated by CkTail from which the sets of dependent and dependee components can be extracted.This separation offers the advantage of defining two metrics OutDeps f and InDeps f , which help better evaluate if a component is mockable.The degree to which a component requires other components for functioning is measured by OutDeps f .InDeps f defines the degree to which a component is needed by other ones.The closer to 1 OutDeps f (c 1 ) and InDeps f (c 1 ) are, the more important c 1 is in the architecture of SUT and its functioning. - Component Observability evaluates how the specified inputs affect the outputs [9].For a component c 1 modelled with the IOLTS L(c 1 ) = Q, q0, Σ, → , Observability is measured with: Component Controllability refers to the capability of a component to reach one of its internal state by means of a specified input that forces it to give a desired output.We denote the metric that evaluates how a component c 1 can be directly controlled through queries sent to its interfaces with ContD(c 1 ).This metric depends on the Accessibility of c 1 .But, when some interfaces are not accessible, we propose another way to measure the capability of controlling c 1 by considering interactions through a chain of components calling each other.
In this case, we define another metric denoted ContI.The Controllability of c 1 is measured with Cont f (c 1 ), which evaluates the best way to control c 1 , either directly or through a chain of components.

Mock Generation and Execution
In reference to [14,22], we recall that developing a mock comes down to creating a component that mimics the behaviours of another real component (H1).A mocks should be easily created, easily set up, and directly queriable (H2).In the tests, the developer has to specify how the mock ought to be exercised (H3).Besides, a mock can be handled by tests to verify that it runs as expected (H4).If the mock is not exercised as expected, it should return an error so that tests fail (H5).With regard to these requirements and to take advantage of the models inferred previously, we have designed a mock for communicating systems as a Mock runner, which is responsible for running behaviours encoded in a Mock model.
For a component c 1 ∈ C, a Mock model is a specialised IOLTS L that expresses some behaviours used to simulate c 1 (H1).It is specialised in the sense that every action a(α) has to include new assignments of the parameters weight, repetition, delay, so that it may be used as a mock specification by the Mock runner.The parameter weight, which is initialised to 0, will be used to better cover the outgoing transitions of an indeterministic state q, instead of randomly firing one of the transitions of q.The parameter repetition will be used to repeat the sending of an output action a large number of times without altering the readability of L. The parameter delay expresses a legal period of inactivity, and will be used to detect quiescent states.With an output action, delay expresses a waiting time before the sending of the action.With an input action, it sets the period of time after which the action cannot be received any-more.

Definition 4 (Mock model).
A Mock model for c 1 ∈ C is an IOLTS Q, q0, Σ, → such that Q is the finite set of states, q 0 is the initial state, → is the transi-tion relation, Q t ⊆ Q is the non empty set of terminal states, Σ is the action set of the form a(α) such that α is composed of the assignments of the parameters weight, repetition and delay.weight(a(α)) = w, repetition(a(α)) = r, delay(a(α)) = d denote these parameter assignments.Fig. 4: Quality metrics and model example for the system of Figure 3 A Mock model for c 1 may be written from scratch, but we strongly recommend to derive it from the IOLTS L(c1) (H2).For instance, for conformance testing, L might correspond to L(c1) whose some paths are pruned.Mocks are also used with other testing types.With robustness testing, a Mock model might be automatically generated by injecting faults in L(c1), e.g., transition removal, transition duplication, action alteration, etc.With security testing, the Mock model might be automatically generated from L(c1) by injecting sequences of transitions expressing attack scenarios.If we take back our example of Figure 3, the quality metrics given in Table 4a reveal that d1 and d2 are good candidates as mockable components.Figure 4b shows a mock example for d1.This IOLTS was written by hands from the IOTS L(d1) of Figure 3.It aims at experimenting G with unexpected and high temperature values.
A Mock runner is a generic piece of software in the sense that its design and implementation depend on the type of system considered.For instance, it may be implemented as a Web service for HTTP components.The Mock runner is implemented by Algorithm 1.It takes as input a Mock model L, which specifies the mock behaviours (H3).Then, it creates instances, i.e. concrete executions by following the paths of L from its initial state.We chose to create one instance at a time to make the test results more easily interpretable (H2).As a consequence, if an incoming action is received but cannot be consumed in the current instance, it is stored in "inputFifoqueue" for being processed later.The Mock runner starts an instance by either processing an incoming action in inputFifoqueue (line 3) or by sending an action if an output action may be fired from the initial state of L (line 9).In both cases, if the initial state is not deterministic, the Mock runner chooses the transition whose action includes the smallest weight.Then, the weight of this action is increased so that another transition will be fired later.In line 8, if the Mock runner receives an unexpected action, it inserts an error in its log, so that the test, which handles the Mock runner, may fail (H5).
The Mock runner creates an instance given under the form of the couple (r, time) with r a run of L and time the current time returned by the clock of the Mock runner.This last parameter is used to compute waiting times before sending actions or time delays during which the Mock runner allows the receipt of input actions.When the Mock runner creates an instance that starts with an output action (line 12), it sends it as many times as it is specified by the parameter repetition.The run r is updated accordingly.
Once a new run is created, the Mock runner calls the procedure treatInstance to process a run q 0 a 0 (α 0 ) . . .q until it expires.For simplicity, the run expiration (line 17) is not detailed in the procedure.We say that a run q 0 a 0 (α 0 ) . . .q expires if either q ∈ Q t is a terminal state, or q is a quiescent state (∀q a(α) − −− → q 2 : a(α) ∈ Σ I ∧ now() − time > delay(a(α)))).The procedure logs every run update (lines 18,33) so that the mock behaviours can be verified by tests (H4).The remaining of the procedure is very similar to Algorithm 1: it either waits for the receipt of an action a(α), or sends an output action if an output action may be fired from the state q.The procedure updates the run r and time for every received or sent action.

Preliminary Evaluation
Our approach is implemented as a prototype tool chain, which gathers the model learning tool CkTail, a tool to compute quality metrics on IOLTSs and DAGs, along with two Mock runners [18].The first is implemented as a Java Web service that can be deployed on Web servers.The second Mock runner is implemented as a C++ Web service that can be installed on some embedded boards (Arduino compatibles).The latter can replace real devices more easily as these boards can be placed anywhere, but their memory and computing capabilities are limited.At the moment, both Mock runners are implemented with a slightly simplified version of the algorithm proposed in Section 3.3 as they take IOLTS paths as inputs, given under the form of rules.However, both Mock runners offer the capability to execute on demand some robustness tests (addition or removal of messages, injection of unexpected values in HTTP verbs and contents) and some security tests (injection of denial-of-service (DoS) or Cross-site scripting (XSS) attacks).This prototype tool chain was employed to begin evaluating the usability of our approach through the questions given below.The study has been conducted on a real home automation system.We firstly monitored it during 5 minutes and collected an event log of 240 HTTP messages involving 12 components.From the event log, we generated 12 IOLTSs along with 12 DAGS and evaluated quality metrics.Table 2 provides the IOLTS sizes, 4 quality measures (Acc f = U nd f = 1 for all the components) and the recommendations given by Table 1. and can completely replace real devices.The mocks of the other components provided between 62% and 72% of valid traces.After inspection, we observed that these mocks, which simulate more complex components, received messages composed of unexpected values, e.g., temperate orders, and replied with error messages.These results confirm that the precision of the IOLTSs used to build mocks is important.Here, the IOLTSs are under-approximated (they exclude correct behaviours).
Besides replicating a real device, a mock aims to be called by a test case to set and verify expectations on interactions with a given component under test.We implemented the Mock runners with these purposes in mind.The Mock runners are services whose methods can be called from test cases.The mock initialisation is carried out by a method taking a rule set as a parameter.Besides, a test case can have access to the 10 last messages received or sent by the mock to verify its behaviour.We successfully wrote security test cases with the 6 previous mocks to check whether the gateway is vulnerable to some DoS or XSS attacks.In these test cases, the mocks are initialised with rules extracted from IOLTSs, and are then called to inject predefined attacks in these rules.We observed in this experiment that the gateway was vulnerable to the receipt of multiple long messages provoking slowdowns and finally unresponsiveness.
This study has also revealed several limitations that need to be investigated in the future.Although the Java Mock runner accepts large rule files and can replace complex components, the second Mock runner only supports rule files having up to 40 actions on account of the memory limitations of the board.In general, mocks are also implemented to speed up the testing stage.The Java Mock runner can indeed be used to quicker provide HTTP responses, but not the second Mock runner.Our current mock implementation does not support data flow management, which is another strong limitation.The data flow of the mocks do not follow any distribution and do not meet any temporal pattern.For instance, the mock of the light meter periodically sends luminance measurements, which are arbitrarily chosen.The data flow exposes unexpected peaks and falls, which corresponds to an incorrect behaviour for this kind of component.

Conclusion
We have proposed a model-based mock generation approach, which combines model learning, quality metrics evaluation and mock generation to assist developers in the test of communicating systems.Given an event log, model learning allows to get models, which can be automatically analysed with quality metrics to help classify every component of a communicating system and choose the best candidates for mocking out.The models are also used to ease the mock generation.As future work, we firstly plan to evaluate our approach on further kinds of systems, e.g., Web service compositions.We also intend to consider further quality metrics to refine the range of levels of interest for the mockable components.As suggested in our evaluation, we need to improve the Mock runner algorithms so that mocks might provide consistent data-flows, e.g., by following predefined distributions or temporal patterns.

Fig. 1 :
Fig. 1: Approach Overview a) Quality metrics (b) Example of Mock model for d1

Fig. 5 :
Fig. 5: Proportion of valid traces of the mocks