Model generation of component-based systems

This paper presents COnfECt, a model learning approach, which aims at recovering the functioning of a component-based system from its execution traces. We refer here to non concurrent systems whose internal interactions among components are not observable from the environment. COnfECt is specialised into the detection of components of a black-box system and in the inference of models called systems of labelled transition systems (LTS). COnfECt tries to detect components and their specific behaviours in traces, then it generates LTS for every component discovered, which captures its behaviours. Besides, it synchronises the LTSs together to express the functioning of the whole system. COnfECt relies on machine learning techniques to build models: it uses the notion of correlation among actions in traces to detect component behaviours and exploits a clustering technique to merge similar LTSs and synchronise them. We describe the three steps of COnfECt and the related algorithms in this paper. Then, we present some preliminary experimentations.


Introduction
The effort required for writing (formal) models has been a strong barrier to the widespread adoption of model-based testing or verification approaches in the industry. Most of today's developers indeed feel that writing models is a difficult, long and error-prone task. This obstacle can be overcome with model learning approaches (Angluin 1987;Biermann and Feldman 1972;Ernst et al. 1999;Meinke and Sindhu 2011;Lorenzoli et al. 2008;Ohmann et al. 2014;Durand and Salva 2015;Pastore et al. 2017), which have proven to be valuable for recovering models that can be exploited in several software engineering steps . The inferred models can be seen as a documentation useful for understanding the functioning of a system; they can be completed and improved to become formal specifications; several papers also showed that model learning can be employed within effective bug finding techniques (Mariani and Pastore 2008;Hangal and Lam 2002;Tappler et al. 2017), or can be used to directly generate test cases (Dallmeier et al. 2012;Shahbaz and Groz 2013;Durand and Salva 2015).
A substantial part of these approaches are specialised to infer models from black-box systems. These offer the advantage of remaining usable when the application code is not available or when the system internal state cannot be known. These approaches capture the behaviours of systems interacting with their environments in a variety of models, e.g. states machines (Angluin 1987;Biermann and Feldman 1972) or invariants (Ernst et al. 1999;Meinke and Sindhu 2011). The model generation is performed either by interacting with the system (active approaches) or by analysing a set of execution traces resulting from the monitoring of the system (passive approaches). In this paper, we focus on the second category.
In this context, several papers recently proposed innovative solutions for designing new learning algorithms and tools, which have the capability to infer symbolic models , resource-aware models (Beschastnikh et al. 2011;Ohmann et al. 2014) and timed models (Pastore et al. 2017), and which can be applied to more and more complex systems. But, few works (Groz et al. 2008;Mariani and Pastore 2008;Beschastnikh et al. 2014) dealt with the generation of models from integrated systems made up of components. Yet numerous present systems are constituted of reusable features or components, which interact together. The inference of models encoding the functioning of every component into a sub-model and how they interact together would greatly ease the readability and analysis of the whole system. Furthermore, such models would offer the possibility to concentrate the efforts for bug detection on some specific sub-parts of the system. These observations motivate this work, which addresses these two research challenges: Challenge 1: given a system under learning SUL, how to learn a model from its execution traces, in such a way that the model captures the behaviours of the SUL components and their synchronisations? Challenge 2: how to manage the level of generalisation of the models, and how to synchronise the sub-models of components?
To address these challenges, we designed a new method called COnfECt (COrrelate Extract Compose) and a corresponding tool for learning models of component-based systems. COnfECt is a passive model learning approach, which generates a system of LTSs (labelled transition systems) from execution traces of non concurrent systems whose internal interactions among components are not observable from the environment. This is often the case for systems having a tightly coupled architecture (strong dependency among components), e.g. embedded systems (vending machines, electronic toys), Internet of Things (IoT) devices (smart thermostat, security camera) or for software made up of modules or subprograms that are dependent upon each other.
COnfECt is composed of three main steps called Trace Recovery, Trace Analysis & Extraction and LTS synchronisation. The first step derives formatted traces from raw messages; the second analyses the traces and tries to identify distinctive sub-sequences in traces and to link them to separate components. The last step generates a system of LTSs by means of three strategies. This model encodes the behaviours of every component by a LTS and shows how they are synchronised together. The strategies adapt the LTS synchronisation and provide several systems of LTSs having different levels of generalisation. These steps rely on machine learning techniques to detect the behaviours of components: traces are analysed with correlation factors based on string similarity metrics and algorithms; the LTS synchronisation step relies on a clustering technique to group similar LTSs.
We have implemented a prototype tool to experiment COnfECt and evaluate its benefits. We provide a preliminary evaluation in the paper, which assesses the correct component detection, the relevance and size of the models and the efficiency/scalability of the algorithm, compared with two other model learning approaches kTail and CSight. We also examine potential threats to the validity of our evaluation.
Paper organisation Section 2 presents some papers related to our approach and our motivations. Section 3 recalls some definitions about the LTS model. Section 4 gives an overview of the functioning of COnfECt with an example. Section 5 describes the steps of the approach. The next section shows the results of the experimentation of COnfECt and discusses about the threats to validity. Finally, Section 8 summarises our contributions and draws some perspectives for future work.

Related work
Model learning can be defined as a set of methods that infer a specification by gathering and analysing system executions and concisely summarising the frequent interaction patterns as state machines that capture the system behaviour (Ammons et al. 2002). These models, even if partial, can serve many purposes, e.g., they can be used as documentation, examined by designers to find bugs, or can be given to testing methods for the test case generation. Models can be generated from different kinds of data samples such as affirmative/negative answers (Angluin 1987), execution traces (Krka et al. 2010), source code (Pradel and Gross 2009) or network traces (Antunes et al. 2011).
Most of the approaches fall into two categories called active and passive model learning, although some works cover both (Petrenko et al. 2017). Active learning approaches, e.g. Angluin (1987), Dupont (1996), Raffelt et al. (2005), Alur et al. (2005), Berg et al. (2006), andHowar et al. (2012) and Hossen et al. (2014), repeatedly query systems or humans to collect positive or negative observations, which are studied to build models. Many existing active techniques have been conceived upon two concepts, the L * algorithm (Angluin 1987) and incremental learning (Dupont 1996). This model learning category is actively studied to make the approaches more effective and efficient. For instance, some researchers recently proposed optimisations to reduce the query number , while others tackled systems having specific constraints (Hossen et al. 2014). Active learning cannot be applied on any system though. For instance, uncontrollable systems cannot be queried easily, or the use of active testing techniques may lead a system to abnormal functioning, because it has to be reset many times.
This brings us to the second category, which includes the techniques that passively generate models from a given set of samples, e.g. a set of execution traces. These techniques are said passive since there is no direct interaction with the system. Models are here often constructed by encoding sample sets with automata whose equivalent states are merged. The state equivalence is usually defined by means of state-based abstractions or event sequence abstractions. The approaches that use state-based abstractions, e.g. Meinke and Sindhu (2011), adopted the generation of state-based invariants to define equivalence classes of states that are combined together to form final models. The Daikon tool (Ernst et al. 1999) was originally proposed to infer invariants composed of data values and variables found in execution traces. With event sequence abstractions, the abstraction level of the models is raised by merging equivalent states (Biermann and Feldman 1972;Mariani and Pezze 2007). In the kTail approach (Biermann and Feldman 1972), the equivalent states are those having the same k-future, i.e. the same event sequences having the maximum length k. kTail has been later enhanced with Gk-tail to generate Extended Finite State Machines encoding data constraints (Lorenzoli et al. 2008;Mariani et al. 2017). The methods Synoptic (Beschastnikh et al. 2011) and Perfume (Ohmann et al. 2014) also reuse kTail. The former generates more precise models by means of the generation of temporal invariants from logs, which have to be satisfied by the models. The latter, which is an improvement of Synoptic, infers resource-aware models capturing behavioural executions that differ in resource consumption. More recently, Pastore et al. (2017) proposed Tk-tail to support the learning of timed automata.

Key observations and motivations
After having studied the literature, we have firstly observed that few papers tackled Challenge 1 or 2. Groz et al. (2008) proposed to generate a controllable approximation of components through active testing. Unlike our approach, the learning of the components is done in isolation, i.e. there is no detection of components as these are known and studied one after the other. Mariani and Pastore (2008) proposed an automatic detection of failures in log files by means of model learning. Their approach offers the possibility to split the original log file into as many files as components. The latter is distinguished in logs by means of regular expressions, which have to be written by end-users. Once the trace set is segmented by component, the models are generated in isolation. Unlike the LTSs provided by COnfECt, these models do not show how components are synchronised.
CSight ) seems to be the major approach that shares several purposes of COnfECt. Csight infers models of concurrent communicating systems, which communicate through synchronous channels. It is assumed that the channels and components are known. Csight also requires specific trace sets capturing this notion of channels: the trace set is segmented with one subset (called process trace set) by component. The exchanged messages are observable and composed of input and output events. Csight has five main stages : (1) log parsing and mining of invariants that must hold in the models; (2) generation of a concrete FSM that captures the functioning of the whole system by recomposing the traces of the different components; (3) generation of a more concise abstract FSM; (4) model refinement with invariants; (5) generation of Communicating FSM (CFSM). The latter shows the synchronizations of the concurrent components by means of the channels and of the input/ outputs, e.g. when the emitter sends an output, the receiver gets an input with the same symbol and vice versa. COnfECt aims at learning models from traces of component-based systems, where the component interactions are hidden. In the paper, we will use as example a connected thermostat integrating several components. With this king of system, the component interaction is not observable. As a consequence, CSight cannot infer a model per component, whereas COnfECt decomposes traces to recover the component behaviours and infer models. Furthermore, we consider that the component number is unknown and traces are not segmented. Hence, the assumptions required by CSight and COnfECt are quite different. But COnfECt needs of other assumptions on the system under learning. In a way, COnfECt targets more the systems having a tightly coupled architecture, whereas CSight seems more to target the systems having a loosely coupled architecture, where components can remain autonomous and allow middleware software to manage communication between them. The models given by CSight should be more precise than those given by COnfECt because CFMS have to be compliant with the behaviours of the traces (thanks to the mining and satisfiability of invariants). At the moment, we do not focus on the satisfiability of mined invariants as this topic has been studied in several papers, e.g. Beschastnikh et al. (2014) and Ohmann et al. (2014). However, this could be implemented in COnfECt in future work. Instead, our approach proposes three strategies to adapt the model generalisation (from a model that is compliant to the behaviours found in traces to a more general model that may call its related components at each of its states). We believe that this notion of generalisation level is important as the original traces may only capture a part of the real behaviours of a system.
Prior to this paper, we laid the first stone of the approach in , in which we proposed to complement Gk-tail for the generation of models of component-based systems. We defined the CEFSM model (Callable Extended Finite State Machine), which is composed of variables and constraints. CEFSMs cannot be composed together though, which reduces their re-usability. Besides, we had not implemented the given algorithms nor evaluated them. We also proposed an overview of this work in . Like in this paper, we considered the LTS model so that we can reuse the LTS theoretical background. We introduced the general functioning of COnfECt and started an evaluation on the component detection. In this paper, we define the correlation coefficient allowing to recognise the call of components in traces. We define the LTS similarity coefficient allowing to provide several LTS synchronisation strategies. Furthermore, we present the algorithms implementing the steps of COnfECt and provide the results of a more thorough evaluation carried out to assess the relevance of the models generated by COnfECt and its efficiency.

Preliminary definitions
COnfECt aims to generate models of component-based systems, where the interactions among components are not observable. The specific notions of communications or channels, which can be found in specific models such as the CFSM, are not required here. Like in Falcone et al. (2011) and van der Bijl et al. (2004), we propose to express the behaviours of atomic components with the well-established labelled transition system (LTS) model. The use of LTS allows to exploit the definitions related to the LTS composition, for instance given by van der Bijl et al. (2004). A composite model, which we denote system of LTSs, is defined with respect to the LTS parallel composition, which synchronises LTSs on their shared actions, called synchronisation actions.
The LTS model is firstly defined in terms of states and transitions labelled by actions, taken from a general action set L, which expresses what happens. τ is a special symbol encoding an internal (unobservable) action; it is common to denote the set L ∪ τ by L τ .

Definition 1 (LTS) A labelled transition system (LTS) is a 4 tuple
We use the generalised transition relation → to represent LTS paths: q a 1 ...a n −−−→ q = def ∃q0 . . . q n , q = q0 a 1 − → q 1 . . . q n−1 a n − → q n = q . The concatenation of two action sequences σ 1 , σ 2 ∈ L * τ is denoted σ 1 .σ 2 . denotes the empty sequence. Finally, we define the runs and traces of a LTS: Definition 2 (Runs and traces) Let L = Q, q0, , → be a LTS. 1. A run q0a 1 ...q k−1 a k q k is an alternate sequence of states and actions such that: is the set of runs found in L. Runs F (L) is the set of runs that end in a state q of F with F ⊆ Q. 2. The trace of a run r = q0a 1 ...q k−1 a k q k , denoted T race(r), is the sequence a 0 ...a k .
The integration of two components C 1 and C 2 modelled with LTSs is often defined in the literature by two operations. The first one is the parallel composition of C 1 and C 2 denoted C 1 C 2 , which synchronises their synchronisation actions. This composition is often followed by the hiding of the communications between C 1 and C 2 to express that only the communications with the environment are observable. This operation is defined by the relation hide S in C 1 C 2 with S the set of synchronisation actions. We refer to van der Bijl et al. (2004) for the definitions of theses two LTS operators. This principle of LTS composition leads to a model called system of LTSs, which describes a component-based system: Definition 3 (System of LTSs) A system of LTSs SC is the couple S, C with C = {C 1 , . . . , C n } a non-empty set of LTSs, and S a set of synchronisation actions.
T races(SC) denotes the trace set T races((hide S in (C 1 C 2 · · · C n )).

COnfECt overview
COnfECt (COrrelate Extract Compose) firstly aims at answering to challenge 1: how to infer a system of LTSs SC from the traces of SUL, in such a way that SC captures the behaviours of the SUL components and their synchronisations? Initially, COnfECt requires a set of raw messages collected from SUL. The latter can be non-deterministic, uncontrollable or can have cycles among its internal states. However, to answer to this research problem, we assume that SUL obeys certain restrictions: -H1: SUL as a black box. SUL is a black box including components from which only communications with the environment can be observed. The interactions among the components are not observable. -H2: Synchronous execution. SUL has components whose behaviours are not carried out in parallel. One component is executed at a time from its initial state to one of its final states. Furthermore, we assume that messages include timestamps. -H3: Single root component. We suppose that the traces of T races(SUL) capture the behaviours of one first component calling other components.
Regarding the assumptions H1-H3, we assume that components follow a procedural behaviour: a component C1 calls another component C2 and waits for the end of its execution. C2 starts its execution at its initial state and does some actions. Once the execution of C2 has completed, C1 proceeds its execution. These assumptions are considered to lay the foundations of a component detection algorithm from traces. These are discussed in Section 7.5. Relaxing these assumptions remains as part of future work. COnfECt has three main successive steps illustrated in Fig. 1. The first step takes raw messages given by monitoring tools or found in log files, and transforms them into formatted traces. The second step, called Trace Analysis & Extraction, tries to detect component behaviours in T races(SUL) Fig. 1 The COnfECt approach overview and partitions it into a set of trace sets called ST races. Each trace set of ST races captures some behaviours of one component. The last step, called LTS Synchronisation, takes ST races and starts with the generation of one LTS for each trace set of ST races. This step also proposes three LTS synchronisation strategies to generate a system of LTSs SC. Figure 3 illustrates an overview of these steps with an example of the system, which is a connected thermostat device from which HTTP traces may be collected. The user gives as inputs: a file of messages, regular expressions to build traces, 2 factors (f f ) allowing to express similarities among actions and components, two thresholds for these factors and a LTS synchronisation strategy. Figure 2 lists 4 HTTP messages collected from the device taken as example. COnfECt starts by parsing these messages with regular expressions to produce the set T races(SUL). An example of regular expression is also given in Fig. 2. It extracts parameters and a label, which shows what happens. Figure 3 gives, in T races(SUL), a complete formatted trace, composed of 16 actions, which was extracted by means of 4 regular expressions that assign the called URL and the HTTP responses to labels, and keep some data, e.g. the temperature with the parameter svalue. The second step of COnfECt tries to identify distinctive component behaviours in the traces of T races(SUL). It processes traces and computes a correlation coefficient with the factor f , which assesses the degree of correlation of successive actions in a trace. Intuitively, when two successive action sub-sequences are not correlated, then we consider that they come from two distinctive components. We propose several means to define the correlation coefficient in Section 5. Let us suppose that the trace of Fig. 3 has been analysed and that COnfECt has detected the 3 sub-sequences in bold. This means that a first component has produced the first action of the trace. Then, a second component has been invoked. The  The COnfECt approach overview latter has produced the two actions in bold (/json.htm and Response) and has terminated its execution. The first component has proceeded its execution and so forth. These special sequences in bold are extracted and replaced by the synchronisation actions call and return, which express that a component has been invoked. The extracted sub-sequences are placed into new trace sets. In our example, 4 trace sets T1-T4 are built. At the end of this step, COnfECt returns a set ST races composed of these trace sets.
In the beginning of the third step (STEP 3A LTS Generation in the figure), every set of STraces is transformed into a LTS by converting traces into LTS paths, which are then joined on the initial state only. Together, these LTSs form an initial system of LTSs. In our example, as we have 4 trace sets, we obtain a system of 4 LTSs. These LTSs include synchronisation actions starting with call and return. Given two LTSs C 1 and C 2 , when the transition q call C 2 − −−− → q of the LTS C 1 is fired, we say that C 1 calls the LTS C 2 . This action means that the current execution is being paused while another LTS C 2 starts its execution at its initial state. When the transition q call C 2 − −−− → q of the LTS C 2 is fired, we say that C 2 is called.
The execution of C 2 ends once the transition q fired. Now, COnfECt proposes three strategies for synchronising these LTSs together. The main purpose of these strategies is to address challenge 2, which stems from the fact that the traces collected from SUL usually do not capture all of its valid behaviours. The strategies make possible the generation of models that accept more behaviours by adapting the component integration differently. The first strategy called strict synchronisation only reduces the LTSs by applying the kTail approach to merge equivalent states. The second strategy, called weak synchronisation, tries to detect the components that have similar behaviours by means of a LTS similarity coefficient, defined by the factor f . In addition, the transition are replaced by loops to allow repetitive component calls. Figure 3 illustrates the use of this strategy. The LTSs of the components C2-C4 are detected as similar and are joined. This gives the new LTS C234. Then, the LTS C1 is modified to allow repetitive component calls (STEP 3B). Finally, kTail is applied on the resulting LTSs to merge equivalent states. Several states are merged in this example, e.g., q3, q4 and q5 (STEP 3C). These final LTSs capture more behaviours than those given by the strict strategy. The last strategy, called strong synchronisation, provides even more general LTSs by returning systems of LTSs such that every LTS allows the invocation of its components at any of its states. In the example of Fig. 3, the LTS C1 calls the LTSs C2-C4. Therefore, the strong strategy adds transition loops of the form q 1 The LTSs C2-C4 are unmodified as they do not call other LTSs.
These steps are detailed in the next sections.

Step 1: Trace formatting
COnfECt takes raw messages that are totally ordered by means of their timestamps. These messages are firstly parsed and analysed with regular expressions to retrieve the actions performed by SUL and their related data. We consider that these expressions transform a message into an action of the form a(α) with a a label and α an assignment of some parameters. For example, the action swith(id := 115, cmd := on) is made up of the label "switch" followed by the assignment of two parameters. These regular expressions may also be used to filter out irrelevant messages. A manual analysis of the messages is often required by endusers to derive regular expressions. Although this task may be carried out with little effort on messages collected from small systems, it is known that this may become impractical when SUL is large or complex. Several works addressed this problem (Mariani and Pastore 2008;Fu et al. 2009;Makanju et al. 2012;Vaarandi and Pihelgas 2015;Messaoudi et al. 2018;Zhu et al. 2018) and proposed approaches and tools that automatically mine patterns from log files. These patterns may be used to quickly derive regular expressions.
Then, COnfECt proposes four ways to split a list of actions into traces: by requesting a trace identifier, by inspecting timestamps or by applying the two ordering combinations of these two options. The first mode, proposed by several model learning approaches, combines actions having the same identifier into the same execution trace. The second mode analyses the timestamps of every pair of successive actions and computes means of time intervals. Then, it searches for gaps (distinctive longer durations), which are usually observed when an execution trace ends and another one begins. The detection of these gaps is used for the trace recognition and extraction.
At the end of this steps, we assume having a trace set denoted T races(SUL), which gathers traces of the form a 1 (α 1 )...a k (α k ).

Step 2: Trace analysis and extraction
This step identifies component behaviours in the traces of T races(SUL); it splits them and returns a set ST races = {T 1 , . . . , T n } such that a trace set T of ST races includes traces of one component. Algorithm 1, which implements this steps, is mostly based on two procedures. The procedure I nspect covers the traces of T races(SUL) and segments them into sub-sequences. These sequences are extracted and placed into new trace sets in ST races by the procedure Extract. The trace sets of ST races will produce LTSs. These procedures are explained below.

Trace analysis (procedure Inspect)
The fundamental idea of COnfECt is that a component should be recognisable by its behaviour in comparison with the behaviours of the other components. We hence cover the traces of SUL with a correlation coefficient, which helps recognise differen t component behaviours. This coefficient evaluates the correlation of action sequences in the traces of T races(SUL), i.e. the degree to which successive actions are related according to all the traces of T races(SUL). We want a flexible coefficient, which could be adapted in accordance to the sort of system under learning and to the knowledge we have about this system. We define the correlation coefficient between two actions by means of a utility function, which involves a weighting process for representing user priorities and preferences. We have chosen the technique simple additive weighting (SAW) (Yoon and Hwang 1995), which allows the interpretation of these preferences with weights: Definition 4 (Correlation coefficient) Let a 1 (α 1 ), a 2 (α 2 ) ∈ L and f 1 , . . . f k be correlation factors. Corr(a 1 (α 1 ), a 2 (α 2 )) is a utility function, defined as 0 ≤ Corr(a 1 (α 1 ), The factors can be general or established with regard to a specific context, e.g. network systems and web applications. We give below two factor examples: the assignment in α of the parameters that identify every component. Otherwise, f 1 (a 1 (α 1 ), a 2 (α 2 ))= 0. When this factor is used, it is assumed that components are identified with a parameter set and that this set is known and given.
freq(a 1 ) , freq(a 1 a 2 ) freq(a 2 ) ) with freq(a 1 a 2 ) the frequency of having the two labels a 1 , a 2 one after the other in T races(SUL) and freq(a 1 ) the frequency of having the label a 1 . This factor used in text mining computes the frequency of the term a 1 a 2 in T races(SUL) over a 1 and over a 2 to avoid the bias of getting a low factor when a 1 is greatly encountered (resp. a 2 ).
The first factor requires some knowledge about SUL, while the second one is more general. Other factors could also be defined. The factor choice or definition should be addressed by an expert of SUL. If he/she has a good knowledge about it, he/she can choose the most appropriate factor allowing the component detection in a precise manner. In contrast, if no information about SUL is known, we recommend the factor f 2 . This factor choice may be seen as a disadvantage of the approach. This is discussed in Section 7.5. Other factors might be defined with regard to the action syntax. For instance, string similarities could be used as factors to correlate actions on their common characters. We refer to Cohen et al. (2003) for the presentation and definition of some of them.
From this correlation coefficient, we define two relations to express the notion of strong correlation of actions and action sequences. We say that strong-corr (σ 1 ) holds when σ 1 has successive actions that strongly correlate. We also define the weak correlation of two action sequences. σ 1 weak − corr σ 2 holds when the last event of σ 1 does not strongly correlate with the first one of σ 2 . In data and text mining, these notions often depend on the considered context, this is why we use a threshold X in the definition given below. This threshold takes a value between 0 and 1, and needs to be appraised by an expert, for instance after some iterative attempts.

strong-corr
The trace analysis is performed with the procedure I nspect given in Algorithm 2, which covers every trace σ of T races(SUL) and potentially segments σ into (sub-)sequences such that each sequence σ 1 has a strong correlation and has a weak correlation with the next sequence σ 2 . We consider that these distinctive sequences σ 1 σ 2 express the behaviour of two components, a component produces σ 1 and calls a second component, which produces σ 2 . In Fig. 3 (STEP 2 Trace Analysis), 3 distinctive sub-sequences have been detected within the trace by means of the factor f 2 . We consider that these sequences reflect the behaviours of other components that produce their own actions among the actions of a first component, which invokes them.

Trace extraction (procedure Extract)
The procedure takes the traces of T races(SUL) and extracts the sub-sequences detected previously. Intuitively, the procedure splits two successive sequences that have a weak correlation and adds synchronisation actions of the form call C i and return C i to model component calls, with C i referring to a future LTS.
The procedure Extract (σ, T c , ST races) is given in Algorithm 2. It takes a trace σ , splits it and stores the resulting trace into a set T c . Given a sequence σ i of the trace σ = σ 1 . . . σ k , the procedure Extract tries to find the next sequence σ j such that strongcorr(σ i .σ j ) holds. The sequence σ = σ i+1 . . . σ j −1 (or σ = σ i+1 . . . σ k when σ j is not found) is extracted as it exposes the behaviour of other components that are called by the current one. If this sequence, σ is composed of only one sub-sequence, then it is added to a new trace set T n of ST races. Otherwise, the procedure Extract is recursively called with Extract (σ , T n , ST races). In σ , the sequence σ is removed and replaced by the actions call C n .return C n . After the covering of every sub-sequence of σ , the procedure Extract eventually checks whether σ needs to be completed to express that this sequence was produced by a component called by another one: if T c is not equal to T 1 , then the trace σ is surrounded with call C c and return C c to express that σ stems from a component that was previously called by another one. Otherwise, the sequence σ remains unchanged.
Let us illustrate the functioning of the procedure Extract with the example of Fig. 4a, which takes back the trace of Fig. 3. This trace was segmented into seven sequences, which are weakly correlated. We start at σ 1 = /devices() (i:=1). The next sequence σ 2 = /j son.htm.Response expresses the behaviours of another component. But the algorithm has to detect when the component invocation ends. To do so, it looks for the next sequence in the trace that is strongly correlated with σ 1 . This sequence represents the resuming of the first component execution after the invocation of another component. In our example, this next sequence is σ 3 = Response, which represents the receipt of a response after the action /devices(). The sequence σ 2 is extracted and replaced by the actions call C2 return C2. The procedure is not recursively called as σ 2 is not composed of several weakly correlated action sequences. The sequence σ 2 is now surrounded with the actions call C2 and return C2 to prepare the LTS synchronisation. The resulting sequence is added to the new trace set T 2 . We go back to the trace σ at the sub-sequence σ 3 (i:=3). The same process is applied on σ 4 and later on σ 6 until the algorithm reaches the end of the sequence σ (with i:=7). The trace σ becomes σ 1 .call C2 return C2.σ 3 .call C 3 return C3.σ 5 .call C 4 return C4.σ 7 . The trace σ comes from T races(SUL), which means that σ captures the behaviour of the root component (assumption H3) that has not been called by another component. Hence, at the end of the procedure, this trace is not surrounded by synchronisation actions. σ is placed into the trace set T 1 . At the end of the procedure, we have recovered the hierarchical structure of components depicted in Fig. 4b. And we get four trace sets, gathered into the set STraces given in Fig. 3.
Once the procedure Extract terminates, Algorithm 1 yields the set Straces = {T 1 , T 2 , . . . , T n } with T 2 , . . . , T n some sets including one action sequence and T 1 a set of modified traces originating from T races(SUL). Fig. 4 a, b Sequence extraction example

Step 3: LTS synchronisation
This step lifts the traces of ST races to the level of LTSs and proposes three LTS synchronisation strategies, which provide systems of LTSs having different levels of generalisation.
Given the trace set T ∈ ST races, a trace σ = a 1 . . . a k of T is transformed into the LTS path q0 a 1 ...a k −−−→ q k such that the states q 1 . . . q k are new states. These paths are joined by a disjoint union on the state q0 to build a LTS having a tree form: Definition 6 (LTS generation) Let T = {σ 1 , . . . , σ m } be a trace set. C = Q, q0, , → is the LTS derived from T where -q0 is the initial state.
-Q, , → are defined by the following rule: Once every trace set of ST races is transformed into a LTS, we have a first system of LTSs SC = S, C with C the set of LTSs derived from ST races and S the set of synchronisation actions of the form call C i and return C i , found in the action sets of the LTSs.

The previous step of COnfECt has segmented and extracted the traces of T races(SUL)
in such a way that they include synchronisation actions. These actions were added to prepare the synchronisation of components with LTSs.
We now propose three strategies, which adapt the transitions labelled by synchronised actions to answer to challenge 2. These are implemented in Algorithm 3.

Strict synchronisation
Algorithm 1 has previously segmented every trace of T races(SUL) into sub-sequences of actions. When a sub-sequence is extracted, it is placed into a new trace set in ST races and replaced by the actions call C i .return C i . The LTSs of SC, derived from ST races, do not repetitively call other LTSs and are composed of acyclic paths only. We call this LTS configuration, strict synchronisation. This strategy, which is mostly and implicitly implemented in Algorithm 1, eventually calls the kTail algorithm to merge the similar states found in the LTSs of SC. This strategy limits over-generalisation, i.e. the fact of generating models expressing more behaviours than those given in the initial trace set T races(SUL). This is more formally captured by the following proposition, which postulates that, before calling kTail, the traces of SC leading to final states are the traces of T races(SUL).
Proposition 1 Let SC = S, C be a system of LTSs achieved with the strict synchronisation strategy (before the call of kTail), with C = {C 1 , . . . , C n }. QF is the set of final states of the LTS C 1 C 2 · · · C n . T races QF (SC) = T races(SUL).

Weak synchronisation
This strategy aims at reducing the number of LTSs and allows repetitive component calls. Algorithm 1 may indeed have refined too much T races(SUL); hence, the system of LTSs SC might include several LTSs modelling the functioning of the same component. This strategy attempts to gather these LTSs by means of a LTS Similarity coefficient, which evaluates the similarity of two LTSs. Like the correlation coefficient, the LTS similarity is defined with a utility function and factors to be compatible with different sorts of systems: Definition 7 (LTS similarity coefficient) Let C i = Q i , q0 i , i , → i (i = 1, 2) be two LTSs of the system of LTSs SC = S, C . Let also f 1 , . . . f k be LTS similarity factors. The LTS similarity of C 1 , C 2 is defined as We provide two similarity factors below. The first one refers once again to the component identification, just like the correlation factor f 1 . The second factor measures the similarity of two LTSs with regard to the actions they share.
f 1 (C 1 , C 2 )= 1 iff ∀a 1 (α 1 ), a 2 (α 2 ) ∈ ( C 1 ∪ C 2 ) \ S, I d(α 1 ) = I d(α 2 ), with I d (α) the assignment in α of the parameters that identify every component. Otherwise, f 1 (C 1 , C 2 ) = 0. This implies that two similar LTSs must have actions including the same component identification. The factor is not applied on the synchronised actions of S, which were added by the previous step of COnfECt.
f 2 (C 1 , C 2 ) = Overlap( C 1 \S, C 2 \S), with the overlap of two sets A and B defined by |A ∩ B|/min(|A|, |B|). Several general similarity coefficients are available in the literature for comparing the similarity and diversity of sets, e.g. the coefficients Jaccard or SMC (Tan et al. 2005). We have chosen the Overlap coefficient because the action sets of two LTSs may have different sizes.
The weak synchronisation strategy is implemented in Algorithm 3 lines (3-16). It computes the LTS similarity of every pair of LTSs of SC. The similar LTSs are then grouped by means of a clustering technique, which uses the LTS similarity coefficients. The LTSs of the same cluster are joined with a disjoint union. Furthermore, the labels of the transitions q 1 call C − −− → q 2 , q 1 return C −−−−→ q 2 are updated accordingly so that the correct LTSs are being called (Algorithm 3 lines (11-14)). In addition, every sequence q 1 q 2 ) by merging both states q 1 and q 2 .

Strong synchronisation
This strategy aims at providing more general models than the weak strategy, by assuming that a component C 1 , which requests services from other components, may repetitively call them at any of its states. We denote R the set of LTSs modelling components that are invoked by C 1 . We define that C 1 is callable-complete when C 1 may call any LTS C 2 of R at any of its states: Definition 8 (Callable-complete LTS) Let SC = S, C be a system of LTSs and C 1 = Q 1 , q0 1 , 1 , → 1 ∈ C. R stands for the set of LTSs sharing synchronised actions with C 1 .
The strategy is implemented in Algorithm 3 lines (3-21). As with the weak synchronisation strategy, the similar LTSs of SC are assembled into bigger LTSs and the transitions labelled by synchronisation actions are updated accordingly. Additionally, every state q of the LTSs is completed with new outgoing transitions of the form q call C return C − −−−−−−−→ q so that the LTSs of SC become callable-complete.
The weak and strong synchronisation strategies produce more general systems of LTSs than the first strategy. This is captured by this proposition: Proposition 2 Let SC = S, C be a system of LTSs achieved with the weak or strong synchronisation strategy (before the call of kTail), with C = {C 1 , . . . , C n }. QF is the set of final states of C 1 C 2 · · · C n . T

races QF (SC) ⊃ T races(SUL).
For the three strategies, the LTSs of SC = S, C may include equivalent states, which should be joined to generate more concise models. As stated previously, we use the kTail approach, which merges the states that share the same k-future. We use k = 2 as recommended by Lorenzoli et al. (2008) and Lo et al. (2012). Figure 3 illustrates the use of the weak strategy. Each trace set of ST races is firstly transformed into a LTS (STEP 3A). As the trace sets are composed of only one action sequence, we get LTSs having one path. Then, the similar LTSs have to be grouped. To define the LTS similarity coefficient, we choose the factor f 2 . We compute a similarity matrix by means of the LTS similarity coefficient. Figure 5 shows the matrix obtained with the four LTSs of our example. If we set the LTS similarity threshold Y to 0.5, we observe that two classes of similar LTSs emerge in this matrix: (C 1 ) and (C 2 , C 3 , C 4 ). A clustering technique, e.g. Ward's method (Willett 1988), can help automate this grouping of similar LTSs. The similar LTSs are then joined by means of a disjoint union. As we choose the weak synchronisation strategy, the transition sequences of the form q 1 call C m return C m − −−−−−−−−− → q 2 have been replaced with loops in C1. We finally obtain two LTSs (STEP 3B): C 1 expresses the use of the Web interface, C 234 models the component that sends data (temperature, motion detection) to a server. The LTS C 234 holds three equivalent state classes (q3, q4, q5), (q6, q7, q8) and (q9, q10, q11), which are merged with kTail (STEP 3C).

Implementation
Our approach is implemented in Java and is released as open source. The prototype tool consists of two applications. The first step of COnfECt, which performs the trace formatting by means of regular expressions, is implemented in a first tool called TFormat. 1 But, endusers might prefer using their own trace formatting tool like LogParser, which automatically learns event templates from unstructured logs (Zhu et al. 2018).
The second application 2 performs the last steps of the approach. The user gives as inputs a folder containing formatted trace files, the chosen factors, the related thresholds and a LTS synchronisation strategy. For the weak and strong strategies, we use a clustering approach based on Ward's method, which is a well-known agglomerative hierarchical clustering method. In short, the LTS clustering is carried out as follows: (1) each LTS is placed into its own initial cluster and similarity coefficients are computed; (2) the two clusters that have the closest similarity (greater than the given threshold Y ) are merged, similarity coefficients are updated and so forth until there is no more similar cluster. This approach avoids the generation of too large clusters and does not need to pre-specify the cluster number.

Preliminary evaluation
With this implementation of COnfECt, we conducted several experiments in order to evaluate the following criteria: -C1 (component detection): is COnfECt able to detect the correct number of components?
The key contribution of COnfECt is its ability of detecting sub-sequences in traces and to link them to separate components. We studied C1 with a real device that we implemented and whose internal architecture is known. Our knowledge about the system under learning allowed us to study the inferred LTSs and to check whether these do not capture mixing behaviours of several real components. -C2 (relevance of the models): is COnfECt able to infer concise and readable models, which express system behaviours and reject abnormal behaviours? C2 investigates how the inferred models accept valid traces including new traces not used for the model generation and the capability of these models to reject invalid traces. We compare COnfECt to CSight and kTail, as kTail is used as baseline in several papers, e.g. Lo et al. (2012) and Ohmann et al. (2014). -C3 (efficiency/scalability): how long does COnfECt take to generate systems of LTSs?
How does COnfECt scale with the size of the trace set? We study the efficiency and scalability of COnfECt, compared with CSight and kTail.

Empirical setup
For this evaluation, we chose a real system that we implemented to be able to appraise the accuracy of the generated models. The system under learning is the connected thermostat taken as example in the paper, whose source code has been made available. 3 This IoT device controls heating pumps according to external events and integrates 3 components: a sensor manager coordinating 4 physical sensors, a component that updates the internal clock of the device by calling a NTP server, and a Web server allowing the configuration of the device and the reading of data, e.g. the temperature. These components meet the requirements given in Section 4 and can be monitored to collect HTTP traces. This device has been implemented in such a way that each component may be turned on or off without blocking the functioning of the others. This feature is important for this evaluation to derive several different models as the effect that events have on the behaviour of the system depend on the set of components being activated. For instance, if the physical sensors are turned off, then the thermostat will not start heating pumps when the temperature is below a given threshold. We ran this IoT device with several component configurations using 1 to 3 components. The HTTP traces were formatted with our tool TFormat and 10 regular expressions. These traces have the same form as the trace given in Fig. 2. The trace sets are available here. 4 The LTS generation was performed on a desktop computer with 1 Intel(R) CPU i5-6500 @ 3.2GHz and 16GB RAM. We adapted the smallest trace set collected previously to compare COnfECt and CSight. But we were unable to have results with CSight after 5 h of computation, which was our limit for each experiment. We observed that the first steps of CSight were achieved, but the model-checker returned successive time-outs while the model refinement step. We suspect that the model-checker is unable to check the invariant satisfiability on large trace sets. Therefore, to compare CSight and COnfECt, we have taken back two trace sets available with the CSight implementation. The first one was extracted from TCP logs, and the second one from logs of the AlternatingBit protocol.

Factor choice and thresholds assessment
The correlation and LTS similarity coefficients have to be defined by setting factors, weights and thresholds. We took for the experiments the factor combinations f 1 /f 1 and f 2 /f 2 . The factors f 1 /f 1 require that an expert of the system provides the parameters allowing the identification of all the components of SUL. The single thresholds we used with f 1 /f 1 are X = Y = 1, which intuitively means that two action sequences are strongly correlated or that two LTSs are similar if they share the same component identification. We applied this factor combination on the IoT device and on the logs of the TCP and AlternatingBit protocols and obtained 3 configurations (Conf. 10, TCP1, Alt. Bit1) given in Table 1. The factor combination f 2 /f 2 , which is based on the labels found in traces, does not require to have any specific information about the system under learning. But the choice of the thresholds has a strong influence on the accuracy of the models. An expert of the system should assess this accuracy and the thresholds. For the experiments, we applied this protocol: 1. generation of the first models with the default thresholds X ≥ 0.75, Y ≥ 1. 2. analysis of the models generated with the strict strategy. If |Straces| is lower than the expected number of components or if we observe in the traces of Straces some action sequences that seem to belong to several components, then increase the threshold X.
Conversely, decrease X. To find the most appropriate value, take X = 1 or X = 0.1 and follow a bisection method. 3. when the weak or strong synchronisation strategy is chosen, analysis of the generated LTSs. If two LTSs seem to capture the behaviours of the same component, then decrease Y . To find the most appropriate value, take Y = 0.1 and follow a bisection method. We applied this protocol on the IoT device and the two protocols with the configurations Conf. 1 to 9, TCP2 and Alt.Bit2. given in Table 1. Conf. 7 to 9 show the three steps of the protocol.
Finally, we collected and formatted a set of 20 traces (resp. 50) composed of about 50 actions with Conf. 1-6 (resp. Conf. 9,10) and used a set of 10 traces (98 actions) with the AlternatingBit and TCP case studies. Table 1 lists the number of LTSs inferred by COnfECt. For comparison purposes, we also recall the exact number of components for each system configuration.

C1 (Component detection)
The lines Conf. 1-6, 9 and 10 show the results achieved with COnfECt when the thresholds X and Y are correctly set after following the protocol defined in Section 7.1.1. The approach detects a correct number of components whatever the strategy used in Conf. 1 to 3. With Conf. 4-6, 9 and 10, the strict strategy provides too many LTSs because of the second step of COnfECt, which refines the traces too much. But, the weak and strong strategies provide a correct component number because they assemble the similar LTSs together.
Conf. 7 to 9 illustrate the incremental use of COnfECt to detect the appropriate thresholds X and Y . The component detection is false whatever the strategy used in Conf. 7 and 8. In Conf. 7, we observed that the initial traces were too much segmented. We hence decreased the threshold X to 0.6 for the correlation coefficient and reran COnfECt. With Conf. 8, we detected that no similar LTSs were detected and decreased the threshold Y to 0.75. With Conf. 9, COnfECt detects the correct number of components with the two last strategies. Thanks to our knowledge about the SUL implementation, we manually analysed the LTSs built with the configurations and strategies giving a correct number of components. We did not detect any mixture of component behaviours and observed that each LTS expresses the behaviours of a real component.
With the configurations TCP2 and Alt.Bit2, COnfECt cannot detect component behaviours. As the factor f 2 computes term frequencies, its accuracy depends on the trace set size. With these configurations, we observed that the trace sets are too small for calculating relevant frequencies. Either the number of detected components is false (TCP2) or the models are incorrect (Alt.Bit2).
With Conf. 10, TCP1 and Alt.Bit1, the trace segmentation and the LTS similarity are based on the component identification (factors f 1 /f 1 ). With these configurations, the number of components is correctly detected with the weak and strong strategies, without adjusting any threshold like with f 2 /f 2 .
These experiments show that COnfECt answers to challenge 1, but for the factor combination f 2 /f 2 , it is required to have a large trace set and to adjust the factor thresholds. The general functioning of COnfECt is illustrated in Conf. 4-6, 9 and 10: the strict strategy refines the traces and often returns too many LTSs. The two last strategies counterbalance the trace refinement.

C2 (Relevance of the models)
Regarding the results of Table 1, it is worth noting that we infer irrelevant models if the given thresholds do not allow a correct component detection. As stated earlier, the threshold choice difficulty depends on the factors. For instance, the factors f 1 /f 1 can each take two values (0 or 1). In contrast, f 2 /f 2 have to be evaluated with several model generation attempts.

Valid and invalid trace acceptance
We firstly analysed the ability of the generated models in accepting valid and invalid traces. The former are traces collected from the system under learning, which were not used for the model generation. The latter are traces including unexpected actions that should be rejected by models.
As the model quality depends on the approaches, the coefficient thresholds and strategies, we chose to check the trace acceptance on the 30 models produced by kTail, COnfECt and CSight with the configurations Conf. 4-6, 9,10, TCP1 and Alt.Bit1. For COnfECt, we recall that the models generated with these configurations capture the behaviours of two or more components and are built with correct thresholds. Seventy percent of the traces collected in these configurations were used for inferring models, the rest used as valid traces for testing the models.
Then, we automatically generated invalid traces by applying three mutations on the valid ones: random repetitions of actions, inversion of HTTP requests and responses, and modifications of the strongly correlated sequences in traces. This last modification was performed on traces after the second step of COnfECt: for two consecutive sequences σ 1 = a 1 . . . a k σ 2 = a 1 . . . a k , we inverted the two actions a k and a 1 . We produced 120 traces of 40 actions for Conf. 4-6, 9-10 and 30 traces of 20 actions for the configurations TCP1 and Alt.Bit1. Figure 6a illustrates the rates of valid traces accepted by the models. In comparison with kTail, the strict strategy of COnfECt gives models that accept slightly less valid traces. The systems of LTSs obtained with this strategy are indeed less general because of the partitioning of the initial trace set (less states are merged in final models). Whatever the configuration, the models inferred by the two last strategies of COnfECt accept more valid traces than the models of kTail. This increase is a consequence of allowing repetitive component calls in the LTSs. Unsurprisingly, the strong strategy provides the systems of LTSs that accept the highest rates of valid traces (between 90 and 100 %) because these LTSs are callable-complete. For the two last configurations, we observe that 100% of the traces are accepted by the models, whatever the approach used. We suspect here that the initial trace sets are not sufficiently large to observe differences between the approaches. We tried to increase these traces sets, but CSight has not been able to return a model. Figure 6b depicts the rates of invalid traces accepted by the models given by kTail, COnfECt and CSight. We observe that the weak strategy and kTail provide the same rates with Conf. 1 to 10. The strong strategy builds models that accept between 5.5 and 17% of invalid traces. After inspection, these invalid traces are mostly composed of repetitive HTTP requests, which are accepted because the models are callable-complete. CSight provides more correct models than COnfECt with TCP1, but less correct models with Alt.Bit1.
From these experiments, we conclude that COnfECt, with the weak and strong strategies, outperforms kTail and is at least as efficient as CSight. The weak strategy provides models that accept slightly more valid traces than kTail and rejects the same amount of invalid traces as the other approaches. The strong strategy tends to give models that accept more valid traces but also more invalid ones.

Model readability
We evaluated the readability of the models generated by COnfECt, CSight and kTail by measuring the model sizes. The first six columns of Table 2 give the number of states and transitions with these configurations. As expected, we obtain bigger LTSs with COnfECt than the ones inferred with kTail and CSight (except with Conf. 1-3 since SUL has only Fig. 6 a, b Rates of traces accepted by models one component). With the two last configurations, we observe that the models inferred by CSight and kTail are close in size and much more concise than the models of COnfECt. This outcome stems from our algorithm, which completes LTSs with transitions labelled by synchronisation actions. With the strict strategy, the state number is increased by 1520% because many LTSs are built, are not joined later, and few equivalent states are found in these LTSs. We observed here that this strategy returns too much LTSs with large trace sets and should be restricted to small trace sets only. The state number is increased by 50% with the weak strategy and remains the same with the strong one.
The transitions labelled by synchronisation actions help interpret the component combination and are required to later compose LTSs. But, these are not significant if one wants   Table 2 provides, in the last six columns, the number of states and transitions after applying the hide operation to remove the transitions labelled by synchronisation actions. The models generated by COnfECt become more concise than those obtained with kTail. More precisely, the state number is increased by 14% with the weak+hide strategy in comparison with kTail. But the former divides the system behaviours into several smaller LTSs, which are much more readable. The state number is reduced by 40% when using the strong+hide strategy. For instance, the number of states is equal to 41 in Conf. 9, whereas the LTS achieved with kTail has 92 states. The strong+hide strategy builds models whose sizes are close to the sizes of the models inferred by CSight.
We compared the models generated by COnfECt and CSight with the two last configurations to evaluate their differences. The systems of LTSs of COnfECt are usually less readable as they contain additional synchronisation actions. If we apply the strong+hide strategy, both CSight and COnfECt generate models of similar sizes though. The other differences of behaviours mainly come from the functioning of the two approaches that do not target the same kind of systems. For example, Fig. 7a and b illustrate the models pid0 pid1 inferred by CSight for the AlterbatingBit protocol, and Fig. 7c and d show the models C 1 C 2 given by COnfECt. For the first component, the CFSM of CSight is more concise, but it accepts more invalid behaviours as it allows the successive sending of the same bit instead of incrementing it. For the second component, the models pid1 and C 2 have the same size but seem different. The initial state of the model pid1 of CSight only accepts the input m0, whereas C 2 accepts both the actions m0 and m1. This difference comes from these two observations: (1) COnfECt builds more general models with the strong strategy; (2) COnfECt builds models of components that control other components whereas CSight builds models of autonomous components. Here, the component C 1 requests a service to C 2 by sends a first action m0, therefore C 2 will always start by executing the action m0.
These experiments show that the models inferred by our approach are relevant on the condition that the correct coefficient thresholds are given. The three strategies help manage the generalisation level, which relates to challenge 2. We showed that the strong+hide strategy tends to provide readable models that accept the highest ratio of valid traces, with a reasonable ratio of invalid ones. If the user wishes to minimise the over-generalisation problem in models but still wants readable ones, he/she can apply the weak+hide strategy instead.

C3 (efficiency/scalability)
We experimented COnfECt and kTail with the parameters of Conf. 9 and several trace sets containing from 10 to 1000 traces composed of about 50 actions. As stated earlier, we were unable ro run CSight with these traces. Therefore, we also measured the execution times of Csight, kTail and COnfECt with the two traces sets of the TCP and AlternatingBit protocols.
Our implementation of kTail required less than 1 s to generate models. The execution times of COnfECt are illustrated in Fig. 8a-c and given in seconds. In the figures, the curves "Total" represent the complete execution times. These are detailed with the other curves, which depict the execution times of some sub-steps of COnfECt: Trace Analysis & Extraction, LTS clustering and call of kTail. With the strict strategy and trace sets having no more than 100 traces (10, 20, 50, 100), COnfECt builds systems of LTSs in less than 3 s. We observed that the evaluation of the factor f 2 takes most of the time as the action set needs to be scanned with two nested loops. Hence, its is not surprising to observe that the tendency curve confirms that the time complexity is quadratic. The time executions substantially increase with the weak and strong strategies. With 100 traces, the execution time goes up to 28 s. As the curves "LTS clustering" are close to the curves "Total", we can conclude  that the additional time is consumed by the Ward clustering technique, which also has a quadratic complexity. With the traces of the TCP and AlternatingBit protocols, we observed that CSight is significantly slower than COnfECt. The former respectively takes 3552 ms and 14657 ms to build models, whereas COnfECt takes 48 ms and 46 ms. Furthermore, the time-outs we observed on the current CSignt implementation with large trace sets also suggests that CSignt might hardly scale to large systems producing large log files. On the contrary, COnfECt is able to take large trace sets even when we run it on a moderate budget computer. With 50000 actions (1000 traces), the model generation requires around 50 min, which remains a reasonable execution time.
Concerning the memory consumption, these experiments required less than 16 Go of memory. If the trace set exceeds 70000 actions, more memory is required. We observed that the space complexity remains linear w.r.t. the trace number.
These results suggest that COnfECt can handle large trace sets and infer models in reasonable time. As the execution time of COnfECt follows a quadratic curve, it is however difficult to claim that it scales well. But the current implementation of COnfECt is absolutely not optimised: the algorithm Trace Analysis & Extraction could be parallelised. The Ward clustering technique could also be replaced by another algorithm having a lesser complexity.

Threats to validity
There are many application and system contexts, but this preliminary experimental evaluation is only applied on two protocols and an IoT device, initialised with different configurations. This is a threat to external validity, in the sense that the results about the component detection and the model accuracy cannot be generalised to all software systems. This is why the experiments deliberately avoid drawing any general conclusion. We chose to mainly concentrate our experimentations on one system that we implemented to be able to appraise the capability of COnfECt of returning correct models. This threat is somewhat mitigated by the fact that we used HTTP traces as inputs, which can be collected from numerous web applications. In addition, one of the components of the IoT device is a small Web server running a classical website. We hence believe that our tool can be easily generalised to web applications. But, it is manifest that more experimentations are required, on further kinds of systems.
The generalisation of our approach is also restricted by the three hypotheses H1 to H3. In H1, we chose to consider that the internal calls among components are removed within the traces. However, if the synchronisation actions are available in traces, our algorithm may be modified to take them into consideration instead of adding synchronisation actions. We assume that components are not executed in parallel (H2) and that there exists a single root component (H3). With some factors, e.g. f 1 /f 1 , we could update COnfECt to consider systems having several root components calling other components in parallel. This could be done by identifying every component with the factor f 1 and then splitting traces into sub-traces when parallel executions are detected with an analysis of the action timestamps. But, at the moment, this modification depends on the employed factors and cannot be generalised.
There are also several threats to internal validity. Firstly, like all the other model learning approaches using traces, the more the traces, the more complete the models will be. Furthermore, our approach uses similarity factors and thresholds, like the approaches used in machine learning. This kind of approach requires some expertise to choose the right factors and thresholds. In our case, the generation of accurate models appears to be laborious without having any expertise allowing to adjust the component detection. We indeed observed that an expert is necessary either to provide some information about the components (e.g. means to identify components) or to be able to observe wrong behaviours in the models and to follow the threshold choice protocol we listed in Section 7.1. Conversely, if the model learning is supervised by an expert, COnfECt infers relevant models in reasonable time delays.

Conclusion
We have presented COnfECt, a model learning method that generates systems of LTSs from execution traces. A system of LTSs captures the behaviours of components and their synchronisations. COnfECt is made up of several algorithms, themselves based on some machine learning techniques to detect components in traces. Additionally, it proposes three LTS synchronisation strategies, which help manage the model generalisation. Learned models are a good way to ease bug detection. As systems of LTSs show how components behave and are synchronised, we believe that these models offer better readability and comprehensibility than those inferred by classical model learning tools for finding and locating bugs. Here a bug can be more precisely located on a LTS and hence on a specific component.
In future work, we firstly intend to perform more evaluations of COnfECt on several kinds of systems. From the lessons learned through this work, it appears that another immediate line of future work is to reduce the requirements of the approach. COnfECt, which uses machine learning techniques, needs to be supervised by an expert of the system in order to infer correct models. We intend to revise the COnfECt algorithm to better integrate this supervision need. For instance, we could help engineers find the parameter assignments used to identify components. Or we could ask them the expected number of components and find the most appropriate factors and thresholds. Another challenge is to get rid of some hypotheses, e.g. the need to collect traces from components having synchronous interactions.
Several approaches, e.g. Beschastnikh et al. (2011Beschastnikh et al. ( , 2014 and Ohmann et al. (2014), mine temporal invariants from logs to increase the accuracy of the generated models. This technique sounds interesting but cannot be directly applied to COnfECt as we split traces and build several LTSs. We need to study if it is of interest to mine invariants after the trace extraction. A system of LTSs also offers the possibility to derive models having different levels of abstraction, by hiding some components or not. This notion of abstraction sounds interesting and needs more investigations. For instance, bug or security analysis could be focused on some components only with respect to a given risk criterion, while reducing the analysis efforts.

Funding information
Research supported by the French Project VASOC (Auvergne-Rhône-Alpes Region) https://vasoc.limos.fr/ Elliott Blot is a Ph.D. student in Clermont Auvergne University (France) and a fellow of the LIMOS laboratory. He received the Master degree in cryptology and computer security in 2017 from the University of Bordeaux. His research interests concern the security analysis of systems, especially security analysis of IoT systems. His research also addresses the modelling of such system for better security issue detection.