End-to-End Probabilistic Ego-Vehicle Localization Framework

Locating the vehicle in its road is a critical part of any autonomous vehicle system and has been subject to different research topics. In most works presented in the literature, ego-localization is split into three parts: Road level-localization consisting in the road on which the vehicle travels, Lane level localization which is the lane on which the vehicle travels, and Ego lane level localization being the lateral position of the vehicle in the ego-lane. For each part, several researches have been conducted. However, the relationship between the different parts has not been taken into consideration. Through this work, an end-to-end ego-localization framework is introduced with two main novelties. The first one is the proposition of a complete solution that tackles every part of the ego-localization. The second one lies in the information-driven approach used. Indeed, we use prior about the road structure from a digital map in order to reduce the space complexity for the recognition process. Besides, several fusion framework techniques based on Bayesian Network and Hidden Markov Model are elaborated leading to an ego-localization method that is, to a large extent, robust to erroneous sensor data. The robustness of the proposed method is proven on different datasets in varying scenarios.

Advanced Driver Assistance Systems (ADAS) have spread out. The main mission of these systems is to ensure that driver safety is constantly guaranteed. For this purpose, multiple applications have been deployed, such as lane departure warning, lane keeping assist, pedestrian detection, collision avoidance, or lane change assist system. To achieve this mission, the faultless knowledge of the localization of the ego-vehicle with regards to the surrounding environment is necessary.
The ego-localization of a vehicle is a mandatory component for a safe autonomous driving. Direct applications range from the decision-making system [1] regarding the decisions to make in order to keep lane or to change lane, to navigation: including path planning and vehicle control [2]. Ultimately, the lane/road understanding demands in terms of precision and false alarm rate [3] vary from one application to another. Therefore, the ego-localization solution must suit perfectly the localization requirement for each application. The ego-localization task has been widely tackled over the years. The current literature is teeming with solutions that address this issue in a variety of manners. For the road level localization, digital maps (Google, Open-StreetMap (OSM) or Waze) are used to perform this task, GPS receivers are used to retrieve the geographic (latitude, longitude) coordinates, and map-matching procedure is performed in order to match the position of the ego-vehicle with the correct road ('link'). However, the accuracy of the localization obtained is in the order of meters. Indeed, according to the Federal Aviation Administration (FAA) GPS Performance Analysis Report [4], the accuracy of a standard GPS device is within 3m with a 95% confidence, which can not be sufficient for most ADAS that require a more precise localization.
For some applications like lane-keeping, knowing the road on which the vehicle is traveling is not sufficient. These systems must be informed about the position of the host lane in the road to provide the adequate maneuver instruction and maintain the vehicle safety.
Further, autonomous vehicle applications need a more accurate localization, which can be translated by the knowledge of the lateral and longitudinal position of the vehicle in the 2379-8858 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
ego-lane. For instance, overtaking maneuvers need a faultless knowledge of the lateral position of the ego-vehicle with respect to the ego-lane marking in order to decide whether the vehicle should overtake the obstacle or not. The task of vehicle localization is still challenging for an autonomous vehicle, a complete ego-vehicle localization must perform all the three key components described above. Thus, in this work, an end-to-end solution for ego-localization from Road level localization to Ego-lane level localization is presented.

II. RELATED WORK
Due to the vastity of this topic, this section will discuss the main vehicle localization techniques relevant to our presented work. To do so, the current literature is divided into three distinct topics: Localization on a map, Localization on a road and Localization on a lane.

A. Localization on a Map
For Autonomous Vehicles, map-matching (MM) algorithms are a critical piece on any localization system. In essence, MM algorithms integrate the geographical position from a GPS receiver with the spatial road network to identify the correct road on which the vehicle is traveling. In [5] a comprehensive literature survey of common MM methods is presented. The basic method combines an inertial measurement unit (IMU) with a classic GNSS receiver (e.g., GPS, Galileo) into a Kalman Filter to choose the correct road [6], [7]. In the same manner, authors in [8] propose a probabilistic MM that takes into account both the spatial (geographical coordinates) and temporal information (speed and time) to determine the likelihood road on which the ego-vehicle is traveling. However, in more complicated cases, such as urban scenarios, multi-path GPS signals lead to a lack of accuracy. In this case, the use of topological and geographical information enhances the proposed algorithm. Towards this end, many studies proposed a Hidden Markov Model (HMM) to take into account the measurement noise [9]. The authors in [10] introduce a HMM to robustify the proposed MM algorithm. The HMM incorporates the topological and the geographical road network information in the transition state to ensure the temporal connectivity between links. A more recent approach presented in [11] utilizes the HMM for the MM task. In essence, the two approaches are similar, except for the HMM modeling. Indeed, in [11] the transition from one 'link' or segment to another must obey some rules, these rules include the dynamics of the vehicles and the legal and logical connectivity of the road map. In a more specific way, the latest developments use the OpenStreetMap (OSM) database to perform the MM. In our previous work [12], a multi-criteria map-matching algorithm based on multiple probabilistic criteria has been introduced. Nevertheless, the road map topology has not been properly operated in the MM process, which consists of one of the contributions of this paper.

B. Localization on a Road
Knowing the position of the ego-lane is still a challenging task for any localization system and subject-matter of research. Indeed, lane-level localization has been widely discussed in recent literature. According to [13], ego-lane localization can be performed in two different manners: model-driven and deep leaning approaches.
1) Model-Driven Approaches: The current literature abounds with ego-lane recognition techniques based on model-driven approaches. In these approaches, road level features are extracted from images via feature extraction, mainly two features (edge and color) are used. For the first feature, filters are used to extract lanes edges, then the resulted outputs are fed into Hough Transforms [14] or Ransac methods [15] to detect lanes in the road. For the color features, it exploits the primary color or the direction of lane components (arrow, zebra marking). Once these features extracted, they are fed into a high-level fusion framework. As an example, in [16] ego-lane recognition is achieved from multiple-lanes detection. Adjacent lanes are first hypothesized assuming the same curvature and lane width in the road, then tested using a video-based system. This approach shows interesting results. However the occlusion of lanes marking by obstacle vehicles can not be explained and will jeopardize the output of the ego-lane. More high level features systems fuse visual features from the road scene: surrounding vehicles [17], lanes marking [18], [19], road clues such as arrows marking [20], lane marking colors [21], lane marking and adjacent vehicles [22]. These visual cues are fed into a fusion framework: Bayesian filter (BF) [17], Bayesian Network (BN) [18], [20], [21], Hidden Markov Model (HMM) [19] or combination of BN and HMM [22]. A worth mentioning work is the one presented in [23] in which a lane-level accurate map is used in order to match the correct localization of the ego-vehicle. The map contains center lines of every lane of the road in addition to the lanes marking. Therefore to know the lane on which the vehicle is travailing a map matching procedure is presented. The presented method deals with the ambiguities encountered in choosing the right 'link'. Nevertheless, this method is tributary to the accurate map. In contrast, we present a method for lane level localization that relies only on a camera and a coarse map (OSM).
2) Deep Learning Approaches: Convolutional Neural Networks (CNNs) have also been widely adopted for lane level localization. These techniques do not take any prior about the surrounding environment, which makes these unrelated to the type of road-scenario (e.g., highway, urban). However, a learning phase is needed to set the weights of the used network.
In [24] local context of the scene is used to consider occlusion to determine the end point of the local lane segment, ego-lane is deduced once lane segments are extracted. Another research [25] uses a combination of CNN and RNN to detect lane boundaries. Similar to this approach a CNN is trained in an end-to-end way to estimate the ego-lane [26]. In [13] an end-to-end ego lane estimation is presented using a deep learning network SegNet. For all the mentioned methods before, no road prior is exploited. However, a learning stage is needed in all cases, which may require a large database to perform it.

C. Localization Within a Lane
In order to guarantee the ego-lane level localization, several data fusion methods have been investigated. The first one uses a GPS receiver with a digital map to locate the ego-vehicle in the lane. The lack of accuracy provided by a classic GPS that can be caused by poor satellite signals, high degree dilution of precision, or multi-path in urban scenes, is first compensated with proprioceptive sensors, such as Inertial Measurement Unit (IMU). These methods are well known as Dead Reckoning [27]. Nevertheless, the accuracy provided by this kind of method is not sufficient for autonomous navigation, where centimetric accuracy is needed. Furthermore, dead reckoning methods suffer from integration errors that tend unbounded in time.
Leafing through the literature, it appears that most researchers use lane marking detection from a camera to provide an accurate ego-lane level localization. There are two main advantages of using the camera: currently, this sensor is the cheapest and most versatile modality for automotive applications and it provides dense information of the environment. Common approaches are based on a two-step process [3]. First, a road marking feature detection [28], then a fitting procedure [29] is used. This fitting procedure can be done with a polynomial fitting model based on the road geometry design [30] or non-parametric model [31]. The above-mentioned approaches suffer from the camera's weaknesses: high luminosity variation and occlusion issues.
In an attempt to provide a more robust and more accurate localization, the trend for manufacturers and researchers is to equip the ego-vehicle with multiple sensors such as Velodyne lidars, multiples cameras, IMUs and high defined digital map that contains the precise location of high features such as the lane marking or landmarks [32]. In that way, authors in [33] used two lateral cameras combined with an IMU and a GPS receiver to accurately determine the lateral distance of the ego-vehicle with respect to the lane marking. To do so, a vision-based ego-lane marking detector was used [34] and fused with the IMU sensors through an Extended Kalman Filter (EKF) to locate the egovehicle. Furthermore, a sub-decimeter accuracy was achieved by adding lanes marking from a created map. Yet again, an EKF was introduced for the fusion stage between the lane marking extracted from the map and the marking obtained through the vision process. In order to bound the integration errors coming from dead reckoning, authors in [35] present a localization system based on the fusion of multi-sensor and a high definition map. The integrity of the localization is insured by a Student's distribution that replaces the classical Gaussian distribution used in fusion filter like Kalman filters in an Informational Filter (IF). Close to our work, an atypical approach presented in [36] relies on the use of road priors and contextual information for road detection (roadway detection). OSM map is used to create the road backbone depending on the lanes number and the width of the lane. This road shape is then projected on the image taking into account the uncertainties related to the ego-vehicle pose. The result is used as prior to roadway detection. The road shape is then delimited by estimating the vanishing point and the lane marking. Naturally, the mentioned approaches are just a small part of the ego-lane localization techniques, an interesting survey is presented in [3] for more detailed approaches.
To the best of the authors' knowledge, there is no end-to-end localization solution proposed that tackles every aspect of the ego-localization task. Thus, in this paper, we present a complete solution for the overall localization problem.
Unlike existent work, this contribution distinguishes by the information-driven approach. Indeed, we exploit the available data from the OSM map in order to extract prior about the road geometry. This strategy allows us to focalize the region of interest in the recognition procedure and ultimately reduce the computation complexity. In addition, we propose a probabilistic model that will ensure the geometric coherency between lanes marking. Finally, the main contributions can be summarized as follows: 1) An enhanced map matching method based on OSM datasets and a Hidden Markov Model 2) An ego-lane detection based on a prior OSM map and a vision based recursive recognition ego-lane marking in an information-driven fashion. 3) Once the ego lane marking is detected and the lateral distance between the ego-vehicle and the lane marking estimated, we proceed to the ego-lane determination (the postion of the ego-lane in the road) using a combination of a Bayesian Network and a Hidden Markov Model. We perform the complete localization steps in a sequential fashion. Thereby, the robustness of the proposed approach is highly improved. In addition to that, the split between the different parts of the localization allows the use of any part of the complete solution individually.

III. OVERALL LOCALIZATION ALGORITHM
In this work, we present an end-to-end algorithm for ego localization as highlighted on Fig. 1. The algorithm is intentionally divided into three modules: Road-level localization (RLL), Egolane level localization (ELL) and Lane level localization (LLL). This split allows us to be in phase with the current literature. Moreover, the pipeline of the algorithm allows to have modular algorithms that can be changed in the future without changing the entire algorithm architecture. In the following, we will first introduce our map-matching module based on OSM datasets, then we will use this map as a prior for our ego-lane marking detection in order to locate the vehicle within its lane. Once the ego-lane level localization is performed, we will proceed to the identification of the ego-lane in the road.

A. Road-Level Localization (RLL)
In this section, we present our road-level localization algorithm using OSM datasets. As shown in Fig. 2 the proposed module is an upgrade of our work presented in [12]. Indeed, an HMM is added to robustify the proposed Map-Matching. Therefore, the algorithm is divided into two different stages: r Discrimination stage elimination of incongruent links based on distance, orientation difference between vehicle steering, and links heading. Additionally, a third factor was implemented, it is based on the maximum speed allowed on the road.  r Selection stage in the case where the remaining candidate links are greater than one, we proceed into a selection procedure based on multiple probabilistic criteria to eliminate the ambiguity. 1) Discrimination Stage: Before starting with this stage, we must present the OSM data. OSM data are composed of three keys components: Nodes, Ways, and Relations [37]. Nodes are the geometrical elements that represent GPS points. Ways are an ordered list of nodes that represents roads network, every Way has a number of tags, the latter provide information about the characteristics of the Way (number of lanes, limitation speed...). In addition, each Way is composed of a set of segments. In other words, belonging to a segment is equivalent to belonging to an OSM Way. So the map matching task can be reformulated as matching a GPS point with a segment. In the remaining sections, the words segment and Way are switchable.
Thus, the discrimination stage consists of the exclusion of segments which are not compatible with the current vehicle and the road network topology configuration. To do so, we introduced three discrimination criteria [12]: r The first one is based on the distance between the egovehicle position estimated with the GPS data and the segment.
r The second one is based on the angle difference between the ego-vehicle steering angle and the segment heading (which expresses the traffic flow).
r The third one is based on the speed limit. The assumption made is that the ego-vehicle is respecting the limitation of speed to some extent i.e. the ego speed can not be higher than the limitation plus 40 km h −1 . Once the discrimination stage completed, if the number of candidates segment is higher than one, we proceed to the selection of the correct Way.
2) Selection of the Correct Way: We define the state of ego-vehicle at time step t as x t = (x w , y w , θ w ) T . The selection procedure can be formulated as finding the highest conditional probability of belonging to a Way W i knowing the pose of the ego-vehicle x t at each time t: The main goal of this formulation is to select the correct Way on which the vehicle is traveling. Consequently, a selection criterion must be introduced to determine the correct Way. Based on equation (1) three probabilistic criteria [12] were developed: r C e : Criterion based on Euclidean distance. r C m : Criterion based on Mahalanobis distance. r C p : Criterion based on the probability of belonging to a segment. Depending on the data available, one of these criteria is calculated for each GPS data and for each candidate Way. Furthermore, to take into account the relation between two consecutive frames, a convenient and effective approach is to use a HMM. Indeed, the change in Ways over time is governed by topology constraints that can be embedded in the transition state of the HMM. Thus we enhance the proposed multi-criteria algorithm with an HMM. The HMM in this work is an upgrade of the proposed HMM by [38], [39]. We fitted the HMM to the way selection problem. However, the big difference lies in the probabilistic reasoning. Indeed, we use criteria that represent probability to model the observation probability, which is not the case in [38], [39], where arbitrary functions have been introduced leading to non-intuitive criteria.
By definition a HMM consists of two stochastic processes, the first one is a Markov Chain to model the change of a state vector over time. This change is governed by a probability that describes the transit probability over time, which is called the transition probability. In the HMM, the states of the chain are not visible but observable, for this reason they are called "hidden". The second process is called the observation space, it produces emission of the observation at each time. Although the elements of the state vector are hidden, there is a relation between the hidden elements of the state and the observations, this relation is referred as an emission probability. Fig. 3 describes the proposed HMM for the map-matching task.
So, in order to model the HMM-MM, three components must be defined: State space the state of the system describes the list of the Ways candidates for each observation x t . We will use S t to denote the set of candidates Ways at time t S t = {W 1 , W 2 , ..., W n t } with n t the number of candidates Ways at each time t, S t is a form of categorical distribution, S t ∼ Cat(ξ).
Observation space for each candidate Way composing the state space, an emission probability is made. This emission probability is directly calculated from the multi-criteria algorithm. For each Way candidate the emission probability P e (W i ) is calculated as follows: where C k is the probabilistic criterion used with k ∈ {e, p, m}. Transition probability The transition probability reflects the probability of the transition matrix that a state will move from one state to another. In the map matching procedure, the topology of the network is used in order to determine the transition matrix. Indeed, the vehicle can only move on Ways that are physically connected.
We will refer to P (W t i , W τ j ) as the transition probability from Way W t i to Way W τ j given the state space S t and S τ for time t and τ : Where w i is a criterion calculated for each Way, this criterion obeys some rules defined as follows:    Fig. 4 shows a simplified road network to illustrate the transition matrix probability. These probabilities are shown on Table I. It can be noticed that in this example the number of Ways is the same for two connected states, which is not always the case. Indeed, the number of Ways may vary from two consecutive frames.
The transition probability s tτ ij is used whenever the two Ways are the same. Let P (x w , y w ) be the vector describing the Cartesian coordinates of the ego-vehicle in the Universal Transverse Mercator coordinate system frame (UTM), we note P the projection of the point P on the Way and ψ the Way's heading, the Way W τ i is composed of two nodes n 1 , n 2 as represented on Fig. 5. The criterion s ij is computed as a Gaussian distance to the middle of the segment [n 1 , n 2 ]. Indeed, the probability to change the Way will be higher if P is in the middle and will be lower in the limits, thus it is calculated as follows: With P = t.n 1 n 2 , u 0 = 0 and we take σ 0 = 0.25.
If the Way W i and W j are connected then we introduce a criterion based on [39], this criterion depends on the difference between the vehicle's heading and the Way's heading : with Δθ being the vehicle's heading change over time, Δψ the Way's heading change between the two Ways W i and W j and β a chosen coefficient. Once the Way selected, the number of lanes is extracted from the OSM database.

B. Ego-Lane Level Localization (ELL)
Once the RLL has been performed, the ELL is initiated. We choose to start the ELL before the LLL for several reasons: the first is that the ego-lane marking is, for the most part, the easiest one to detect. The second one is due to the use of the information driven approach. Indeed, when the ELL is completed we have an estimate of the ego-vehicle's lateral position in its own lane. So, knowing the lane's width and the lanes number allows us to interpolate research zones for other lane marking.
To perform the ELL, the road is modeled in a 2D vehicle's frame (x v , y v ) as a cubic polynomial [40], [41]: The parameters c 0 is the curvature's derivative of the road, c 1 the curvature of the road, c 2 the vehicle's heading with respect to the tangent of the road, c 3 the lateral shift of the ego-vehicle with regards to the road model. Once the road has been modeled in the vehicle's frame, the next step is to project this model on the image taking into account the intrinsic and extrinsic parameters of the camera. According to [42] the projection of the road model (left and right lines) in the image frame (u i , v i ) is defined as follows: where e u = f/du, e v = f/dv, f is the focal distance of the camera, du and dv are the width and height of a pixel in the image, Z 0 is the height of the camera, x 0 the lateral distance of the ego-vehicle with respect to the ego-lane marking, the ± sign indicates whether the ego-marking is right (+) or left (−), α is the camera tilt angle and L w is the road width. i = 1, ..., n Roi with n Roi the number of Region Of Interest (ROI) for each lane marking. As mentioned in [12], the OSM map does not provide information about the accuracy of its data. However, it is well known that OSM is a collaborative project in which volunteers provide the geospatial data. If we consider that most of the volunteers have a classic GNSS receiver, the accuracy of the OSM is thus metric. That being said, the polynomial representation of the road will be affected by this unknown accuracy. For the parameters c 0 and c 1 , they represent the shape of the road. Given that we are working mostly with highway roads, their values will not be affected by the inaccurate geospatial accuracy of the OSM. In contrast, the parameter c 2 is more sensitive to inaccurate accuracy of the OSM, and its value will be affected. Hence, to compensate for the inaccuracy of its value, we will define X v = [x 0 , c 2 ] T as the state vector of the road model in the ego-vehicle frame. We associate a covariance matrix to this vector C X v . This matrix expresses the allowed dispersion around the average parameters previously defined by the vector X v . Concerning the parameters c 3 , the value extracted from equation (6) is not used. Indeed, it express the lateral shift of the vehicle to the road. Hence it is not used in equation (7), as we use x 0 , which is the lateral shift regarding the center of the ego-lane. It assumed that the ego-vehicle travel in the center of the ego-lane thus x 0 is around 0.
Using equation ( 7), we can express the probabilistic model (X v , C X v ) into the image space. We will note this probabilistic model as u , C u , with u being the average values for the pixel in the image and C u its corresponding covariance matrix: with: The resulting projection of the equation (7) is shown in Fig. 6. Taking into account the prior about the road geometry allows focalizing the zones of research in a Top-Down process fashion. Indeed, not all the image is used in order to perform the processing task. Consequently, the recognition process is faster, and subject to less noise considering it takes into account only the regions in the image that most likely contains a lane marking.
Once the probabilistic model is defined, we proceed to the recognition phase. This stage is inspired by [42]. For clarity's sake, we will discuss each aspect of this stage as presented in Fig. 7.
i) Initialization in this stage we define the probabilistic model (u , C u ) as already presented in Fig. 6. ii) Selection of the best Region Of Interest (ROI) once the probabilistic model is defined, we proceed to the selection of the most informative ROI in the Image in a Top-Down fashion To do so, we present an informational criterion based on the Shannon entropy [43]. For each ROI, we use an a priori selection based on a entropic criterion H sel : with: For the probabilities, p(D b ), p(D k ), a Bayesian Network (BN) is introduced. The presented nodes are an adaptation of the BN presented in [44]. The scheme of the network is illustrated in Fig. 8. The BN is composed of the following nodes: r X k − the confidence before the detection is attempted. r Z 0 is the chosen landmark (the white strip) is observable in the ROI.
r D k a landmark is detected in the focal zone, r D b the correct landmark has been detected (which manages landmark ambiguity).
r X k + the confidence after the detection is attempted.
This network has two uses, the first one is to calculate the a priori probabilities p(D k ) and p(D b ), the second one is to determine the confidence p(X k + ) obtained after proceeding to a detection. iii) Detection Once the most informative ROI is chosen, we proceed to the detection. The implemented method is based on a row filter as presented in [42]. The idea is to compute the gradient of the image row by row with the aim to find pixels that correspond to the lane marking. Once the patterns have been selected for the complete ROI, a Ransac method is used in order to detect the segment in the ROI. Thus, for each ROI two points p 1 (u u , v u ) and p 2 (u d , v d ) are defined. iv) Update for the state vector x p = (u u , u d ) we associate a covariance matrix C p : With σ 2 u u being the accuracy of the segment detected. In order to update the probabilistic model, we use a Kalman filter: is the model before detection and (u + , C + u ) is the model after detection. v) End after each detection, we compute the corresponding entropic gain H gain : With C u the covariance from the initialization stage. The ego-lane marking has been detected if the following condition is satisfied : (14) with With λ a fixed coefficient (< 1). If the condition is satisfied, it means that enough detections have been attempted successfully, and the obtained probabilistic model (u + , C + u ) is sufficiently precise to consider the ego-lane recognition finished. Otherwise, the ego-lane recognition process carried out, and the second most informative ROI is selected. The end of the marking detection involves the end of the egolane level localization. Indeed, the update of the model u, C u leads to the update of the model X v , C X v . This means that an estimation of the road parameters c 2 , x 0 is performed.

C. Lane Level Localization (LLL)
Once the ego-lane localization is estimated, we have to perform the lane-level localization to correctly choose the right lane on which the vehicle travels. To do so, we proposed in [22] a probabilistic framework that is split into three stages. Thus, the presented LLL algorithm is an extension of our work proposed in [22]. Indeed, the Hidden Markov Model's architecture is the same. The difference lies in the input of the algorithm, as we take the information about the road from OSM. In addition, the parameters of the road, such as the curvature, are also extracted from OSM, which will be useful for interpolating other lanes marking.
In the first stage, we extrapolate adjacent lanes by assuming that lanes in the same road have the same width L w and the variation in the curvature c 1 is very small. The second step consists of a Bayesian Network (BN), that takes as input the results of the hypothesized adjacent lane-marking detection, whether these detections succeed or fail. Furthermore, since the proposed BN is modular, we also use an adjacent vehicle detector based on deep learning. The third step includes a filtering process, using a Hidden Markov Model (HMM).

1) Adjacent Lanes Extrapolation:
The adjacent lanes are extrapolated by taking advantage of the estimated ego-lane localization. Thus, each adjacent lane is described by a probabilistic model {X l , C X l }, where X l contains the parameters (x 0 , c 2 ) T described in Section III-B and C X l refers to the corresponding covariance matrix. The assumption made is that the curvature and lane width stays constant for all the lanes in the same road. Thereby, the value of the vector X l remains the same as x v described in equation (6), only the value of x 0 will be shifted by a (±iL w ), with (−i) indicates that the edge is at the right of the ego-lane and (+i) the left. The number of adjacent lanes extrapolated is equal to the lanes number from OSM. Thus, from a perspective view, the hypothesized lanes are shown in Fig. 9.
As mentioned in the ELL section, the model {X l , C X l } can be transferred to the model image {u l , C u l }. As a consequence, u l represents the horizontal pixel of the edges in the image and C u l its interval confidence. In Fig. 10, the adjacent lane regions of interest resulting from the extrapolation are shown.
For each adjacent lane marking extrapolated, the detection is performed and the results are fed into a Bayesian Network (BN).
2) Bayesian Network for Ego-Lane Determination: The proposed BN is designed to be flexible and modular for other detection results from any type of sensor, i.e. vehicle detector, guardrail detector. In order to show the flexibility of the BN, we will first use only adjacent lanes detection to determine the  The general architecture of the BN used is illustrated in Fig. 11. The described nodes are the following: Depending on the results of the detection, we infer the probability to belong to a lane l i as follows: The lane with the highest probability is chosen to be the lane on which the vehicle travels.
As mentioned before, the proposed BN is aimed to be modular for other detectors. To start, we only use the Adjacent Lanes Detection (ALD) as input into our BN (BN + ALD), where D l i indicates whether the detection of adjacent lane i is successful. After, we add the information about the adjacent vehicles D v i using the Yolo detector, where D v i indicates whether the detection of the adjacent vehicle i is successful. Knowing that, this detection is performed in the regions of the image bounded by the neighboring marking-lanes.
3) HMM for Ego-Lane Determination: The proposed BN in the previous section is applied on a per-frame basis. However, the dynamic relationship between two consecutive frames is not taken into account. Indeed, the BN does not take into consideration the dynamic constraints of the ego-vehicle (i.e. the ego-vehicle can only change lane to an adjacent lane). In order to take into account these constraints, we filter the output of the BN by a Dynamic Bayesian Network which is the Hidden Markov Model (HMM).
We will use L t to denote the set of ego-lane state variables at time t, which depends on the lanes number n lanes and which are assumed to be observable. e t denotes the observable evidence variable. The aim of the filter algorithm is to estimate the probability P (L t+1 |e 1:t+1 ). According to [46], this probability can be formulated as follows: Where η denotes a normalizing constant to make probabilities sum equal to 1. By arranging Equation (16): The probability P (e t+1 |L t+1 ) comes from the observation model. Hence, in this paper from the BN described previously, thus: With regard to the probability P (L t+1 |l t ), it comes from the transition model. It expresses the probability of the ego-lane to change its current state, which is the lane change probability. Finally, the third term P (L t , e 1:t ) expresses the current state distribution. Graphically, we can illustrate the corresponding HMM as in Fig. 12.
With the recursive formulation obtained in Equation (17), we can estimate the current ego-lane state given the observation obtained from the BN, but before we have to compute the lane change probability.  normal distribution with mean μ x 0 and variance σ x 0 : The value μ x 0 and variance σ x 0 are obtained from the estimated probabilistic model (x v , C x v ). Therefore, we predict the lateral position x 0 at time t k+1 as shown on Fig. 13. Accordingly, the lane-change probability is calculated as follows: with P (C r ) the probability to right change lane and P (C l ) the probability to left change lane. Now that we have designed the HMM, we will introduce the real-world experimental results in the following section.

IV. REAL-WORLD EXPERIMENTAL RESULTS
The different parts of the localization algorithm have been introduced in the previous section. In order to prove the effectiveness of the presented algorithms, real-world experiments have been carried out.
To do so, we tested our algorithm on real driving data-sets. The first one was collected in the region of Clermont-Ferrand in France, where we drove our acquisition vehicle on two lanes and three lanes road in a national highway. In order to collect these datasets, the acquisition vehicle was equipped with a front camera, with the addition of an IMU and a classic GPS receiver. The data have been stored on a bag using using ROS. In addition to that, the ego-vehicle provides information about its own odometery. Finally, these datasets will be made available for researchers. 1 In the following, they will be referred to as "2-lanes" for two lanes road and "3-lanes" for three lanes road (1000 images frames).
Furthermore, we wanted to compare the performances of our algorithm to the literature on more challenging scenarios. Naturally, we turned our attention to the KITTI datasets [47]. Although these dataset contain few lane-changing scenarios, we will however use some part of those in order to show the effectiveness of our RLL module. In addition, we tested our algorithm on some datasets referred to in [19]. Unfortunately, all the data were not ready yet. We were able to test on some of them, noted as the A4-Highway Italy, for a total of 9528 frames. The collected datasets were manually annotated in a per-frame basis in order to determine the correct ego-lane classification. In the following, we will refer to these datasets as "4-lanes".
Each part of the ego-localization: RLL, ELL and LLL will be tested on the presented datasets, the results will be discussed in terms of accuracy. In the end, the overall algorithm will be discussed in terms of computation time.
In order to assess the presented RLL algorithm, the correct Way has been manually annotated for each GPS frame. The resulting ground truth was compared with HMM-MM. The results are summarized on Table II. As highlighted, the Map-Matching with the addition of the HMM improves the overall accuracy precision. This can be explained by the inclusion of the topology of the road network in the HMM. Indeed, the transition states are governed by the connectivity between links. Furthermore, we test the reliability of our algorithm in more challenging scenarios. Thus, we used the well-known KITTI [47] database in urban scenarios. The used sequences are the following: "sequence_2001_09_26_0001" and "sequence_2001_09_26_0001". Even if the two sequences 1 shorturl.at/lAR28 were taking in urban areas, the correct map-matching obtained shows the effectiveness and the robustness of our proposed RLL algorithm in varying scenarios. Finally, it can be noticed that for the "4-lanes", the GPS datasets have not been mentioned. Indeed, practically the entirety of the datasets is on the same Highway. Thus, the results on MM would not be relevant.
As discussed in [13], [48], in most of the work presented in the literature, the aim is to find the white strips in the image without seeking for consecutive elements of lane marking. Cnversely, in our work, we are more interested in the curvature of the lane marking than the white strips. However, by doing so, the characterization of the algorithm is considerably more difficult. Some examples are presented in Fig. 14 showing the robustness of the presented ego-lane marking detector on different lighting conditions. Once the ego-lane localization completed, the Lane level localization is performed. So as to point out the increment of each added part, we first, determined the ego-lane using solely the BN with adjacent lanes. Within the second instance, we introduced adjacent vehicle detection in the BN. In all instances, we filtered the outcome of the BN with the HMM. All the results obtained are summarized in Table III. Considering the results, the increment provided by each module is clearly illustrated. Indeed, altogether cases the BN+ALD+AVD provides more accurate classification than the BN+ALD, which suggests that the addition of another information source also will improve the obtained accuracy. After investigation on the inaccurate results, it appears that the false classifications obtained using the BN+ALD are due to two main reasons: either the lanes marking are not detected or the lanes marking are wrongly detected. For the primary case, this can be explained if the lanes marking are missing or hidden by an object. For the second case, it shows the limitation of the used lane marking detector. Moreover, even if the introduction of the adjacent vehicle detection shows excellent results, there are some cases where the vehicles detection are not relevant, for instance, if the detected vehicle is on the nearby road. To overcome this issue, we would have to determine the localization  III  CLASSIFICATION ACCURACY FOR EGO-LANE DETERMINATION. BN+ALD REFERS TO THE BN FEED WITH THE ADJACENT LANES DETECTION, BN+ALD+HMM  INDICATES THE HMM WITH THE CORRESPONDING BN, BN+ALD+AVD REFERS TO THE BN FEED WITH ADJACENT LANES AND VEHICLES DETECTION AND  BN+ALD+AVD+HMM REFERS TO THE HMM WITH THE CORRESPONDING   of the detected vehicle relative to the ego-vehicle, which is not the case since we use solely images as input. Finally, despite that the authors in [19] did not take into account the lane change scenarios and designed empirically the HMM. Nevertheless, we manage to outperform their results on the same datasets. Indeed, they achieved 77% correct classifications, where we were able to reach 85.35% on 9528 frames. Some ELL in different images are presented in Fig. 15. The presented results have been carried using Python 3.7 under a Dell G3 3579 Core i7 8th generation equipped with an NVIDIA GeForce GTX 1050 Ti. The computation time results presented in Fig. 16 shows that even if the entire algorithm was coded in Python, real-time implementation is possible. Indeed, the sum of all parts of the ego-localization algorithm is under 500 ms (446.95 ms on average). In addition, if we glance in-depth at the time consumed in the RLL, we found that on average, it takes 238.16 ms to query the local server containing the OSM data. Furthermore, from the 143.67 ms dedicated to the LLL, 65.60 ms are spent on the YOLO detector. These results lead us to assume that implementation on C/C++ will divide the calculation time by 10.

V. CONCLUSION
In this paper, we presented an end-to-end ego-vehicle localization. Starting from Road-level localization (RLL) using OSM datasets and a classic GPS receiver, to the Ego-lane level localization (ELL) using a recognition lane marking in an information-driven fashion and finally, a Lane-level localization (LLL) using the well-known YOLO detector in a probabilistic framework composed of a Bayesian Network and Hidden Markov Model to correctly determine the ego-lane in the road. The distinction from other work lies in the overall aspect of the ego-localization that has been tackled in stock in the paper. In addition, this work is distinguished by the information-driven approach.
For our future work, we are currently working on adding detectors from different sensors, i.e., lidar, radar to enhance the proposed framework. On the other hand, we are still working on the OSM datasets in order to update this database when it provides false information about the environment.