GB2597266A

GB2597266A - Method for joint detection of at least one lane marker and at least one object

Info

Publication number: GB2597266A
Application number: GB2011058.1A
Authority: GB
Inventors: Valentin Gheorghe Ionut
Original assignee: Continental Automotive GmbH; Continental Automotive Romania SRL
Current assignee: Continental Automotive GmbH; Continental Automotive Romania SRL
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2022-01-26
Also published as: GB202011058D0

Abstract

This patent relates to a computer-implemented method for joint detection of at least one lane marker and at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle. The method comprises a training step of a fully convolutional neural network FCN, using keypoint and bounding box predictions, to generate a joint network prediction, an inference step for generating a joint heatmap and a list of bounding boxes a raw image and sending said joint heatmap and said list of bounding boxes of the lane markers and of the at least one object k as extracted from raw image to a communication module accessible by a decision-making unit of the ego road vehicle. A fully convolutional neural network FCN having a set of parameters for executing the above steps is also described.

Description

Method for joint detection of at least one lane marker and at least one object

Field of the invention

The invention is related to autonomous assisted driving, to computer vision field, and particularly in the field of detection of lane markers and objects in an ego road vehicle environment.

Terms used in this invention Throughout this invention, the term "ego vehicle" stands for a road or land or agricultural vehicle equipped with advanced driver assistance technology. The ego vehicle may be autonomous or manned. Non limiting examples include: passenger cars, trucks, buses, earth-moving equipment, cranes, agricultural equipment as long as they are equipped with advanced driver assistance technology.

Throughout this invention, the term "objects" refer to objects 20 that are placed in the ego road vehicle environment, such as other vehicles, pedestrians and animals, all when in move, alternatively called "running".

Background of the invention

Object detection in real complex environments is a challenging task for autonomous driving in the aforementioned applications. A typical pipeline for object detection can be divided into three stages: the selection of informative regions, extraction of features and classification. Deep learning has reformed computer vision and is the core innovation behind the capabilities of a self-driving vehicle.

Modern computer vision techniques using deep learning for detecting objects in an image can generally be split into two categories: many stage detectors techniques and one stage detectors techniques.

The many stage detectors techniques work by firstly generating image region proposals, either external or internal to a convolutional neural network, and subsequently classifying those region proposals into classes. The one stage detectors techniques are more efficient because they typically use regression to anchor boxes with all object bounding boxes being generated in a single feed-forward pass.

More recently, the one stage detectors techniques have morphed into an even more efficient keypoint estimation task whereby object bounding boxes and other properties can be directly inferred from the keypoint locations such as the object centre. This amounts to an end-to-end differentiable approach that does not require the post processing needed for standard one stage detectors techniques.

Lane markers detection is typically solved as a separate component using encoder-decoder CNNs involving semantic segmentation or instance segmentation approaches either as a 20 standalone task or in a multi-task, i.e. multi label framing.

Disadvantages of prior art

The prior art solutions have many disadvantages:

-The detection methods used in the prior art are expensive either in the inference stages due to computation overheads incurred by separate convolutional networks, separate embedding and segmentation branches or at the post processing stage due to the non-maximum suppression in anchor box approaches, or due to the segmentation artifacts of the object detection that would amount to a significant number of false positives, which have to be corrected using additional post processing computations that also leads to increasing the costs and time, - Lower performance and possible class imbalances at train-time by under-sampling due to using separate data sets and data sets schedulers for lane markers and object detection whereas in reality normal traffic conditions contain both, - The detection methods used in the prior art have increased complexity of training a multi-task semantic segmentation network with fundamentally different network heads or tasks. Recent literature reveals that inferior performance is to be expected from multi-task learning, especially if tasks objectives compete, for example if one task representation or desired output is substantially different to others, as is the case for lane markers and object detection.

Problem solved by the invention The problem to be solved by the invention is to provide joint detection of one or more lane markers im and of one or more objects k, the method having an improved accuracy, speed and performance using minimum resources for training and post processing stage.

Summary of the invention

In order to solve the problem, the inventor conceived in a first aspect of the invention a computer-implemented method for joint detection of at least one lane marker 4, and at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle, the method comprising the following: S.I. Training step I of a fully convolutional neural network FCN, having an initial set of parameters, of a training module, using keypoints predictions and bounding boxes predictions, the bounding boxes predictions being extracted from attributes of the keypoints predictions of: -the at least one lane marker 1m being classified in one class cw, comprising a plurality of one bounding box Be consisting of one pixel of the respective at least one lane marker 1. , the at least one lane marker /m comprising a plurality to a one bounding box Bic, and -the at least one object k being classified in at least one class ck of a at least one class bounding box Bk of at least one object k, such that the fully convolutional neural network FCN is able to generate a joint network prediction ho based on a learned set of parameters O'of the fully convolutional neural network FCN, 10 the training step I comprising the following sub-steps: S.I.1.Providing of the fully convolutional network FCN having the initial set of parameters 0 with a predetermined number in of training samplesf(a1,b3M, each training sample comprising: -an input image ai having an image width Mit and an image height Hi; -a target output data bi comprising: key-points * a target keypoint label b. corresponding to a keypoint of the at least one lane marker 1m and of the at least one object k, where each of the target keypoint label h key-points is defined within the codomain: 117x1r(ck +ck, ) w H bkey-points E [0,1 (1) size and offset * target size and offset labels hi corresponding to the size and offset of one bounding box 130 of the at least one lane marker 4, and of the at least one bounding box Bk of the at least one object k, wherein each of the target size and size and offset offset label kis defined within the codomain: h size and offset E RiTxTx4 (2) R is the output stride used to down-sample, and 4 stands for 4 30 outputs of the fully convolutional neural network FCN at each

W H

location in 7-XT, namely object or lane marker width, height, offset in horizontal direction, offset in vertical direction.

S.I.2. Determining the joint network prediction he S.I.2.1.Determining the joint keypoints prediction P by means of the fully convolutional neural network FON based on the input ftn image a1, including a lane marker keypoints prediction plc, for the at least one lane marker /. and a objects keypoints prediction objects for the at least one object k, where the lane marker keypoints are placed in each pixel of the at least one Lane marker 2. and the objects keypoints are placed in the center of the at least one object k, the respective keypoints being expressed in a coordinates system KY, each keypoint having two coordinates, one on the horizontal H axis and the other on the vertical y axis comprising the following sub-steps: 4n - Determining the lane marker keypoints prediction pe defined within the codomain

W H

IM-X-X C

-pc [0,1]R R (3) - Determining the objects keypoints prediction PR object s defined within the codomain:

W H

PR objects E [0, 1] R XT?Xck ( 4) - Determining the joint keypoints prediction P is defined within the codomain: -P E [0, 1]-;xZx(ck +ck° (5) -Generating a heatmap of the image ai having peaks corresponding to each keypoint of the joint keypoints prediction P S.I.2.2. Determining the joint size prediction S of the bounding boxes B, by means of the fully convolutional neural network FCN, including the constant size prediction sremmtersof the bounding cts box Bo obje and the size prediction sk of the at least one bounding box Bk on said heatmap of the image au tm S. I. 2.2.1 Determining the constant size prediction 5,c' of the bounding box Bo of at least one lane marker /.

- Determining all the bounding boxes Be of the at least one marker based on the lane marker keypoints prediction pk,fm 1m - Determining the constant size prediction Se of the bounding boxes Be by using the formula: ,SP = (1, 1) ( 6) I m sk, corresponding to the bounding box Be objects S. I. 2.2.2. Determining the size prediction sk of the at least one bounding box Bk for at least one object k: (k) (k) Determining at least one bounding box Bk,y(k) i,x(k)2,y2 as rectangular boxes corresponding to each keypoint of the joint keypoints prediction that are determined in the coordinates system (k).

XY, based on two corners, the upper-left corner M (x, i y(k), ) and (k) (k) the lower-right corner of the rectangle 0 (x2;y2), origin of the coordinates being placed in each keypoint of the joint keypoints 20 prediction, - Determining the size prediction S objects of at least one bounding box Bic for at least one object k by using the formula: skobjects = (x00 _k)) for each object k (7) S. I. 2.2.3. Determining the size joint size prediction:a. defined within the codomain E nTx,7x2 1,17 H (8) S. I. 2.3. Establishing an offset O for the plurality of one 30 bounding box B,,,of the at least one lane marker /. , and the at least one bounding box Bk of at least one object k, in order to recover the discretization error caused by the output stride R, defined within the codomain:

W H

6 e Ri7xT,x2 (9) wherein the fully convolutional neural network FCN prediction he is defined by the formula: Mai) = [15, 6] (10) S. I. 2.4. Generating a joint heatmap of the input image ai having peaks corresponding to each keypoint of the joint keypoints prediction P and corresponding to the plurality of the one bounding box Boof the at least one lane marker /m, and bounding boxes Bk for at least one object k.

IS

S. I. 3. Assessing the difference between the fully convolutional neural network FCN prediction he and the target output data hi for all the predetermined number in of training samples by a cost function (9) , wherein the assessment of said difference is carried out by evaluating if the fully convolutional neural network FCN predictions h9(a) deviates from the target output data k for each training example i using a cost function f(S) defined by the formula: 1(0) = ml Jot& (ho(CI)A) (11) totw where i is a summation of terms that quantifies misalignment between the fully convolutional neural network FCN prediction he and the target output data S. I. 4. Adjusting the initial set of parameters 000 of the fully convolutional network EON to a learned set of parameters 0(0 of the fully convolutional neural network EON by determining whether a cost function J(6) of the fully convolutional neural network EON satisfies a pre-determined threshold or until convergence criteria is met: If the pre-determined cost function threshold is met or convergence criteria is satisfied, the training of the fully convolutional neural network EON is completed, or If the pre-determined cost function threshold is not met, the initial parameters 0(0 of the fully convolutional neural network EON are adjusted according to the target output data k until the cost function AO) satisfies the pre-determined threshold or until convergence.

S.II. Inference step of the fully convolutional neural network FCN of an inference module S.II.1.Providing the first convolutional stage of the fully convolutional neural network EON with a raw image cti,nagei provided by the advanced driving assistance system (ADAS) processing chain 20 of the ego road vehicle.

S.II.2 Determining the joint network prediction holof the raw image aimagei according to step S.I.2 as claimed above S. II. 3 Decoding the fully convolutional neural network EON prediction I/01 by means of a decoding unit to generate the plurality of the one bounding box Bk, of the at least one lane marker /m, and the at least one bounding box Bk for at least one object k, comprising the following sub-steps: S. 11.3.1 Generating the plurality of the one bounding box B"f for the at least one lane marker 2.

- determining the lane marker keypoints prediction pk/m - establishing the offset 47 of the bounding box Elk) of the at least one lane marker 2.

- determining the constant size prediction si,Yof the one bounding 5 box 13,,, of the at least one lane marker /m by using the formula (ve),h)= (1,1) - generating the plurality of the one bounding box Bk, by using the formula: S. 11.3.2. Generating the at least one bounding box Bk based on the following features: kiectscoordinates00 00 - determining the objects keypoints Pk° (x,y, ) - establishing the offsetoras (640,6e) for the ar least one 15 bounding box Bkfor the at least one object, - determining the size sokbjects(w,(10,k(ION ) of the at least one bounding box Bk for at least one object k, - determining the upper-left corner N coordinates, and the lower-right corner P coordinates, of the corresponding the at 20 least one bounding box Bk forat least one class ck, (k) (k) (k) wc 0 N 2 2 "'Yc) o P (x,Y)+ Sx, + =,37" + +1.') 2 2 - generating the at least one bounding box Bk by using the upper-left corner N coordinates and the lower-right corner P 25 coordinates as follows: ) + ) - + S37 - x&) + Sx(") + 8c 2 c 2 c S. 11.3.3 Generating a joint heatmap and a list of bounding boxes of the raw image aimagel* corresponding to each keypoint of the joint keypoints prediction 2 and generating the plurality of the one bounding box Elk, of the at least one lane marker /m, and generating the at least one bounding box Bk for at least one object k representing the joint detection of the at least one lane marker /m and the at least one object k in the ego road vehicle environment.

S. 11.4. Sending said joint heatmap and said list of bounding boxes of the lane markers in and of the at least one object k as extracted from raw image a,",""flei to a communication module accessible by a decision-making unit of the ego road vehic In a second aspect of the invention it is provided a fully convolutional neural network FCN having a set of parameters 0(0 for joint detection of the at least one lane marker in and of the at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle, the fully convolutional neural network FON comprising: * a first sequence of downsampling convolutional layers containing convolution, activation, pooling and skip operations, * followed by a second sequence of upsampling transpose 25 convolutional layers containing transpose convolution, activation, pooling and skip operations, said first and second sequence being stacked together forming an hourglass architecture, wherein said fully convolutional neural network FCN includes a 30 final layer of comprising an additional channel to generate keypoints predictions and bounding boxes predictions of the at least one lane marker 4n.

In a third aspect of the invention it is provided a machine computing unit for implementing the computer-implemented method for joint detection of the at least one lane marker lm and of the at least one object k from in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle, the system comprising: -at least one processor configured to execute instructions 10 stored in the memory to execue the steps of the method using the fully convolutional neural network FCN of claim 3, -at least one non-volatile memory used by the at least one processor, -a communication module accessible by a decision-making unit 15 of the vehicle.

wherein the at least one processor comprises: -at least a training module being configured to execute the training step 5.1, -at least an inference module being configured to execute the 20 Inference steps 5.II.2.1 and 5.II.2.2, -a decoding unit being configured to execute the step 5II.3.

In a fourth aspect of the invention it is provided a computer program comprising code instructions for the execution of a method for joint detection of the at least one lane marker In and at of the least one object k in an ego road vehicle environment, when said program is executed on the machine computer unit.

In a fifth aspect of the invention it is provided a computer readable medium having stored thereon instructions which, when executed by the machine computing unit, causes the machine computing unit to carry out the computer-implemented method of detection.

In a sixth aspect of the invention it is provided a data stream which is representative of the computer program.

Advantages of the invention The main advantages of this invention are the following: - The invention proposes a simplified structure having a single computer implemented method, a single computer program running on a single machine computing unit, the method using an end to end differentiable and trainable neural network to jointly infer lane markers and traffic participants in a video frame acquired from a forward driving perspective. This amounts to considerable savings in terms of computing resources and, due to the reduced complexity stemming from above, to less errors, - The method has the advantage of using a single data set and data set schedulers for lane markers and object detection, which means that, from the machine learning perspective, the invention has the advantage of having a single task, acting as a supervisory signal to train for non-competing optimization objectives which is highly desirable in terms of minimizing the errors generated in processing multiple data, - The invention makes the task of the detection of both lane markers and objects under-sampling or over-sampling easier to monitor and to mitigate from a train data scheduling standpoint, which, in its turn reduces further the computing resources and the errors.

Brief description of the drawings

Fig.1 Determining the objects keypoints prediction pkablects in the in the prediction step of the method according to the invention, Fig. 2 Determining the bounding boxes Bk for the objects k in the prediction step of the method according to the invention, Fig. 3 Generating the bounding boxes Bk for the objects k in the decoding step of the method according to the invention/

Detailed description

In a first aspect of the invention it is provided a computer -implemented method for joint detection of at least one lane marker /m and at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle.

The method comprises a training step I of a fully convolutional neural network FCN having an initial set of parametersO, and an inference step II of the fully convolutional neural network EON.

In the training step I a fully convolutional neural network EON is trained. The fully convolutional neural network EON has an initial set of parameters e of a training module, using keypoints predictions and bounding boxes predictions, the bounding boxes predictions being extracted from attributes of the keypoints predictions of: -the at least one lane marker /rn being classified in one class cy, comprising a plurality of one bounding box Bic, consisting of one pixel of the respective at least one lane marker 1. , the at least one lane marker /m comprising a plurality to a one bounding box Bk, and -the at least one object k being classified in at least one class ck comprising at least bounding box Bk of at least one object k, The training of the fully convolutional neural network EON is carried out such that the fully convolutional neural network FCN is able to generate a joint network prediction he based on a learned set of parameters trof the fully convolutional neural network EON.

The fully convolutional neural network FCN is a parametrized, nested mathematical function, commonly known as network model, of the form h9(a1) where x represents the input and 0 represents all the model parameters in vector form. Another way to represent this 5 function mathematically is shown below: (= (L) (n(L -1) (" L -2) CFie ai) nea) 8(1-,) no(L-2) (1) ** * (he (x) ** * )) U) The term how refers to a mathematical operation such as those described in the convolutional neural network literature (e.g. convolution or dot product followed by an activation function, 10 max pooling etc.) applied at network layer /Cf1,2,...LI. The mathematical function applied at network layer I (i.e. h()) is parametrized using parameters 0(0.

It is a typical U-net architecture and in U-net like approaches one tends to down-sample the input tensor e.g. an image, which acts as a bottleneck while also removing redundant information, and then up-sampling is carried out as to perform inference at a pixel level. This type of architecture has morphed into a so called Hourglass (U-net after U-net) architecture which is also what this invention is considering for the keypoint approach.

There are also other more efficient network bodies such as those comprised of deep layer aggregation.

The fully convolutional neural network FON learns a hierarchal feature representation distributed across the hourglass body. Shallow layers become sensitive to edges and motives (i.e. simple features). Deeper layers become sensitive to combinations of edges e.g. blobs and shapes. There may be one or more pre-processing steps applied to a raw image prior to CNN computation e.g. image normalization or scaling.

The training step I comprises the following sub-steps: In the sub-step S.I.1. the fully convolutional network FCN having the initial set of parameters 0 is provided with a predetermined number m of training samples.

Each training sample comprises a set of data f(a1,1017±1 comprising an input image a, and target output data 13E. The input image ai has an image width VVi and an image height Hi.

A non-limiting example of the input image a, is an image pf 250 5 pixels width x 250 pixels height x 3 (for ROB values) is often denoted as 250x250x3.

In general, it is better to have a high resolution input images ai to the fully convolutional neural network EON so as to allow for more details of the input image a, to be processed. The images provided by the advanced driving assistance system (ADAS) processing chain do satisfy the condition of high resolution as mentioned above.

Higher frame rate is generally preferred especially if functionality is to be preserved reliably at higher vehicle speeds. For example a 30 frames per second is generally a good frame rate, however even 15 frames per second or less may be sufficient for some low speed maneuvering.

In another embodiment of this invention, temporal consistency of keypoints between consecutive video frames is ensured using classic video tracking, such as e.g. Kalman filtering or deep learning techniques e.g. sequence models consisting of long-short term memory modules or other recurrent neural network building blocks. The latter approach has the advantage of being an end to end differentiable and trainable neural network.

Complex driving scenarios can be disambiguated using additional sensing capability or sensor fusion techniques to generate high fidelity point clouds in a 3D Euclidean space. Senscr readings can be acquired from a forward driving perspective cr multiple vantage points Sensors capable of generating point clouds include but are not limited to LIDAR, RGBD, stereo cameras and time-of-flight cameras. Radar technology can also be used with the aforementioned ranging sensors to improve robustness of laser ranging in low albedo regions.

The target output data k comprises a target keypoint label hkey -points corresponding to a keypoint of the at least one lane marker 4, and of the at least one object k. Each of the target keypoint key-points label b is defined within the codomain:

WH

bkey-points [0,1[17><Tx(ck +ck, ) (1) The target output data k further comprises a target size and reandoffset offset labels b corresponding to the size and offset of the plurality of the one bounding box of the at least one lane marker 4ft and of the at least one bounding box of the at least one object k, wherein each of the target size and offset label bsize and offset set is defined within the codomain:

W

bsize and offset RRXT54 ( 2)

E

where R is the output stride used to down-sample, and 4 stands for 4 outputs of the fully convolutional neural network EON at w H each location ±n TX.7)T, namely object or lane marker width, height, offset in horizontal direction, offset in vertical direction.

In the sub-step 5.I.2. the joint network prediction ho is determined by four principal sub-steps: determining the joint keypoints prediction P in the step 5.I.2.1, determining the joint size prediction S of the bounding boxes B in the step 5.I.2.2, 25 establishing an offset O in the step 5.I.2.3, and generating a joint heatmap of the input image ai having peaks corresponding to each keypoint of the joint keypoints prediction P and the bounding boxes B in the substep 5.1.2.4.

In the sub-step 5.I.2.1. are determined the joint keypoints 30 prediction P by means of the fully convolutional neural network FCN based on the input imageai. The joint keypoints prediction 2 includes a lane marker keypoints predictionpo for the at least one lane marker im and a objects keypoints prediction pk(miects for the at least one object k as it is presented in Fig.l.

The lane marker keypoints are placed in each pixel of the at least one lane marker in and the objects keypoints are placed in the center of the at least one object k.

The respective keypoints are expressed in a coordinates system KY, each keypoint having two coordinates, one on the horizontal x 10 axis and the other on the vertical y axis. i

The lane marker keypoints prediction pmo are defined within the codomain:

W H

TXcio (3), E[0,1]Tx where Ck, = 1. kobj eas The objects keypoints prediction p1,° are defined within the codomain:

W H

Pk"JeaSE [0,1]VITXCk (4) The joint keypoints prediction 2 is defined within the codomain:

W H

P E [0, lifxr(x(ek +Lk') ( 5) The sub-step 5.I.2.1. ends with generating a heatmap of the 20 image at having peaks corresponding to each keypoint of the joint keypoints prediction P. In the sub-step 5.I.2.2. it is determined the joint size prediction S of the bounding boxes B, by means of the fully convolutional neural network FCN. The joint size prediction S Includes the constant size prediction Se of the one bounding box Bw of the at least one lane marker lm, and the size prediction objects Sk of the at least one bounding box Bk for the at least one object k, on said heatmap of the image The joint size prediction S of the bounding boxes B is determined 30 by three principal sub-steps: determining the constant size prediction skriof the one bounding box Bo in the sub-step objects 3.I.2.2.1, determining the size prediction sk of the at least one bounding box Bk for at least one object k in the sub-step 3.I.2.2.2, determining the size joint size prediction S. in the sub-step S. 1.2.3, In the sub-step 3.I.2.2.1 it is determined the constant size prediction skTnof the one bounding box Bk.!, firstly by determining the plurality of the one bounding box Be based on the lane marker keypoints prediction polm and further on by determining the constant size prediction Sikwof the one bounding box Be,' by using the formula: 5,,y. =(1,1) (6) In the sub-step 5.I.2.2.2 it is determined the size prediction objects Sk of the at least one bounding box Bk for at least one object k.

ex2k)Ce-', The at least one bounding box Bk(41°, ) is a rectangular box corresponding to each keypoint of the joint keypoints prediction that is determined in the coordinates system (k) KY based on two corners, the upper-left corner M (xi)y.(k)1) and (k) the lower-right corner of the rectangle 0 (x2;y2) as it is 20 presented in Fig. 2.

Origins of the coordinates is placed in each keypoint of the joint keypoints prediction.

objects Further on the size prediction sk of the at least one bounding box Bk for at least one object k is determined by using the objects (k) (k) (k) (k) formula: Sk =(2 -.T1,y2 -y2) for each object k (7) In the sub-step S. I. 2.2.3. it is determined the size joint

W H

size prediction defined within the codomain gEXiX2 (8) In the sub-step S. 1.2.3. an offset 0 it is established for one class ck, comprising a plurality of one bounding box Bo of 30 the at least one lane marker im, and all classes Ck of the at least one bounding box Bk for at least one object k in order to recover the discretization error caused by the output stride R, 417 H 2 defined within the codomain: OcRRXIX (9) The fully convolutional neural network FCN prediction hp is 5 defined by the formula: h9(a1) = [P,g',6] (10) In the sub-step 3.I.2.4. it is generated a joint heatmap of the input image ai having peaks corresponding to each keypoint of the joint keypoints prediction P and the bounding boxes Bk! of the at least one lane marker /m, and bounding boxes Bk for at least one object k.

In the sub-step 3.I.3. it is assessed the difference between the fully convolutional neural network FCN prediction ho and the target output data bi for all the predetermined number m of training samples by a cost function (0) . The assessment of said difference is carried out by evaluating if the fully convolutional neural network FCN predictions h9(a) deviates from the target output data bi for each training example i using a cost function J(0) defined by the formula: j(0)=yrfir%jt°tai(h9(a),bi) (11) where * jtotat is a summation of terms that quantifies misalignment between the fully convolutional neural network FCN prediction he and the target output data k.

The cost function itotat is defined by formula * 'total = 'key points ± Asizeisize ± onsetiof f set where Asize =0.1 and = f set = 1 * The tey points isize jot!set * .1 * J quantifies misalignment between the fully convolutional neural network FCN prediction ho and the target output data k, regarding the keypoints, size and offset of the plurality of the one bounding box Be of the at least one lane marker /m, and the at least one bounding box Bk for at least one object k.

In the sub-step 8.I.4. the initial set of parameters OW of the fully convolutional network FCN are adjusted to the learned set of parameters 0(0 of the fully convolutional neural network FON by determining whether a cost function J(9) of the fully convolutional neural network FCN satisfies a pre-determined threshold or until convergence criteria is met: - If the pre-determined cost function threshold is met or convergence criteria is satisfied, the training of the fully convolutional neural network FCN is completed, or - If the pre-determined cost function threshold is not met, the initial parameters 00) of the fully convolutional neural network FCN are adjusted according to the target output data bi until the cost function J(0) satisfies the predetermined threshold or until convergence.

The inference step II has four sub-steps: providing a raw image aintagelf determining the joint network prediction, decoding the joint network prediction, generating the plurality of the one bounding box Be of the at least one lane marker in, and at least one bounding box Bk for at least one object k., and generating a joint heatmap and list of bounding boxes of the raw image aimagei., representing the joint detection of the at least one lane marker In and the at least one object k in the ego road vehicle environment In the sub-step 3.II.1 the fully convolutional neural network 30 FCN of an inference module is deployed by providing the first convolutional stage of the fully convolutional neural network FCN with a raw image aimagel provided by the advanced driving assistance system (ADAS) processing chain of the ego road vehicle.

In the sub-step S.II.2 it is determined the joint network prediction helof the raw image a anagel according to step 5.I.2 as 5 presented above.

In the sub-step S. II. 3 the fully convolutional neural network FCN prediction hel is decoded by means of a decoding unit to generate the plurality of the one bounding box se of the at least one lane marker /En, and at least one bounding box Bk for at least one object K. In the sub-step S. 11.3.1 the plurality of the one bounding box BO for the at least one lane marker In are generated by the following sub-steps: - determining the lane marker keypoints prediction pk/m -establishing the offset DIY of the one bounding box Bkf of the at least one lane marker lm, - determining the constant size predictions of the one bounding box Bo of the at least one lane marker im as (4e),In = (1,1), - generating the plurality of the at least one bounding box Bic, 20 by using the formula: (k) (k) I g (0) We (0) g (e) he Be) g (0) We "te ±uXe --,yc -Fuye ±,,Xe ±_,e')+Syr 2 2 2 In the sub-step 3.II.3.2. the at least one bounding box Bk is (0 (0 generated. Firstly the objects keypoints pkobjects coordinates (x, ) are determined, and it is established the off set(8.4k),6 yc(k)), the 25 at least one bounding box Bkfor the at least one object 0,19,bjects objects In this sub-step it is determined the size sk of the at least (0 one bounding box Bk for at least one object k as (wc,h(0c), and the upper-left corner N coordinates, and the lower-right corner P coordinates, of the corresponding at least one bounding box Bk for at least one classck as it is presented in Fig. 3.

O N

-(k) 0) (k) (k) w

O P 2 2

The at least one bounding box Bk is generated by using the upper-left corner N and the lower-right corner P as follows: b(k-) (k) 17.(k) (e) ± se) .37c + 6.3.7c "c,x(k) (k) W e ---X, ± 2 2 In the sub-step S. 11.3.3 it is generated a joint heatmap and list of bounding boxes from the raw image a,nage1., corresponding to each keypoint of the joint keypoints prediction 2 and generating the plurality of bounding boxes Re of the at least one lane marker in, and the at least one bounding box Bk for at least one object k representing the joint detection of the at least one lane marker in and the at least one object k in the ego road IS vehicle environment.

In the sub-step S. 11.4. the joint heatmap and list of bounding boxes of the lane markers in and of the objects k as extracted from raw image ct,",agei are sent to a communication module accessible by a decision-making unit of the ego road vehicle.

In a preferred embodiment, the learned set of parameters 0* obtained via a training procedure are weights and biases. In this case the adjusting of the initial set of parameters 0(0 of the fully convolutional network FCN to the learned set of parameters 0() of the fully convolutional neural network FCN is carried out gradually using the backpropagation algorithm.

In a second aspect of the invention it is provided a fully convolutional neural network FCN having a set of parameters 00) for joint detection of the at least one lane marker im and of the 30 at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle, the fully convolutional neural network FCN comprising: a first sequence of downsampling convolutional layers containing convolution, activation, pooling and skip operations, followed by a second sequence of upsampling transpose convolutional layers containing transpose convolution, activation, pooling and skip operations, The first and second sequence being stacked together forming an hourglass architecture, The fully convolutional neural network FCN includes a final layer of comprising an additional channel to generate keypoints predictions and bounding boxes predictions of the at least one lane marker In a third aspect of the invention it TS provided a machine computing unit for implementing the computer-implemented method for joint detection of the at least one lane marker im and of the at least one object k from in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle The system comprises at least one processor configured to execute instructions stored in the memory to execute the steps of the method of any of the embodiments using the fully convolutional neural network FCN, at least one non-volatile memory used by the at least one processor, and a communication module accessible by a decision-making unit of the vehicle.

The at least one processor comprises at least a training module which is configured to execute the training step 5.1, and at least an inference module being configured to execute the Inference steps S.II.2.1 and S.II.2.2.

The at least one processor further comprises a decoding unit which is configured to execute the step SII.3.

In a fourth aspect of the invention it is provided a computer program comprising code instructions for the execution of a method according to any of the embodiments for joint detection of the at least one lane marker In and at of the least one object k in an ego road vehicle environment, when said program is executed on the machine computer unit of the invention.

In a fifth aspect of the invention it is provided a computer readable medium having stored thereon instructions which, when executed by the machine computing unit according to claim 4, causes the machine computing unit to carry out the computer-implemented method of prediction of any of preferred embodiments.

In a sixth aspect of the invention it is provided a data stream 15 which is representative of the computer program of the invention. While the description is made with reference to preferred embodiments and the example of realization, it will be understood that various changes maybe made and equivalents maybe substituted for elements thereof without departing from the scope of the 20 claims.

Claims

Patent claims 1. A computer-implemented method for joint detection of at least one lane marker 4, and at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle, the method comprising the following: S.I. Training step I of a fully convolutional neural network FCN, having an initial set of parameters, of a training module, using keypoints predictions and bounding boxes predictions, the bounding boxes predictions being extracted from attributes of the keypoints predictions of: -the at least one lane marker 1m being classified in one class cw, comprising a plurality of one bounding box Be consisting of one pixel of the respective at least one lane marker 1," , the at least one lane marker /m ccmprising a plurality to a one bounding box Bic, and -the at least one object k being classified in at least one class ck of a at least one class bounding box Bk of at least one object k, such that the fully convolutional neural network FON is able to generate a joint network prediction he based on a learned set of parameters rof the fully convolutional neural network FON, 25 the training step I comprising the following sub-steps: S.I.1.Providing of the fully convolutional network FCN having the initial set of parameters 0 with a predetermined number in of training samples, each training sample comprising a set of data Rabball comprising: -an input image aihaving an image width WI and an image height Hi; -a target output data k comprising: key-points * a target keypoint label hi corresponding to a keypoint of the at least one lane marker 1m and of the at least one object k, where each of the target keypoint label bkey-points is defined within the codomain:W HFX17X(Ck +Co) (1) bkey-points E [0, 1 ze * target size and offset labels hisi and offsetcorresponding to the size and offset of one bounding box Bk, of the at least one lane marker /m and of the at least one bounding box Bk of the at least one object k, wherein each of the target size and offset label b isizeandoffset is defined within the codomain:W HAsize and offset E RR XR X4 ( 2) nt R is the output stride used to down-sample, and 4 stands for 4 outputs of the fully convolutional neural network FCN at each w H location in TXT, namely object or lane marker width, height, offset in horizontal direction, offset in vertical direction.S.I.
2. Determining the joint network prediction he S.I.2.1.Determining the joint keypoints prediction P by means of the fully convolutional neural network FON based on the input im imagea, including a lane marker keypoints prediction pe for the at least one lane marker /. and a objects keypoints prediction t, objects for the at least one object k, where the lane marker keypoints are placed in each pixel of the at least one Lane marker im and the objects keypoints are placed in the center of the at least one object k, the respective keypoints being expressed in a coordinates system XY, each keypoint having two coordinates, one on the horizontal x axis and the other on the vertical y axis, comprising the following sub-steps: -Determining the lane marker keypoints prediction pe defined within the codomain if4 c [0nxck, (3) h ec - Determining the objects keypoints prediction pk°ts 1 defined within the codomainH

Pk objects C [0, 1]7'< ( 4) - Determining the joint keypoints prediction P is defined within the codomain -P E [0, 113rxix(ck +Ckr) (5) - Generating a heatmap of the image ai having peaks corresponding to each keypoint of the joint keypoints prediction P S.I.2.2. Determining the joint size prediction S of the bounding boxes B, by means of the fully convolutional neural network FCN, including the constant size prediction stanemarkersof the bounding Icf box Be, and the size prediction s1 of the at least one bounding 15 box Bk on said heatmap of the image ai, S. I. 2.2.1 Determining the constant size prediction skinof the bounding box Be of at least one lane marker /rn - Determining all the bounding boxes Be of the at least one 20 marker based on the lane marker keypoints prediction pik,Im - Determining the constant size prediction s;inof the bounding boxes Be by using the formula: sity = (1, 1) ( 6) 1m ci corresponding to the bounding box Be objects S. I. 2.2.2. Determining the size prediction sk of the at least one bounding box Bk for at least one object k: Determining at least one bounding box Bk,y(k) ,y(k)2 as rectangular boxes corresponding to each keypoint of the joint 30 keypoints prediction that are determined in the coordinates system Q0 U KY,based on two corners, the upper-left corner M (xi;y01) and (10 U0 the lower-right corner of the rectangle 0 (x2;y2), origin of the coordinates being placed in each keypoint of the joint keypoints prediction, -Determining the size prediction s oNects k of at least one bounding box Bk for at least one object k by using the formula: objects, (10 (70 (k) (10 Sk ( 2 1,y2 -3/1) for each object k (7) S. I. 2.2.
3. Determining the size joint size prediction defined within the codomain

W H

E RITx(7><2 (8) S. I. 2.3. Establishing an offset b for the one bounding box Boof the at least one lane marker lm, and the at least one bounding box Bk of at least one object k, in order to recover the discretization error caused by the output stride R, defined within the codomain:

W H

6 e Rxxe2 (9) wherein the fully convolutional neural network EON prediction is defined by the formula: Mai) = [P,g,i5] (10) S. I. 2.4. Generating a joint heatmap of the input image ai having peaks corresponding to each keypoint of the joint keypoints prediction P and corresponding to the plurality of the one bounding box Bklof the at least one lane marker 2m, and bounding boxes Bk for at least one object k.S. I. 3. Assessing the difference between the fully convolutional neural network EON prediction ho and the target output data bi for all the predetermined number in of training samples by a cost function (9) , wherein the assessment of said difference is carried out by evaluating if the fully convolutional neural network FCN predictions h9(a) deviates from the target output data bi for each training example i using a cost function AO) defined by the formula: J(0)--;:t7!1ft"af(fte(a0,130 (11) where focal is a summation of terms that quantifies misalignment between the fully convolutional neural network FCN prediction he and the target output data S. I. 4. Adjusting the initial set of parameters 0) of the fully convolutional network FON to a learned set of parameters 0(0 of the fully convolutional neural network FCN by determining whether a cost function KO) of the fully convolutional neural network FCN satisfies a pre-determined threshold or until convergence criteria is met: If the pre-determined cost function threshold is met or convergence criteria is satisfied, the training of the fully convolutional neural network FCN is completed, or If the pre-determined cost function threshold is not met, the initial parameters 0(0 of the fully convolutional neural network EON are adjusted according to the target output data k until the cost function J(0) satisfies the pre-determined threshold or until convergence.S.II. Inference step of the fully convolutional neural network FCN of an inference module S.II.1.Providing the first convolutional stage of the fully convolutional neural network LCD with a raw image a image' provided by the advanced driving assistance system (ADAS) processing chain of the ego road vehicle.5.II.2 Determining the joint network prediction holof the raw image aimagel according to step S.I.2 as claimed above S. II. 3 Decoding the fully convolutional neural network SON prediction hel by means of a decoding unit to generate the plurality of the one bounding box Be of the at least one lane marker 1., and the at least one bounding box Bk for at least one object k, comprising the following sub-steps: S. 11.3.1 Generating the plurality of the one bounding box Elk, 15 for the at least one lane marker 2.- determining the lane marker keypoints prediction pk/m - establishing the offset (7 of the bounding box Bk, of the at least one lane marker in - determining the constant size prediction srof the one bounding 20 box Be of the at least one lane marker 4" by using the formula (er), k,k1)) = (1,1) - generating the plurality of the one bounding box Be by using the formula: (h) t (X(kr) ± X(k) W Y (hi) Ch-) 2 c - 1-) +8A-1k1) 2 C -c 2 de ± Ye S. 11.3.2. Generating the at least one bounding box Bk based on the following features: - determining the objects keypoints Acobjectscoontnates0e,e) - estobjects (0 00 0 (10) ablishing the offsetok OX, ye for the an least one bounding box Bkfor the at least one object k - determining the size s rjects(e) bounding box Bk for at least one object k of the at least one - determining the upper-left corner N coordinates, and the lower-right corner P coordinates, of the corresponding the at 5 least one bounding box Bk forat least one class ck, ---- rk) (k "i ii) (x(") 8x(")-= 37 + 8)7 -c 2 c c 2 0 P 2 2 - generating the at least one bounding box Bk by using the upper-left corner N coordinates and the lower-right corner P 10 coordinates as follows: (k) b(k) (k) 11(k w) c(k) se) "7c (k) (k) w, (x 2 x +8x, + 2 c _n 2 2 S. 11.3.3 Generating a joint heatmap and a list of bounding boxes of the raw image abnag el-r corresponding to each keypoint of the joint keypoints prediction P and generating the plurality of the one bounding box se of the at least one lane marker /m, and generating the at least one bounding box Bk for at least one object k representing the joint detection of the at least one lane marker /m and the at least one object k in the ego road vehicle environment.S. 11.4. Sending said joint heatmap and said list of bounding boxes of the lane markers /m and of the at least one object k as extracted from raw image aimagel to a communication module 25 accessible by a decision-making unit of the ego road vehicle.2. Computer-implemented method for joint detection of the at least one lane marker /rn and of the at least one object k according to the claim 1, wherein the learned set of parameters O obtained 30 via a training procedure are weights and biases, and the adjusting of the initial set of parameters 0(0 of the fully convolutional network EON to the learned set of parameters 000 of the fully convolutional neural network EON is carried out gradually using the backpropagation algorithm.3. A fully convolutional neural network FCN having a set of parameters 0(0 for executing the step I, and step II of the method for joint detection of the at least one lane marker 1m and of the at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (ADAS) processing chain of the ego road vehicle according to the claim 1 or 2, the fully convolutional neural network EON comprising: * a first sequence of downsampling convolutional layers containing convolution, activation, pooling and skip operations, * followed by a second sequence of upsampling transpose convolutional layers containing transpose convolution, activation, pooling and skip operations, said first and second sequence being stacked together forming an hourglass architecture, wherein said fully convolutional neural network EON includes a final layer comprising an additional channel to generate keypoints predictions and bounding boxes predictions of the at least one lane marker 1m.
4. A machine computing unit for implementing the computer-implemented method for joint detection of the at least one lane marker _7", and of the at least one object k in an ego road vehicle environment using images provided by an advanced driving assistance system (RDAS) processing chain of the ego road vehicle, the system comprising: -at least one processor configured to execute instructions stored in the memory to execute the steps of the method of claim 1 or 2 using the fully convolutional neural network FCN of claim 3, -at least one non-volatile memory used by the at least one processor, -a communication module accessible by a decision-making unit of the vehicle.wherein the at least one processor comprises: -at least a training module being configured to execute the training step 5.1 as claimed in claim 1or 2, -at least an inference module being configured to execute the inference steps 5.II.2.1 and 5.II.2.2 as claimed in claim 1 or 2, -a decoding unit being configured to execute the step 5II.3 as claimed in claim 1 or 2.
5. Computer program comprising code instructions for the execution of a method according to claim 1 or 2 for joint detection of the at least one lane marker 1. and at of the least one object k in an ego road vehicle environment, when said program is executed on the machine computer unit of claim 4.
6. A computer readable medium having stored thereon instructions which, when executed by the machine computing unit according to claim 4, causes the machine computing unit to carry out the computer-implemented method for joint detection of the at least one lane marker 21 and of the at least one object k in an ego road vehicle environment of any of the claims 1 or 2.
7. A data stream which is representative of the computer program of claim 5.