CN110136058B

CN110136058B - Drawing construction method based on overlook spliced drawing and vehicle-mounted terminal

Info

Publication number: CN110136058B
Application number: CN201811245545.3A
Authority: CN
Inventors: 李天威; 童哲航; 谢国富
Original assignee: Beijing Chusudu Technology Co ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2018-10-25
Filing date: 2018-10-25
Publication date: 2024-01-02
Anticipated expiration: 2038-10-25
Also published as: CN110136058A

Abstract

A mapping method and a vehicle-mounted terminal based on a top view splice graph comprise the following steps: acquiring a plurality of target images shot by a plurality of image acquisition devices at the same moment; splicing the multiple target images to obtain a top-view spliced image; identifying image semantic features in the overlook mosaic to obtain a overlook perception map; positioning based on the top view perception map, thereby determining a key frame; map points are generated according to the key frames, so that a local map is formed. By implementing the embodiment of the invention, dense and accurate ground elements of the ground warehouse can be built, and the ground elements are more robust and accurate.

Description

Drawing construction method based on overlook spliced drawing and vehicle-mounted terminal

Technical Field

The invention relates to the technical field of automatic driving, in particular to a graph building method based on a top view splice graph and a vehicle-mounted terminal.

Background

In the positioning scheme of the underground garage, the positioning result is global, but because the underground garage does not have a global satellite navigation system (Global Positioning System, GPS) signal, global position information cannot be obtained in real time by means of GPS. By means of only odometer schemes, such as visual odometers, encoders or IMU-based odometers, accumulated errors are unavoidable regardless of the accuracy of the operation, and therefore, in indoor positioning, particularly in the positioning requirements of a ground pool, high-accuracy maps occupy a very important position. The vehicle can continuously rely on own observation and high-precision map matching to obtain a global position of the vehicle in the ground library.

Among schemes for constructing images using vision, a front view image constructing scheme is also a very advanced one. However, the forward-looking-based map construction scheme has the defects that high-precision map elements are sparse, the requirements on an odometer and a matching algorithm are high, and the map elements are easily influenced by shielding.

Disclosure of Invention

The embodiment of the invention discloses a mapping method based on a top view splice graph and a vehicle-mounted terminal, which are used for reducing the dimension of a real 3D world, constructing dense and accurate ground base ground elements, and being very suitable for a ground base scene with a flat ground, more robust and accurate, and capable of giving more observation for later positioning.

The first aspect of the embodiment of the invention discloses a graph construction method based on a top view splice graph, which comprises the following steps:

acquiring a plurality of target images shot by a plurality of image acquisition devices at the same moment;

splicing the multiple target images to obtain a top-view spliced image;

identifying image semantic features in the overlook mosaic to obtain a overlook perception map;

positioning based on the top view perception map, thereby determining a key frame;

map points are generated according to the key frames, so that a local map is formed.

As an alternative embodiment, in the first aspect of the embodiment of the present invention, the method further includes the steps of:

loop detection is carried out in the process of local image construction;

and (5) performing global optimization after loop detection is successful.

As an optional embodiment, in the first aspect of the embodiment of the present invention, the obtained top view mosaic is input to a neural network model, and image semantic features in the top view mosaic are identified based on the neural network model.

As an alternative embodiment, in the first aspect of the embodiment of the present invention, the positioning fuses one or more of ring view VO, ring view VIO, and wheel speed.

As an alternative embodiment, in the first aspect of the embodiment of the present invention, the current frame is regarded as a key frame when the distance and angle between the current frame and the nearest key frame are different to a threshold value.

The second aspect of the embodiment of the invention discloses a graph construction method based on a top view splice graph, which comprises the following steps:

splicing the multiple target images to obtain a top-view spliced image;

map points are generated according to the key frames, so that a local map is formed;

the method also comprises the step of judging whether the overlook sensing image at a certain moment is a key frame according to the observed condition and the spatial relationship, wherein the formula according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

in the above formula, p _k Represents the position of the center of the vehicle at time k, p _i Indicating the position of the vehicle center at time i; in a dimension-reduced 2D map, the definition of a location isθ _k Represents the heading angle, theta, of the vehicle at the moment k _i And representing the heading angle of the vehicle at the moment i, wherein lambda is the weight of a parameter value control position part and a heading angle part, and delta is a set threshold value.

In a third aspect of the embodiment of the present invention, a vehicle-mounted terminal is disclosed, including:

the acquisition subunit is used for acquiring a plurality of target images shot by a plurality of cameras at the same time;

the splicing subunit is used for splicing the plurality of target images acquired by the acquisition subunit to obtain a top-view splicing image;

the sensing subunit is used for identifying the semantic features of the images in the overlook spliced image obtained by splicing by the splicing subunit so as to obtain the overlook sensing image;

the positioning subunit is used for positioning the top view sensing image obtained by the sensing subunit so as to determine a key frame;

and the construction subunit is used for generating map points according to the key frames of the positioning subunit so as to form a local map.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the detection subunit is configured to perform loop detection in a local mapping process;

and the optimizing subunit is used for performing global optimization after the loop detection is successful.

In a second aspect of the embodiment of the present invention, the sensing subunit is configured to input the obtained top view mosaic image into a neural network model, and identify, based on the neural network model, semantic features of an image in the top view mosaic image, so as to obtain the top view sensing image.

As an alternative implementation manner, in the second aspect of the embodiment of the present invention, the positioning fuses one or more of ring view VO, ring view VIO, and wheel speed.

As an alternative implementation, in the second aspect of the embodiment of the present invention, the positioning subunit is further configured to treat the current frame as a key frame when the distance and the angle between the current frame and the nearest key frame are different to a threshold value.

the construction subunit is used for generating map points according to the key frames of the positioning subunit so as to form a local map; the vehicle-mounted terminal further comprises a judging subunit, wherein the judging subunit further comprises a step of judging whether the overlook sensing image at a certain moment is a key frame according to the observed condition and the spatial relationship, and the judging formula according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

Compared with the prior art, the invention has the following invention points and beneficial effects, but is not limited to the following points:

1) In the prior art, the technical scheme of recognizing the semantic features of the image and then splicing the semantic features of the image to obtain the spliced image is not found. The invention obtains the overlook perception image by identifying the image semantic features in the overlook mosaic image, and particularly selects lane lines, library bit lines and library sites as semantic features, which is one of the invention points;

2) Training a plurality of overlook spliced samples marked with image semantic features in advance through an Encoder-Decoder model neural network (detailed description exists in specific embodiments), extracting the image semantic features from the overlook spliced image through a deep learning method, and improving the recognition accuracy of the image semantic features, which is one of the invention points;

3) Positioning and tracking the overlook sensing image, and judging whether the overlook sensing image at a certain moment is a key frame or not according to the observed condition and a specific spatial relationship;

4) Generating map points according to the key frames, and judging to avoid generating repeated map points in the same place, wherein the map points are one of the invention points;

5) In order to eliminate the influence of accumulated errors to a certain extent, the continuous loop detection and global optimization in the local mapping process are one of the invention points of the invention.

Therefore, by implementing the embodiment of the invention, the dimension of the real 3D world is reduced, dense and accurate ground base ground elements are built, the ground base scene on the ground is very suitable, the ground base scene is more robust and accurate, and more observation can be given to later positioning.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a graph construction method based on a top view splice graph according to an embodiment of the present invention;

FIG. 2 is a partially pictorial illustration of a parking lot constructed by an in-vehicle terminal in accordance with an embodiment of the present invention;

FIG. 3 is a partially pictorial illustration of another parking lot constructed from an in-vehicle terminal in accordance with an embodiment of the present invention;

FIG. 4 is a schematic flow chart of another method for constructing a graph based on a top view stitching graph according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a vehicle-mounted terminal according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of another vehicle-mounted terminal according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another vehicle-mounted terminal according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present invention and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a graph building method based on a top view splice graph and a vehicle-mounted terminal. The following will describe in detail.

Example 1

Instant localization and mapping (Simultaneous localization and mapping, SLAM) techniques utilize images of objects captured by cameras to construct a local map that is used to describe the surroundings of the vehicle. Based on the SLAM technique, the in-vehicle terminal can recognize feature points (here, feature points can also be understood as pixel blocks) in the target image, and construct a map using these feature points. That is, when the map is constructed, the in-vehicle terminal may gradually draw a local map of the vehicle pathway environment using the image photographed by the camera in the process that the vehicle is continuously traveling.

Referring to fig. 1, fig. 1 is a flow chart of a graph construction method based on a top view stitching graph according to an embodiment of the present invention. The method is applied to vehicle-mounted terminals such as vehicle-mounted computers, vehicle-mounted industrial control computers (Industrial personal Computer, IPC) and the like, and the embodiment of the invention is not limited. The vehicle-mounted terminal is connected with each sensor of the vehicle and receives and processes data acquired by each sensor. As shown in fig. 1, the mapping method based on the top view mosaic may include the following steps:

101. and acquiring a plurality of target images shot by a plurality of image acquisition devices at the same time.

In the embodiment of the present invention, the image capturing device may be a camera, and for convenience of description, the camera refers to the image capturing device hereinafter unless otherwise specified. The cameras are arranged in the front, rear, left and right directions of the vehicle respectively, and the view finding range of each camera at least comprises the ground below the camera. Optionally, the camera may be a fisheye camera, where a Field OF View (FOV) OF the fisheye camera is larger, so that a target image shot by a single fisheye camera may include as many surrounding environments OF the vehicle as possible, thereby improving the integrity OF observation, further improving the integrity OF a local map, and increasing the information amount contained in the local map. The cameras arranged in the four directions form a looking-around scheme of the cameras, so that the vehicle-mounted terminal can acquire the environmental information of all directions around the vehicle at one time, and a local map constructed by using the target image acquired by single acquisition can contain more information. In addition, the image data collected by the four-way cameras has certain redundancy, and if one way of cameras fails, the image data collected by the other cameras can be used as supplement, so that the influence on the construction of a local map and the positioning of the vehicle-mounted terminal is low.

102. And splicing the plurality of target images to obtain a top-view spliced image.

In the embodiment of the invention, the vehicle-mounted terminal splices target images shot by cameras arranged in the front, rear, left and right directions of the vehicle at the same time, and the obtained overlooking splice image contains 360-degree environmental information centering on the vehicle. In addition, if the camera used for shooting the target image is the camera, the vehicle-mounted terminal needs to perform anti-distortion processing on the target image before performing step 102 to splice a plurality of target images, that is, according to a certain mapping rule, the target image shot by the fisheye camera is projected onto the ground plane, and then the images obtained by the projection are spliced.

103. And identifying the image semantic features in the top view mosaic to obtain a top view perception map.

In the embodiment of the invention, the image semantic features can be the semantic features which have special meanings and are helpful for vehicle positioning after experience screening. In one possible application scenario, the vehicle is located in a parking lot, which may be an above-ground parking lot or an underground garage, and embodiments of the present invention are not limited. In the application scenario of the parking lot, the image semantic features may be lane lines, parking garage bit lines, garage sites (intersection points between the garage bit lines), zebra lines, lane arrows, and the like, which are not limited in the embodiment of the invention. Referring to fig. 2 together, fig. 2 is a schematic diagram of a parking lot constructed by a vehicle-mounted terminal according to an embodiment of the present invention, and as can be seen from fig. 2, the local map is composed of semantic features such as a lane line, a garage bit line, and a garage site, which are passed by the vehicle-mounted terminal when the vehicle-mounted terminal is driving in the parking lot, wherein a dotted line with an arrow indicates a driving track of a vehicle.

In addition, as an optional implementation manner, in the embodiment of the invention, the vehicle-mounted terminal can identify the image semantic features from the top view mosaic through an image identification algorithm such as deep learning or image segmentation. Preferably, the neural network model suitable for deep learning can be used for identifying image semantic features, and the neural network model is trained in advance by adopting a large number of overlook spliced sample images marked with the image semantic features. The neural network model is as follows:

the network structure adopts an Encoder-Decoder model and mainly comprises two parts: an encoding (Encoder) portion and a decoding (Decoder) portion.

In the embodiment of the invention, the spliced images are input into the network, wherein the coding part network extracts the characteristics of the images mainly through the rolling layer and the pooling layer. The network adjusts network parameters through the training of marked large-scale samples so as to code accurate semantic features and non-semantic features of the network. After the coding network extracts features through two convolutions, downsampling is performed through pooling. The architecture of cascading four two-layer convolutions plus one-layer pooling enables the receptive field of neurons at the top layer of the coding network to cover semantic elements of different scales in the examples of the invention.

The decoding network is a symmetrical structure to the encoding network, wherein the pooling layer of the encoding network is changed to an upsampling layer. And after four times of up-sampling in the decoding part, the features extracted by encoding are amplified to the original image size, so that the pixel semantic classification is realized. Upsampling is achieved by deconvolution, which allows most of the information of the input data to be obtained, but still some of the information is lost, so we introduce underlying features to supplement the details lost in the decoding process. These underlying features are mainly used to encode the convolutional layers of different scales in the network, and the features extracted by the convolutional layers of the encoding network on the same scale can be combined with deconvolution to generate a more accurate feature map. The network training mainly adopts cross entropy to measure the difference between the predicted value and the actual value of the network, and the cross entropy formula is as follows:

wherein y is a marking value of the image element, namely whether one pixel of the image is a semantic element or a non-semantic element, and generally 1 is used for representing the semantic element and 0 is used for representing the non-semantic element; n is the total number of pixels of the image, x is the input, a is the output of the neuron a=σ (z), z= Σ _j w _j x _j +b, which can overcome the problem of too slow updating of the network weights. After the training of the network model is completed, when the example of the invention is actually used, the network predicts each pixel of the input image, outputs the attribute value corresponding to each pixel as 0 or 1, and the connected blocks of the image elements marked as 1 are meaningful semantic image structures, so that the semantic segmentation of the image is realized. Inputting the overlook mosaic image obtained by stitching the vehicle-mounted terminal into the trained neural network model, and identifying the image semantic features in the overlook mosaic image based on the identification result of the neural network model. Compared with the traditional image segmentation technology, the image semantic features are extracted from the top view mosaic by a deep learning method, so that the recognition accuracy of the image semantic features can be improved.

Inputting the overlook mosaic image obtained by stitching the vehicle-mounted terminal into the trained neural network model, and identifying the image semantic features in the overlook mosaic image based on the identification result of the neural network model. Compared with the traditional image segmentation technology, the image semantic features are extracted from the top view mosaic by a deep learning method, so that the recognition accuracy of the image semantic features can be improved.

104. The key frames are determined based on locating the top view perspective.

In the embodiment of the invention, when the map is built, the system sets a global coordinate system and performs positioning tracking on the position of the vehicle based on the top view sensing map, and the obtained coordinate corresponding top view at each moment can be used for building the map. As an alternative implementation manner, the system determines whether the top view perceived image at a certain moment is a key frame according to the observed condition and the spatial relationship, and the formula according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

in the above formula, p _k Represents the position of the center of the vehicle at time k, p _i Indicating the position of the vehicle center at time i; in a dimension-reduced 2D map, the definition of a location isθ _k Represents the heading angle θ of the vehicle at time k _i And representing the heading angle of the vehicle at the moment i, wherein lambda is the weight of a parameter value control position part and a heading angle part, and delta is a set threshold value.

In case of sufficient observation, the current frame and the nearest key frame will be regarded as key frames when the distance and angle of the frames differ to a threshold.

In addition, as an optional implementation manner, in the above step 104, besides positioning based on the top view mosaic, the positioning aspect may also blend the look-around VO (so-called look-around VO is an original image of four cameras before obtaining the mosaic, visual o measure made based on the original images of several cameras), look-around VIO (Visual Inertial Odometry), look-around VIO blend wheel speed, etc. to improve one accuracy of local positioning.

105. Map points are generated according to the key frames, so that a local map is formed.

In the embodiment of the invention, when a frame is determined to be a key frame, the system determines whether each identified pixel point in the key frame is created. If not, the system generates a new map point for it to occupy the corresponding position in the map. This determination is made each time a map point is generated from a new key frame, avoiding the generation of duplicate map points in the same place. Each map point will determine from the observation of successive frames whether the category is correct and whether the map point ripens to calculate the category and ripen. For example, assume that a map point is determined to be a carport line category in a keyframe, and its corresponding map point is also established. The map points calculate whether other frames nearby are consistent with the observation of the corresponding positions, and the definition of the consistency covers two kinds of consistency, namely whether the categories are consistent; and secondly, whether the relative position of the map point corresponds to the observation of other frames or not is within a certain range. If both items are satisfied, the map point will be determined to be mature and added to the map.

For a better understanding of the local mapping of steps 101-105 by specific features, please refer to fig. 3 together. Fig. 3 is a partially schematic illustration of another parking lot constructed by an in-vehicle terminal according to an embodiment of the present invention. As shown in fig. 3, the local map includes three image semantic features of library bit lines, library sites and lane arrows.

Example two

With the form of the carrier in the ground base, the local map is built based on vision or vision fused with the odometer of other sensors, so that the built map can ensure local accuracy, and the accumulated error is inevitably added in long term, and loop detection and global optimization can be adopted.

Referring to fig. 4, fig. 4 is a flow chart of another graph construction method based on a top view stitching graph according to an embodiment of the present invention. As shown in fig. 4, the mapping method based on the top view splice graph may further include the following steps:

106. loop detection is carried out in the process of local image construction;

in the embodiment of the present invention, loop detection may be: the path that the carrier travels when it comes to the same location twice will be a loop, so called loop detection. When the same position is detected by coming twice, the positioning accumulated error can be quantized, and the influence of the accumulated error is eliminated to a certain extent through the calculation of global optimization.

107. And (5) performing global optimization after loop detection is successful.

Example III

Referring to fig. 5, fig. 5 is a schematic structural diagram of a vehicle-mounted terminal according to an embodiment of the present invention. As shown in fig. 5, the in-vehicle terminal includes:

an acquiring subunit 501 is configured to acquire multiple target images captured by multiple cameras at the same time.

In the embodiment of the invention, the image acquisition device may be a camera, and for convenience of description, the camera refers to the image acquisition device hereinafter unless otherwise specified. The cameras comprise at least four cameras which are respectively arranged in the front, the back, the left and the right directions of the vehicle, and the view finding range of each camera at least comprises the ground below the camera. The cameras arranged in the four directions form a looking-around scheme of the cameras, so that a local map constructed by utilizing a target image acquired by single acquisition contains more features, and the matching success rate of the local map and the global map is improved. In addition, certain redundancy exists among data collected by each camera in the looking-around scheme, so that collected data of other cameras can be used as a supplement under the condition that a certain path of cameras fail, and the influence of partial camera failure on the construction of a local map and positioning of a vehicle-mounted terminal can be reduced.

The stitching subunit 502 is configured to stitch the multiple target images acquired by the acquiring subunit 501 to obtain a top stitching image.

In the embodiment of the present invention, if the camera used for capturing the target image is the above-mentioned camera, before the splicing subunit 502 splices the multiple target images, the target image needs to be subjected to anti-distortion processing, that is, the target image captured by the fisheye camera is projected onto the ground plane according to a certain mapping rule, and then the image obtained by the projection is spliced.

The sensing subunit 503 is configured to identify semantic features of the image in the top view stitching graph obtained by stitching by the stitching subunit 502, so as to obtain a top view sensing graph.

In the embodiment of the invention, the image semantic features can be the semantic features which have special meanings and are helpful for vehicle positioning after experience screening. For example, the image semantic features may be lane lines, parking garage bit lines, library sites, zebra lines, lane arrows, and the like, which are not limited in the embodiments of the present invention.

In addition, the perception subunit 503 may identify image semantic features from the top view mosaic by an image recognition algorithm such as deep learning or image segmentation. Preferably, image semantic features can be identified using neural network models suitable for deep learning: inputting the overlook mosaic image obtained by stitching the vehicle-mounted terminal into the trained neural network model, and identifying the image semantic features in the overlook mosaic image based on the identification result of the neural network model. Compared with the traditional image segmentation technology, the image semantic features are extracted from the top view mosaic by a deep learning method, so that the recognition accuracy of the image semantic features can be improved.

A positioning subunit 504, configured to position the top view obtained by the sensing subunit 503, so as to determine a key frame.

In the embodiment of the invention, when the map is built, the system sets a global coordinate system and tracks the position of the vehicle based on the overlook mosaic, and the obtained coordinate corresponding to the overlook at each moment is used for building the map. The system can judge whether the overlook sensing image at a certain moment is a key frame according to the observed condition and the spatial relationship, and the formula according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

As an alternative embodiment, when the distance and angle of the current frame and the nearest key frame differ by a threshold value, the frame is treated as a key frame if the observation is sufficient.

In addition, as an optional implementation manner, in the above step 104, besides positioning based on the top view mosaic, the positioning aspect may be fused with a look-around VO (so-called look-around VO is an original image of four cameras before obtaining the mosaic, and based on Visual input made by the original images of several cameras), look-around VIO (Visual Inertial Odometry), look-around VIO, and wheel speed, etc. to improve one accuracy of local positioning.

A construction subunit 505 is configured to generate map points according to the key frames of the positioning subunit 404, thereby forming a local map.

In the embodiment of the invention, when a frame is determined to be a key frame, the system determines whether each identified pixel is constructed. If not, the system generates a new map point for it to occupy the corresponding position in the map. This determination is made each time a map point is generated from a new key frame, avoiding the generation of duplicate map points in the same place. Each map point will determine from the observation of successive frames whether the category is correct and whether the map point is mature. For example, assume that a map point is determined to be a carport line category in a keyframe, and its corresponding map point is also established. The map points calculate whether other frames nearby are consistent with the observation of the corresponding positions, and the definition of the consistency covers two kinds of consistency, namely whether the categories are consistent; and secondly, whether the relative position of the map point corresponds to the observation of other frames or not within a certain range. If both items are satisfied, the map point will be determined to be mature and added to the map.

Example IV

Referring to fig. 6, fig. 6 is a schematic structural diagram of a vehicle-mounted terminal according to an embodiment of the present invention. As shown in fig. 6, the in-vehicle terminal further includes:

a detection subunit 506, configured to perform loop detection in the local mapping process;

in the embodiment of the present invention, loop detection may be: the path that the carrier travels when it comes to the same location twice will be a loop, so called loop detection. When the same position is detected by coming twice, the positioning accumulated error can be quantized, and the influence of the accumulated error is eliminated to a certain extent by calculating global optimization.

And the optimizing subunit 507 is configured to perform global optimization after the loop detection is successful.

Example five

Referring to fig. 7, fig. 7 is a schematic structural diagram of another vehicle-mounted terminal according to an embodiment of the present invention. As shown in fig. 7, the in-vehicle terminal may include:

at least one processor 701, such as a CPU, at least one network interface 704, a user interface 703, memory 705, at least one communication bus 702, and a display 706. Wherein the communication bus 702 is used to enable connected communications between these components. The user interface 703 may include a Display (Display), among other things, and the optional user interface 703 may also include a standard wired interface, a wireless interface. The network interface 704 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 705 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 705 may also optionally be at least one storage device located remotely from the processor 701. As shown in fig. 7, the memory 705, which is a computer storage medium, stores executable program codes, and may include at least an operating system, a network communication module, a user interface module, and a local mapping module.

In the in-vehicle terminal shown in fig. 7, the network interface 704 is mainly used for connecting to a server, and performing data communication with the server; and processor 701 may be coupled to memory 705 and configured to invoke executable program code corresponding to the local mapping module stored in memory 705 to perform any of the top view splice-based mapping methods shown in fig. 1 or 4.

It should be noted that, the vehicle-mounted terminal shown in fig. 7 may further include components not shown, such as a power supply, an input key, a speaker, a bluetooth module, etc., which are not described in detail in this embodiment.

The embodiment of the invention discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute any one of the graph building methods based on a top view splice graph shown in fig. 1 or 4.

Embodiments of the present invention disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform any of the top view splice graph-based mapping methods shown in fig. 1 or 4.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments and that the acts and modules referred to are not necessarily required for the present invention.

In various embodiments of the present invention, it should be understood that the sequence numbers of the foregoing processes do not imply that the execution sequences of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation of the embodiments of the present invention.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the above-mentioned method of the various embodiments of the present invention.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by a program that instructs associated hardware, the program may be stored in a computer readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium that can be used for carrying or storing data that is readable by a computer.

The embodiment of the invention discloses a graph construction method based on a top view splice graph and a vehicle-mounted terminal, and specific examples are applied to the description of the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention. Meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method for constructing a top view mosaic, the method comprising the steps of:

101. acquiring a plurality of target images shot by a plurality of image acquisition devices at the same moment;

102. splicing the target images to obtain a top-down spliced image: the network training mainly adopts cross entropy to measure the difference between the predicted value and the actual value of the network, and the cross entropy formula is as follows:

wherein y is a marking value of the image element, namely whether one pixel of the image is a semantic element or a non-semantic element, wherein 1 is used for representing the semantic element, and 0 is used for representing the non-semantic element; n is the total number of pixels of the image, x is the input, a is the output of the neuron a=σ (z), z=Σ _j w _j x _j +b; after the training of the network model is completed, predicting each pixel of the input image by the network, outputting an attribute value of 0 or 1 corresponding to each pixel, wherein a connected block of the image element marked as 1 is a meaningful semantic image structure, so that the semantic segmentation of the image is realized;

103. identifying image semantic features in the overlook mosaic to obtain a overlook perception map; the image semantic features are any one of lane lines, parking garage bit lines, garage sites, zebra crossings and lane arrows;

104. positioning based on the top view perception map, thereby determining a key frame: whether the overlook sensing image at a certain moment is a key frame or not is judged according to the observed condition and the spatial relationship, and the formula according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

in the above formula, p _k Represents the position of the center of the vehicle at time k, p _i Indicating the position of the vehicle center at time i; in a dimension-reduced 2D map, the definition of a location isθ _k Represents the heading angle, theta, of the vehicle at the moment k _i Representing the heading angle of the vehicle at the moment i, wherein lambda is the weight of a parameter value control position part and a heading angle part, and delta is a set threshold value;

105. and generating map points according to the key frames so as to form a local map.

2. The method of mapping based on top view stitching according to claim 1, further comprising the steps of:

106. loop detection is carried out in the process of local image construction;

107. and performing global optimization after the loop detection is successful.

3. The method of mapping based on top view stitching according to any one of claims 1-2, wherein step 103 is specifically: inputting the obtained overlook mosaic image into a neural network model, and identifying image semantic features in the overlook mosaic image based on the neural network model so as to obtain a overlook perception image.

4. The method of claim 1, wherein in step 104, one or more of look-around VO, look-around VIO, and wheel speed are located and fused.

5. The method according to claim 1, wherein in step 104, the current frame is regarded as a key frame when the distance and angle between the current frame and the nearest key frame are different to a threshold value.

6. A method for constructing a top view mosaic, the method comprising the steps of:

wherein y is a marking value of the image element, namely whether one pixel of the image is a semantic element or a non-semantic element, wherein 1 is used for representing the semantic element, and 0 is used for representing the non-semantic element; n is the total number of pixels of the image, x is the input, a is the output of the neuron a=σ (z), z= Σ _j w _j x _j +b; after the training of the network model is completed, predicting each pixel of the input image by the network, outputting an attribute value of 0 or 1 corresponding to each pixel, wherein a connected block of the image element marked as 1 is a meaningful semantic image structure, so that the semantic segmentation of the image is realized;

104. positioning based on the top view perception map, thereby determining a key frame;

105. map points are generated according to the key frames, so that a local map is formed;

the method further comprises the step of judging whether the overlook perceived graph at a certain moment is the key frame according to the observed condition and the spatial relationship, wherein a formula for judging according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

7. A vehicle-mounted terminal, characterized by comprising:

the splicing subunit is used for splicing the plurality of target images acquired by the acquisition subunit to obtain a overlook splicing image: the network training mainly adopts cross entropy to measure the difference between the predicted value and the actual value of the network, and the cross entropy formula is as follows:

wherein y is a marking value of the image element, namely whether one pixel of the image is a semantic element or a non-semantic element, wherein 1 is used for representing the semantic element, and 0 is used for representing the non-semantic element; n is the total number of pixels of the image, x is the input, a is the output of the neuron a=σ (z), z= Σ _j w _j x _j +b; after the training of the network model is completed, the network predicts each pixel of the input image, outputs the attribute value of 0 or 1 corresponding to each pixel, and the connected blocks of the image elements marked as 1 are meaningful semantic image structures, thus realizing the semantics of the imageDividing;

the sensing subunit is used for identifying the semantic features of the images in the overlook spliced image obtained by splicing by the splicing subunit so as to obtain the overlook sensing image; the image semantic features are any one of lane lines, parking garage bit lines, garage sites, zebra crossings and lane arrows;

the positioning subunit is used for positioning the top view sensing image obtained by the sensing subunit so as to determine a key frame: the system can judge whether the overlook sensing image at a certain moment is a key frame according to the observed condition and the spatial relationship, and the formula according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ

8. The vehicle-mounted terminal according to claim 7, wherein,

the detection subunit is used for carrying out loop detection in the process of local mapping;

9. The vehicle-mounted terminal according to any one of claims 7-8, wherein the sensing subunit is configured to input the obtained top view mosaic into a neural network model, and identify image semantic features in the top view mosaic based on the neural network model, so as to obtain the top view sensing map.

10. The vehicle-mounted terminal of claim 7, wherein the localization fuses one or more of ring VO, ring VIO, and wheel speed.

11. The vehicle terminal of claim 7, wherein the positioning subunit is further configured to treat the current frame as a key frame when the distance and angle between the current frame and the nearest key frame differ by a threshold.

12. A vehicle-mounted terminal, characterized by comprising:

the construction subunit is used for generating map points according to the key frames of the positioning subunit so as to form a local map;

the vehicle-mounted terminal further comprises a judging subunit, wherein the judging subunit further judges whether the overlook perception picture at a certain moment is the key frame according to the observed condition and the spatial relationship, and the formula for judging according to the spatial relationship is as follows:

||p _k -p _i ||+λ||θ _k -θ _i ||>δ