CN114693001A

CN114693001A - Parking space prediction method and device, electronic equipment and storage medium

Info

Publication number: CN114693001A
Application number: CN202210459935.0A
Authority: CN
Inventors: 赵起超; 王龙玉; 马然
Original assignee: China Automotive Innovation Corp
Current assignee: China Automotive Innovation Corp
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2022-07-01

Abstract

The invention discloses a parking space prediction method, a parking space prediction device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a bird's-eye view of a target parking range; the method comprises the steps that a bird's-eye view is encoded by an encoding network based on a parking space prediction model to obtain an encoding characteristic graph, and the encoding characteristic graph is decoded by a decoding network based on the parking space prediction model to obtain a decoding characteristic graph; inputting the coding feature map into a local feature branch network of a parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; inputting the decoding feature map into a global feature branch network of the parking space prediction model to predict the parking space, and obtaining a second prediction result; and determining the target position information of the parking space in the aerial view according to the fusion processing of the first prediction result and the second prediction result. Therefore, the accuracy of each parking space angular point can be improved through end-to-end training, and the problem that the prediction of the position information of the parking space is not accurate enough is at least improved.

Description

Parking space prediction method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of automatic driving, and in particular, to a parking space prediction method, apparatus, electronic device, and storage medium.

Background

In an automatic parking scene, a bird's eye view is generally used to predict a parking space. In the prior art, the position of the final parking space is generated through the prior geometric constraint of the parking space, and the position of the parking space can be approximately estimated, but the position is not accurate enough, namely the prediction accuracy of the parking space is poor.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a parking space prediction method and apparatus, an electronic device, and a storage medium. The technical scheme is as follows:

in one aspect, a parking space prediction method is provided, the method comprising:

acquiring a bird's-eye view of a target parking range;

the method comprises the steps that a bird's-eye view is encoded by an encoding network based on a parking space prediction model to obtain an encoding characteristic graph, and the encoding characteristic graph is decoded by a decoding network based on the parking space prediction model to obtain a decoding characteristic graph;

inputting the coding feature map into a local feature branch network of a parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; the first prediction result indicates position information of the parking space angular points in the aerial view;

inputting the decoding feature map into a global feature branch network of the parking space prediction model to predict the parking space, and obtaining a second prediction result; the second prediction result indicates candidate position information of the parking space in the aerial view;

and determining the target position information of the parking space in the aerial view according to the fusion processing of the first prediction result and the second prediction result.

In another aspect, a parking space prediction apparatus is provided, the apparatus including:

the aerial view acquisition module is used for acquiring an aerial view of the target parking range;

the characteristic map acquisition module is used for encoding the aerial view map based on the encoding network of the parking space prediction model to obtain an encoding characteristic map, and decoding the encoding characteristic map based on the decoding network of the parking space prediction model to obtain a decoding characteristic map;

the first prediction result acquisition module is used for inputting the coding feature map into a local feature branch network of the parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; the first prediction result indicates position information of the parking space angular points in the aerial view;

the second prediction result acquisition module is used for inputting the decoding characteristic graph into a global characteristic branch network of the parking space prediction model to perform parking space prediction to obtain a second prediction result; the second prediction result indicates candidate position information of the parking space in the aerial view;

and the parking space target position determining module is used for determining the target position information of the parking space in the aerial view according to the fusion processing of the first prediction result and the second prediction result.

In an exemplary embodiment, the first prediction result obtaining module includes:

the local characteristic diagram acquisition module is used for inputting the coding characteristic diagram into a local characteristic branch network of the parking space prediction model and obtaining a first local characteristic diagram and a second local characteristic diagram through convolution kernel processing; the first local feature map and the second local feature map have the same size and the same channel number and are divided into a plurality of pixel grids;

and the first prediction result output module is used for predicting the parking space angular point based on the first local characteristic diagram and the second local characteristic diagram to obtain a first prediction result.

In an exemplary embodiment, the first prediction result output module includes:

the first local prediction result acquisition module is used for obtaining a first local prediction result based on the first local feature map; the first local prediction result indicates a pixel grid where parking space angular points in the aerial view are located;

the second local prediction result acquisition module is used for acquiring a second local prediction result based on the second local feature map; the second local prediction result indicates the deviation amount of the parking space angular points in the aerial view relative to the horizontal and vertical coordinates of the central point of the pixel grid;

and the local prediction result fusion module is used for obtaining a first prediction result according to fusion processing of the first local prediction result and the second local prediction result.

In an exemplary embodiment, the second prediction result obtaining module includes:

the global feature map acquisition module is used for inputting the decoding feature map into a global feature branch network of the parking space prediction model and obtaining a first global feature map and a second global feature map through convolution kernel processing; the first global feature map and the second global feature map have the same size and different channel numbers and are divided into a plurality of pixel grids;

and the second prediction result output module is used for predicting the parking space based on the first global feature map and the second global feature map to obtain a second prediction result.

In an exemplary embodiment, the second prediction result output module includes:

the first global prediction result acquisition module is used for acquiring a first global prediction result based on the first global feature map; the first global prediction result indicates a pixel grid where a parking space center point is located in the aerial view; the center point of the parking space in the aerial view is the diagonal intersection point of the angular points of the parking space in the aerial view;

the second global prediction result acquisition module is used for acquiring a second global prediction result based on the second global feature map; the second global prediction result indicates the deviation amount of the horizontal and vertical coordinates of the parking space angle point in the aerial view relative to the parking space center point in the aerial view;

and the global prediction result fusion module is used for obtaining a second prediction result according to the fusion processing of the first global prediction result and the second global prediction result.

In an exemplary embodiment, the candidate position information of the parking space in the bird's eye view includes position information of a preset number of candidate angular points, and the parking space target position determining module includes:

and the first operation module is used for traversing the parking space in the second prediction result and executing a first operation on the traversed current parking space. Wherein, first operation module includes:

the distance calculation module is used for calculating the distance between the candidate angular point and each parking space angular point in the first prediction result according to the position information of the candidate angular point and the position information of each parking space angular point in the first prediction result aiming at each candidate angular point corresponding to the current parking space;

the replacement module is used for replacing the candidate angular points with parking space angular points in a first prediction result corresponding to the target distance when the target distance smaller than the preset distance exists;

and the parking space target position output module is used for taking the position information of each parking space in the second prediction result as the target position information of the corresponding parking space when the traversal is finished.

In one exemplary embodiment, the replacement module includes:

the target distance determining module is used for determining the minimum value of the distances between the candidate corner point and each parking space corner point in the first prediction result as a target distance;

and the candidate angular point replacing module is used for replacing the candidate angular point with a parking space angular point in the first prediction result corresponding to the target distance when the target distance is smaller than the preset distance.

In another aspect, an electronic device is provided, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the parking space prediction method.

In another aspect, a computer-readable storage medium is provided, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the parking space prediction method as described above.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the electronic device executes the parking space prediction method provided in the above aspects.

According to the embodiment of the invention, the aerial view of the target parking range is obtained; the method comprises the steps that a bird's-eye view is encoded by an encoding network based on a parking space prediction model to obtain an encoding characteristic graph, and the encoding characteristic graph is decoded by a decoding network based on the parking space prediction model to obtain a decoding characteristic graph; inputting the coding feature map into a local feature branch network of a parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; inputting the decoding feature map into a global feature branch network of the parking space prediction model to predict the parking space, and obtaining a second prediction result; according to the fusion processing of the first prediction result and the second prediction result, the target position information of the parking space in the aerial view is determined, the fusion of the prediction results output by the local characteristic branch network and the global characteristic branch network is realized, the precision of each parking space angular point is improved, and the problem that the position information of the parking space is not accurately predicted is at least solved. Meanwhile, the local characteristic branch network and the global characteristic branch network in the embodiment of the invention are all convolutional neural networks, and can be trained end to end, so that the position information of the angular point of the parking space can be given without additional geometric constraint.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a parking space prediction method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a parking space prediction model according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a working principle of a parking space prediction model according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a principle of predicting angular point position information of a parking space according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating an operation principle of another parking space prediction model according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating another principle of predicting angular point position information of a parking space according to an embodiment of the present invention;

fig. 7 is a schematic diagram illustrating another principle of predicting angular point position information of a parking space according to an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating a principle of predicting parking space position information according to an embodiment of the present invention;

fig. 9 is a block diagram of a parking space prediction apparatus according to an embodiment of the present invention;

fig. 10 is a block diagram of a hardware structure of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Please refer to fig. 1, which is a flowchart illustrating a parking space prediction method according to an embodiment of the present invention. It is noted that the present specification provides the method steps as described in the examples or flowcharts, but that more or fewer steps may be included according to routine or non-inventive practice. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In actual system or product execution, sequential execution or parallel execution (e.g., parallel processor or multi-threaded environment) may be used according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 1, the method may include:

s101: and acquiring a bird's-eye view of the target parking range. The target parking range may be a part of the target parking lot, and therefore the bird's eye view includes a plurality of parking spaces.

S103: the method comprises the steps that a bird's-eye view is encoded by an encoding network based on a parking space prediction model to obtain an encoding characteristic graph, and the encoding characteristic graph is decoded by a decoding network based on the parking space prediction model to obtain a decoding characteristic graph.

As an alternative embodiment, the bird's eye view is preprocessed and the size of the bird's eye view is adjusted before the encoding process. Since the picture has not been subjected to any formal processing at this time, "the bird's-eye view after the preprocessing" may also be referred to as "original image". In a point to be specifically described, regarding the picture size, the present specification is presented in the form of "the middle bracket includes 4 parameters", and the 4 parameters sequentially represent the number of pictures, the height of the pictures, the width of the pictures, and the number of channels from left to right.

Specifically, the bird's eye view size after preprocessing is [ N, H, W,3], and this size may also be referred to as an original size. Wherein, N is the number of pictures, H is the height of the pictures, W is the width of the pictures, and 3 is the number of channels, where the number of channels is set to 3 because one color image has three channels of RGB. The size of the coding feature map is [ N, H/32, W/32,512], namely, the height and the width of the coding feature map are reduced by 32 times compared with the size of the original map; as for the number of channels, 512 is set according to an empirical value. The size of the decoding characteristic graph is [ N, H/4, W/4,64], namely, compared with the size of the original graph, the height and the width of the decoding characteristic graph are reduced by 4 times; the number of channels 64 is also an empirical value.

As shown in fig. 2, on one hand, from the interior of the parking space prediction model, the decoding feature map needs to be generated based on the encoding feature map, and there is a precedence relationship between the encoding feature map and the decoding feature map. On the other hand, the encoding characteristic map and the decoding characteristic map are obtained by once calculation of the parking space prediction model, and the relationship between the encoding characteristic map and the decoding characteristic map can also be understood as parallel output.

S105: inputting the coding feature map into a local feature branch network of a parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; the first prediction result indicates position information of the parking space corner point in the aerial view.

S107: inputting the decoding feature map into a global feature branch network of the parking space prediction model to predict the parking space, and obtaining a second prediction result; the second prediction result indicates candidate position information of the parking space in the aerial view.

S109: and determining the target position information of the parking space in the aerial view according to the fusion processing of the first prediction result and the second prediction result.

Compared with a method for outputting the target position information of the parking space only by using the prediction result of the global characteristic branch network, the detection accuracy of the target position information of the parking space is obviously improved.

In an exemplary embodiment, the step S105 of inputting the encoding feature map into a local feature branch network of the parking space prediction model to perform parking space corner point prediction, so as to obtain a first prediction result, may include the following steps:

inputting the coding feature map into a local feature branch network of the parking space prediction model, and obtaining a first local feature map and a second local feature map through convolution kernel processing; the first local feature map and the second local feature map have the same size and the same channel number and are divided into a plurality of pixel grids;

and carrying out parking space angular point prediction based on the first local characteristic diagram and the second local characteristic diagram to obtain a first prediction result.

Specifically, as shown in fig. 3, the local feature branch network includes two parts, one part processes the encoding feature map [ N, H/32, W/32,512] through 2 convolution kernels of 3 × 3 × 512 and a double bending Function (Sigmoid Function), and obtains a first local feature map [ N, H/32, W/32,2 ]; and the other part processes the coding feature map [ N, H/32, W/32,512] through another 2 convolution kernels of 3 multiplied by 512 and a double bending function to obtain a second local feature map [ N, H/32, W/32,2 ].

The local feature branch network of the embodiment of the invention comprises two parts, wherein the two parts are respectively used for carrying out feature extraction on the coding feature graph, and then the extracted features are fused for output, so that the accuracy of the result output is improved.

In an exemplary embodiment, the performing parking space corner point prediction based on the first local feature map and the second local feature map to obtain the first prediction result may include the following steps:

obtaining a first local prediction result based on the first local feature map; the first local prediction result indicates a pixel grid where the parking space angular point in the aerial view is located;

obtaining a second local prediction result based on the second local feature map; the second local prediction result indicates the deviation amount of the parking space angular points in the aerial view relative to the horizontal and vertical coordinates of the central point of the pixel grid;

and obtaining a first prediction result according to the fusion processing of the first local prediction result and the second local prediction result.

Specifically, the core of the local characteristic branch network is to predict the position information of the parking space angular points based on the central point of the grid where the real parking space angular points are manually marked in the training stage. The grids are well divided, but specific grids contain real parking space angular points which are manually marked in the training stage, and the grids need to be learned by a parking space prediction model, which is the function of the first local characteristic diagram. With continued reference to fig. 3, the first local feature map is used to learn a grid including a real parking space corner through binary classification (binary classification), and the learned grid including the real parking space corner (highlighted in the figure) belongs to the first local prediction result. After the truly needed grids are found, the local feature branch networks can immediately acquire the central points of the grids, so that the central point of the grid where the real parking space corner point is located is also part of the first local prediction result.

In addition, the second classification also explains the meaning of the number of channels 2 in the first local feature map size [ N, H/32, W/32,2 ]. For the grid of the first local feature map, only the real parking space corner points are included/excluded, which are two cases. Thus, 2

channels

0 and 1, 1 indicate that the grid contains a true parking location corner point, and 0 indicates that the grid does not contain a true parking location corner point.

It is not enough to learn which grids contain the real parking space angular point, and it is also necessary to learn the position relationship (including the deviation) between the real parking space angular point and the center point of the grid where it is located, which is the function of the second local feature map. As shown in fig. 3, the second local feature map is used to learn the horizontal and vertical coordinate deviation of the real parking space corner point relative to the center point of the grid where the real parking space corner point is located, and the learned horizontal and vertical coordinate deviation (highlighted in the map) is the second local prediction result. This also explains the meaning of the number of channels 2 in the second local feature size [ N, H/32, W/32,2 ]: the deviation amount is 2, and the abscissa deviation amount and the ordinate deviation amount are included.

And after the first local prediction result and the second local prediction result are obtained, the first local prediction result and the second local prediction result are fused, and the first prediction result, namely the predicted position information of the parking space angular points in the aerial view, is output. The first prediction result, the first local prediction result and the second local prediction result are presented in the same form and are presented in the picture.

Referring to fig. 4, the process of learning the position information of the parking space angular point by the local feature branch network is understood from the angle of the single parking space angular point. As shown in fig. 4, a large rectangular frame represents one parking space in the bird's eye view, and a small square frame represents a unit pixel grid including a real parking space corner point. Since the bird's eye view size is [ N, H, W,3], the first local feature size is [ N, H/32, W/32,2], that is, the length and width of the first local feature is 1/32 of the original drawing, the cell grid of size 1x1 in the first local feature drawing is 32x32 in the original drawing, and therefore the small square in the schematic drawing is intentionally not so small.

At a certain real parking space angular point P₀(x, y) for example, one parking space angular point P in the first prediction result₀The output process of' is (P)₀' not identified in fig. 4): local feature branch network learning P through convolution kernel and dual-bending function₀(x, y) and performing visualization processing on the grid where the (x, y) is located; learn to P₀Immediately after the grid (x, y) is located, the center point P of the grid is learned₁(x₁,y₁) (ii) a Meanwhile, the local feature branch network learns P through an additional convolution kernel and a dual-bending function₀Relative to P₁The amount of deviation P of the abscissa and the ordinate₁P₀(△x＝x₁-x；△y＝y₁-y); then, based on the mesh center point P₁And the amount of deviation P of the horizontal and vertical coordinates₁P₀Output P₀' position information of the mobile terminal. The relationship between the above points can be expressed by the following equation:

P₀’＝P₀+P₁P₀ (1)

in a neural network, the closer the two points are, the smaller the learning error becomes. In the embodiment of the invention, the real parking space angular point and the grid central point belong to the same pixel grid, obviously, the two points are closer, and the error of the horizontal and vertical coordinate deviation amount learned by the local characteristic branch network is smaller. Therefore, the first prediction result output by the local characteristic branch network is more accurate. Meanwhile, the local characteristic branch network of the embodiment of the invention is a full convolution neural network, and can be trained end to end, so that the position information of four parking space angular points can be given without additional geometric constraint.

In an exemplary embodiment, the step S107 of inputting the decoding feature map into the global feature branch network of the parking space prediction model to perform parking space prediction, so as to obtain a second prediction result, may include the following steps:

inputting the decoding feature map into a global feature branch network of the parking space prediction model, and obtaining a first global feature map and a second global feature map through convolution kernel processing; the first global feature map and the second global feature map have the same size and different channel numbers and are divided into a plurality of pixel grids;

and predicting the parking space based on the first global feature map and the second global feature map to obtain a second prediction result.

Specifically, as shown in fig. 5, the global feature branch network includes two parts, one part processes the decoded feature map [ N, H/4, W/4,64] through 2 convolution kernels of 3 × 3 × 64 and a normalized exponential Function (Softmax Function) to obtain a first global feature map [ N, H/4, W/4,2 ]; and the other part processes the decoding characteristic graph [ N, H/4, W/4,64] through 8 convolution kernels of 3 multiplied by 64 and a double bending function to obtain a second global characteristic graph [ N, H/4, W/4,8 ].

The global feature branch network comprises two parts, wherein the two parts are used for respectively extracting the features of the decoding feature graph and then fusing the extracted features for output, so that the accuracy of the result output is improved.

In an exemplary embodiment, the predicting the parking space based on the first global feature map and the second global feature map to obtain the second prediction result may include the following steps:

obtaining a first global prediction result based on the first global feature map; the first global prediction result indicates a pixel grid where a parking space center point in the aerial view is located; the center point of the parking space in the aerial view is the diagonal intersection point of the corner points of the parking space in the aerial view;

obtaining a second global prediction result based on the second global feature map; the second global prediction result indicates the deviation amount of the horizontal and vertical coordinates of the parking space angle point in the aerial view relative to the parking space center point in the aerial view;

and obtaining a second prediction result according to the fusion processing of the first global prediction result and the second global prediction result.

Specifically, the core of the global feature branch network is to predict candidate position information of parking spaces based on a real parking space center point. The real parking space center point is generated based on a real parking space angular point manually marked in a training stage, namely an intersection point of two diagonal lines formed by four real parking space angular points; and the candidate position information of the parking space obviously comprises the position information of the candidate angular point and the position information of the candidate parking space frame. Similar to the local characteristic branch network, the grids are well divided, but specific grids contain the real parking space central point and need to be learned by a parking space prediction model, which is the function of the first global characteristic diagram. With continued reference to fig. 5, through the second classification, the first global feature map is used to learn the grid containing the real parking spot center point, and both the learned grid containing the real parking spot center point and the parking spot center point itself (highlighted in the figure) belong to the first global prediction result.

Similarly, binary explains the meaning of channel number 2 in the first global feature map size [ N, H/4, W/4,2 ]. For the grid of the first global feature map, only the real parking space center point is included/excluded, which are two cases. Therefore, 2

channels

0 and 1, 1 means that the grid contains the real parking space center point, and 0 means that the grid does not contain the real parking space center point.

It is not enough to learn which grids contain the real parking space center point, and it is also necessary to learn the position relationship (including the deviation) between the real parking space angle point and the real parking space center point, which is the function of the second global feature map. As shown in fig. 5, the second global feature map is used to learn the horizontal and vertical coordinate deviations of the real parking space corner point relative to the center point of the parking space, and the learned horizontal and vertical coordinate deviations (the part highlighted in the map) are the second global prediction result. Wherein, each parking space angular point and parking space central point have 2 offsets-abscissa offset and ordinate offset, and a parking space has 4 angular points, therefore a total of 8 offsets. This is the meaning of the number of channels 8 in the second global feature map size [ N, H/4, W/4,8 ].

And after the first global prediction result and the second global prediction result are obtained, the first global prediction result and the second global prediction result are fused, and the second prediction result, namely the predicted candidate position information of the parking space in the aerial view, is output. The second prediction result, the first global prediction result and the second global prediction result have the same presentation form and are presented in the picture.

Referring to fig. 6, the process of learning candidate position information of a parking space by the global feature branch network is understood from the perspective of a single candidate angular point of the parking space. True parking space angular point P assumed in the foregoing₀(x, y) for example, the parking space candidate angular point P in the second prediction result₀"is output as (P)₀"not identified in fig. 4): suppose P₀Generating a certain real parking space central point P together with the other three real parking space angular points₂(x₂,y₂) The global feature branch network learns P through convolution kernels and normalized exponential functions₂(x₂,y₂) The grid and P₂This point; meanwhile, the global feature branch network learns P through convolution kernels and dual-bending functions₀Relative to P₂Amount of deviation P of abscissa and ordinate₂P₀(△x＝x₂-x；△y＝y₂-y); then, based on the parking space center point P₂And the amount of deviation P of the horizontal and vertical coordinates₂P₀Output P₀"is detected. The relationship between the above points can be expressed by the following equation:

P₀”＝P₀+P₂P₀ (2)

the global characteristic branch network of the embodiment of the invention learns the positions of the four angular points based on the central point of the real parking space, and then the four angular points form the parking space frame, thereby obtaining the candidate position information of the parking space and laying a foundation for the target position information of the parking space. Meanwhile, the global feature branch network of the embodiment of the invention is a full convolution neural network, and can be trained end to end, so that the position information of the four parking space candidate angular points can be given without additional geometric constraint.

In an exemplary embodiment, the step S109 of determining the target position information of the parking space in the bird' S eye view according to the fusion process of the first prediction result and the second prediction result may include the following steps:

traversing the parking space in the second prediction result, and executing a first operation on the traversed current parking space, wherein the first operation comprises the following steps:

aiming at each candidate angular point corresponding to the current parking space, calculating the distance between the candidate angular point and each parking space angular point in the first prediction result according to the position information of the candidate angular point and the position information of each parking space angular point in the first prediction result; wherein, the calculation formula can be an Euclidean distance calculation formula;

when a target distance smaller than a preset distance exists, replacing the candidate angular point with a parking space angular point in a first prediction result corresponding to the target distance;

and taking the position information of each parking space in the second prediction result at the end of traversal as the target position information of the corresponding parking space.

As can be seen from the foregoing, the closer the distance between the two points is, the smaller the error of the deviation amount of the abscissa and ordinate learned by the parking space prediction model is. As shown in FIG. 7, P₀And P₁Is less than P₀And P₂So P predicted by the local feature branch network₀' more predicted than global feature branching network P₀"more accurate. The predetermined distance may be a side length of a unit pixel grid in the first partial feature map, when P₀' and P₀"when the distance between them is smaller than the predetermined distance, it indicates that P₀' and P₀"in a grid. When one grid has the parking space angular points predicted by the local characteristic branch network and the parking space angular points predicted by the global characteristic branch network, the P predicted by the global characteristic branch network is used₀"replacement by P predicted by local feature branching network₀The accuracy of angular point position information is undoubtedly improved, and therefore the accuracy of the whole parking space position information is improved. Of course, there is also P₀' and P₀"the distance between the two is not less than the predetermined distance", which indicates that P is the same as the predetermined distance₀' and P₀"not in the same grid, not replaced, only P is reserved₀"is used.

Since there may be a plurality of position information of the parking space angular points predicted by the local feature branch network, in an exemplary embodiment, when there is a target distance smaller than the preset distance, replacing the candidate angular points with the parking space angular points in the first prediction result corresponding to the target distance may include the following steps:

determining the minimum value of the distances between the candidate corner and each parking space corner in the first prediction result as a target distance;

and when the target distance is smaller than the preset distance, replacing the candidate angular point with a parking space angular point in a first prediction result corresponding to the target distance.

Referring to fig. 8, it can be seen intuitively that, when the first prediction result and the second prediction result are fused, the accuracy of the parking space position information is significantly improved. For example, in parking space 3, in the candidate position information, one side 3a is not completely overlapped with the real parking space line, and after the first prediction result is fused (that is, the candidate angular points at the two ends of 3a are replaced by the parking space angular points in the first prediction result output by the local characteristic branch network), the side 3a displayed by the target position information of parking space 3 is already overlapped with the real parking space line when seen by naked eyes. Similarly, before the fusion, the candidate position information of the parking space 5 is displayed, and a side edge 5b has a certain distance from the real parking space line; after the fusion, the target position information of the parking space 5 is displayed, and the distance from the side edge 5b to the real vehicle line is obviously reduced. For the parking space 6, the improvement is more obvious, the candidate position information before fusion is displayed, and the distance between the top edge 6c and the real parking space line is far; after the corner point replacement, the top edge 6c and the real vehicle line have visually completely coincided.

According to the embodiment of the invention, the angular point replacement selection range is locked at the parking space angular point closest to the candidate angular point, so that the problem that the angular point position information predicted by the global characteristic branch network is not accurate enough is solved to a greater extent.

The embodiments of the parking space prediction method provided in the embodiments of the present invention are also applicable to the parking space prediction device provided in the embodiments, and the details are not described in this embodiment, because the parking space prediction device provided in the embodiments of the present invention is corresponding to the parking space prediction methods provided in the embodiments described above.

Please refer to fig. 9, which is a schematic structural diagram illustrating a parking space prediction apparatus 900 according to an embodiment of the present invention. The parking space prediction device 900 has the function of implementing the parking space prediction method in the above method embodiment, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. As shown in fig. 6, the parking space prediction apparatus 900 may include:

the aerial view acquisition module 910 is configured to acquire an aerial view of a target parking range;

the feature map obtaining module 920 is configured to perform encoding processing on the aerial view based on an encoding network of the parking space prediction model to obtain an encoding feature map, and perform decoding processing on the encoding feature map based on a decoding network of the parking space prediction model to obtain a decoding feature map;

a first prediction result obtaining module 930, configured to input the coding feature map to a local feature branch network of the parking space prediction model to perform parking space angular point prediction, so as to obtain a first prediction result; the first prediction result indicates position information of a parking space angular point in the aerial view;

a second prediction result obtaining module 940, configured to input the decoding feature map to the global feature branch network of the parking space prediction model to perform parking space prediction, so as to obtain a second prediction result; the second prediction result indicates candidate position information of the parking space in the aerial view;

and a parking space target position determining module 950, configured to determine target position information of a parking space in the bird's eye view according to fusion processing of the first prediction result and the second prediction result.

In an exemplary embodiment, the first prediction result output module includes:

the first local prediction result acquisition module is used for acquiring a first local prediction result based on the first local feature map; the first local prediction result indicates a pixel grid where the parking space angular point in the aerial view is located;

the first global prediction result acquisition module is used for acquiring a first global prediction result based on the first global feature map; the first global prediction result indicates a pixel grid where a parking space center point is located in the aerial view; the center point of the parking space in the aerial view is the diagonal intersection point of the corner points of the parking space in the aerial view;

the second global prediction result acquisition module is used for acquiring a second global prediction result based on the second global feature map; the second global prediction result indicates the horizontal and vertical coordinate deviation of the parking space angular point in the aerial view relative to the central point of the parking space in the aerial view;

In one exemplary embodiment, the replacement module includes:

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, which are not described herein again.

The parking space prediction device of the embodiment of the invention realizes the fusion of the prediction results output by the local characteristic branch network and the global characteristic branch network, and improves the prediction precision of the position information of each parking space angular point, thereby finally improving the prediction precision of the position information of the parking space.

An embodiment of the present invention provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement any one of the parking space prediction methods provided in the foregoing method embodiments.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and monitor power anomalies by running the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The method provided by the embodiment of the invention can be executed in a computer terminal, a server or a similar operation device. The embodiment of the invention provides a terminal, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the parking space prediction method provided by the embodiment of the method.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and parking space prediction by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The method provided by the embodiment of the invention can be executed in a computer terminal, a server or a similar operation device. Taking the operation on the terminal as an example, fig. 10 is a hardware structure block diagram of the terminal that operates a parking space prediction method provided by the embodiment of the present invention, specifically:

the terminal may include RF (Radio Frequency) circuitry 1010, memory 1020 including one or more computer-readable storage media, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, WiFi (wireless fidelity) module 1070, processor 1080 including one or more processing cores, and power source 1090. Those skilled in the art will appreciate that the terminal structure shown in fig. 10 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

RF circuit 1010 may be used for receiving and transmitting signals during a message transmission or communication process, and in particular, for receiving downlink information from a base station and then processing the received downlink information by one or more processors 1080; in addition, data relating to uplink is transmitted to the base station. In general, RF circuitry 1010 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, the RF circuit 1010 may also communicate with networks and other terminals through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (Short Messaging Service), and the like.

The memory 1020 may be used to store software programs and modules, and the processor 1080 executes various functional applications and data processing by operating the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 1020 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 1020 may also include a memory controller to provide access to memory 1020 by processor 1080 and input unit 1030.

The input unit 1030 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, input unit 1030 may include touch-sensitive surface 1031, as well as other input devices 1032. The touch-sensitive surface 1031, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (such as operations by a user on or near the touch-sensitive surface 1031 using any suitable object or attachment, such as a finger, a stylus, etc.) on or near the touch-sensitive surface 1031 and drive the corresponding connection device according to a preset program. Optionally, the touch sensitive surface 1031 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1080, and can receive and execute commands sent by the processor 1080. In addition, the touch-sensitive surface 1031 may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit 1030 may also include other input devices 1032 in addition to the touch-sensitive surface 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, or the like.

The display unit 1040 may be used to display information input by or provided to a user and various graphical user interfaces of the terminal, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 1040 may include a Display panel 1041, and optionally, the Display panel 1041 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 1031 may overlay the display panel 1041, and when a touch operation is detected on or near the touch-sensitive surface 1031, the touch operation is transmitted to the processor 1080 for determining the type of the touch event, and the processor 1080 then provides a corresponding visual output on the display panel 1041 according to the type of the touch event. Touch-sensitive surface 1031 and display panel 1041 may be implemented as two separate components for input and output functions, although in some embodiments touch-sensitive surface 1031 may be integrated with display panel 1041 for input and output functions.

The terminal may also include at least one sensor 1050, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 1041 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 1041 and/or a backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the terminal is stationary, and can be used for applications of recognizing terminal gestures (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the terminal, detailed description is omitted here.

Audio circuitry 1060, speaker 1061, microphone 1062 may provide an audio interface between a user and the terminal. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the electrical signal is converted into a sound signal by the speaker 1061 and output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1060 and converted into audio data, which is then processed by the audio data output processor 1080 and then transmitted to, for example, another terminal via the RF circuit 1010, or output to the memory 1020 for further processing. The audio circuit 1060 may also include an earbud jack to provide communication of peripheral headphones with the terminal.

WiFi belongs to a short-distance wireless transmission technology, and the terminal can help a user to send and receive e-mails, browse web pages, access streaming media, and the like through the WiFi module 1070, and provides wireless broadband internet access for the user. Although fig. 10 shows the WiFi module 1070, it is understood that it does not belong to the essential constitution of the terminal, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 1080 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 1020 and calling data stored in the memory 1020, thereby integrally monitoring the terminal. Optionally, processor 1080 may include one or more processing cores; preferably, the processor 1080 may integrate an application processor, which handles primarily the operating system, user interfaces, applications, etc., and a modem processor, which handles primarily the wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 1080.

The terminal also includes a power supply 1090 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 1080 via a power management system that may enable managing charging, discharging, and power consumption via the power management system. Power supply 1090 may also include any component including one or more DC or AC power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, and the like, which are not described herein again. In this embodiment, the terminal further includes a memory and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors. One or more of the programs include instructions for performing the parking space prediction provided by the method embodiments described above.

The embodiment of the present invention further provides a computer-readable storage medium, where the storage medium may be disposed in a terminal to store at least one instruction, at least one program, a code set, or an instruction set related to implementing a parking space prediction method, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the parking space prediction method provided in the foregoing method embodiment.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A parking space prediction method, characterized in that the method comprises:

acquiring a bird's-eye view of a target parking range;

coding the aerial view by using a coding network based on a parking space prediction model to obtain a coding characteristic graph, and decoding the coding characteristic graph by using a decoding network based on the parking space prediction model to obtain a decoding characteristic graph;

inputting the coding feature map into a local feature branch network of the parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; the first prediction result indicates position information of a parking space angular point in the aerial view;

inputting the decoding feature map into a global feature branch network of the parking space prediction model to perform parking space prediction to obtain a second prediction result; the second prediction result indicates candidate position information of the parking space in the aerial view;

2. The parking space prediction method according to claim 1, wherein said inputting said coded feature map into a local feature branch network of said parking space prediction model for parking space corner prediction to obtain a first prediction result comprises:

and carrying out parking space corner point prediction based on the first local characteristic diagram and the second local characteristic diagram to obtain the first prediction result.

3. A parking space prediction method according to claim 2, wherein said performing parking space angular point prediction based on said first local feature map and said second local feature map to obtain said first prediction result comprises:

obtaining a first local prediction result based on the first local feature map; the first local prediction result indicates a pixel grid where a parking space corner point in the aerial view is located;

obtaining a second local prediction result based on the second local feature map; the second local prediction result indicates the deviation amount of the horizontal and vertical coordinates of the parking space corner points in the aerial view relative to the central point of the pixel grid;

and obtaining the first prediction result according to the fusion processing of the first local prediction result and the second local prediction result.

4. A parking space prediction method according to claim 1, wherein the step of inputting the decoded feature map into a global feature branch network of the parking space prediction model to perform parking space prediction to obtain a second prediction result comprises:

5. A parking space prediction method according to claim 4, wherein said performing parking space prediction based on said first global feature map and said second global feature map to obtain said second prediction result comprises:

obtaining a first global prediction result based on the first global feature map; the first global prediction result indicates a pixel grid where a parking space center point is located in the aerial view; the center point of the parking space in the aerial view is a diagonal intersection point of the corner points of the parking space in the aerial view;

obtaining a second global prediction result based on the second global feature map; the second global prediction result indicates the deviation amount of the angular point of the parking space in the aerial view relative to the horizontal and vertical coordinates of the central point of the parking space in the aerial view;

and obtaining the second prediction result according to the fusion processing of the first global prediction result and the second global prediction result.

6. A parking space prediction method according to claim 1, characterized in that the candidate position information of the parking space in the bird's eye view includes position information of a preset number of candidate angular points; the determining the target position information of the parking space in the aerial view according to the fusion processing of the first prediction result and the second prediction result comprises the following steps:

traversing the parking space in the second prediction result, and executing a first operation on the traversed current parking space, wherein the first operation comprises:

aiming at each candidate angular point corresponding to the current parking space, calculating the distance between the candidate angular point and each parking space angular point in the first prediction result according to the position information of the candidate angular point and the position information of each parking space angular point in the first prediction result;

when a target distance smaller than a preset distance exists, replacing the candidate angular point with a parking space angular point in the first prediction result corresponding to the target distance;

and taking the position information of each parking space in the second prediction result as the target position information of the corresponding parking space when the traversal is finished.

7. A parking space prediction method according to claim 6, wherein when there is a target distance smaller than a preset distance, replacing the candidate angular point with a parking space angular point in the first prediction result corresponding to the target distance comprises:

determining the minimum value of the distances between the candidate angular points and the angular points of the parking spaces in the first prediction result as the target distance;

and when the target distance is smaller than the preset distance, replacing the candidate angular point with a parking space angular point in the first prediction result corresponding to the target distance.

8. A parking space prediction apparatus, characterized in that the apparatus comprises:

the characteristic diagram acquisition module is used for coding the aerial view based on a coding network of a parking space prediction model to obtain a coding characteristic diagram and decoding the coding characteristic diagram based on a decoding network of the parking space prediction model to obtain a decoding characteristic diagram;

the first prediction result acquisition module is used for inputting the coding feature map into a local feature branch network of the parking space prediction model to perform parking space angular point prediction to obtain a first prediction result; the first prediction result indicates position information of a parking space angular point in the aerial view;

9. An electronic device comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the parking space prediction method according to any one of claims 1-7.

10. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the parking spot prediction method according to any one of claims 1-7.