CN112380305B

CN112380305B - High-precision map data crowdsourcing method and device based on space-time similarity

Info

Publication number: CN112380305B
Application number: CN202011227747.2A
Authority: CN
Inventors: 黄睿; 唐洁
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2023-01-17
Anticipated expiration: 2040-11-06
Also published as: CN112380305A

Abstract

A high-precision map data crowdsourcing method and device based on space-time similarity are disclosed, wherein the method comprises the following steps: crowdsourcing collected video data and extracting video metadata; calculating the time similarity among the metadata; calculating the spatial similarity among the metadata; calculating the space-time similarity between the video metadata according to the time and space similarity; and uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points. The method and the device reduce the redundancy of data in the high-precision map so as to reduce the delay of transmission and complete map generation.

Description

High-precision map data crowdsourcing method and device based on space-time similarity

Technical Field

The invention relates to the technical field of data processing, in particular to a high-precision map data crowdsourcing method and device based on space-time similarity.

Background

Along with the gradual increase of population in recent years, science and technology is developed gradually, and along with the continuous development of urban traffic, people slowly feel that ordinary car can no longer satisfy present life needs now, consequently, people look the eye that develops towards unmanned driving direction, compare and drive in traditional car, unmanned car can bring huge profit for the car producer, and the human cost is showing to be reduced, and unmanned car can strengthen the security of vehicle road greatly moreover.

Compared with the traditional navigation map, the high-precision map serving automatic driving has higher requirements in all aspects, and can provide support for a decision layer by matching with a sensor and an algorithm, so that various requirements of unmanned driving are better met. The high-precision map is a 2D grid with high precision and fine definition, the main data is point cloud data generated by a laser radar, and the point cloud is derived from conversion of data of types such as video images and the like acquired by a vehicle through data acquisition. The high-precision map is one of key capabilities of realizing automatic driving, and the high-precision map can be an effective supplement to an automatic driving existing sensor and provides more reliable sensing capability for a vehicle. The high-precision map service system is important for unmanned driving, because the high-precision map service is a machine, the map has to reach certain accuracy to ensure that the safety of vehicles and people can be ensured under the unsupervised driving, and the vehicles can be ensured to avoid various obstacles and select the most suitable road according to the actual situation, which is completely impossible by the traditional electronic map, and the unmanned driving can be enabled only through a large amount of driving auxiliary information contained in the high-precision map.

Although the high-precision map is popular in research at present and various automobile manufacturers have unsophisticated achievements, in the research process, many unsolved problems still exist to restrict the research progress, and due to the nature of the high-precision map, the integral map data volume is huge, and the requirement of instantaneity required by the high-precision map cannot be met only by simply carrying out one-time transmission under the existing network conditions. For unmanned driving, the information provided by the map must be accurate and time-efficient, at this time, if the source of the picture or video is single, the transmission may cause delay due to bandwidth limitation, and the accuracy may not meet the requirement of unmanned driving due to the limitation of the angle and shooting location, so the high-accuracy map is updated in a crowd-sourced manner in the research, and the accuracy and the time-efficient performance of the map are guaranteed.

Disclosure of Invention

The application provides a high-precision map data crowdsourcing method and device based on space-time similarity, and aims to solve the problems that redundancy of data in a high-precision map is reduced, and transmission and delay of generation of a complete map are reduced.

According to a first aspect, an embodiment provides a high-precision map data crowdsourcing method based on spatiotemporal similarity, which includes:

crowdsourcing collected video data and extracting video metadata;

calculating the time similarity among the metadata;

calculating the spatial similarity among the metadata;

calculating the space-time similarity among the video metadata according to the time and space similarity;

and uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points.

In some embodiments, the step of calculating spatial similarity between metadata comprises:

simplifying the representation of two different video frame metadata;

decomposing the rigid motion of the two simplified video metadata into translation and rotation and respectively calculating the similarity of the two motions;

calculating a rotated coverage intersection and a translated coverage intersection;

and obtaining the similarity between the two metadata according to the rotated coverage intersection and the translated coverage intersection.

In some embodiments, the computing the coverage intersection of translations is computing a similarity of translations, including: calculating from the vertical direction and the horizontal direction to respectively obtain the similarity in the vertical direction and the similarity in the horizontal direction; and performing weighted calculation on the similarity in the vertical direction and the similarity in the horizontal direction to obtain the translation similarity.

In some embodiments, two different video frame metadata f ₁ 、f ₂ Simplified representation f ₁ ＝(l ₁ ,θ ₁ ) And f ₂ ＝(l ₂ ,θ ₂ )，

Wherein, theta ₁ And theta ₂ Representing the angular extent of two different video frame metadata, l ₁ And l ₂ Representing the shooting locations of two different video frame metadata;

the rotated coverage intersection is represented as:

wherein,

indicating the size of the field of view of the camera lens,

a view difference representing metadata of two different video frames;

the coverage intersection of the translations is represented as:

wherein Ssec is the similarity in the vertical direction,

r denotes the effective range of the camera, δ _l Is represented by ₁ And l ₂ The distance between the two or more of the two or more,

δ _θ ＝min(|θ ₂ -θ ₁ |,2π-|θ ₂ -θ ₁ |)，δ _θ representing the difference in orientation of two different video frame metadata,

the similarity in the horizontal direction can be calculated by the following equation:

two different video frame metadata f ₁ And f ₂ The spatial similarity between them is expressed as:

Ssem _(f 1,f2)＝Ssem _R ×Ssem _T 。

in some embodiments, the calculating the temporal similarity between the metadata comprises:

calculating the time measurement relation correlation degree:

calculating the time similarity:

the method of claim 1, wherein the calculating the spatiotemporal similarity between the video metadata comprises: calculating the measurement relation of the space-time similarity and calculating the topological relation of the space-time similarity,

the metric relation for calculating the space-time similarity is to calculate the integral of the area S along the lj change curve:

the topological relation for calculating the space-time similarity is to obtain the space similarity Ssem between two different video frame metadata by decomposing the operation of the topological relation and the orientation relation on each frame _(f1,f2) And then, associating with the time similarity to obtain a topological relation of the space-time similarity:

Stsem _TA(f1,f2) ＝Tsem×Ssem _(f1,f2) ；

the spatio-temporal similarity is expressed as:

in the formula: stsem is the total spatiotemporal similarity; w _ts ，W _tta Are respectively space and time associated weight values and satisfy W _ts +W _tta =1, large similarity of two video metadataA small range is between 0 and 1.

According to a third aspect, an embodiment provides a high-precision map data crowdsourcing device based on space-time similarity, comprising:

an acquisition module: the method comprises the steps of collecting video data through crowdsourcing, and extracting video metadata;

a time module: for calculating the time similarity between metadata;

a space module: the spatial similarity between metadata is calculated;

a calculation module: the video processing device is used for calculating the space-time similarity among the video metadata according to the time and space similarities;

an uploading module: the method is used for uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points.

According to a third aspect, an embodiment provides a high-precision map data crowdsourcing device based on space-time similarity, including:

a memory for storing a program;

a processor for implementing the method as described in the first aspect by executing the program stored by the memory.

According to a fourth aspect, there is provided in an embodiment a computer readable storage medium comprising a program executable by a processor to implement the method of the first aspect.

According to the embodiment, the high-precision map data crowdsourcing method and system based on the space-time similarity are provided, crowdsourcing data acquisition and edge calculation are combined, and meanwhile the operating pressure of a cloud system is remarkably reduced.

Drawings

FIG. 1 is a flow chart of a high-precision map data crowdsourcing method based on spatiotemporal similarity provided by the application;

FIG. 2 is a system operational scenario diagram of an embodiment;

FIG. 3 is a flow chart of a spatio-temporal similarity computation method according to an embodiment.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments have been given like element numbers associated therewith. In the following description, numerous specific details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in this specification in order not to obscure the core of the present application with unnecessary detail, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The application provides a high-precision map data crowdsourcing method based on space-time similarity, which comprises the following steps of:

s100: the vehicle collects video data and extracts information to form six-element group for each video

S200: calculating spatial similarity Ssec and temporal similarity Tsec, and executing steps S310-S380 when calculating the spatial similarity Ssec; when the temporal similarity Tsem is calculated, steps S410-S420 are performed.

The method for calculating the spatial similarity Ssem comprises the following steps:

s310: metadata f for two different video frames ₁ And f ₂ Simplified representation f ₁ ＝(l ₁ ,θ ₁ ) And f ₂ ＝(l ₂ ,θ ₂ ) The effective range of the camera and the visual field size of the camera lens are fixed as r and

. Each having an angular range theta ₁ And theta ₂ 。δ _l Is represented by ₁ And l ₂ Distance between, δ _θ Indicates a difference in direction of

S320: decomposing rigid motion of two video metadata into translation and rotation and respectively calculating the similarity of the two motions;

s330: a rotated coverage intersection is computed, defining the coverage intersection as the similarity between two metadata.

S340; calculating the translational coverage intersection, and calculating from the vertical direction and the horizontal direction;

s350: similarity Ssem in vertical direction _⊥ Can be obtained by the following formula

S360: the similarity in the horizontal direction can be calculated by the following equation:

wherein

S370: the translational similarity is obtained by a weighting equation:

S380：f ₁ and f ₂ The similarity between them can be obtained by the equation: ssem _(f1,f2) ＝Ssem _R ×Ssem _T ；

The calculation of the temporal similarity Tsem comprises the following steps:

s410: the time metric relationship correlation is calculated as follows:

s420: calculating from functional expressions

Calculating the spatio-temporal similarity according to the temporal similarity Tsem and the spatial similarity Ssem comprises the following steps:

s510: calculating integral of space-time similarity measurement relation, i.e. area S along lj change curve

S520: the topological relation of the space-time similarity can be calculated and can be decomposed into the operation of the topological and orientation relation on each frame to directly obtain Ssem _(f1,f2) And then correlated with the time similarity, i.e. Stsem _TA(f1,f2) ＝Tsem×Ssem _(f1,f2) ；

S530: the calculation of the total spatio-temporal similarity may be obtained

In the formula: stsem is the total spatio-temporal similarity; w is a group of _ts ，W _tta Are respectively space and time associated weight values and satisfy W _ts +W _tta =1, the size of the similarity of two video metadata ranges between 0 and 1.

The method of the application uploads the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points. Due to the limitations of local storage capacity, bandwidth and the like, the video clip collected by the data collection vehicle has a fixed time length limitation, and the longest available video length T is provided by the application _md The length of the acquisition period is set, and when the cloud server issues a task each time, all video metadata in the period participate in operation to upload. Because the cloud server also has a requirement on the coverage degree of each place, for some places, one-time collection may not meet the published requirement, and if the coverage degree selected in the current period is not enough, what influence is caused on the selection in the next period needs to be considered.

The maximum diversity is regarded as the minimum redundancy, the video is selected on the basis of space-time similarity, the diversity of the finally uploaded video is guaranteed to be maximum, if the requirement cannot be met by one-time uploading, multiple rounds of acquisition are needed, and the requirement of coverage is finally met by multiple selections.

The maximum problem of diversity based on spatio-temporal similarity is described as follows:

suppose that the number d of the car participates in the collection ₁ ,d ₂ ,...,d _i The bandwidth allocated by the cloud is B, and the coverage requirement is S _cov A group of video segments collected in a cycle

Each section V _i All have a start time ts (V) _i ) End time te (V) _i ) And size s (V) _i ). Given an upload time limit t, each device can upload its video in V form, i.e., s (V), within the time limit _i ) Not more than B, and U V _i The space-time similarity is minimum, namely the coverage diversity is maximum.

The biggest problem of diversity based on spatio-temporal similarity is the NP-hard problem.

We reduce the maximum problem of diversity based on spatio-temporal similarity to a known NP-hard problem, namely the 0-1 knapsack problem. In the 0-1 knapsack problem, there is a set of items, each with a value and weight. Given a weight-constrained backpack, the problem is to find a subset whose total weight does not exceed the weight constraint and whose total value is maximized, and for any instance of the 0-1 backpack problem, we can construct an example as follows. We randomly select a mobile device d _i Wherein the network bandwidth limit B is set as the weight limit of the backpack and the number of videos n uploaded by the mobile device _i Is set as the number of items. We set the initial spatio-temporal similarity Stsem to 0, for the j-th segment V _j Its sum of similarity to the selected video metadata ∑ Stsem (V) _j ) Set to the value of the jth item, V is a subset of items that satisfy the weight constraint and minimize the total value of the items. Therefore, V is also a solution to the 0-1 knapsack problem.

For the approximate knapsack problem, we can solve with an algorithm based on dynamic programming. Specifically, the algorithm first randomly selects video metadata for a vehicle as the object of the first selection queue. Once video metadata is selected, it will not be deleted, nor will it continue to be selected at a later time. And selecting videos corresponding to the metadata one by one according to the similarity between the new video metadata and the metadata in the selected video metadata set. If no more videos can be selected (i.e., either all videos have been selected or the bandwidth cap has been reached), the process stops.

And (3) defining dynamic planning: in the solution problem, various possible local solutions are listed for each step of decision, then according to a certain judgment condition, the local solutions which can not obtain the optimal solution certainly are abandoned, and the optimal solution is screened in each step to ensure that the global optimal solution is obtained.

The 0-1 knapsack problem in this context can be seen as a special integer programming problem:

wherein:

suppose that

The backpack has a capacity of B, and the selectable articles are

Time 0-1 optimum for knapsack problem. While

Can select the article as

Time 0-1 backpack problem. At this time:

extrapolating back this recursion, deducing from the putting in of the ith video, if no video is put in, the value of the backpack is not changed, still m (V) _i B), if put in, the backpack capacity is reduced to (B) _i -w _i s(V _i ) Value increased to m (V) _i-1 ,B-w _i s(V _i ))+Stsem(V _i )；

If the most primitive recursion method is used, the time complexity of the algorithm is

Wherein n is _i For all videoIn the case of crowd-sourced acquisition of data, the size of the number of videos is very large, so the time complexity is correspondingly very large, and many subproblems repeat calculations, which causes considerable redundancy. The algorithm in the text simplifies recursion, adopts a bottom-up solving mode to compare the values obtained by selection and non-selection, only compares the result with the previous result, and simplifies the time complexity into O (n) _i * B) Wherein n is _i For all video quantities, B is the allocated bandwidth size.

For the convenience of subsequent operation, the space-time similarity of different metadata is converted into the coverage U of the metadata _cov The corresponding calculation is as follows:

defined by the video metadata, the area covered by a frame of video can be defined by the camera's effective range r in the metadata _j And field of view of camera lens

Determining, from the formula:

the area covered by the complete video is

Wherein l is a camera l on the collecting vehicle _j The trajectory of the run. The effective area covered by two different metadata is calculated by

If the coverage area of the video selected this time is less than S _cov Temporarily keeping the data in the edge server, waiting for the next acquisition, recalculating the video coverage, continuing the process until the coverage requirement is met, and uploading the video to the cloudAnd a server.

The above steps and flows are only for illustrating the technical idea of the present invention, and are not intended to limit the present invention, and any modification and improvement made on the technical scheme, technical idea, introduction method proposed by the present invention are within the protection scope of the present invention.

In summary, the present invention uses a new method to define temporal and spatial similarities, and defines the spatio-temporal similarity between different video metadata based on the definition of the two. The spatiotemporal similarity is computed using only definitions of metadata, without using video content features.

Correspondingly, the application also provides a high-precision map data crowdsourcing device based on space-time similarity, which comprises:

a time module: for calculating the time similarity between metadata;

a space module: the method is used for calculating the spatial similarity among the metadata;

The application also provides a high-precision map data crowdsourcing device based on space-time similarity, which comprises:

a memory for storing a program;

a processor for implementing the above method by executing the program stored in the memory.

Accordingly, the present application provides a computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method as described above.

Those skilled in the art will appreciate that all or part of the functions of the methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A high-precision map data crowdsourcing method based on space-time similarity is characterized by comprising the following steps:

crowdsourcing collecting video data and extracting video metadata;

calculating the time similarity among the metadata;

calculating the spatial similarity among the metadata;

calculating the space-time similarity between the video metadata according to the time and space similarity;

uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low point;

the step of calculating the spatial similarity between the metadata comprises the following steps:

simplifying the representation of two different video frame metadata;

obtaining the similarity between the two metadata according to the rotated coverage intersection and the translated coverage intersection;

calculating from the vertical direction and the horizontal direction to respectively obtain the similarity in the vertical direction and the similarity in the horizontal direction; carrying out weighted calculation on the similarity in the vertical direction and the similarity in the horizontal direction to obtain translation similarity;

two different video frame metadata f ₁ 、f ₂ Simplified representation f ₁ ＝(l ₁ ,θ ₁ ) And f ₂ ＝(l ₂ ,θ ₂ ),

the rotated coverage intersection is represented as:

wherein,

is the field of view of the camera lens, phi represents the field of view size of the camera lens,

a view difference representing metadata of two different video frames;

the coverage intersection of the translations is represented as:

wherein, ssem _⊥ Is prepared by reacting withThe similarity of the vertical direction of the lens,

r denotes the effective range of the camera, δ _l Is represented by ₁ And l ₂ In between the distance between the first and second electrodes is less than the predetermined distance,

Ssem _|| for similarity in the horizontal direction, δ _θ ＝min(|θ ₂ -θ ₁ |,2π-|θ ₂ -θ ₁ |)，δ _θ A direction difference representing metadata of two different video frames;

similarity Ssem in horizontal direction _|| Calculated from the following equation:

when f is ₂ In a translational direction of

And a translation distance of delta _l Then, the translational similarity is obtained through a weighting equation:

Ssem(f1,f2)＝Ssem _R(f1,f2) +Ssem _T(f1,f2) 。

2. the method of claim 1, wherein the calculating of the temporal similarity between the metadata step comprises:

calculating the time measurement relation correlation degree:

calculating the time similarity:

3. the method of claim 1, wherein the calculating the spatiotemporal similarity between the video metadata comprises: calculating the measurement relation of the space-time similarity and calculating the topological relation of the space-time similarity,

Stsem _TA(f1,f2) ＝Tsem×Ssem _(f1,f2) ；

the spatio-temporal similarity is expressed as:

Stsem＝W _ts Stsem _s(f1,f2) +W _tta Stsem _TA(f1,f2)

in the formula: stsem is the total spatio-temporal similarity; w is a group of _ts ，W _tta Are respectively space and time associated weight values, and satisfy W _ts +W _tta =1, the size of the similarity of two video metadata ranges between 0 and 1.

4. A high-precision map data crowdsourcing device based on space-time similarity, wherein the method of any one of claims 1-3 is applied, and comprises the following steps:

an acquisition module: the method comprises the steps of collecting video data in a crowdsourcing mode and extracting video metadata;

a time module: for calculating the time similarity between metadata;

a space module: the spatial similarity between metadata is calculated;

5. A high-precision map data crowdsourcing device based on space-time similarity is characterized by comprising the following components:

a memory for storing a program;

a processor for implementing the method of any one of claims 1-3 by executing a program stored by the memory.

6. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1-3.