CN112380305A

CN112380305A - High-precision map data crowdsourcing method and device based on time-space similarity

Info

Publication number: CN112380305A
Application number: CN202011227747.2A
Authority: CN
Inventors: 黄睿; 唐洁
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2021-02-19
Anticipated expiration: 2040-11-06
Also published as: CN112380305B

Abstract

A high-precision map data crowdsourcing method and device based on space-time similarity are disclosed, wherein the method comprises the following steps: crowdsourcing collected video data and extracting video metadata; calculating the time similarity among the metadata; calculating the spatial similarity among the metadata; calculating the space-time similarity between the video metadata according to the time and space similarity; and uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points. The method and the device reduce the redundancy of data in the high-precision map so as to reduce the delay of transmission and complete map generation.

Description

High-precision map data crowdsourcing method and device based on time-space similarity

Technical Field

The invention relates to the technical field of data processing, in particular to a high-precision map data crowdsourcing method and device based on space-time similarity.

Background

Along with the gradual increase of population in recent years, science and technology is developed gradually, and along with the continuous development of urban traffic, people slowly feel that ordinary car can no longer satisfy present life needs now, consequently, people look the eye that develops towards unmanned driving direction, compare and drive in traditional car, unmanned car can bring huge profit for the car producer, and the human cost is showing to be reduced, and unmanned car can strengthen the security of vehicle road greatly moreover.

Compared with the traditional navigation map, the high-precision map serving automatic driving has higher requirements in all aspects, and can provide support for a decision layer by matching with a sensor and an algorithm, so that various requirements of unmanned driving are better met. The high-precision map is a 2D grid, the main data is point cloud data generated by a laser radar, and the point cloud is derived from conversion of data of types such as video images collected by vehicles through data collection. The high-precision map is one of key capabilities of realizing automatic driving, and the high-precision map can be an effective supplement to an automatic driving existing sensor and provides more reliable sensing capability for a vehicle. The high-precision map service system is important for unmanned driving, because the high-precision map service is a machine, the map has to reach certain accuracy to ensure that the safety of vehicles and people can be ensured under the unsupervised driving, and the vehicles can be ensured to avoid various obstacles and select the most suitable road according to the actual situation, which is completely impossible by the traditional electronic map, and the unmanned driving can be enabled only through a large amount of driving auxiliary information contained in the high-precision map.

Although the high-precision map is popular in research at present and various automobile manufacturers have unsophisticated achievements, in the research process, many unsolved problems still exist to restrict the research progress, and due to the nature of the high-precision map, the integral map data volume is huge, and the requirement of instantaneity required by the high-precision map cannot be met only by simply carrying out one-time transmission under the existing network conditions. For unmanned driving, the information provided by the map must be accurate and time-efficient, at this time, if the source of the picture or video is single, the transmission may cause delay due to bandwidth limitation, and the accuracy may not meet the requirement of unmanned driving due to the limitation of the angle and shooting location, so the high-accuracy map is updated in a crowd-sourced manner in the research, and the accuracy and the time-efficient performance of the map are guaranteed.

Disclosure of Invention

The application provides a high-precision map data crowdsourcing method and device based on space-time similarity, and solves the problems that the redundancy of data in a high-precision map is reduced so as to reduce delay of transmission and complete map generation.

According to a first aspect, an embodiment provides a high-precision map data crowdsourcing method based on spatiotemporal similarity, which includes:

crowdsourcing collected video data and extracting video metadata;

calculating the time similarity among the metadata;

calculating the spatial similarity among the metadata;

calculating the space-time similarity between the video metadata according to the time and space similarity;

and uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points.

In some embodiments, the step of calculating spatial similarity between metadata comprises:

simplifying the representation of two different video frame metadata;

decomposing the rigid motion of the two simplified video metadata into translation and rotation and respectively calculating the similarity of the two motions;

calculating a rotated coverage intersection and a translated coverage intersection;

and obtaining the similarity between the two metadata according to the rotated coverage intersection and the translated coverage intersection.

In some embodiments, the computing the coverage intersection of translations is computing a similarity of translations, including: calculating from the vertical direction and the horizontal direction to respectively obtain the similarity in the vertical direction and the similarity in the horizontal direction; and performing weighted calculation on the similarity in the vertical direction and the similarity in the horizontal direction to obtain the translation similarity.

In some embodiments, two different video frame metadata f₁、f₂Simplified representation f₁＝(l₁,θ₁) And f₂＝(l₂,θ₂)，

Wherein, theta₁And theta₂Representing the angular extent of two different video frame metadata, l₁And l₂Representing the shooting locations of two different video frame metadata;

the rotated coverage intersection is represented as:

wherein the content of the first and second substances,

indicating the size of the field of view of the camera lens,

a view difference representing metadata of two different video frames;

the coverage intersection of the translations is represented as:

wherein, Ssem_⊥For the sake of similarity in the vertical direction,

r denotes the effective range of the camera, δ_lIs represented by₁And l₂The distance between the two or more of the two or more,

δ_θ＝min(|θ₂-θ₁|,2π-|θ₂-θ₁|)，δ_θrepresenting the difference in orientation of two different video frame metadata,

the similarity in the horizontal direction can be calculated by the following equation:

two different video frame metadata f₁And f₂The spatial similarity between them is expressed as:

Ssem_(f1,f2)＝Ssem_R×Ssem_T。

in some embodiments, the calculating the temporal similarity between the metadata comprises:

calculating the time measurement relation correlation degree:

calculating the time similarity:

the method of claim 1, wherein the step of calculating spatiotemporal similarity between video metadata comprises: calculating the measurement relation of the space-time similarity and calculating the topological relation of the space-time similarity,

the metric relation for calculating the space-time similarity is to calculate the integral of the area S along the lj change curve:

the topological relation for calculating the space-time similarity is to obtain the space similarity Ssem between two different video frame metadata by decomposing the operation of the topological relation and the orientation relation on each frame_(f1,f2)And then, associating with the time similarity to obtain a topological relation of the space-time similarity:

Stsem_TA(f1,f2)＝Tsem×Ssem_(f1,f2)；

the spatio-temporal similarity is expressed as:

in the formula: stsem is the total spatio-temporal similarity; w_ts，W_ttaAre respectively space and time associated weight values and satisfy W_ts+W_ttaThe size of the similarity of two video metadata ranges between 0 and 1, 1.

According to a third aspect, an embodiment provides a high-precision map data crowdsourcing device based on space-time similarity, including:

an acquisition module: the method comprises the steps of collecting video data through crowdsourcing, and extracting video metadata;

a time module: for calculating the time similarity between metadata;

a space module: the method is used for calculating the spatial similarity among the metadata;

a calculation module: the video processing device is used for calculating the space-time similarity among the video metadata according to the time and space similarities;

an uploading module: the method is used for uploading the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points.

a memory for storing a program;

a processor for implementing the method as described in the first aspect by executing the program stored by the memory.

According to a fourth aspect, an embodiment provides a computer readable storage medium comprising a program executable by a processor to implement the method according to the first aspect.

According to the embodiment, the high-precision map data crowdsourcing method and system based on the space-time similarity are provided, crowdsourcing data acquisition and edge calculation are combined, and meanwhile the operating pressure of a cloud system is remarkably reduced.

Drawings

FIG. 1 is a flow chart of a high-precision map data crowdsourcing method based on spatiotemporal similarity provided by the application;

FIG. 2 is a diagram of a system operation scenario according to an embodiment;

FIG. 3 is a flow chart of a spatiotemporal similarity calculation method according to an embodiment.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The application provides a high-precision map data crowdsourcing method based on space-time similarity, which comprises the following steps of:

s100: the vehicle collects video data and extracts information to form six-element group for each video

S200: calculating the spatial similarity Ssem and the temporal similarity Tsem, and executing the steps S310-S380 when the spatial similarity Ssem is calculated; when the time similarity Tsem is calculated, steps S410-S420 are performed.

The method for calculating the spatial similarity Ssem comprises the following steps:

s310: metadata f for two different video frames₁And f₂Simplified representation f₁＝(l₁,θ₁) And f₂＝(l₂,θ₂) The effective range of the camera and the visual field size of the camera lens are fixed as r and

each having an angular range of theta₁And theta₂。δ_lIs represented by₁And l₂Distance between, δ_θIndicates a difference in direction of

S320: decomposing rigid motion of two video metadata into translation and rotation and respectively calculating the similarity of the two motions;

s330: the rotated coverage intersection is computed, defining the coverage intersection as the similarity between the two metadata.

S340; calculating the translational coverage intersection, and calculating from the vertical direction and the horizontal direction;

s350: similarity Ssem in vertical direction_⊥Can be obtained by the following formula

S360: the similarity in the horizontal direction can be calculated by the following equation:

wherein

S370: the translational similarity is obtained by a weighting equation:

S380：f₁and f₂The similarity between them can be obtained by the equation: ssem_(f1,f2)＝Ssem_R×Ssem_T；

Calculating the time similarity Tsem comprises the following steps:

s410: the time metric relationship correlation is calculated as follows:

s420: calculating from functional expressions

Calculating the spatio-temporal similarity according to the temporal similarity Tsem and the spatial similarity Ssem comprises the following steps:

s510: calculating integral of space-time similarity measurement relation, i.e. area S along lj change curve

S520: the topological relation for calculating the time-space similarity can be decomposed into the operation of the topological and orientation relation on each frameThen obtain Ssem_(f1,f2)And then correlated with the time similarity, i.e. Stsem_TA(f1,f2)＝Tsem×Ssem_(f1,f2)；

S530: the calculation of the total spatio-temporal similarity may be obtained

The method of the application uploads the required video according to the space-time similarity, the bandwidth environment and the coverage required by the low points. Due to the limitations of local storage capacity, bandwidth and the like, the video clip collected by the data collection vehicle has a fixed time length limitation, and the longest available video length T is provided by the application_mdThe acquisition period length is set, and when the cloud server issues a task each time, all video metadata in the period participate in operation to perform uploading selection. Because the cloud server also has a requirement on the coverage degree of each place, for some places, one-time collection may not meet the published requirement, and if the coverage degree selected in the current period is not enough, what influence is caused on the selection in the next period needs to be considered.

The maximum diversity is regarded as the minimum redundancy, the video is selected on the basis of space-time similarity, the diversity of the finally uploaded video is guaranteed to be maximum, if the requirement cannot be met by one-time uploading, multiple rounds of acquisition are needed, and the requirement of coverage is finally met by multiple selections.

The maximum problem of diversity based on spatio-temporal similarity is described as follows:

suppose that the number d of the car participates in the collection₁,d₂,...,d_iThe bandwidth allocated by the cloud is B, and the coverage requirement is S_covA group of video segments collected in a cycle

Each section V_iAll have a start time ts (V)_i) End time te (V)_i) And size s (V)_i). Given an upload time limit t, each device can upload its video in V form, i.e., s (V), within the time limit_i) Not more than B, and U.V_iThe space-time similarity is minimum, namely the coverage diversity is maximum.

The biggest problem of diversity based on spatio-temporal similarity is the NP-hard problem.

We reduce the maximum problem of diversity based on spatio-temporal similarity to a known NP-hard problem, namely the 0-1 knapsack problem. In the 0-1 knapsack problem, there is a set of items, each with a value and weight. Given a weight-constrained backpack, the problem is to find a subset whose total weight does not exceed the weight constraint and whose total value is maximized, and for any instance of the 0-1 backpack problem, we can construct an example as follows. We randomly select a mobile device d_iWherein the network bandwidth limit B is set as the weight limit of the backpack and the number of videos n uploaded by the mobile device_iIs set as the number of items. We set the initial spatio-temporal similarity Stsem to 0, for the j-th segment V_jIts sum of similarity to the selected video metadata ∑ Stsem (V)_j) Set to the value of the jth item, V is the subset of items that satisfies the weight constraint and minimizes the total value of the items. Therefore, V is also a solution to the 0-1 knapsack problem.

For the approximate knapsack problem, we can solve with an algorithm based on dynamic programming. Specifically, the algorithm first randomly selects video metadata for a vehicle as the object of the first selection queue. Once video metadata is selected, it will not be deleted, nor will it continue to be selected at a later time. And selecting videos corresponding to the metadata one by one according to the similarity of the new video metadata and the metadata in the selected video metadata set. If no more videos can be selected (i.e., either all videos have been selected or the bandwidth cap has been reached), the process stops.

And (3) defining dynamic planning: in the solution problem, various possible local solutions are listed for each step of decision, then according to a certain judgment condition, the local solutions which can not obtain the optimal solution certainly are abandoned, and the optimal solution is screened in each step to ensure that the global optimal solution is obtained.

The 0-1 knapsack problem herein can be viewed as a special integer programming problem:

wherein:

suppose that

The backpack has a capacity of B, and the selectable articles are

Time 0-1 backpack problem. While

Can select the article as

Time 0-1 backpack problem. At this time:

extrapolating back this recursion, deducing from the putting in of the ith video, if no video is put in, the value of the backpack is not changed, still m (V)_iB), if put in, the backpack capacity is reduced to (B)_i-w_is(V_i) Value increased to m (V)_i-1,B-w_is(V_i))+Stsem(V_i)；

If the most primitive recursion method is used, the time complexity of the algorithm is

Wherein n is_iFor all video quantities, when crowd-sourcing the acquisition data, the size of the video quantities is very large, and therefore the time complexity is correspondingly very large, and many sub-problems are repeatedly calculated, resulting in considerable redundancy. The algorithm in the text simplifies recursion, adopts a bottom-up solving mode to compare the values obtained by selection and non-selection, only compares the result with the previous result, and simplifies the time complexity into O (n)_iB), wherein n_iFor all video quantities, B is the allocated bandwidth size.

For the convenience of subsequent operation, the space-time similarity of different metadata is converted into the coverage U of the metadata_covThe corresponding calculation is as follows:

defined by the video metadata, the area covered by a frame of video can be defined by the camera's effective range r in the metadata_jAnd field of view of camera lens

The decision, derived from the formula:

the area covered by the complete video is

Wherein l is a camera l on the collecting vehicle_jThe trajectory of the run. Of the effective area of coverage of two different metadataIs calculated by the formula

If the coverage area of the selected video is smaller than S_covAnd temporarily keeping the data in the edge server, recalculating the video coverage after waiting for the next acquisition, continuing the process until the coverage requirement is met, and uploading the video to the cloud server.

The above steps and flows are only for illustrating the technical idea of the present invention, and are not intended to limit the present invention, and any modification and improvement made on the technical scheme, technical idea, introduction method proposed by the present invention are within the protection scope of the present invention.

In summary, the present invention uses a new method to define temporal and spatial similarities, and defines the spatio-temporal similarities between different video metadata based on the definitions of the two. The spatiotemporal similarity is calculated without using video content features, using only the definitions of metadata.

Correspondingly, the application also provides a high-precision map data crowdsourcing device based on space-time similarity, which comprises:

a time module: for calculating the time similarity between metadata;

The application also provides a high-precision map data crowdsourcing device based on space-time similarity, which comprises:

a memory for storing a program;

a processor for implementing the above method by executing the program stored in the memory.

Accordingly, the present application provides a computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method as described above.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A high-precision map data crowdsourcing method based on space-time similarity is characterized by comprising the following steps:

crowdsourcing collected video data and extracting video metadata;

calculating the time similarity among the metadata;

calculating the spatial similarity among the metadata;

2. The method of claim 1, wherein the calculating spatial similarities between metadata step comprises:

simplifying the representation of two different video frame metadata;

3. The method of claim 2, wherein the computing the coverage intersection of translations is computing a similarity of translations, comprising: calculating from the vertical direction and the horizontal direction to respectively obtain the similarity in the vertical direction and the similarity in the horizontal direction; and performing weighted calculation on the similarity in the vertical direction and the similarity in the horizontal direction to obtain the translation similarity.

4. The method of claim 3,

two different video frame metadata f₁、f₂Simplified representation f₁＝(l₁,θ₁) And f₂＝(l₂,θ₂),

the rotated coverage intersection is represented as:

wherein the content of the first and second substances,

indicating the size of the field of view of the camera lens,

a view difference representing metadata of two different video frames;

the coverage intersection of the translations is represented as:

wherein, Ssem_⊥For the sake of similarity in the vertical direction,

δ_θ＝min(|θ₂-θ₁|,2π-|θ₂-θ₁|)，δ_θrepresenting a difference in orientation of two different video frame metadata;

Ssem_(f1,f2)＝Ssem_R×Ssem_T。

5. the method of claim 1, wherein the calculating of the temporal similarity between the metadata step comprises:

calculating the time measurement relation correlation degree:

calculating the time similarity:

6. the method of claim 1, wherein the step of calculating spatiotemporal similarity between video metadata comprises: calculating the measurement relation of the space-time similarity and calculating the topological relation of the space-time similarity,

Stsem_TA(f1,f2)＝Tsem×Ssem_(f1,f2)；

the spatio-temporal similarity is expressed as:

7. A high-precision map data crowdsourcing device based on space-time similarity is characterized by comprising the following components:

a time module: for calculating the time similarity between metadata;

8. A high-precision map data crowdsourcing device based on space-time similarity is characterized by comprising the following components:

a memory for storing a program;

a processor for implementing the method of any one of claims 1-6 by executing a program stored by the memory.

9. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1-6.