CN110598597A

CN110598597A - Multi-scene intersection information classification and extraction method and equipment

Info

Publication number: CN110598597A
Application number: CN201910810713.7A
Authority: CN
Inventors: 周康明
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2019-12-20

Abstract

The method comprises the steps of obtaining at least two scene images of a target intersection, wherein each scene image comprises at least two frames of images; determining a two-dimensional segmentation feature map of each frame of image, and extracting scene features of each frame of image according to the two-dimensional segmentation feature map of each frame of image and each frame of image; clustering and dividing the scenes in each frame of image according to the scene features of each frame of image to obtain a segmented scene image corresponding to each frame of image; fusing the segmentation scene images corresponding to all the frame images under the scene image to obtain a fusion result, and performing structured storage on the fusion result to generate a structured file; and determining an evaluation value corresponding to the fusion result, and performing corresponding processing on the fusion result and the structured file according to the evaluation value. The automatic scene division and the extraction of more complete and accurate scene information are realized, and the accuracy is effectively improved.

Description

Multi-scene intersection information classification and extraction method and equipment

Technical Field

The application relates to the field of computers, in particular to a multi-scene intersection information classification and extraction method and equipment.

Background

The extraction of the structured information of the scene which cannot be opened by the intelligent traffic violation auditing system comprises elements such as lane lines, stop lines, zebra stripes, guide lines, traffic lights and the like. Scene structured information extraction mainly comprises two modes: and identifying intelligent scene information and configuring artificial labels. The former method adopts a traditional method or a deep learning method to intelligently identify and position scene elements, and extracts scene information aiming at a single-frame image, so that the speed is high, the dynamic response is realized, but the problem that the extraction of the single-frame scene information is not complete and inaccurate due to the fact that factors such as shielding, weather and environment are greatly limited is solved; the latter manual mode has high precision, can manually complement the shielding information, but has long time consumption, slow response and high cost, and is difficult to quickly deal with scene change and meet the requirement of extracting large-batch scene information.

Disclosure of Invention

An object of the present application is to provide a method and an apparatus for classifying and extracting multi-scene intersection information, which solve the problem in the prior art that the scene information extraction is not complete and inaccurate due to the environment, weather, shielding, and other reasons.

According to one aspect of the application, a multi-scene intersection information classification and extraction method is provided, and the method comprises the following steps:

acquiring at least two scene images of a target intersection, wherein each scene image comprises at least two frames of images;

determining a two-dimensional segmentation feature map of each frame of image, and extracting scene features of each frame of image according to the two-dimensional segmentation feature map of each frame of image and each frame of image;

clustering and dividing the scenes in each frame of image according to the scene features of each frame of image to obtain a segmented scene image corresponding to each frame of image;

fusing the segmentation scene images corresponding to all the frame images under the scene image to obtain a fusion result, and performing structured storage on the fusion result to generate a structured file;

and determining an evaluation value corresponding to the fusion result, and performing corresponding processing on the fusion result and the structured file according to the evaluation value.

Further, determining the two-dimensional segmentation feature map of each frame of image comprises:

and extracting scene information from each frame of image, and segmenting the scene information according to a preset image segmentation algorithm to obtain a two-dimensional segmentation feature map of each frame of image.

Further, fusing the segmented scene images corresponding to all the frame images in the scene image, including:

determining sub-segmentation images participating in fusion according to all segmentation scene images corresponding to all frame images under the scene image;

and voting according to the value of the corresponding position in the sub-segmentation image participating in the fusion to determine the value of the corresponding position in the fused two-dimensional segmentation feature map under the scene image.

Further, voting according to the value of the corresponding position in the sub-segmented image participating in the fusion to determine the value of the corresponding position in the fused two-dimensional segmented feature map under the scene image, including:

selecting the value with the maximum vote number of the value of the corresponding position in the sub-segmentation image participating in the fusion as the value of the corresponding position in the fused two-dimensional segmentation characteristic image under the scene image;

and when the vote numbers are equal, selecting a target value from the equal vote number values as the value of the corresponding position in the fused two-dimensional segmentation feature map under the scene image.

Further, determining an evaluation value corresponding to the fusion result includes:

and determining an evaluation value corresponding to the fusion result according to the number of the sub-divided images participating in the fusion, the width and the height of the fused two-dimensional divided feature map in the scene image and a corresponding voting matrix, wherein the corresponding voting matrix is determined by the voting number of values of corresponding positions in the sub-divided images participating in the fusion.

Further, determining an evaluation value corresponding to the fusion result, wherein the following conditions are met:

wherein S is_cRepresenting an evaluation value, w is the width of the fused two-dimensional segmentation feature map under the scene image, h is the height of the fused two-dimensional segmentation feature map under the scene image, n is the number of the sub-segmentation images participating in the fusion, S_ijFor the elements in the corresponding voting matrix, S_ijThe value of (d) is the number of votes at the position (x, y) in the fused two-dimensional segmentation feature map corresponding to the scene image.

Further, the corresponding processing of the fusion result and the structured file according to the evaluation value includes:

when the evaluation value is larger than or equal to zero and smaller than a first preset threshold value, regenerating the fused two-dimensional segmentation feature map and the structured file under the scene image;

when the evaluation value is greater than or equal to the first preset threshold and smaller than a second preset threshold, correcting the fused two-dimensional segmentation feature map and the structured file under the scene image;

and when the evaluation value is greater than or equal to the second preset threshold and less than or equal to 1, outputting the fused two-dimensional segmentation feature map and the structured file in the scene image.

According to another aspect of the present application, there is also provided a device for classifying and extracting multi-scene intersection information, the device including:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method as previously described.

According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method as described above.

Compared with the prior art, the method and the device have the advantages that at least two scene images of the target intersection are obtained, wherein each scene image comprises at least two frames of images; determining a two-dimensional segmentation feature map of each frame of image, and extracting scene features of each frame of image according to the two-dimensional segmentation feature map of each frame of image and each frame of image; clustering and dividing the scenes in each frame of image according to the scene features of each frame of image to obtain a segmented scene image corresponding to each frame of image; fusing the segmentation scene images corresponding to all the frame images under the scene image to obtain a fusion result, and performing structured storage on the fusion result to generate a structured file; and determining an evaluation value corresponding to the fusion result, and performing corresponding processing on the fusion result and the structured file according to the evaluation value. Automatic scene division and automatic extraction of scene information which is as complete and accurate as possible can be realized. The grading separable strategy of the final fusion result is adjusted to ensure the reliability of the output structured file; the accuracy is effectively improved, and the universality and the stability are enhanced.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a schematic flow chart illustrating a multi-scenario intersection information classification and extraction method according to an aspect of the present application;

fig. 2 shows a schematic diagram of multiple scenes acquired in an embodiment of the present application;

FIG. 3 is a schematic diagram of an image obtained by segmenting the scene in FIG. 2 according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating a segmentation and fusion map obtained by fusing a plurality of segmentation scene images of each scene in an embodiment of the present application;

fig. 5 is a flowchart illustrating a method for adaptive classification of multi-scene intersection information and extraction of traffic signs in an embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or flash Memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change RAM (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette tape, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmyedia), such as modulated data signals and carrier waves.

Fig. 1 shows a flow chart of a multi-scenario intersection information classification and extraction method provided according to an aspect of the present application, where the method includes: step S11 to step S15,

in step S11, at least two scene images of the target intersection are acquired, wherein each scene image includes at least two frames of images; here, a scene image of the target intersection is acquired through the front-end device, the acquired scene image includes a plurality of different scenes, each scene corresponds to the multi-frame image, the plurality of different scenes are scene images of the intersection in a plurality of time periods, and as shown in fig. 2, the acquired scene images are schematic diagrams of the plurality of scenes; therefore, multi-scene multi-frame images are fused subsequently, and the problem of inaccurate surfaces caused by single-frame images is avoided.

In step S12, determining a two-dimensional segmentation feature map of each frame of image, and extracting scene features of each frame of image according to the two-dimensional segmentation feature map of each frame of image and the each frame of image; specifically, scene information is extracted from each frame of image, and the scene information is segmented according to a preset image segmentation algorithm to obtain a two-dimensional segmentation feature map of each frame of image. Extracting scene information from each frame of image in the step S11, and obtaining scene characteristic information such as a sign line and a traffic light by using an image segmentation algorithm, wherein the sign line may be a lane line, a guide line, a stop line, a zebra crossing, and the like; the image segmentation algorithm includes, but is not limited to, using a depth learning-based method and a conventional method, such as PSPNet, depllabv 3+ segmentation algorithm, etc., and the application scene diagram shown in fig. 3 is an image obtained by segmenting the scene of fig. 2 based on PSPNet. And performing feature extraction on the obtained two-dimensional segmentation feature map in combination with the corresponding original image to obtain scene features of the image, and providing a data basis for subsequent clustering division.

In step S13, clustering and dividing the scenes in each frame of image according to the scene features of each frame of image to obtain a segmented scene image corresponding to each frame of image; and performing clustering analysis according to the scene characteristics of the images, clustering the scene characteristics to obtain the clustering characteristics of the scenes, and dividing the clustering characteristics, so that the automatic division of the scenes is realized, and the segmented scene images corresponding to each frame of images are obtained.

In addition, the following method may be used for cluster division of images: and extracting directional gradient histogram features (Hog features), and selecting a proper threshold value to control the clustering degree by adopting a hierarchical clustering method.

In step S14, fusing the segmented scene images corresponding to all the frame images in the scene image to obtain a fused result, and performing structured storage on the fused result to generate a structured file; the segmentation results of the multi-frame images in each scene are fused, that is, the corresponding segmentation scene images are fused to obtain a fusion result, the fusion result is displayed by a picture, and the fusion result is stored in a structured file, for example, a json file, where the content of the stored fusion result may be a pixel value corresponding to each pixel point in the fused image and includes scene information.

In step S15, an evaluation value corresponding to the fusion result is determined, and the fusion result and the structured file are processed accordingly according to the evaluation value. Here, the confidence level of the fusion result is scored to obtain a corresponding evaluation value, the higher the evaluation value is, the more reliable the fusion result is, and further, the corresponding processing of the fusion result and the structured file is performed according to the obtained evaluation value, where the corresponding processing includes regenerating, correcting, or directly outputting the fusion result and the structured file.

In an embodiment of the present application, in step S14, a sub-divided image participating in fusion is determined according to each divided scene image corresponding to all frame images in the scene image; and voting according to the value of the corresponding position in the sub-segmentation image participating in the fusion to determine the value of the corresponding position in the fused two-dimensional segmentation feature map under the scene image. When the divided scene images are fused, the sub-divided images participating in the fusion are determined, the sub-divided images are divided scene images of a plurality of frames of images corresponding to one scene image, the fused divided feature map of the scene image is determined by a voting fusion method, and specifically, each value in the fused two-dimensional divided feature map M is determined by the sub-divided image I_j(j ═ 0,1,2, … … n) is voted for the value at the corresponding position, and the value at that position in the fused graph is determined from the number of votes for each value at each position.

In connection with the above embodiment, the value of the corresponding position in the two-dimensional segmentation feature map after fusion is determined in a voting fusion manner, and the specific process is as follows:

selecting the value with the maximum vote number of the value of the corresponding position in the sub-segmentation image participating in the fusion as the value of the corresponding position in the fused two-dimensional segmentation characteristic image under the scene image; and when the vote numbers are equal, selecting a target value from the equal vote number values as the value of the corresponding position in the fused two-dimensional segmentation feature map under the scene image. In this case, each value in the fused two-dimensional segmentation feature map M is represented by a sub-segmentation image I_j(j-0, 1,2, … … n) voting for the value at the corresponding position, the value with the largest number being the value of M, and if there are flat tickets, randomly selecting one value from the flat tickets as the value of M; as shown in fig. 4, the scene division fusion map is a fusion map of a plurality of divided scene images of each scene. Through a multi-frame fusion mode, advantage complementation can be formed, a relatively complete and accurate segmentation feature map is obtained, the problem of scene image classification of different front-end equipment is solved, and the problem of scene information fusion errors caused by position change, scene information change and the like of the same equipment is also solved.

In one embodiment of the present application, in step S15, the evaluation value corresponding to the fusion result is determined according to the number of the sub-divided images participating in the fusion, the width and height of the two-dimensional divided feature map after the fusion in the scene image, and a corresponding voting matrix determined by the number of votes for the value of the corresponding position in the sub-divided images participating in the fusion. The voting matrix corresponding to the two-dimensional segmentation characteristic diagram M after fusion is S, the value of each position in S represents the number of votes corresponding to the position of M, and the evaluation value corresponding to the fusion result is determined according to the width and the height of M, the voting matrix S and the number of sub-segmentation images participating in fusion when M is obtained, wherein M comprises C-type segmentation information, 0 represents a background, and 1-C-1 represent lane lines, guide lines, stop lines, zebra lines, traffic lights and the like in a scene.

Specifically, the evaluation value corresponding to the fusion result is determined according to the width and height of M, the voting matrix S and the number of the sub-divided images participating in the fusion when M is obtained, and the following conditions are met:

According to the embodiment, when the evaluation value is greater than or equal to zero and smaller than a first preset threshold value, the fused two-dimensional segmentation feature map and the structured file in the scene image are regenerated; when the evaluation value is greater than or equal to the first preset threshold and smaller than a second preset threshold, correcting the fused two-dimensional segmentation feature map and the structured file under the scene image; and when the evaluation value is greater than or equal to the second preset threshold and less than or equal to 1, outputting the fused two-dimensional segmentation feature map and the structured file in the scene image. Here, threshold values T1 and T2 for evaluation score are set, and the evaluation value S is used as a basis_cPerforming corresponding operation on the generated fused two-dimensional segmentation feature map and the generated structured file according to the satisfied threshold condition; when 0 is less than or equal to S_c<At T1, the score is too low, which means that the fusion result is unreliable, so that a new two-dimensional segmentation feature map and a new structured file need to be generated manually; when T1 is less than or equal to S_c<At T2, the score is low, indicating that the reliability of the fusion result is low, and therefore the original two-dimensional segmentation feature map and the original structured document need to be manually corrected, for example, if the position region (W, Y) in the fusion result is a stop line, but actually is a broken line, manual correction is required. When T2 is less than or equal to S_cAnd when the score is less than or equal to 1, the score is qualified, the fusion result is reliable, manual intervention is not needed, and the two-dimensional segmentation feature fusion graph and the structured file are directly output.

Fig. 5 is a schematic flow chart illustrating a method for adaptive classification of multi-scene intersection information and extraction of traffic signs in an embodiment of the present application, in which a multi-scene multi-frame image is acquired through a front-end device, and a two-dimensional segmentation feature map such as a lane line, a guide line, and a traffic light of each frame image is acquired by using a segmentation algorithm; combining the two-dimensional segmentation feature map and the original image to extract scene features; clustering according to scene features of the images to obtain clustering features, performing fusion on all segmented scene images under each scene obtained by clustering and dividing the scenes to generate a segmented effect graph of each scene and generate a structured file of segmented information of each scene, generating a fusion score by using an evaluation model to determine an evaluation score, and manually generating the segmented effect graph and the structured file if the score is more than or equal to 0 and is less than T1; when T1 is not more than score < T2, artificially correcting the segmentation effect graph and the structured file; and when T2 is less than or equal to score less than or equal to 1, directly outputting the segmentation effect graph and the structured file. And the evaluation model is a model established according to the number of the sub-segmentation images participating in the fusion, the width and the height of the fused two-dimensional segmentation characteristic diagram under the scene image and the corresponding voting matrix.

By the multi-scene intersection information classification and extraction method, automatic scene division and automatic extraction of scene information which is as complete and accurate as possible can be achieved. The grading separable strategy of the final fusion result is adjusted to ensure the reliability of the output structured file; compared with the prior art, the method has the advantages that the accuracy is effectively improved, the universality and the stability are enhanced, and the method is suitable for other fields including but not limited to illegal auditing systems and relating to scene information extraction.

In addition, a computer readable medium is provided, on which computer readable instructions are stored, where the computer readable instructions are executable by a processor to implement the foregoing multi-scenario intersection information classification and extraction method.

In an embodiment of the present application, a device for classifying and extracting multi-scene intersection information is further provided, where the device includes:

one or more processors; and

For example, the computer readable instructions, when executed, cause the one or more processors to:

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A multi-scene intersection information classification and extraction method is characterized by comprising the following steps:

clustering and dividing the scenes in each frame of image according to the scene characteristics of each frame of image to obtain a segmented scene image corresponding to each frame of image;

2. The method of claim 1, wherein determining a two-dimensional segmentation feature map for each frame of image comprises:

3. The method according to claim 1, wherein fusing the segmented scene images corresponding to all the frame images in the scene image comprises:

4. The method according to claim 3, wherein determining the value of the corresponding position in the fused two-dimensional segmentation feature map under the scene image according to the vote of the value of the corresponding position in the sub-segmentation image participating in the fusion comprises:

5. The method of claim 3, wherein determining the evaluation value corresponding to the fusion result comprises:

6. The method according to claim 5, wherein the evaluation value corresponding to the fusion result is determined so as to satisfy the following condition:

7. The method according to claim 3, wherein the corresponding processing of the fusion result and the structured file according to the evaluation value comprises:

8. A device for classifying and extracting multi-scene intersection information is characterized by comprising:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 7.

9. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 7.