CN111401359A - Target identification method and device, electronic equipment and storage medium - Google Patents

Target identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111401359A
CN111401359A CN202010116980.7A CN202010116980A CN111401359A CN 111401359 A CN111401359 A CN 111401359A CN 202010116980 A CN202010116980 A CN 202010116980A CN 111401359 A CN111401359 A CN 111401359A
Authority
CN
China
Prior art keywords
response
region
candidate
image
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010116980.7A
Other languages
Chinese (zh)
Inventor
范铭源
罗钧峰
张珂
魏晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN202010116980.7A priority Critical patent/CN111401359A/en
Publication of CN111401359A publication Critical patent/CN111401359A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a target identification method, a target identification device, electronic equipment and a storage medium. The method comprises the following steps: acquiring an image to be identified; performing position identification on the image to be identified to obtain a position identification result, and determining a candidate region set of the image to be identified according to the position identification result; and performing target category identification on each candidate region in the candidate region set to obtain a category identification result. The method has the advantages that the target position identification and the target category identification are carried out in two stages, the target categories are identified by using the candidate areas, effective information in the image to be identified can be fully utilized, the category identification accuracy is improved, and the target identification effect is improved.

Description

Target identification method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer vision, and in particular, to a target recognition method, apparatus, electronic device, and storage medium.
Background
Computer vision can enable a computer to replace human eyes, effective information can be identified from videos or images, and the method has important significance in scenes such as automatic driving, living body detection and the like. A common task in the field of computer vision is how to recognize various targets such as people, vehicles, trees and the like from an image, but the recognition accuracy in the prior art is not high, and particularly, some similar targets are easily confused, so that a solution is urgently needed.
Disclosure of Invention
In view of the above, the present application is proposed to provide an object recognition method, apparatus, electronic device and storage medium that overcome or at least partially solve the above problems.
According to an aspect of the present application, there is provided a target recognition method including:
acquiring an image to be identified;
performing position identification on the image to be identified to obtain a position identification result, and determining a candidate region set of the image to be identified according to the position identification result;
and performing target category identification on each candidate region in the candidate region set to obtain a category identification result.
Optionally, the performing position recognition on the image to be recognized to obtain a position recognition result includes:
identifying a plurality of response areas of a target from the image to be identified through a neural network to obtain a response area set; selecting an optimal response area from the response area set as a position identification result;
the determining the candidate region set of the image to be recognized according to the position recognition result comprises:
selecting a relevant response region from the response region set according to the relevance between the rest response regions in the response region set and the optimal response region; and taking the optimal response region and the associated corresponding region as candidate regions to obtain the candidate region set.
Optionally, the selecting an optimal response region from the response region set includes:
and selecting an optimal response area from the response areas according to a non-maximum suppression mode.
Optionally, the selecting, according to the relevance between the remaining response regions in the response region set and the optimal response region, relevant response regions from the response region set includes:
respectively calculating the intersection ratio of each residual response area and the optimal response area;
and screening out the associated response regions from the remaining response regions with the intersection ratio falling into the preset interval according to the confidence coefficient.
Optionally, the performing target category identification on each candidate region in the candidate region set to obtain a category identification result includes:
respectively carrying out target category identification on each candidate area to obtain a category identification sub-result corresponding to each candidate area;
and performing soft voting on the result of each category identification sub to obtain a final category identification result.
Optionally, the performing soft voting on each category identification sub-result to obtain a final category identification result includes:
weighting the class probability distribution representing the class identification sub-result according to the soft voting weight of each candidate region to obtain weighted class probability distribution;
summing the class probability distributions according to class dimensions to obtain summarized class probability distributions;
and taking the category with the highest probability as a category identification result.
Optionally, the soft voting weight is determined according to an intersection ratio of each candidate region and the optimal response region.
According to another aspect of the present application, there is provided an object recognition apparatus including:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be recognized;
the recognition unit is used for carrying out position recognition on the image to be recognized to obtain a position recognition result, and determining a candidate area set of the image to be recognized according to the position recognition result; and performing target category identification on each candidate region in the candidate region set to obtain a category identification result.
Optionally, the identification unit is configured to identify a plurality of response regions of a target from the image to be identified through a neural network, so as to obtain a response region set; selecting an optimal response area from the response area set as a position identification result; selecting a relevant response region from the response region set according to the relevance between the rest response regions in the response region set and the optimal response region; and taking the optimal response region and the associated corresponding region as candidate regions to obtain the candidate region set.
Optionally, the identifying unit is configured to select an optimal response region from the response regions according to a non-maximum suppression manner.
Optionally, the identification unit is configured to calculate an intersection ratio between each remaining response region and the optimal response region; and screening out the associated response regions from the remaining response regions with the intersection ratio falling into the preset interval according to the confidence coefficient.
Optionally, the identifying unit is configured to perform target category identification on each candidate region respectively to obtain a category identifier result corresponding to each candidate region; and performing soft voting on the result of each category identification sub to obtain a final category identification result.
Optionally, the identifying unit is configured to weight the class probability distribution representing the class identifier result according to the soft voting weight of each candidate region, so as to obtain a weighted class probability distribution; summing the class probability distributions according to class dimensions to obtain summarized class probability distributions; and taking the category with the highest probability as a category identification result.
Optionally, the soft voting weight is determined according to an intersection ratio of each candidate region and the optimal response region.
In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.
According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.
According to the technical scheme, after the image to be recognized is obtained, the candidate area set corresponding to the target is recognized from the image to be recognized, then the target category recognition is carried out on each candidate area in the candidate area set to obtain the category recognition result, and the position recognition result is determined according to the category recognition result. The technical scheme has the advantages that the target position identification and the target category identification are carried out in two stages, the target categories are identified by using the candidate areas, effective information in the image to be identified can be fully utilized, the category identification accuracy is improved, and the target identification effect is improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a schematic flow diagram of a method of object recognition according to an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of a target recognition device according to an embodiment of the present application;
FIG. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application;
fig. 5 shows an error cause analysis diagram of an object recognition method.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The method for identifying the target mainly comprises the following two aspects: firstly, determining the position of a target, namely determining the position of the target in an image, and identifying the position of the target, which is also called target detection; the second is to determine the category of the target, if the target is a vehicle, whether the target is a truck, a car, an ambulance, or the like, also called target classification. Based on the two aspects, the scheme design can be carried out through the following two ideas.
The first idea is as follows: and taking the target detection and the target classification as a task to train a target recognition model. However, the disadvantage is that such models are not effective for some objects with unbalanced distribution of categories, such as traffic signs, and traffic signs with fewer categories are easily covered by traffic signs with more categories. Moreover, the neural network used for the classification part of the model is light in weight, and the effect is difficult to meet the requirement.
And a second idea: firstly, target detection is carried out, such as the position of a traffic sign is marked by a surrounding frame in an image; and then, identifying the images in the surrounding frame to determine the category of the traffic sign. However, in this case, the effect of the class identification is closely related to the result of the object detection, and if a part of valid information is missing from the result of the object detection, an identification error is generated.
As shown in fig. 5, the image at the upper part of fig. 5 is a target image cut from the image to be recognized through the target detection network, and the image to be recognized is recognized because the definition of the image to be recognized is not high, and the result is the traffic sign with the target speed limit of 60 km/h. But actually, the traffic sign is limited to the speed of 50km/h, and the visible identification is not accurate. This is because, when the target image is cut out, a part of the information included in the image to be recognized is inevitably lost, and actually, for this example, if the cut region is slightly shifted to obtain the target image shown in the lower part of fig. 5, it is possible to accurately recognize that it is a traffic sign with a speed limit of 50 km/h.
Therefore, the design idea of the present application is to perform target detection first by using target detection and target classification as two stages, but different from the prior art, instead of outputting one detection frame, a plurality of associated detection frames are used as candidate regions; and then, carrying out target classification on each candidate region, integrating the classification result of each candidate region, determining the final target class, and determining a specific target position according to the target class.
Fig. 1 shows a schematic flow chart of a target identification method according to an embodiment of the present application.
As shown in fig. 1, the method includes:
and step S110, acquiring an image to be identified. For example, a road capture image in the automatic driving project may be used as an image to be recognized to perform recognition of a traffic sign.
And step S120, carrying out position identification on the image to be identified to obtain a position identification result, and determining a candidate area set of the image to be identified according to the position identification result.
Here, any of the existing target detection methods may be selected, and a mask (mask) or a bounding box (bounding box) of the target may be detected. In practice, the target detection algorithm may output a plurality of regions of the target, but generally only the region with the highest probability is output.
Step S130, performing target category identification on each candidate region in the candidate region set to obtain a category identification result.
Here, the target category recognition may be performed on each candidate region, so as to obtain a plurality of category recognition sub-results, and the category recognition result may be determined according to the category recognition sub-results. Therefore, the effective information in the image to be recognized can be fully utilized, and the recognition failure condition caused by the omission of the effective information due to the fact that the recognition is carried out only according to one candidate region is avoided.
It can be seen that the method shown in fig. 1 uses the recognition of the target position and the recognition of the target category as two stages, and uses a plurality of candidate regions to recognize the target category, so that the effective information in the image to be recognized can be more fully used, the accuracy of category recognition can be improved, and the target recognition effect can be improved.
In an embodiment of the present application, in the method, performing position recognition on an image to be recognized, and obtaining a position recognition result includes: identifying a plurality of response areas of a target from an image to be identified through a neural network to obtain a response area set; selecting an optimal response area from the response area set as a position identification result; determining the candidate region set of the image to be recognized according to the position recognition result comprises the following steps: selecting an associated response region from the response region set according to the relevance between the remaining response regions in the response region set and the optimal response region; and taking the optimal response region and the associated corresponding region as candidate regions to obtain a candidate region set.
The neural network can be selected from fast R-CNN (Faster area-based convolutional neural network), RFCN (area-based full convolutional network), SSD (single multi-bounding box detector), YO L O, etckAnd recording the set of response regions as P ═ P1,p2,p3,p4,…,pnAnd then, selecting an optimal response area from the optimal response areas, wherein the optimal response area is marked as p and corresponds to the position identification result.
In the embodiment of the present application, it is considered that p may omit part of valid information that can be used for target classification, and the problem is how to acquire the valid information that may be omitted. The embodiment of the application shows a manner of obtaining the candidate regions by screening the relevance between the remaining response regions and the optimal response regions in the response region set, in which the second batch of selected candidate regions and the first batch of selected optimal response regions have sufficiently high relevance and contain more information, which is helpful for performing the subsequent target classification.
In an embodiment of the present application, the selecting an optimal response region from the set of response regions includes: and selecting an optimal response area from the response areas according to the non-maximum suppression mode.
Non-maximum suppression (NMS) is an algorithm for removing Non-maxima and is commonly used for edge detection and target recognition in computer vision.
The specific algorithm flow can refer to the following example:
in the preparation stage, a picture and many candidate frames for object detection are needed (i.e. each frame may represent an object), but the frames are likely to have overlapping parts, and all we need to do is to keep only the optimal frame. Suppose there are N frames, the score of each frame calculated by the classifier is Si, i is more than or equal to 1 and less than or equal to N.
Specifically, when non-maximum suppression is performed, the first step: building a set H for storing candidate frames to be processed, and initializing the set H to include all N frames; a set M for storing the optimal frames is built and initialized to be an empty set. The second step is that: and sorting all the frames in the set H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M. The third step: traversing the boxes in the set H, respectively calculating intersection-over-intersection ratios (IoU) with the box m, if the intersection-over-intersection ratios are higher than a certain threshold (generally 0-0.5), considering that the boxes are overlapped with the m, and removing the boxes from the set H. The fourth step: and returning to the second step for iteration until the set H is empty.
The boxes in the final set M are the results we need.
In an embodiment of the present application, in the method, selecting an associated response region from the response region set according to the relevance between the remaining response regions in the response region set and the optimal response region includes: respectively calculating the intersection ratio of each residual response area and the optimal response area; and screening out the associated response regions from the remaining response regions with the intersection ratio falling into the preset interval according to the confidence coefficient.
The embodiment refers to the cross-over ratio used in non-maximum suppression, determines a plurality of response regions by the cross-over ratio, and if the number is excessive, selects a preset number of associated response regions according to the confidence (i.e. the probability of candidate regions output by the neural network) from high to low. In a preferred embodiment, the number of associated response regions may be 4.
In a specific embodiment, the intersection ratio of the associated response region and the optimal corresponding region is required to be IOU ∈ [ miniIOU, maxIOU ]. the selection of miniIOU is to ensure that the associated response region is associated with the optimal response region as much as possible, and in a preferred embodiment, the value is 0.6, the value of maxIOU is to ensure that the associated response region is different from the optimal response region to a certain extent, that is, more effective information is included, and in a preferred embodiment, the value is 0.9.
In an embodiment of the present application, in the method, performing target class identification on each candidate region in the candidate region set, and obtaining a class identification result includes: respectively carrying out target category identification on each candidate area to obtain a category identification sub-result corresponding to each candidate area; and performing soft voting on the result of each category identification sub to obtain a final category identification result.
Soft voting (Soft voting) and Hard voting (Hard voting, or Majority voting) are two output strategies for integrated learning for classification problems. The soft voting is also called weighted average probability voting, and is a voting method for classifying by using the output class probability, and a weighted average value of the class probability of each class is obtained by inputting a weight, and the class with a large value is selected.
According to the embodiment of the application, one category identification result is determined according to a plurality of category identification sub-results, namely, the results of the plurality of category identifications are integrated and accord with the scene of ensemble learning, so that a soft voting mechanism can be selected.
Specifically, in an embodiment of the present application, in the method, performing soft voting on each category identification sub-result to obtain a final category identification result includes: weighting the class probability distribution representing the class identification sub-result according to the soft voting weight of each candidate region to obtain weighted class probability distribution; summing the class probability distributions according to class dimensions to obtain summarized class probability distributions; and taking the category with the highest probability as a category identification result.
For the classification network, the recognition result of each image is an n-dimensional probability vector, and n is the number of classes of the recognition model, which can be regarded as representing the probability distribution of each class.
As a simplified example, the result of identifying the candidate region a is: a probability of 0.1 is large-sized vehicles, a probability of 0.2 is medium-sized vehicles, and a probability of 0.7 is small-sized vehicles, then the probability vector of the candidate area a is (0.1,0.2, 0.7); the recognition result of the candidate region B is: if the probability of 0.2 is large-sized vehicles, the probability of 0.2 is medium-sized vehicles, and the probability of 0.6 is small-sized vehicles, the probability vector of the candidate region B is (0.2,0.2, 0.6). The weight of the candidate area a is 0.4 and the weight of the candidate area B is 0.6, and the final probability vector is (0.16,0.2,0.64), that is, the probability of 0.16 is a large-sized vehicle, the probability of 0.2 is a medium-sized vehicle, the probability of 0.64 is a small-sized vehicle, and the final determined category recognition result is a small-sized vehicle.
In an embodiment of the present application, in the above method, the soft voting weight value is determined according to an intersection ratio of each candidate region and the optimal response region.
And calculating the soft voting weight value according to the proportion of the IOU value of each candidate area to the IOU value of the optimal response area. The IOU value of the optimal response region is 1, and the IOU values of the other candidate regions and the optimal response region are { IOU0,IOU1,…IOUkThus, the weight of the ith frame
Figure BDA0002391793490000081
Therefore, the weight of the optimal response area is highest, and interference is avoided; but also comprehensively considers the category identification results of other candidate areas. The higher the IOU value is, the more consistent the information contained in the two areas is, and the correction of the recognition result can be performed based on the partial information that is not consistent.
The technical scheme of the application is theoretically suitable for various scenes which divide target identification into two stages of target detection and target classification, especially for tasks such as traffic signs and the like which need to be classified finely and have uneven sample quantity, and the performance effect is better. The following table shows the results of experiments performed on the basis of the traffic sign identification data set.
Traffic sign identification data set Recall rate Rate of accuracy
Control group 92.85% 80.26%
Experimental group 93.03% 81.82%
The scheme adopted by the experimental group is a scheme which integrates the above embodiments, and the intersection ratio interval takes a value of [0.6,0.9], and the number of candidate regions is 5. The control group is a scheme for performing class identification using only one optimal response region. Therefore, the recall rate and the accuracy of the experimental group are obviously improved compared with those of the control group. Specifically, the recall rate is the ratio of the number of the correctly identified traffic signs to the total number of the traffic signs, and the accuracy rate is the ratio of the number of the correctly identified traffic signs to the number of the traffic signs identified by the model.
Fig. 2 shows a schematic structural diagram of an object recognition apparatus according to an embodiment of the present application. As shown in fig. 2, the object recognition apparatus 200 includes:
an obtaining unit 210, configured to obtain an image to be identified. For example, a road capture image in the automatic driving project may be used as an image to be recognized to perform recognition of a traffic sign.
The identification unit 220 is configured to perform position identification on the image to be identified to obtain a position identification result, and determine a candidate region set of the image to be identified according to the position identification result; and performing target category identification on each candidate region in the candidate region set to obtain a category identification result.
Here, any of the existing target detection methods may be selected, and a mask (mask) or a bounding box (bounding box) of the target may be detected. In practice, the target detection algorithm may output a plurality of regions of the target, but generally only the region with the highest probability is output. The target category recognition can be respectively carried out on each candidate region to obtain a plurality of category recognition sub-results, and then the category recognition result is determined according to the category recognition sub-results. Therefore, the effective information in the image to be recognized can be fully utilized, and the recognition failure condition caused by the omission of the effective information due to the fact that the recognition is carried out only according to one candidate region is avoided.
It can be seen that the apparatus shown in fig. 2 recognizes the target type by using a plurality of candidate regions in two stages, i.e., recognizing the target position and recognizing the target type, and can more fully utilize the effective information in the image to be recognized, thereby improving the accuracy of type recognition and improving the target recognition effect.
In an embodiment of the present application, in the above apparatus, the identifying unit 220 is configured to identify a plurality of response regions of a target from an image to be identified through a neural network, so as to obtain a set of response regions; selecting an optimal response area from the response area set as a position identification result; selecting an associated response region from the response region set according to the relevance between the remaining response regions in the response region set and the optimal response region; and taking the optimal response region and the associated corresponding region as candidate regions to obtain a candidate region set.
The neural network herein mayFast R-CNN (Faster region-based convolutional neural network), RFCN (region-based full convolutional network), SSD (single multi-bounding box detector), YO L O, etc. are selectedkAnd recording the set of response regions as P ═ P1,p2,p3,p4,…,pnAnd then, selecting an optimal response area from the optimal response areas, wherein the optimal response area is marked as p and corresponds to the position identification result.
In the embodiment of the present application, it is considered that p may omit part of valid information that can be used for target classification, and the problem is how to acquire the valid information that may be omitted. The embodiment of the application shows a manner of obtaining the candidate regions by screening the relevance between the remaining response regions and the optimal response regions in the response region set, in which the second batch of selected candidate regions and the first batch of selected optimal response regions have sufficiently high relevance and contain more information, which is helpful for performing the subsequent target classification.
In an embodiment of the present application, in the apparatus, the identifying unit 220 is configured to select an optimal response region from the response regions according to a non-maximum suppression manner.
Non-maximum suppression (NMS) is an algorithm for removing Non-maxima and is commonly used for edge detection and target recognition in computer vision.
The specific algorithm flow can refer to the following example:
in the preparation stage, a picture and many candidate frames for object detection are needed (i.e. each frame may represent an object), but the frames are likely to have overlapping parts, and all we need to do is to keep only the optimal frame. Suppose there are N frames, the score of each frame calculated by the classifier is Si, i is more than or equal to 1 and less than or equal to N.
Specifically, when non-maximum suppression is performed, the first step: building a set H for storing candidate frames to be processed, and initializing the set H to include all N frames; a set M for storing the optimal frames is built and initialized to be an empty set. The second step is that: and sorting all the frames in the set H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M. The third step: traversing the boxes in the set H, respectively calculating intersection-over-intersection ratios (IoU) with the box m, if the intersection-over-intersection ratios are higher than a certain threshold (generally 0-0.5), considering that the boxes are overlapped with the m, and removing the boxes from the set H. The fourth step: and returning to the second step for iteration until the set H is empty.
The boxes in the final set M are the results we need.
In an embodiment of the present application, in the apparatus, the identifying unit 220 is configured to calculate an intersection ratio between each remaining response region and the optimal response region; and screening out the associated response regions from the remaining response regions with the intersection ratio falling into the preset interval according to the confidence coefficient.
The embodiment refers to the cross-over ratio used in non-maximum suppression, determines a plurality of response regions by the cross-over ratio, and if the number is excessive, selects a preset number of associated response regions according to the confidence (i.e. the probability of candidate regions output by the neural network) from high to low. In a preferred embodiment, the number of associated response regions may be 4.
In a specific embodiment, the intersection ratio of the associated response region and the optimal corresponding region is required to be IOU ∈ [ miniIOU, maxIOU ]. the selection of miniIOU is to ensure that the associated response region is associated with the optimal response region as much as possible, and in a preferred embodiment, the value is 0.6, the value of maxIOU is to ensure that the associated response region is different from the optimal response region to a certain extent, that is, more effective information is included, and in a preferred embodiment, the value is 0.9.
In an embodiment of the present application, in the above apparatus, the identifying unit 220 is configured to perform object class identification on each candidate region, respectively, to obtain a class identification sub-result corresponding to each candidate region; and performing soft voting on the result of each category identification sub to obtain a final category identification result.
Soft voting (Soft voting) and Hard voting (Hard voting, or Majority voting) are two output strategies for integrated learning for classification problems. The soft voting is also called weighted average probability voting, and is a voting method for classifying by using the output class probability, and a weighted average value of the class probability of each class is obtained by inputting a weight, and the class with a large value is selected.
According to the embodiment of the application, one category identification result is determined according to a plurality of category identification sub-results, namely, the results of the plurality of category identifications are integrated and accord with the scene of ensemble learning, so that a soft voting mechanism can be selected.
Specifically, in an embodiment of the present application, in the apparatus, the identifying unit 220 is configured to weight the class probability distribution representing the class identifier result according to the soft voting weight of each candidate region, so as to obtain a weighted class probability distribution; summing the class probability distributions according to class dimensions to obtain summarized class probability distributions; and taking the category with the highest probability as a category identification result.
For the classification network, the recognition result of each image is an n-dimensional probability vector, and n is the number of classes of the recognition model, which can be regarded as representing the probability distribution of each class.
As a simplified example, the result of identifying the candidate region a is: a probability of 0.1 is large-sized vehicles, a probability of 0.2 is medium-sized vehicles, and a probability of 0.7 is small-sized vehicles, then the probability vector of the candidate area a is (0.1,0.2, 0.7); the recognition result of the candidate region B is: if the probability of 0.2 is large-sized vehicles, the probability of 0.2 is medium-sized vehicles, and the probability of 0.6 is small-sized vehicles, the probability vector of the candidate region B is (0.2,0.2, 0.6). The weight of the candidate area a is 0.4 and the weight of the candidate area B is 0.6, and the final probability vector is (0.16,0.2,0.64), that is, the probability of 0.16 is a large-sized vehicle, the probability of 0.2 is a medium-sized vehicle, the probability of 0.64 is a small-sized vehicle, and the final determined category recognition result is a small-sized vehicle.
In an embodiment of the present application, in the apparatus, the soft voting weight value is determined according to a merging ratio of each candidate region and the optimal response region.
And calculating the soft voting weight value according to the proportion of the IOU value of each candidate area to the IOU value of the optimal response area. The IOU value of the optimal response region is 1, and the IOU values of the other candidate regions and the optimal response region are { IOU0,IOU1,…IOUkThus the ith oneWeight of the frame
Figure BDA0002391793490000121
Therefore, the weight of the optimal response area is highest, and interference is avoided; but also comprehensively considers the category identification results of other candidate areas. The higher the IOU value is, the more consistent the information contained in the two areas is, and the correction of the recognition result can be performed based on the partial information that is not consistent.
In summary, according to the technical scheme of the application, after the image to be recognized is obtained, the candidate region set corresponding to the target is recognized from the image to be recognized, then the target category recognition is performed on each candidate region in the candidate region set to obtain the category recognition result, and the position recognition result is determined according to the category recognition result. The technical scheme has the advantages that the target position identification and the target category identification are carried out in two stages, the target categories are identified by using the candidate areas, effective information in the image to be identified can be fully utilized, the category identification accuracy is improved, and the target identification effect is improved.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in an object recognition arrangement according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 300 comprises a processor 310 and a memory 320 arranged to store computer executable instructions (computer readable program code). The memory 320 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 storing computer readable program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing the computer readable program code may comprise respective computer readable program codes 331 for respectively implementing various steps in the above method. The computer readable program code 331 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 400 has stored thereon a computer readable program code 331 for performing the steps of the method according to the application, readable by a processor 310 of an electronic device 300, which computer readable program code 331, when executed by the electronic device 300, causes the electronic device 300 to perform the steps of the method described above, in particular the computer readable program code 331 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 331 may be compressed in a suitable form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. An object recognition method, comprising:
acquiring an image to be identified;
performing position identification on the image to be identified to obtain a position identification result, and determining a candidate region set of the image to be identified according to the position identification result;
and performing target category identification on each candidate region in the candidate region set to obtain a category identification result.
2. The method of claim 1, wherein the identifying the position of the image to be identified, and obtaining the position identification result comprises:
identifying a plurality of response areas of a target from the image to be identified through a neural network to obtain a response area set; selecting an optimal response area from the response area set as a position identification result;
the determining the candidate region set of the image to be recognized according to the position recognition result comprises:
selecting a relevant response region from the response region set according to the relevance between the rest response regions in the response region set and the optimal response region; and taking the optimal response region and the associated corresponding region as candidate regions to obtain the candidate region set.
3. The method of claim 2, wherein said selecting an optimal response region from the set of response regions comprises:
and selecting an optimal response area from the response areas according to a non-maximum suppression mode.
4. The method of claim 2, wherein the selecting associated response regions from the set of response regions according to the association of the remaining response regions in the set of response regions with the optimal response region comprises:
respectively calculating the intersection ratio of each residual response area and the optimal response area;
and screening out the associated response regions from the remaining response regions with the intersection ratio falling into the preset interval according to the confidence coefficient.
5. The method of claim 2, wherein the performing object class recognition on each candidate region in the candidate region set to obtain a class recognition result comprises:
respectively carrying out target category identification on each candidate area to obtain a category identification sub-result corresponding to each candidate area;
and performing soft voting on the result of each category identification sub to obtain a final category identification result.
6. The method of claim 5, wherein the performing soft voting on the category identifier sub-results to obtain the final category identifier result comprises:
weighting the class probability distribution representing the class identification sub-result according to the soft voting weight of each candidate region to obtain weighted class probability distribution;
summing the class probability distributions according to class dimensions to obtain summarized class probability distributions;
and taking the category with the highest probability as a category identification result.
7. The method of claim 6, wherein the soft voting weight values are determined according to a union ratio of each candidate region to the optimal response region.
8. An object recognition apparatus comprising:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be recognized;
the recognition unit is used for carrying out position recognition on the image to be recognized to obtain a position recognition result, and determining a candidate area set of the image to be recognized according to the position recognition result; and performing target category identification on each candidate region in the candidate region set to obtain a category identification result.
9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-7.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.
CN202010116980.7A 2020-02-25 2020-02-25 Target identification method and device, electronic equipment and storage medium Pending CN111401359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116980.7A CN111401359A (en) 2020-02-25 2020-02-25 Target identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116980.7A CN111401359A (en) 2020-02-25 2020-02-25 Target identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111401359A true CN111401359A (en) 2020-07-10

Family

ID=71432097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116980.7A Pending CN111401359A (en) 2020-02-25 2020-02-25 Target identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111401359A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643420A (en) * 2021-07-02 2021-11-12 北京三快在线科技有限公司 Three-dimensional reconstruction method and device
CN116229280A (en) * 2023-01-09 2023-06-06 广东省科学院广州地理研究所 Method and device for identifying collapse sentry, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977619A (en) * 2017-11-28 2018-05-01 北京航空航天大学 A kind of EO-1 hyperion object detection method minimized based on integrated study bound energy
US9996890B1 (en) * 2017-07-14 2018-06-12 Synapse Technology Corporation Detection of items
CN108229267A (en) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 Object properties detection, neural metwork training, method for detecting area and device
CN108629354A (en) * 2017-03-17 2018-10-09 杭州海康威视数字技术股份有限公司 Object detection method and device
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
WO2019246250A1 (en) * 2018-06-20 2019-12-26 Zoox, Inc. Instance segmentation inferred from machine-learning model output
CN110807139A (en) * 2019-10-23 2020-02-18 腾讯科技(深圳)有限公司 Picture identification method and device, computer readable storage medium and computer equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229267A (en) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 Object properties detection, neural metwork training, method for detecting area and device
CN108629354A (en) * 2017-03-17 2018-10-09 杭州海康威视数字技术股份有限公司 Object detection method and device
US9996890B1 (en) * 2017-07-14 2018-06-12 Synapse Technology Corporation Detection of items
CN107977619A (en) * 2017-11-28 2018-05-01 北京航空航天大学 A kind of EO-1 hyperion object detection method minimized based on integrated study bound energy
WO2019246250A1 (en) * 2018-06-20 2019-12-26 Zoox, Inc. Instance segmentation inferred from machine-learning model output
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN110807139A (en) * 2019-10-23 2020-02-18 腾讯科技(深圳)有限公司 Picture identification method and device, computer readable storage medium and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643420A (en) * 2021-07-02 2021-11-12 北京三快在线科技有限公司 Three-dimensional reconstruction method and device
CN116229280A (en) * 2023-01-09 2023-06-06 广东省科学院广州地理研究所 Method and device for identifying collapse sentry, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US20210213961A1 (en) Driving scene understanding
CN111428875A (en) Image recognition method and device and corresponding model training method and device
CN110569701B (en) Computer-implemented vehicle damage assessment method and device
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN111680698A (en) Image recognition method and device and training method and device of image recognition model
CN106709475B (en) Obstacle recognition method and device, computer equipment and readable storage medium
CN108009466B (en) Pedestrian detection method and device
CN111507327B (en) Target detection method and device
CN111123927A (en) Trajectory planning method and device, automatic driving equipment and storage medium
CN111401133A (en) Target data augmentation method, device, electronic device and readable storage medium
CN110555347B (en) Vehicle target identification method and device with dangerous cargo-carrying behavior and electronic equipment
CN109934216B (en) Image processing method, device and computer readable storage medium
CN111814835A (en) Training method and device of computer vision model, electronic equipment and storage medium
CN111311540A (en) Vehicle damage assessment method and device, computer equipment and storage medium
CN112906823B (en) Target object recognition model training method, recognition method and recognition device
CN111241969A (en) Target detection method and device and corresponding model training method and device
CN111144315A (en) Target detection method and device, electronic equipment and readable storage medium
CN111160395A (en) Image recognition method and device, electronic equipment and storage medium
CN114627437B (en) Traffic target identification method and system
CN111401359A (en) Target identification method and device, electronic equipment and storage medium
CN109313699A (en) The device and method for carrying out target identification for the input picture to a vehicle
CN112738470A (en) Method for detecting parking in expressway tunnel
Toprak et al. Conditional weighted ensemble of transferred models for camera based onboard pedestrian detection in railway driver support systems
CN111400533A (en) Image screening method and device, electronic equipment and storage medium
US20170053172A1 (en) Image processing apparatus, and image processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200710

WD01 Invention patent application deemed withdrawn after publication