WO2020181872A1 - Object detection method and apparatus, and electronic device - Google Patents

Object detection method and apparatus, and electronic device Download PDF

Info

Publication number
WO2020181872A1
WO2020181872A1 PCT/CN2019/126435 CN2019126435W WO2020181872A1 WO 2020181872 A1 WO2020181872 A1 WO 2020181872A1 CN 2019126435 W CN2019126435 W CN 2019126435W WO 2020181872 A1 WO2020181872 A1 WO 2020181872A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
group
box
detection
preselection
Prior art date
Application number
PCT/CN2019/126435
Other languages
French (fr)
Chinese (zh)
Inventor
李作新
俞刚
袁野
Original Assignee
北京旷视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京旷视科技有限公司 filed Critical 北京旷视科技有限公司
Publication of WO2020181872A1 publication Critical patent/WO2020181872A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application relates to the technical field of image processing, and in particular to an object detection method, device and electronic equipment.
  • Object detection is one of the classic problems in computer vision. Its task is to mark the position of the object in the image with a bounding box and give the object category. From the traditional artificially designed feature plus shallow classifier framework to the end-to-end detection framework based on deep learning, object detection has become more and more mature. At present, under the condition that multiple objects, especially similar objects appear intensively, and there are occlusions between objects, the existing object detection algorithms only consider object detection at the category level, which results in the existing technology not being able to perform well under occlusion. Perform accurate object detection. In the case of mutual occlusion between objects, the method in the prior art often causes the problem that the occluded object and the occluded object cannot be effectively distinguished, resulting in missed detection of the occluded object.
  • this application is to provide an object detection method, device, and electronic equipment. This application alleviates the technical problem that similar objects are prone to miss detection when objects are detected under dense obstruction in the prior art.
  • an embodiment of the present application provides an object detection method, including: acquiring a to-be-processed image containing one or more detection objects; performing object detection on the to-be-processed image to obtain at least one preselected box, wherein
  • the pre-selection frame includes a visible frame and/or a complete frame, the complete frame is an enclosing frame for a detection object as a whole, and the visible frame is an enclosing frame of the visible area of each detection object in the image to be processed;
  • the sexual modeling model determines the group to which each preselection box belongs in the at least one preselection box to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object; each preselection box group is deduplicated Processing to obtain a pre-select box group after the de-duplication processing; determine the target detection frame of each detection object based on the pre-select box group after the de-duplication processing.
  • determining the group to which each preselection box belongs in the at least one preselection box through an associative modeling model, and obtaining at least one preselection box group includes: obtaining the attribute feature projection network of the associated modeling model At least one pre-selected box’s attribute feature vector for each pre-selected box; through the clustering module of the association modeling model, each pre-selected box in the at least one pre-selected box is determined based on the attribute feature vector of each pre-selected box The group belongs to, and the at least one preselection box group is obtained.
  • the instance attribute feature projection network is obtained through Lpull loss function and Lpush loss function training; wherein, the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object is shortened by the Lpull loss function, and the Lpush loss function is used to reduce The distance between the attribute feature vectors of the preselection boxes belonging to different detection objects is widened.
  • the group to which each preselection box belongs to the at least one preselection box is determined based on the attribute feature vector of each preselection box to obtain the at least one preselection box.
  • the group includes: calculating the vector distance value between any two of the attribute feature vectors to obtain multiple vector distance values; adding two preselected boxes that are less than a preset threshold among the multiple vector distance values to the same group, Each other preselection box that is not added to the group is separately regarded as a group; the at least one obtained group is clustered and grouped by a clustering algorithm to obtain the at least one preselection box group.
  • each of the preselection box groups includes a visible box group and a complete box group; performing deduplication processing on each preselection box group to obtain the preselection box after the deduplication processing includes: The visible frame group undergoes de-duplication processing to obtain the visible frame group after the de-duplication processing; determining the target detection frame of each detection object based on the pre-selected frame group after the de-duplication processing includes: based on the visible frame after the de-duplication processing The group and the complete frame group determine the target detection frame of each detection object.
  • performing deduplication processing on the visible frame group in the at least one preselection frame group to obtain the visible frame group after the deduplication processing includes: using a non-maximum value suppression algorithm to deduplicate the visible frame group in the at least one preselection frame group The frame group undergoes de-duplication processing to obtain the visible frame group after the de-duplication processing.
  • determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group includes: performing local features on each visible frame in the visible frame group after the deduplication processing Aligning processing; and performing local feature alignment processing on each complete frame in the complete frame group; inputting the visible frame after the feature alignment processing and the complete frame after the feature alignment processing to the target detection model for detection processing to obtain the The position coordinates and classification probability value of the visible frame after the feature alignment processing, and the position coordinates and classification probability value of the complete frame after the feature alignment processing are obtained; the target detection frame of each detection object is determined based on the target position coordinates and the target classification probability value,
  • the target position coordinates include: the visible frame position coordinates after the feature alignment process and/or the position coordinates of the complete frame after the feature alignment process
  • the target classification probability value includes: after the feature alignment process The classification probability value of the visible frame and/or the classification probability value of the complete frame after the feature alignment processing.
  • determining the target detection frame of each detection object based on the target position coordinates and the target classification probability value includes: using the target classification probability value as the weight of the corresponding target position coordinates; The target position coordinates of the object are calculated with a weighted average value to obtain the target detection frame of the detection object; the target detection frame includes a visible target frame and/or a complete target frame.
  • performing object detection on the to-be-processed image to obtain at least one preselection box includes: inputting the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid; and using a regional candidate network RPN model to analyze the feature pyramid Processing is performed to obtain the at least one preselection box, wherein each preselection box in the at least one preselection box carries an attribute label, and the attribute label is configured to determine the type of each preselection box, and the type includes a complete box And visible frame.
  • an embodiment of the present application also provides an object detection device, including: an image acquisition unit configured to acquire a to-be-processed image containing one or more detection objects; and a preselected frame acquisition unit configured to Object detection is performed on the image to be processed to obtain at least one pre-selected box, wherein the pre-selected box includes a visible frame and/or a complete frame, the complete frame is an enclosing frame for a detection object as a whole, and the visible frame is each detection The bounding box of the visible area of the object in the image to be processed; the grouping unit is configured to determine the group to which each preselection box belongs in the at least one preselection box through the association modeling model, to obtain at least one preselection box group; The preselection boxes in the preselection box group belong to the same detection object; the deduplication unit is configured to perform deduplication processing on each preselection box group to obtain the preselection box group after the deduplication processing; the determining unit is configured to be based on the The preselected box group
  • an embodiment of the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and when the processor executes the computer program Steps to implement the above method.
  • the embodiments of the present application also provide a computer-readable medium having non-volatile program code executable by a processor, and the program code causes the processor to execute the foregoing method.
  • an image to be processed containing one or more detection objects is acquired; then, object detection is performed on the image to be processed to obtain at least one preselected box; next, each of the at least one preselected box is determined
  • the group to which the preselection box belongs has at least one preselection box group.
  • at least one preselection box group is obtained by determining the group to which each preselection box belongs.
  • the preselection boxes in the same preselection box group belong to the same detection object, the preselection boxes belonging to different detection objects are distinguished by the preselection box group Open to prevent the preselection box of the occluded object from being removed as the redundant preselection box of the occluded object during the de-duplication process, which alleviates the technology that similar objects are prone to missed detection when the existing technology detects objects under dense occlusion.
  • the problem is to realize the detection of one or more detection objects in the image to be processed, and effectively avoid the missed detection of the detection objects.
  • the relevance modeling model is realized by the neural network.
  • the feature information and preselection of the image in the preselection box are fully utilized
  • the position information of the frame groups the pre-selected boxes, which can effectively distinguish the pre-selected boxes of different detection objects, especially for dense object occlusion scenes, when the occlusion object and the occluded object have a high degree of overlap in the complete frame, the position can be adjacent Preselection boxes that are similar to the size but belong to different detection objects are accurately grouped.
  • Fig. 1 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • Fig. 2 is a flowchart of an object detection method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a visible frame and a complete frame that densely block similar objects according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of a corresponding relationship between a preselection box and a detection object according to an embodiment of the present application
  • Fig. 5 is a schematic diagram of an object detection device according to an embodiment of the present application.
  • an example electronic device 100 configured to implement the object detection method of an embodiment of the present application is described with reference to FIG. 1.
  • the electronic device 100 includes one or more processors 102, one or more storage devices 104, an input device 106, an output device 108, and a camera 110. These components are connected through a bus system 112 and/or other forms The mechanisms (not shown) are interconnected. It should be noted that the components and structure of the electronic device 100 shown in FIG. 1 are only exemplary and not restrictive, and the electronic device may also have other components and structures as required.
  • the processor 102 may be implemented in at least one hardware form among a digital signal processor (DSP), a field programmable gate array (FPGA), a programmable logic array (PLA), and an ASIC (Application Specific Integrated Circuit), so
  • the processor 102 may be a central processing unit (CPU) or another form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.
  • DSP digital signal processor
  • FPGA field programmable gate array
  • PDA programmable logic array
  • ASIC Application Specific Integrated Circuit
  • the storage device 104 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 102 may run the program instructions to implement the client functions (implemented by the processor) in the embodiments of the present application described below. And/or other desired functions.
  • Various application programs and various data such as various data used and/or generated by the application program, can also be stored in the computer-readable storage medium.
  • the input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, and a touch screen.
  • the output device 108 may output various information (for example, images or sounds) to the outside (for example, a user), and may include one or more of a display, a speaker, and the like.
  • the camera 110 is configured to obtain a to-be-processed image, wherein the to-be-processed image obtained by the camera is processed by the object detection method to obtain a target detection frame of the detected object.
  • the camera can take an image (such as a photo) desired by the user. And video, etc.), and then the image is processed by the object detection method to obtain the target detection frame of the detection object, and the camera may also store the captured image in the memory 104 for use by other components.
  • the example electronic device configured to implement the object detection method according to the embodiment of the present application may be implemented on a mobile terminal such as a smart phone and a tablet computer.
  • an embodiment of an object detection method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, although The logical sequence is shown in the flowchart, but in some cases, the steps shown or described may be performed in a different order than here.
  • Fig. 2 is a flowchart of an object detection method according to an embodiment of the present application. As shown in Fig. 2, the method includes the following steps:
  • Step S202 Obtain a to-be-processed image containing one or more detection objects.
  • the image to be processed may include multiple types of detection objects, for example, including humans and non-humans.
  • non-humans include dynamic objects and static objects
  • dynamic objects may be animal objects.
  • Static objects can be objects that are at rest except humans and animals.
  • each image to be processed there can be multiple categories of objects, and there can be one or more objects of each category. For example, there are 2 people and 3 dogs in the image.
  • the various objects in the image to be processed can be displayed independently of each other, or some of the objects may be blocked by other objects, resulting in not being fully displayed.
  • the detection object may be one or more types of objects in the object detection step to be performed in the image to be processed.
  • the user can determine the category of the detection object according to actual needs, which is not specifically limited in this embodiment.
  • the image to be processed may be an image captured by the camera of the electronic device in Embodiment 1, or may be an image pre-stored in the memory of the electronic device. In this embodiment, There is no specific restriction on this.
  • Step S204 Perform object detection on the image to be processed to obtain at least one preselected box, where the preselected box includes a visible frame and/or a complete frame, and the complete frame is an enclosing frame for an entire detection object.
  • the visible frame is a bounding frame of the visible area of each detection object in the image to be processed.
  • object detection can be performed on the image to be processed through the pre-selection box detection network.
  • the process of object detection for the image to be processed is to perform object detection on the unobstructed detection object in the image to be processed to output a complete frame.
  • the process can also be: object detection for the occluded object in the image to be processed and output the complete frame at the same time And visible frame.
  • Multiple visible frames or multiple complete frames may be generated for the same detection object, and different visible frames or different complete frames may have different scales relative to the image to be processed.
  • Step S206 Determine the group to which each preselection box belongs in the at least one preselection box through the association modeling model to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object.
  • a plurality of preselected boxes are generated for different detected objects, wherein the preselected boxes include visible boxes and/or complete boxes.
  • the pre-selected boxes contained in the detection result are redundant and need to be de-duplicated.
  • each pre-selected box needs to be determined For the group to which it belongs, a preselection box group is obtained corresponding to one group, and at least one preselection box group can be obtained, so that the preselection boxes belonging to different detection objects can be distinguished by the preselection box group.
  • the relevance modeling model is a model that can obtain the relevance of the input data, which can be realized by a neural network. After at least one preselection box is input into the relevance modeling model, the relevance modeling model will be based on the preselection box The feature information of the image is combined with the position information of the preselection box to effectively group the preselection boxes.
  • the preselection boxes belonging to the same detection object can be grouped into a preselection box group. Since the preselection box group of the same detection object may include both visible and complete frames, the The preselection box group can also include a visible box group and a complete box group at the same time.
  • FIG. 4 a schematic diagram of the correspondence between a preselection box and a detection object is shown.
  • the detection object includes an occluded object P and an occluded object Q that is occluded by the occluded object P.
  • the preselection box includes frame No. 7 To the twelfth box 12.
  • the seventh frame 7, the eighth frame 8 and the ninth frame 9 all belong to the occluded object P in the figure, and the tenth frame 10, the eleventh frame 11 and the twelfth frame 12 all belong to the occluded object Q in the figure.
  • the seventh box 7, the eighth box 8 and the ninth box 9 form a preselection box group, and the tenth box 10, the eleventh box 11 and the twelfth box 12 form another preselection box group.
  • the preselection box in each preselection box group can be de-duplicated separately to prevent different objects
  • the confusion of the frames between different objects will occur to prevent the preselected box of the occluded object Q (for example, the tenth box 10) as the redundant preselected box of the occluded object P in the process of deduplication. Removal greatly reduces the probability of missed detection of blocked objects.
  • Step S208 Perform deduplication processing on each preselection box group to obtain the preselection box group after deduplication processing.
  • the preselection box group of each detection object is deduplicated separately.
  • the preselection boxes of different detection objects are not confused with each other. Specifically This avoids removing the preselection box of the occluded object as a redundant preselection box of the occlusion object in the process of deduplication, thereby avoiding the problem of missed detection of the occluded object.
  • Step S210 Determine the target detection frame of each detection object based on the preselected frame group after the de-duplication processing.
  • the target detection frame of each detection object may be determined based on the preselection box group after the deduplication processing. If the detection object is not occluded in the image to be processed, the target detection frame of the detection object includes the target complete frame; if the detection object is occluded in the image to be processed, the target detection frame of the detection object includes the target complete frame and Target visible frame.
  • the target complete frame can be configured to obtain the position information of the detection object and the image feature information of the detection object other than the occluded object; the target visible frame can be configured to obtain the image feature information of the occluded object.
  • the application embodiment can obtain two types of target detection frames, and thus can obtain more comprehensive and more accurate detection object information for subsequent image processing such as recognition and verification.
  • step S202 to step S210 may be executed by the processor in the electronic device in the foregoing embodiment 1.
  • processors capable of executing steps S202 to S210 described above can be applied in the embodiments of the present application, and there is no specific limitation on this.
  • an image to be processed containing one or more detection objects is acquired; then, object detection is performed on the image to be processed to obtain at least one preselected box; next, each of the at least one preselected box is determined
  • the group to which the preselection box belongs has at least one preselection box group.
  • at least one preselection box group is obtained by determining the group to which each preselection box belongs.
  • the preselection boxes in the same preselection box group belong to the same detection object, the preselection boxes belonging to different detection objects are distinguished by the preselection box group Open to prevent the preselection box of the occluded object from being removed as the redundant preselection box of the occluded object during the de-duplication process, which alleviates the technology that similar objects are prone to missed detection when the existing technology detects objects under dense occlusion.
  • the problem is to realize the detection of one or more detection objects in the image to be processed, and effectively avoid the missed detection of the detection objects.
  • the correlation modeling model is implemented by a neural network. After at least one preselection box is input into the correlation modeling model, the feature information of the image in the preselection box and the position information of the preselection box are effectively used to perform the preselection.
  • Grouping can effectively distinguish the pre-selected boxes of different detection objects, especially for dense object occlusion scenes, when the occluded object and the full frame of the occluded object have a high degree of coincidence, the location and size are similar, but they belong to different detections Preselection boxes of objects are accurately grouped.
  • the image to be processed containing one or more detection objects is first acquired. After that, object detection can be performed on the image to be processed to obtain at least one preselection box.
  • step S204 performing object detection on the image to be processed to obtain at least one preselected box includes the following steps:
  • Step S2041 input the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid;
  • Step S2042 Process the feature pyramid by using a regional candidate network RPN (Region Proposal Networks) model to obtain the at least one preselection box, wherein each preselection box in the at least one preselection box carries an attribute label, and The attribute label is configured to determine the type of each pre-selected box, the type includes a complete box and a visible box.
  • RPN Registered Proposal Networks
  • the feature pyramid network is configured to generate a feature pyramid.
  • Basic network models such as VGG (Visual Geometry Group) 16 model, Resnet or FPN (Feature Pyramid Networks) can be selected as the feature pyramid network.
  • the image to be processed may be input into the feature pyramid network for processing to obtain a feature pyramid.
  • the regional candidate network RPN (Region Proposal Networks) model to process the feature pyramid
  • the basic network model for example, FPN
  • the RPN model is trained together.
  • the preset training set includes multiple training samples, and each training sample includes: a training image and its corresponding image label.
  • the image label is configured to mark the type of the preselection box in the training image, and the type includes a complete box or a visible box.
  • multiple training samples can be used to train the RPN model, so that the RPN model can recognize and identify the preselection box type in the image.
  • the trained regional candidate network RPN model can be used to process the feature pyramid to obtain at least one preselection box, and each preselection box An attribute label, which is configured to characterize whether the preselected box is a visible box or a complete box.
  • the attribute label may be expressed as “1” or “2", for example, “1” indicates that the preselected box is a visible frame, and “2" indicates that the preselected box is a complete frame.
  • other data that can be recognized by the machine can also be selected as the attribute tag, which is not specifically limited in this embodiment.
  • the group to which each preselection box belongs in the at least one preselection box can be determined, and at least one preselection box group can be obtained.
  • step S206 determining the group to which each preselection box belongs to the at least one preselection box through the association modeling model, and obtaining at least one preselection box group includes the following steps:
  • Step S11 Obtain the attribute feature vector of each preselection box in the at least one preselection box through the instance attribute feature projection network of the association modeling model;
  • Step S12 through the clustering module of the association modeling model, determine the group to which each preselection box belongs in the at least one preselection box based on the attribute feature vector of each preselection box to obtain the at least one preselection box group.
  • the association modeling model may be an associate embedding model.
  • the instance attribute feature projection network in the association modeling model can be an embedding encoding (also called embedded coding) network.
  • Input at least one preselection box into the embedding encoding network in the association modeling model, and return a corresponding attribute feature vector for each preselection box, and each preselection box corresponds to an attribute feature vector. Then, the preselection boxes of the same detection object are grouped into the same group by the clustering module according to the attribute feature vector, and different groups correspond to different detection objects.
  • the Associate embedding model Before using the Associate embedding model to determine the group to which each preselection box belongs, it is also necessary to train the embedding encoding network in the Associate embedding model. To determine which attribute feature vector is output by the embedding encoding network. In the training process, the constraint condition of the above attribute feature vector is the distance between the attribute feature vectors, which can be Euclidean distance or cosine distance.
  • the first constraint condition can be used to narrow the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object, and then the preselection boxes belonging to the same detection object can be added to the same group through the attribute feature vector; through the second constraint
  • the condition extends the distance between the attribute feature vectors of the preselection boxes belonging to different detection objects, and then the preselection boxes belonging to different detection objects are added to different groups through the attribute feature vectors.
  • the first constraint condition may be the Lpull loss function
  • the second constraint condition may be the Lpush loss function.
  • the Lpull loss function can be used to train the embedding encoding network first, and then the Lpush loss function can be used to train the embedding encoding network to extend the distance; it is also possible to use both the Lpull loss function and the Lpush loss function to train the embedding encoding network.
  • the above Lpull loss function has the form: Among them, M is the number of attribute feature vectors, e k and e j both represent arbitrary attribute feature vectors, and C m represents the number of attribute feature vectors corresponding to the corresponding detection object; the above Lpush loss function has the form: Among them, M is the number of attribute feature vectors, e k and e j both represent arbitrary attribute feature vectors, and ⁇ represents a preset distance value.
  • the embedding encoding network training is completed, and after the preselection box is obtained through the regional candidate network RPN model, the embedding encoding network is used to obtain the attribute feature vector of each preselection box, that is, the embedding value (embedding value) is obtained.
  • the embedding value can be an N-dimensional vector, and an N-dimensional vector is obtained for each preselection box.
  • the N-dimensional vector can be expressed as: [a 1 , a 2 ,..., a N ].
  • the purpose of obtaining the attribute feature vector is to distinguish different object instances (instances, that is, detection objects) in the preselection box.
  • the feature vector needs to have an instance-level distinguishing ability and be able to distinguish each detection object. It is not only the distinguishing ability at the category level (to distinguish the types of detection objects), so there are certain requirements for the selection of the feature extraction network, and the attribute feature vector embedding value obtained by the instance attribute feature projection network has a good instance level Distinguish ability.
  • the attribute feature vector embedding encoding is generated by directly optimizing the association relationship of the actual preselection box using the association modeling model with grouping relationship, Associate embedding, and is directly optimized according to the preselection box grouping task, so it can be obtained more directly And good performance improvement.
  • the instance attribute feature projection network is implemented by a neural network, which can be integrated with the detection network of the preselection box (such as the feature pyramid network and the regional candidate network RPN), and the two share the basic features of the network, reducing the amount of calculation.
  • the detection network training process of the preselection box can be directly combined with the instance attribute feature projection network to realize the joint training of the two overall networks without adding other external information, and the training process is relatively simple.
  • the Euclidean distance between two N-dimensional vectors can be judged by setting a preset threshold. For example, for the preset threshold x, if the Euclidean distance between the N-dimensional vectors of two different preselection boxes is less than x, the distance between the two preselection boxes is considered to be small, and they are considered to belong to the same group.
  • the detection object to which each preselection box belongs can be accurately determined, thereby further reducing the probability of missed detection of the detection object.
  • the clustering module of the association modeling model determines the group to which each preselection box belongs to the at least one preselection box based on the attribute feature vector of each preselection box to obtain the at least one preselection box
  • the frame group can be realized by the following implementation:
  • Step S1 Calculate the vector distance value between any two of the attribute feature vectors to obtain multiple vector distance values
  • Step S2 adding two preselected boxes that are less than a preset threshold among the plurality of vector distance values to the same group, and each other preselected box that is not added to the group is separately regarded as a group;
  • Step S3 clustering the at least one obtained group by a clustering algorithm, to obtain the at least one preselected box group.
  • the above-mentioned embedding encoding network is used to regress all preselection boxes to obtain the attribute feature vector, and the vector distance value between any two attribute feature vectors is calculated respectively.
  • the vector distance value can be calculated by the Euclidean distance equal distance calculation method .
  • the size of the preset threshold can be determined according to actual needs or experience, which is not specifically limited in this embodiment. If the vector distance value is less than the preset threshold, it can be determined that the vector distance value is the target vector distance value. It is considered that the two preselected boxes corresponding to the target vector distance value correspond to the same detection object. Therefore, the target vector distance value corresponds to the target vector distance value. Two preselected boxes are added to the same group. The preselected boxes corresponding to the attribute feature vectors whose vector distances from other attribute feature vectors are not less than the preset threshold value are respectively regarded as a group. Thus, at least one group can be obtained.
  • the at least one obtained group is clustered and grouped by a clustering algorithm.
  • the clustering algorithm can be a commonly used algorithm, for example, it can be a K-means clustering algorithm (K-means) or a mean shift clustering algorithm.
  • the embedding encoding algorithm is used to return the attribute feature vector, that is, the embedding value.
  • the attribute feature vector that is, the embedding value.
  • the preselection boxes f6 and f7 are respectively regarded as a group.
  • the grouping result includes four groups. One group includes preselection boxes f1, f2, and f3; one group includes preselection boxes f4, f5, and f8; one group includes preselection box f6; and one group includes preselection box f7. According to the obtained 4 groups and then cluster grouping, 4 preselection box groups can be obtained.
  • the preselection boxes f1-f4 are used to return their attribute feature vectors by using the embedding encoding algorithm, that is, the embedding values are a1.
  • vector a1 and vector a4 are considered to belong to the same detection object in A, B or C; if vector a1 and vector a2
  • the vector distance value between vector a1 and vector a3 and vector a2 and vector a3 is not less than the preset threshold, it is considered that the vectors a1, a2, and a3 do not belong to the same detection object. If the vector is still satisfied
  • the vector distance value between a2 and vector a4 and vector a3 and vector a4 is not less than the preset threshold.
  • vector a2 belongs to one of the detection objects of A, B or C; vector a3 belongs to different among A, B or C
  • the detection target for a2 is also different from the detection target corresponding to the vector a1 and the vector a4. That is, the grouping result obtained may be: a1 and vector a4 belong to A, vector a2 belongs to B, and vector a3 belongs to C.
  • each preselection box group After determining the group to which each preselection box belongs in the at least one preselection box and obtaining at least one preselection box group, each preselection box group can be deduplicated to obtain the preselection box group after the deduplication processing; and based on The preselected box group after the deduplication process determines the target detection box of each detection object.
  • each preselection box group may include a visible box group and a complete box group.
  • step S208 performs deduplication processing on each preselection box group, and obtaining the preselection box after the deduplication processing includes:
  • the visible frame group in a preselected frame group is deduplicated to obtain a visible frame group after the deduplication process.
  • the visible frame group after the deduplication process may include a visible frame or a group of visible frames.
  • Step S210 determining the target detection frame of each detection object based on the preselected frame group after the deduplication processing includes: determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group .
  • an image to be processed containing one or more detection objects is acquired; then, object detection is performed on the image to be processed to obtain at least one preselected box; then, each of the at least one preselected box is determined At least one preselection box group is obtained from the group to which each preselection box belongs; next, the visible box group in at least one preselection box group is deduplicated to obtain the visible box group after the deduplication processing; finally, based on the deduplication processing The visible frame group and the complete frame group determine the target detection frame of each detection object.
  • the detection objects identified in the embodiments of the present application may be densely present in the image to be processed, resulting in a higher degree of coincidence of the complete frames of the detection objects.
  • the visible frame group after deduplication and the complete frame group without deduplication can be input into the R-CNN model for object detection, and then the target detection frame of each detection object can be obtained.
  • the visible frame group after deduplication and the complete frame group without deduplication are used as the input of the R-CNN model.
  • object detection is performed again, for the occluded object, only The visible frame group or the complete frame group is used as the input of the R-CNN model to improve the efficiency of detection, and the visible frame group and the complete frame group can also be used as the input of the R-CNN model together to improve the accuracy of detection.
  • This embodiment There is no specific restriction on this.
  • the step of performing deduplication processing on the visible frame group in the at least one preselected frame group, and obtaining the visible frame group after the deduplication processing includes: using a non-maximum value suppression algorithm to At least one visible frame group in the preselected frame group is subjected to deduplication processing to obtain the visible frame group after the deduplication processing.
  • a non-maximum suppression algorithm (nms) is used to remove redundant preselection boxes from the preselection box group, and by setting the threshold value in the nms algorithm, the visible box group in the preselection box group is performed Duplicate processing. After the pre-selected frame group of each detection object is obtained, since each complete frame in the complete frame group has a high degree of overlap, the complete frame may not be deduplicated. Therefore, the nms algorithm is used to de-duplicate only the visible frame group, and the visible frame group after the de-duplication process is obtained. That is, in this embodiment, after the preselection frame group of the detection object is obtained, if the preselection frame group includes the visible frame group and the complete frame group, the visible frame group of the detection object can be deduplicated.
  • nms non-maximum suppression algorithm
  • the first frame 1 and the third frame 3 on the left are complete frames of the occluded object P and the occluded object Q, respectively.
  • the nms algorithm is used to de-duplicate only the preselection boxes of all detection objects of the same type, and it is impossible to distinguish and recognize the instances (different detection objects) well.
  • the dashed frame No. 2 frame 2 is the visible frame of the occluded object Q. It can be seen that the overlap between the second frame 2 of the visible part of the occluded object Q and the first frame 1 of the occluded object P is significantly smaller than the third frame 3. The degree of coincidence with the first frame 1, therefore, the occluded object P and the occluded object Q can be distinguished by the second frame 2, and the second frame 2 as the visible frame and the third frame 3 as the complete frame are bound , Becomes a pre-selected frame group, avoiding the redundant removal of the third frame 3 as the occlusion object P during the de-duplication process.
  • the calculation process can be simplified, the calculation speed and calculation accuracy of the R-CNN model can be improved, and a more accurate target detection frame can be obtained.
  • the step of determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group includes:
  • Step S21 performing local feature alignment processing on each visible frame in the visible frame group after the deduplication processing; and performing local feature alignment processing on each complete frame in the complete frame group;
  • Step S22 Input the visible frame after the feature alignment processing and the complete frame after the feature alignment processing to the target detection model for detection processing to obtain the visible frame position coordinates and classification probability value after the feature alignment processing, and obtain the feature alignment The position coordinates and classification probability value of the complete box after processing;
  • Step S23 Determine the target detection frame of each detection object based on the target position coordinates and the target classification probability value, wherein the target position coordinates include: the visible frame position coordinates after the feature alignment processing and/or the feature alignment processing After the position coordinates of the complete frame, the target classification probability value includes: the classification probability value of the visible frame after the feature alignment processing and/or the classification probability value of the complete frame after the feature alignment processing.
  • first, local feature alignment processing is performed on each visible frame in the visible frame group and each complete frame in the complete frame group.
  • the purpose of the local feature alignment processing is to adjust each visible frame in the visible frame group and each complete frame in the complete frame group to the same size.
  • the above-mentioned target detection model can be an R-CNN model.
  • the visible frame after the alignment process and the complete frame after the alignment process can be used to determine its corresponding The target detection frame of the detection object.
  • the visible frame after the alignment process and/or the complete frame after the alignment process can be used as the input of the target detection model (such as the R-CNN model).
  • the target detection model such as the R-CNN model.
  • the visible frame or complete frame included in each detection object can be separately fused according to their target position coordinates and target classification probability value.
  • the fused visible frame Or the fused complete frame is the target detection frame of the corresponding detection object.
  • its target detection frame is its final complete frame
  • the final complete frame is a detection frame obtained by fusing one or more complete frames
  • its target detection frame is it
  • the final complete frame and the final visible frame of, the final visible frame is a detection frame obtained by fusing one or more visible frames.
  • the complete frame and the visible frame are respectively merged to obtain the final complete frame and the final visible frame.
  • the visible frame after the feature alignment process can be used as the input of the target detection model, or only the complete frame after the feature alignment process can be used as the input of the target detection model.
  • the visible frame and the complete frame after the feature alignment process are used as the input of the target detection model, which is not specifically limited in this embodiment.
  • step S23 determining the target detection frame of each detection object based on the target position coordinates and the target classification probability value includes the following steps:
  • Step S231 Use the target classification probability value as the weight of the corresponding target position coordinate
  • Step S232 Calculate a weighted average of the target position coordinates of each detection object according to the target classification probability value to obtain the target detection frame of the detection object; the target detection frame includes the final visible frame and/or the final complete frame.
  • the target position coordinates of the visible frame indicate the corresponding position information of the visible frame in the image to be processed, and the target classification probability value of the visible frame indicates the evaluation of the detection processing result of the visible frame.
  • the target position coordinates of the complete frame indicate the corresponding position information of the complete frame in the image to be processed, and the target classification probability value of the complete frame indicates the evaluation of the detection processing result of the complete frame.
  • the higher the target classification probability value the better the detection processing result of the visible frame or the complete frame. Therefore, to give it a higher weight, the target classification probability value can be used as the weight value to calculate the weighted average of the target position coordinates.
  • the target detection frame of the object is obtained, and the target detection frame obtained by the weighted average method combines the comprehensive detection processing evaluation results of each visible frame or complete frame, and the position of the target detection frame is also closer to the actual position of the detection object.
  • the target detection frame is an accurate visible frame or an accurate complete frame of the final detected object.
  • the precise visible frame is the smallest bounding frame that can accurately describe the maximum visible area of the occluded detection object.
  • performing local feature alignment processing on each visible frame in the visible frame group after the de-duplication processing includes the following steps:
  • Step S31 selecting a first target feature map in the feature pyramid
  • Step S32 Perform feature cropping on the first target feature map in the feature pyramid based on each visible frame in the visible frame group after the de-duplication processing to obtain a first cropping result; perform feature cropping on the first cropping result Local feature alignment processing.
  • the first target feature map refers to the feature map corresponding to the visible frame in the visible frame group in the feature pyramid. Since the feature pyramid contains feature maps of different scales, the feature maps of different scales are obtained by scaling the image to be processed in different proportions through the pyramid network.
  • the visible frame can be scaled according to the scaling ratio of the first target feature map relative to the image to be processed, and the scaled first target feature map is determined
  • the position of the visible frame is then obtained, and the feature and its position information in the first target feature map corresponding to the position are obtained as the first cropping result.
  • the ROI Align module in Mask RCNN can be used to crop the features corresponding to the visible frame, and then the RCNN model can be used to perform further local feature alignment processing on the first cropping result.
  • performing local feature alignment processing on each complete frame in the complete frame group includes the following steps:
  • Step S41 selecting a second target feature map in the feature pyramid
  • Step S42 performing feature cropping on the second target feature map in the feature pyramid based on each complete frame in the complete frame group to obtain a second cropping result
  • Step S43 Perform local feature alignment processing on the second cropping result.
  • the second target feature map refers to the feature map corresponding to the complete frame in the complete frame group in the feature pyramid. Since the feature pyramid contains feature maps of different scales, the feature maps of different scales are obtained by scaling the image to be processed in different proportions.
  • the complete frame is based on the second target feature
  • the image is scaled relative to the scale of the image to be processed, and the position of the scaled complete frame is determined in the second target feature map, and the feature and its position information in the second target feature map corresponding to the position are acquired, As the second cropping result.
  • the ROI Align module in Mask RCNN can be used to crop the features corresponding to the visible frame, and then the RCNN model can be used to perform further local feature alignment processing on the second cropping result.
  • the method provided by the embodiment of this application can distinguish and recognize the detected objects well.
  • the visible frame and the complete frame are used as the regression target in the RPN stage.
  • the hidden variable (embedding value) is distinguished, so as not only to distinguish between different The pre-selection box of the object of the category, and also distinguish the pre-selection box of different detection objects, and then use R-CNN to regress the deduplication results, and merge the regression results of different detection objects to obtain the final detection result, thus Realize the recognition of occluded objects under dense occlusion, avoiding missed detection of occluded objects.
  • the embodiment of the present application also provides an object detection device, which is mainly configured to implement the object detection method provided in the above-mentioned content of the embodiment of the present application.
  • the object detection device provided by the embodiment of the present application will be specifically introduced below.
  • Fig. 5 is a schematic diagram of an object detection device according to an embodiment of the present application.
  • the object detection device mainly includes an image acquisition unit 10, a preselection frame acquisition unit 20, a grouping unit 30, a deduplication unit 40, and a determination Unit 50, where:
  • the image acquisition unit 10 is configured to acquire a to-be-processed image containing one or more detection objects;
  • the pre-selected frame obtaining unit 20 is configured to perform object detection on the image to be processed to obtain at least one pre-selected frame, wherein the pre-selected frame includes a visible frame and/or a complete frame, and the complete frame is an overall detection object A bounding frame of, where the visible frame is a bounding frame of the visible area of each detection object in the image to be processed;
  • the grouping unit 30 is configured to determine the group to which each preselection box in the at least one preselection box belongs through the association modeling model to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object;
  • the deduplication unit 40 is configured to perform deduplication processing on each preselection box group to obtain the preselection box group after the deduplication processing;
  • the determining unit 50 is configured to determine the target detection frame of each detection object based on the preselected frame group after the deduplication processing.
  • the image to be processed containing one or more detection objects is first acquired, and then object detection is performed on the image to be processed to obtain at least one preselection box. Next, each preselection in the at least one preselection box is determined At least one preselection box group is obtained for the group to which the box belongs, and redundant preselection boxes are removed by deduplicating the preselection box group, and the preselection box group after deduplication is obtained, so that each preselection box group is determined based on the preselection box group after the deduplication processing.
  • a target detection frame of a detection object thereby realizing the detection of one or more detection objects in the image to be processed, effectively avoiding missed detection of the detection object.
  • each of the pre-selected box groups includes a visible box group and a complete box group; the deduplication unit 40 is further configured to: perform de-duplication processing on the visible box group in the at least one pre-selected box group to obtain de-duplication The visible frame group after processing; determining the target detection frame of each detection object based on the preselected frame group after the de-duplication processing includes: determining each detection frame based on the visible frame group after the de-duplication processing and the complete frame group The target detection frame of the object.
  • the preselection frame acquisition unit 20 is further configured to: input the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid; and use the regional candidate network RPN model to process the feature pyramid to obtain the At least one pre-selected box, wherein each pre-selected box in the at least one pre-selected box carries an attribute label configured to determine the type of each pre-selected box, and the type includes a complete box and a visible box.
  • the grouping unit 30 determines the group to which each preselection box belongs to the at least one preselection box through the relevance modeling model, and obtaining at least one preselection group includes: projecting characteristics of the instance attributes of the relevance modeling model
  • the network obtains the attribute feature vector of each preselection box in the at least one preselection box;
  • the clustering module of the association modeling model determines the at least one preselection box based on the attribute feature vector of each preselection box
  • the group to which each preselection box belongs obtains the at least one preselection box group.
  • the instance attribute feature projection network is obtained through Lpull loss function and Lpush loss function training; wherein, the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object is shortened through the Lpull loss function, and the Lpush loss function is used Extend the distance between the attribute feature vectors of the preselection boxes belonging to different detection objects.
  • the grouping unit 30 calculates the vector distance value between any two of the attribute feature vectors through the clustering module of the association modeling model to obtain multiple vector distance values;
  • the two pre-selected boxes that are smaller than the preset threshold in the group are added to the same group, and each other pre-selected box that is not added to the group is separately regarded as a group; at least one of the obtained groups is clustered and grouped by a clustering algorithm to obtain all Describes at least one preselection box group.
  • the deduplication unit 40 is further configured to perform deduplication processing on the visible frame group in the at least one preselected frame group by using a non-maximum value suppression algorithm to obtain the visible frame group after the deduplication processing.
  • the determining unit 50 is further configured to: perform local feature alignment processing on each visible frame in the visible frame group after the de-duplication processing; and perform local feature alignment on each complete frame in the complete frame group Processing; input the visible frame after the feature alignment processing and the complete frame after the feature alignment processing into the target detection model for detection processing, obtain the visible frame position coordinates and classification probability value after the feature alignment processing, and obtain the feature alignment processing The position coordinates and classification probability value of the complete frame afterwards; the target detection frame of each detection object is determined based on the target position coordinates and the target classification probability value, wherein the target position coordinates include: the visible frame position after the feature alignment processing The coordinates and/or the position coordinates of the complete frame after the feature alignment processing, the target classification probability value includes: the classification probability value of the visible frame after the feature alignment processing and/or the complete frame after the feature alignment processing The classification probability value of.
  • the determining unit 50 is further configured to: use the target classification probability value as the weight of the corresponding target position coordinates; calculate a weighted average value for the target position coordinates of each detection object according to the target classification probability value ,
  • the target detection frame of the detection object is obtained; the target detection frame includes the final visible frame and/or the final complete frame.
  • the feature pyramid includes a plurality of feature maps
  • the determining unit 50 is further configured to: select a first target feature map in the feature pyramid; and based on each of the visible frame groups after the de-duplication processing.
  • a visible frame performs feature cropping on the first target feature map in the feature pyramid to obtain a first cropping result; performing local feature alignment processing on the first cropping result.
  • the feature pyramid includes a plurality of feature maps
  • the determining unit 50 is further configured to: perform local feature alignment processing on each complete frame in the complete frame group, including: selecting a second feature map in the feature pyramid. Target feature map; perform feature cropping on the second target feature map in the feature pyramid based on each complete frame in the complete frame group to obtain a second cropping result; perform local feature alignment processing on the second cropping result.
  • the terms “installed”, “connected” and “connected” should be interpreted broadly, for example, they may be fixed or detachable connections. , Or integrally connected; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediate medium, and it can be the internal communication between two components.
  • installed should be interpreted broadly, for example, they may be fixed or detachable connections. , Or integrally connected; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediate medium, and it can be the internal communication between two components.
  • the computer program product of an object detection method includes a computer-readable storage medium storing non-volatile program code executable by a processor, and instructions included in the program code can be configured to execute
  • a computer-readable storage medium storing non-volatile program code executable by a processor, and instructions included in the program code can be configured to execute
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation.
  • multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some communication interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a nonvolatile computer readable storage medium executable by a processor.
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

Abstract

The present application relates to the technical field of image recognition and provided thereby are an object detection method and apparatus, and an electronic device. The method comprises: obtaining an image to be processed that comprises one or more detection objects; performing object detection on the image to be processed to obtain at least one pre-selection box, the pre-selection box comprising a visible box and/or a complete box, the complete box being a bounding box for an entire detection object, and the visible box being a bounding box of each detection object in a visible region in the image to be processed; by means of a correlation modeling model, determining a group to which each pre-selection box among the at least one pre-selection box belongs so as to obtain at least one pre-selection box group, pre-selection boxes in the same pre-selection box group belonging to the same detection object; performing de-duplication processing on each pre-selection box group to obtain a pre-selection box group after de-duplication processing; and determining a target detection box of each detection object on the basis of the pre-selection box group after de-duplication processing. In the present application, the missed detection of a detection object can be effectively avoided.

Description

一种物体检测方法、装置及电子设备Object detection method, device and electronic equipment
相关申请的交叉引用Cross references to related applications
本申请要求于2019年03月12日提交中国专利局的申请号为CN201910186133.5、名称为“一种物体检测方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 12, 2019, with the application number CN201910186133.5 and titled "An object detection method, device and electronic equipment", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及图像处理的技术领域,尤其是涉及一种物体检测方法、装置及电子设备。This application relates to the technical field of image processing, and in particular to an object detection method, device and electronic equipment.
背景技术Background technique
物体检测是计算机视觉中的经典问题之一,其任务是用包围框标记出图像中物体的位置,并给出物体的类别。从传统的人工设计特征加浅层分类器的框架,到基于深度学习的端到端的检测框架,物体检测变得愈加成熟。目前,在出现多物体特别是同类物体密集出现,且物体之间出现遮挡的条件下,现有的物体检测算法仅考虑类别层次的物体检测,导致现有技术无法很好地在遮挡的情况下进行精确的物体检测。在物体之间互相遮挡的情况下,由于现有技术中的方法经常会产生被遮挡物体与遮挡物体无法进行有效区分的问题,从而导致被遮挡物体漏检。Object detection is one of the classic problems in computer vision. Its task is to mark the position of the object in the image with a bounding box and give the object category. From the traditional artificially designed feature plus shallow classifier framework to the end-to-end detection framework based on deep learning, object detection has become more and more mature. At present, under the condition that multiple objects, especially similar objects appear intensively, and there are occlusions between objects, the existing object detection algorithms only consider object detection at the category level, which results in the existing technology not being able to perform well under occlusion. Perform accurate object detection. In the case of mutual occlusion between objects, the method in the prior art often causes the problem that the occluded object and the occluded object cannot be effectively distinguished, resulting in missed detection of the occluded object.
发明内容Summary of the invention
有鉴于此,本申请的目的在于提供一种物体检测方法、装置及电子设备,本申请缓解了现有技术在物体密集遮挡情况下进行物体检测时,同类物体容易出现漏检的技术问题。In view of this, the purpose of this application is to provide an object detection method, device, and electronic equipment. This application alleviates the technical problem that similar objects are prone to miss detection when objects are detected under dense obstruction in the prior art.
第一方面,本申请实施例提供了一种物体检测方法,包括:获取包含一个或多个检测对象的待处理图像;对所述待处理图像进行物体检测,得到至少一个预选框,其中,所述预选框包括可见框和/或完整框,所述完整框为对一个检测对象整体的包围框,所述可见框为每个检测对象在所述待处理图像中可见区域的包围框;通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组;相同预选框组中的预选框属于相同的检测对象;对每个预选框组进行去重处理,得到去重处理之后的预选框组;基于所述去重处理之后的预选框组确定每个检测对象的目标检测框。In a first aspect, an embodiment of the present application provides an object detection method, including: acquiring a to-be-processed image containing one or more detection objects; performing object detection on the to-be-processed image to obtain at least one preselected box, wherein The pre-selection frame includes a visible frame and/or a complete frame, the complete frame is an enclosing frame for a detection object as a whole, and the visible frame is an enclosing frame of the visible area of each detection object in the image to be processed; The sexual modeling model determines the group to which each preselection box belongs in the at least one preselection box to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object; each preselection box group is deduplicated Processing to obtain a pre-select box group after the de-duplication processing; determine the target detection frame of each detection object based on the pre-select box group after the de-duplication processing.
进一步地,通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组包括:通过所述关联性建模模型的实例属性特征投影网络获得所述至少一个预选框中每个预选框的属性特征向量;通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组。Further, determining the group to which each preselection box belongs in the at least one preselection box through an associative modeling model, and obtaining at least one preselection box group includes: obtaining the attribute feature projection network of the associated modeling model At least one pre-selected box’s attribute feature vector for each pre-selected box; through the clustering module of the association modeling model, each pre-selected box in the at least one pre-selected box is determined based on the attribute feature vector of each pre-selected box The group belongs to, and the at least one preselection box group is obtained.
进一步地,所述实例属性特征投影网络通过Lpull损失函数和Lpush损失函数训练获得;其中,通过Lpull损失函数将属于同一个检测对象的预选框的属性特征向量的距离拉近,通过Lpush损失函数将属于不同检测对象的预选框的属性特征向量的距离拉远。Further, the instance attribute feature projection network is obtained through Lpull loss function and Lpush loss function training; wherein, the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object is shortened by the Lpull loss function, and the Lpush loss function is used to reduce The distance between the attribute feature vectors of the preselection boxes belonging to different detection objects is widened.
进一步地,通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向 量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组包括:计算任意两个所述属性特征向量之间的向量距离值,得到多个向量距离值;将所述多个向量距离值中小于预设阈值的两个预选框添加至相同的分组,未添加至分组中的其他每一个预选框分别单独作为一个分组;通过聚类算法对得到的至少一个分组进行聚类分组,得到所述至少一个预选框组。Further, through the clustering module of the association modeling model, the group to which each preselection box belongs to the at least one preselection box is determined based on the attribute feature vector of each preselection box to obtain the at least one preselection box The group includes: calculating the vector distance value between any two of the attribute feature vectors to obtain multiple vector distance values; adding two preselected boxes that are less than a preset threshold among the multiple vector distance values to the same group, Each other preselection box that is not added to the group is separately regarded as a group; the at least one obtained group is clustered and grouped by a clustering algorithm to obtain the at least one preselection box group.
进一步地,每个所述预选框组包括可见框组和完整框组;对每个预选框组进行去重处理,得到去重处理之后的预选框包括:对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组;基于所述去重处理之后的预选框组确定每个检测对象的目标检测框包括:基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框。Further, each of the preselection box groups includes a visible box group and a complete box group; performing deduplication processing on each preselection box group to obtain the preselection box after the deduplication processing includes: The visible frame group undergoes de-duplication processing to obtain the visible frame group after the de-duplication processing; determining the target detection frame of each detection object based on the pre-selected frame group after the de-duplication processing includes: based on the visible frame after the de-duplication processing The group and the complete frame group determine the target detection frame of each detection object.
进一步地,对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组包括:利用非极大值抑制算法对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组。Further, performing deduplication processing on the visible frame group in the at least one preselection frame group to obtain the visible frame group after the deduplication processing includes: using a non-maximum value suppression algorithm to deduplicate the visible frame group in the at least one preselection frame group The frame group undergoes de-duplication processing to obtain the visible frame group after the de-duplication processing.
进一步地,基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框包括:对所述去重处理之后的可见框组中的各个可见框进行局部特征对齐处理;以及对所述完整框组中的各个完整框进行局部特征对齐处理;将特征对齐处理之后的可见框和特征对齐处理之后的完整框输入至目标物检测模型进行检测处理,得到所述特征对齐处理之后的可见框位置坐标和分类概率值,以及得到特征对齐处理之后的完整框的位置坐标和分类概率值;基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框,其中,所述目标位置坐标包括:所述特征对齐处理之后的可见框位置坐标和/或所述特征对齐处理之后的完整框的位置坐标,所述目标分类概率值包括:所述特征对齐处理之后的可见框的分类概率值和/或所述特征对齐处理之后的完整框的分类概率值。Further, determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group includes: performing local features on each visible frame in the visible frame group after the deduplication processing Aligning processing; and performing local feature alignment processing on each complete frame in the complete frame group; inputting the visible frame after the feature alignment processing and the complete frame after the feature alignment processing to the target detection model for detection processing to obtain the The position coordinates and classification probability value of the visible frame after the feature alignment processing, and the position coordinates and classification probability value of the complete frame after the feature alignment processing are obtained; the target detection frame of each detection object is determined based on the target position coordinates and the target classification probability value, Wherein, the target position coordinates include: the visible frame position coordinates after the feature alignment process and/or the position coordinates of the complete frame after the feature alignment process, and the target classification probability value includes: after the feature alignment process The classification probability value of the visible frame and/or the classification probability value of the complete frame after the feature alignment processing.
进一步地,基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框包括:将所述目标分类概率值作为对应的目标位置坐标的权重;根据所述目标分类概率值对每个检测对象的所述目标位置坐标计算加权平均值,得到所述检测对象的目标检测框;所述目标检测框包括目标可见框和/或目标完整框。Further, determining the target detection frame of each detection object based on the target position coordinates and the target classification probability value includes: using the target classification probability value as the weight of the corresponding target position coordinates; The target position coordinates of the object are calculated with a weighted average value to obtain the target detection frame of the detection object; the target detection frame includes a visible target frame and/or a complete target frame.
进一步地,对所述待处理图像进行物体检测,得到至少一个预选框包括:将所述待处理图像输入到特征金字塔网络中进行处理,得到特征金字塔;利用区域候选网络RPN模型对所述特征金字塔进行处理,得到所述至少一个预选框,其中,所述至少一个预选框中的每个预选框携带属性标签,所述属性标签被配置成确定每个预选框所属类型,所述类型包括完整框和可见框。Further, performing object detection on the to-be-processed image to obtain at least one preselection box includes: inputting the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid; and using a regional candidate network RPN model to analyze the feature pyramid Processing is performed to obtain the at least one preselection box, wherein each preselection box in the at least one preselection box carries an attribute label, and the attribute label is configured to determine the type of each preselection box, and the type includes a complete box And visible frame.
第二方面,本申请实施例还提供了一种物体检测装置,包括:图像获取单元,被配置成获取包含一个或多个检测对象的待处理图像;预选框获取单元,被配置成对所述待处理图像进行物体检测,得到至少一个预选框,其中,所述预选框包括可见框和/或完整框,所述完整框为对一个检测对象整体的包围框,所述可见框为每个检测对象在所述待处理图像中可见区域的包围框;分组单元,被配置成通过关联性建模模型确定所述至少一个预选框 中每个预选框所属的分组,得到至少一个预选框组;相同预选框组中的预选框属于相同的检测对象;去重单元,被配置成对每个预选框组进行去重处理,得到去重处理之后的预选框组;确定单元,被配置成基于所述去重处理之后的预选框组确定每个检测对象的目标检测框。In a second aspect, an embodiment of the present application also provides an object detection device, including: an image acquisition unit configured to acquire a to-be-processed image containing one or more detection objects; and a preselected frame acquisition unit configured to Object detection is performed on the image to be processed to obtain at least one pre-selected box, wherein the pre-selected box includes a visible frame and/or a complete frame, the complete frame is an enclosing frame for a detection object as a whole, and the visible frame is each detection The bounding box of the visible area of the object in the image to be processed; the grouping unit is configured to determine the group to which each preselection box belongs in the at least one preselection box through the association modeling model, to obtain at least one preselection box group; The preselection boxes in the preselection box group belong to the same detection object; the deduplication unit is configured to perform deduplication processing on each preselection box group to obtain the preselection box group after the deduplication processing; the determining unit is configured to be based on the The preselected box group after the deduplication process determines the target detection box of each detection object.
第三方面,本申请实施例还提供了一种电子设备,包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述方法的步骤。In a third aspect, an embodiment of the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and when the processor executes the computer program Steps to implement the above method.
第四方面,本申请实施例还提供了一种具有处理器可执行的非易失的程序代码的计算机可读介质,所述程序代码使所述处理器执行上述方法。In a fourth aspect, the embodiments of the present application also provide a computer-readable medium having non-volatile program code executable by a processor, and the program code causes the processor to execute the foregoing method.
在本申请实施例中,首先,获取包含一个或多个检测对象的待处理图像;然后,对待处理图像进行物体检测,得到至少一个预选框;接下来,确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组。本申请实施例通过确定每个预选框所属的分组,得到至少一个预选框组,由于相同预选框组中的预选框属于相同的检测对象,从而通过预选框组将属于不同检测对象的预选框区分开,防止在去重过程中将被遮挡对象的预选框作为遮挡对象的冗余预选框被去除,缓解了现有技术在物体密集遮挡情况下进行物体检测时,同类物体容易出现漏检的技术问题,实现了对待处理图像中一个或多个检测对象的检测,并有效地避免检测对象的漏检的目的。In the embodiment of the present application, firstly, an image to be processed containing one or more detection objects is acquired; then, object detection is performed on the image to be processed to obtain at least one preselected box; next, each of the at least one preselected box is determined The group to which the preselection box belongs has at least one preselection box group. In the embodiment of the application, at least one preselection box group is obtained by determining the group to which each preselection box belongs. Since the preselection boxes in the same preselection box group belong to the same detection object, the preselection boxes belonging to different detection objects are distinguished by the preselection box group Open to prevent the preselection box of the occluded object from being removed as the redundant preselection box of the occluded object during the de-duplication process, which alleviates the technology that similar objects are prone to missed detection when the existing technology detects objects under dense occlusion. The problem is to realize the detection of one or more detection objects in the image to be processed, and effectively avoid the missed detection of the detection objects.
同时,通过关联性建模模型确定至少一个预选框组,关联性建模模型由神经网络实现,将至少一个预选框输入到关联性建模模型后,充分利用预选框内图像的特征信息和预选框的位置信息对预选框进行分组,能够有效区分不同检测对象的预选框,特别是对于密集物体遮挡场景中,遮挡对象和被遮挡对象的完整框重合度较高的情况下,能够对位置邻近和尺寸相似,但属于不同检测对象的预选框进行准确分组。At the same time, at least one preselection box group is determined through the relevance modeling model. The relevance modeling model is realized by the neural network. After at least one preselection box is input to the relevance modeling model, the feature information and preselection of the image in the preselection box are fully utilized The position information of the frame groups the pre-selected boxes, which can effectively distinguish the pre-selected boxes of different detection objects, especially for dense object occlusion scenes, when the occlusion object and the occluded object have a high degree of overlap in the complete frame, the position can be adjacent Preselection boxes that are similar to the size but belong to different detection objects are accurately grouped.
本公开的其他特征和优点将在随后的说明书中阐述,或者,部分特征和优点可以从说明书推知或毫无疑义地确定,或者通过实施本公开的上述技术即可得知。Other features and advantages of the present disclosure will be described in the following specification, or some of the features and advantages can be inferred from the specification or determined without doubt, or can be learned by implementing the above-mentioned technology of the present disclosure.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and understandable, preferred embodiments are described in detail below in conjunction with accompanying drawings.
附图说明Description of the drawings
为了更清楚地说明本申请具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of this application or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the specific embodiments or the description of the prior art. Obviously, the appendix in the following description The drawings are some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是根据本申请实施例的一种电子设备的示意图;Fig. 1 is a schematic diagram of an electronic device according to an embodiment of the present application;
图2是根据本申请实施例的一种物体检测方法的流程图;Fig. 2 is a flowchart of an object detection method according to an embodiment of the present application;
图3是根据本申请实施例的一种密集遮挡同类物体的可见框与完整框示意图;FIG. 3 is a schematic diagram of a visible frame and a complete frame that densely block similar objects according to an embodiment of the present application;
图4是根据本申请实施例的一种预选框与检测对象对应关系示意图;FIG. 4 is a schematic diagram of a corresponding relationship between a preselection box and a detection object according to an embodiment of the present application;
图5是根据本申请实施例的一种物体检测装置的示意图。Fig. 5 is a schematic diagram of an object detection device according to an embodiment of the present application.
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present application, not all of them.的实施例。 Example. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
实施例一:Example one:
首先,参照图1来描述被配置成实现本申请实施例的物体检测方法的示例电子设备100。First, an example electronic device 100 configured to implement the object detection method of an embodiment of the present application is described with reference to FIG. 1.
如图1所示,电子设备100包括一个或多个处理器102、一个或多个存储装置104、输入装置106、输出装置108以及摄像机110,这些组件通过总线系统112和/或其它形式的连接机构(未示出)互连。应当注意,图1所示的电子设备100的组件和结构只是示例性的,而非限制性的,根据需要,所述电子设备也可以具有其他组件和结构。As shown in FIG. 1, the electronic device 100 includes one or more processors 102, one or more storage devices 104, an input device 106, an output device 108, and a camera 110. These components are connected through a bus system 112 and/or other forms The mechanisms (not shown) are interconnected. It should be noted that the components and structure of the electronic device 100 shown in FIG. 1 are only exemplary and not restrictive, and the electronic device may also have other components and structures as required.
所述处理器102可以采用数字信号处理器(DSP)、现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)和ASIC(Application Specific Integrated Circuit)中的至少一种硬件形式来实现,所述处理器102可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制所述电子设备100中的其它组件以执行期望的功能。The processor 102 may be implemented in at least one hardware form among a digital signal processor (DSP), a field programmable gate array (FPGA), a programmable logic array (PLA), and an ASIC (Application Specific Integrated Circuit), so The processor 102 may be a central processing unit (CPU) or another form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.
所述存储装置104可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘和闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器102可以运行所述程序指令,以实现下文所述的本申请实施例中(由处理器实现)的客户端功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。The storage device 104 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), for example. The non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 102 may run the program instructions to implement the client functions (implemented by the processor) in the embodiments of the present application described below. And/or other desired functions. Various application programs and various data, such as various data used and/or generated by the application program, can also be stored in the computer-readable storage medium.
所述输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。The input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, and a touch screen.
所述输出装置108可以向外部(例如,用户)输出各种信息(例如,图像或声音),并且可以包括显示器和扬声器等中的一个或多个。The output device 108 may output various information (for example, images or sounds) to the outside (for example, a user), and may include one or more of a display, a speaker, and the like.
所述摄像机110被配置成获取待处理图像,其中,摄像机所获取的待处理图像经过所述物体检测方法进行处理之后得到检测对象的目标检测框,例如,摄像机可以拍摄用户期望的图像(例如照片和视频等),然后,将该图像经过所述物体检测方法进行处理之后得到检测对象的目标检测框,摄像机还可以将所拍摄的图像存储在所述存储器104中以供其它组件使用。The camera 110 is configured to obtain a to-be-processed image, wherein the to-be-processed image obtained by the camera is processed by the object detection method to obtain a target detection frame of the detected object. For example, the camera can take an image (such as a photo) desired by the user. And video, etc.), and then the image is processed by the object detection method to obtain the target detection frame of the detection object, and the camera may also store the captured image in the memory 104 for use by other components.
示例性地,被配置成实现根据本申请实施例的物体检测方法的示例电子设备可以被实现为诸如智能手机和平板电脑等移动终端上。Exemplarily, the example electronic device configured to implement the object detection method according to the embodiment of the present application may be implemented on a mobile terminal such as a smart phone and a tablet computer.
实施例二:Embodiment two:
根据本申请实施例,提供了一种物体检测方法的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to the embodiments of the present application, an embodiment of an object detection method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, although The logical sequence is shown in the flowchart, but in some cases, the steps shown or described may be performed in a different order than here.
图2是根据本申请实施例的一种物体检测方法的流程图,如图2所示,该方法包括如下步骤:Fig. 2 is a flowchart of an object detection method according to an embodiment of the present application. As shown in Fig. 2, the method includes the following steps:
步骤S202,获取包含一个或多个检测对象的待处理图像。Step S202: Obtain a to-be-processed image containing one or more detection objects.
在本申请实施例中,待处理图像中可以包括多种类别的检测对象,例如,包括人类和非人类,其中,非人类包括动态的物体和静态的物体,动态的物体可以是动物类的物体,静态的物体可以是除了人类和动物之外的其他处于静止状态的物体。In the embodiments of the present application, the image to be processed may include multiple types of detection objects, for example, including humans and non-humans. Among them, non-humans include dynamic objects and static objects, and dynamic objects may be animal objects. , Static objects can be objects that are at rest except humans and animals.
在每个待处理图像中,可以包含多种类别的物体,每种类别的物体可以有一个或多个,例如图像中有2个人和3只狗。待处理图像中的各类物体之间可以相互独立显示,也可能其中一些物体被另外的物体遮挡,而导致不能完全显示出。In each image to be processed, there can be multiple categories of objects, and there can be one or more objects of each category. For example, there are 2 people and 3 dogs in the image. The various objects in the image to be processed can be displayed independently of each other, or some of the objects may be blocked by other objects, resulting in not being fully displayed.
需要说明的是,检测对象可以为待处理图像中待执行物体检测步骤的一种或多种类别的物体。用户可以根据实际需要确定检测对象的类别,本实施例不做具体限定。It should be noted that the detection object may be one or more types of objects in the object detection step to be performed in the image to be processed. The user can determine the category of the detection object according to actual needs, which is not specifically limited in this embodiment.
进一步需要说明的是,在本实施例中,待处理图像可以为通过实施例一中的电子设备的摄像机拍摄得到的图像,还可以为预先存储在电子设备中的存储器中的图像,本实施例对此不做具体限定。It should be further noted that, in this embodiment, the image to be processed may be an image captured by the camera of the electronic device in Embodiment 1, or may be an image pre-stored in the memory of the electronic device. In this embodiment, There is no specific restriction on this.
步骤S204,对所述待处理图像进行物体检测,得到至少一个预选框,其中,所述预选框包括可见框和/或完整框,所述完整框为对一个检测对象整体的包围框,所述可见框为每个检测对象在所述待处理图像中可见区域的包围框。Step S204: Perform object detection on the image to be processed to obtain at least one preselected box, where the preselected box includes a visible frame and/or a complete frame, and the complete frame is an enclosing frame for an entire detection object. The visible frame is a bounding frame of the visible area of each detection object in the image to be processed.
在本申请实施例中,在获取到待处理图像之后,可以通过预选框检测网络对待处理图像进行物体检测。对待处理图像进行物体检测的过程为对待处理图像中不被遮挡的检测对象进行物体检测,以输出完整框,该过程还可以为:对待处理图像中被遮挡的对象进行物体检测,同时输出完整框和可见框。In the embodiment of the present application, after the image to be processed is acquired, object detection can be performed on the image to be processed through the pre-selection box detection network. The process of object detection for the image to be processed is to perform object detection on the unobstructed detection object in the image to be processed to output a complete frame. The process can also be: object detection for the occluded object in the image to be processed and output the complete frame at the same time And visible frame.
对同一个检测对象可能生成多个可见框或多个完整框,不同的可见框或不同的完整框相对于待处理图像可能有不同比例的放缩。Multiple visible frames or multiple complete frames may be generated for the same detection object, and different visible frames or different complete frames may have different scales relative to the image to be processed.
步骤S206,通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组;相同预选框组中的预选框属于相同的检测对象。Step S206: Determine the group to which each preselection box belongs in the at least one preselection box through the association modeling model to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object.
在本申请实施例中,经过物体检测后,针对不同检测对象分别生成多个预选框,其中,该预选框中包含可见框和/或完整框。通常,检测结果包含的预选框是冗余的,需要进行去重处理,为了防止在去重过程中将被遮挡对象的预选框作为遮挡对象的冗余预选框被去除,需要确定每个预选框所属的分组,根据一个分组对应得到一个预选框组,可以得到至少一个预选框组,从而通过预选框组将属于不同检测对象的预选框区分开。所述关联性建模模型是一种能够获得输入数据的关联关系的模型,可以由神经网络实现,将至少一个预选框 输入到关联性建模模型后,关联性建模模型会依据预选框内图像的特征信息,并结合预选框的位置信息对预选框进行有效分组。In the embodiment of the present application, after object detection, a plurality of preselected boxes are generated for different detected objects, wherein the preselected boxes include visible boxes and/or complete boxes. Usually, the pre-selected boxes contained in the detection result are redundant and need to be de-duplicated. In order to prevent the pre-selected box of the occluded object from being removed as the redundant pre-selected box of the occluded object during the de-duplication process, each pre-selected box needs to be determined For the group to which it belongs, a preselection box group is obtained corresponding to one group, and at least one preselection box group can be obtained, so that the preselection boxes belonging to different detection objects can be distinguished by the preselection box group. The relevance modeling model is a model that can obtain the relevance of the input data, which can be realized by a neural network. After at least one preselection box is input into the relevance modeling model, the relevance modeling model will be based on the preselection box The feature information of the image is combined with the position information of the preselection box to effectively group the preselection boxes.
通过上述方式将至少一个预选框进行分组,能够将属于同一个检测对象的预选框组成一个预选框组,由于同一个检测对象的预选框组中可能同时包括可见框和完整框,该检测对象的预选框组中也可同时包括一个可见框组和一个完整框组。By grouping at least one preselection box in the above manner, the preselection boxes belonging to the same detection object can be grouped into a preselection box group. Since the preselection box group of the same detection object may include both visible and complete frames, the The preselection box group can also include a visible box group and a complete box group at the same time.
需要说明的是,如图4所示的一种预选框与检测对象对应关系示意图,图中,检测对象包括遮挡对象P和被遮挡对象P遮挡的被遮挡对象Q,预选框包括七号框7至十二号框12。七号框7、八号框8和九号框9均属于图中遮挡对象P,十号框10、十一号框11和十二号框12均属于图中被遮挡对象Q。七号框7、八号框8和九号框9组成一个预选框组,十号框10、十一号框11和十二号框12组成另一个预选框组。It should be noted that, as shown in FIG. 4, a schematic diagram of the correspondence between a preselection box and a detection object is shown. In the figure, the detection object includes an occluded object P and an occluded object Q that is occluded by the occluded object P. The preselection box includes frame No. 7 To the twelfth box 12. The seventh frame 7, the eighth frame 8 and the ninth frame 9 all belong to the occluded object P in the figure, and the tenth frame 10, the eleventh frame 11 and the twelfth frame 12 all belong to the occluded object Q in the figure. The seventh box 7, the eighth box 8 and the ninth box 9 form a preselection box group, and the tenth box 10, the eleventh box 11 and the twelfth box 12 form another preselection box group.
在得到七号框7至十二号框12中每个预选框所属的分组,并得到预选框组后,可以对每个预选框组中的预选框分别进行去重处理,防止在不同对象的预选框重合度较高的情况下,出现不同对象之间框的混淆,防止在去重过程中将被遮挡对象Q的预选框(例如十号框10)作为遮挡对象P的冗余预选框被去除,大大降低了对被遮挡物体漏检的概率。After obtaining the grouping of each preselection box in the 7th box 7 to the 12th box 12, and obtaining the preselection box group, the preselection box in each preselection box group can be de-duplicated separately to prevent different objects In the case of a high degree of coincidence of the preselected boxes, the confusion of the frames between different objects will occur to prevent the preselected box of the occluded object Q (for example, the tenth box 10) as the redundant preselected box of the occluded object P in the process of deduplication. Removal greatly reduces the probability of missed detection of blocked objects.
步骤S208,对每个预选框组进行去重处理,得到去重处理之后的预选框组。Step S208: Perform deduplication processing on each preselection box group to obtain the preselection box group after deduplication processing.
在本申请实施例中,在确定了预选框所属的对象之后,对每个检测对象的预选框组分别进行去重处理,通过分组去重,避免了不同检测对象的预选框相互混淆,具体地,避免了去重过程中将被遮挡对象的预选框作为遮挡对象的冗余预选框去除掉,进而避免出现被遮挡物体的漏检的问题。In the embodiment of the present application, after the object to which the preselection box belongs is determined, the preselection box group of each detection object is deduplicated separately. By grouping and deduplication, the preselection boxes of different detection objects are not confused with each other. Specifically This avoids removing the preselection box of the occluded object as a redundant preselection box of the occlusion object in the process of deduplication, thereby avoiding the problem of missed detection of the occluded object.
步骤S210,基于所述去重处理之后的预选框组确定每个检测对象的目标检测框。Step S210: Determine the target detection frame of each detection object based on the preselected frame group after the de-duplication processing.
在本申请实施例中,在得到去重处理之后的预选框组之后,可以基于去重处理之后的预选框组确定每个检测对象的目标检测框。如果检测对象在待处理图像中未被遮挡,则该检测对象的目标检测框包括目标完整框;如果检测对象在待处理图像中被遮挡,则该检测对象的目标检测框中包括目标完整框和目标可见框。所述目标完整框可被配置成获得检测对象的位置信息,以及除被遮挡对象以外的检测对象的图像特征信息;所述目标可见框可被配置成获得被遮挡对象的图像特征信息,由于本申请实施例能够获得两种类型的目标检测框,进而能够获得更加全面且更加准确的检测对象的信息,以用于后续的识别和验证等图像处理。In the embodiment of the present application, after the preselection box group after the deduplication processing is obtained, the target detection frame of each detection object may be determined based on the preselection box group after the deduplication processing. If the detection object is not occluded in the image to be processed, the target detection frame of the detection object includes the target complete frame; if the detection object is occluded in the image to be processed, the target detection frame of the detection object includes the target complete frame and Target visible frame. The target complete frame can be configured to obtain the position information of the detection object and the image feature information of the detection object other than the occluded object; the target visible frame can be configured to obtain the image feature information of the occluded object. The application embodiment can obtain two types of target detection frames, and thus can obtain more comprehensive and more accurate detection object information for subsequent image processing such as recognition and verification.
在本申请实施例中,可以通过上述实施例一中电子设备中的处理器来执行上述步骤S202至步骤S210。In the embodiment of the present application, the foregoing step S202 to step S210 may be executed by the processor in the electronic device in the foregoing embodiment 1.
需要说明的是,能够执行上述步骤S202至步骤S210的处理器均可以应用在本申请实施例中,对此不作具体限定。It should be noted that all processors capable of executing steps S202 to S210 described above can be applied in the embodiments of the present application, and there is no specific limitation on this.
在本申请实施例中,首先,获取包含一个或多个检测对象的待处理图像;然后,对待处理图像进行物体检测,得到至少一个预选框;接下来,确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组。本申请实施例通过确定每个预选框所属的分组,得到至少一个预选框组,由于相同预选框组中的预选框属于相同的检测对象,从而通 过预选框组将属于不同检测对象的预选框区分开,防止在去重过程中将被遮挡对象的预选框作为遮挡对象的冗余预选框被去除,缓解了现有技术在物体密集遮挡情况下进行物体检测时,同类物体容易出现漏检的技术问题,实现了对待处理图像中一个或多个检测对象的检测,并有效地避免检测对象的漏检的目的。In the embodiment of the present application, firstly, an image to be processed containing one or more detection objects is acquired; then, object detection is performed on the image to be processed to obtain at least one preselected box; next, each of the at least one preselected box is determined The group to which the preselection box belongs has at least one preselection box group. In the embodiment of the application, at least one preselection box group is obtained by determining the group to which each preselection box belongs. Since the preselection boxes in the same preselection box group belong to the same detection object, the preselection boxes belonging to different detection objects are distinguished by the preselection box group Open to prevent the preselection box of the occluded object from being removed as the redundant preselection box of the occluded object during the de-duplication process, which alleviates the technology that similar objects are prone to missed detection when the existing technology detects objects under dense occlusion. The problem is to realize the detection of one or more detection objects in the image to be processed, and effectively avoid the missed detection of the detection objects.
此外,在密集物体遮挡场景下,遮挡对象和被遮挡对象的完整框重合度较高,仅仅通过完整框的位置和尺寸等信息,无法对不同检测对象的完整框进行有效区分,分组效果差,进而无法对完整框进行有效的去重。本申请实施例中,关联性建模模型由神经网络实现,将至少一个预选框输入到关联性建模模型后,有效地利用预选框内图像的特征信息和预选框的位置信息对预选框进行分组,能够有效区分不同检测对象的预选框,特别是对于密集物体遮挡场景中,遮挡对象和被遮挡对象的完整框重合度较高的情况下,能够对位置邻近和尺寸相似,但属于不同检测对象的预选框进行准确分组。In addition, in a dense object occlusion scene, the complete frame of the occluded object and the occluded object have a high degree of overlap. Only by using the position and size of the complete frame, the complete frame of different detection objects cannot be effectively distinguished, and the grouping effect is poor. Therefore, the complete frame cannot be effectively removed. In the embodiment of the present application, the correlation modeling model is implemented by a neural network. After at least one preselection box is input into the correlation modeling model, the feature information of the image in the preselection box and the position information of the preselection box are effectively used to perform the preselection. Grouping can effectively distinguish the pre-selected boxes of different detection objects, especially for dense object occlusion scenes, when the occluded object and the full frame of the occluded object have a high degree of coincidence, the location and size are similar, but they belong to different detections Preselection boxes of objects are accurately grouped.
下面将结合具体的实施方式对本申请实施例进行详细的介绍。The embodiments of the present application will be described in detail below in conjunction with specific implementation manners.
通过上述描述可知,在本实施例中,首先获取包含一个或多个检测对象的待处理图像。之后,就可以对待处理图像进行物体检测,得到至少一个预选框。It can be seen from the above description that in this embodiment, the image to be processed containing one or more detection objects is first acquired. After that, object detection can be performed on the image to be processed to obtain at least one preselection box.
在一个可选的实施方式中,步骤S204,对待处理图像进行物体检测,得到至少一个预选框包括如下步骤:In an optional implementation, step S204, performing object detection on the image to be processed to obtain at least one preselected box includes the following steps:
步骤S2041,将所述待处理图像输入到特征金字塔网络中进行处理,得到特征金字塔;Step S2041, input the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid;
步骤S2042,利用区域候选网络RPN(Region Proposal Networks)模型对所述特征金字塔进行处理,得到所述至少一个预选框,其中,所述至少一个预选框中的每个预选框携带属性标签,所述属性标签被配置成确定每个预选框所属类型,所述类型包括完整框和可见框。Step S2042: Process the feature pyramid by using a regional candidate network RPN (Region Proposal Networks) model to obtain the at least one preselection box, wherein each preselection box in the at least one preselection box carries an attribute label, and The attribute label is configured to determine the type of each pre-selected box, the type includes a complete box and a visible box.
通过上述描述可知,在本申请实施例中,特征金字塔网络被配置成生成特征金字塔。可以选用如VGG(Visual Geometry Group)16模型,Resnet或FPN(Feature Pyramid Networks)等基础网络模型作为特征金字塔网络。在本实施例中,可以将待处理图像输入到特征金字塔网络中进行处理,得到特征金字塔。It can be seen from the above description that in the embodiment of the present application, the feature pyramid network is configured to generate a feature pyramid. Basic network models such as VGG (Visual Geometry Group) 16 model, Resnet or FPN (Feature Pyramid Networks) can be selected as the feature pyramid network. In this embodiment, the image to be processed may be input into the feature pyramid network for processing to obtain a feature pyramid.
在利用区域候选网络RPN(Region Proposal Networks)模型对特征金字塔进行处理之前,需要通过预设训练集对区域候选网络RPN模型进行训练,本实施例中,可以将基础网络模型(例如,FPN)和RPN模型一起进行训练。其中,预设训练集中包括多个训练样本,每个训练样本包括:训练图像及其对应的图像标签。其中,图像标签被配置成标记训练图像中预选框的类型,该类型包括完整框或者可见框。在本申请中,可以使用多个训练样本对RPN模型进行训练,以使RPN模型能够识别并标识图像中的预选框类型。Before using the regional candidate network RPN (Region Proposal Networks) model to process the feature pyramid, it is necessary to train the regional candidate network RPN model through a preset training set. In this embodiment, the basic network model (for example, FPN) and The RPN model is trained together. Among them, the preset training set includes multiple training samples, and each training sample includes: a training image and its corresponding image label. Wherein, the image label is configured to mark the type of the preselection box in the training image, and the type includes a complete box or a visible box. In this application, multiple training samples can be used to train the RPN model, so that the RPN model can recognize and identify the preselection box type in the image.
在利用上述预设训练集对基础网络模型和区域候选网络RPN模型进行训练之后,就可以利用训练之后的区域候选网络RPN模型对特征金字塔进行处理,得到至少一个预选框,以及每个预选框的属性标签,该属性标签被配置成表征该预选框是可见框,还是完整框。After training the basic network model and the regional candidate network RPN model using the above-mentioned preset training set, the trained regional candidate network RPN model can be used to process the feature pyramid to obtain at least one preselection box, and each preselection box An attribute label, which is configured to characterize whether the preselected box is a visible box or a complete box.
具体地,该属性标签可以表示为“1”或“2”,例如,“1”表示该预选框为可见框,“2”表示该预选框为完整框。除了“1”和“2”之外,还可以选用机器能够识别的其他数据作 为属性标签,本实施例中不做具体限定。Specifically, the attribute label may be expressed as "1" or "2", for example, "1" indicates that the preselected box is a visible frame, and "2" indicates that the preselected box is a complete frame. In addition to "1" and "2", other data that can be recognized by the machine can also be selected as the attribute tag, which is not specifically limited in this embodiment.
在本实施例中,通过区域候选网络RPN模型对特征金字塔进行处理的方式,能够得到更加准确的预选框检测结果。In this embodiment, by processing the feature pyramid through the regional candidate network RPN model, more accurate preselection box detection results can be obtained.
在得到更加准确的预选框检测结果之后,就可以确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组。After obtaining a more accurate preselection box detection result, the group to which each preselection box belongs in the at least one preselection box can be determined, and at least one preselection box group can be obtained.
在一个可选的实施方式中,步骤S206,通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组包括如下步骤:In an optional embodiment, step S206, determining the group to which each preselection box belongs to the at least one preselection box through the association modeling model, and obtaining at least one preselection box group includes the following steps:
步骤S11,通过所述关联性建模模型的实例属性特征投影网络获得所述至少一个预选框中每个预选框的属性特征向量;Step S11: Obtain the attribute feature vector of each preselection box in the at least one preselection box through the instance attribute feature projection network of the association modeling model;
步骤S12,通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组。Step S12, through the clustering module of the association modeling model, determine the group to which each preselection box belongs in the at least one preselection box based on the attribute feature vector of each preselection box to obtain the at least one preselection box group.
在本申请实施例中,关联性建模模型可以为Associate embedding模型。关联性建模模型中的实例属性特征投影网络可以为embedding encoding(也称,嵌入编码)网络。将至少一个预选框输入到关联性建模模型中的embedding encoding网络,为每个预选框回归相应的属性特征向量,每个预选框对应一个属性特征向量。然后,通过聚类模块依据属性特征向量使同一检测对象的预选框分到同一个分组,不同的分组对应不同的检测对象。In the embodiment of the present application, the association modeling model may be an associate embedding model. The instance attribute feature projection network in the association modeling model can be an embedding encoding (also called embedded coding) network. Input at least one preselection box into the embedding encoding network in the association modeling model, and return a corresponding attribute feature vector for each preselection box, and each preselection box corresponds to an attribute feature vector. Then, the preselection boxes of the same detection object are grouped into the same group by the clustering module according to the attribute feature vector, and different groups correspond to different detection objects.
在利用Associate embedding模型确定每个预选框所属的分组之前,还需要对Associate embedding模型中的embedding encoding网络进行训练。以确定embedding encoding网络输出何种属性特征向量。训练过程中,上述属性特征向量的约束条件为属性特征向量之间的距离,其可以是欧式距离或余弦距离等。可以通过第一个约束条件将属于同一个检测对象的预选框的属性特征向量的距离拉近,进而通过属性特征向量将属于同一个检测对象的预选框添加至同一个分组;通过第二个约束条件将属于不同检测对象的预选框的属性特征向量的距离拉远,进而通过属性特征向量将属于不同检测对象的预选框添加至不同的分组。具体地,第一个约束条件可以为Lpull损失函数,第二个约束条件可以为Lpush损失函数。可以先通过Lpull损失函数对embedding encoding网络进行距离拉近训练,再通过Lpush损失函数对embedding encoding网络进行距离拉远训练;也可以同时利用Lpull损失函数和Lpush损失函数对embedding encoding网络进行训练。Before using the Associate embedding model to determine the group to which each preselection box belongs, it is also necessary to train the embedding encoding network in the Associate embedding model. To determine which attribute feature vector is output by the embedding encoding network. In the training process, the constraint condition of the above attribute feature vector is the distance between the attribute feature vectors, which can be Euclidean distance or cosine distance. The first constraint condition can be used to narrow the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object, and then the preselection boxes belonging to the same detection object can be added to the same group through the attribute feature vector; through the second constraint The condition extends the distance between the attribute feature vectors of the preselection boxes belonging to different detection objects, and then the preselection boxes belonging to different detection objects are added to different groups through the attribute feature vectors. Specifically, the first constraint condition may be the Lpull loss function, and the second constraint condition may be the Lpush loss function. The Lpull loss function can be used to train the embedding encoding network first, and then the Lpush loss function can be used to train the embedding encoding network to extend the distance; it is also possible to use both the Lpull loss function and the Lpush loss function to train the embedding encoding network.
需要说明的是,上述Lpull损失函数形如:
Figure PCTCN2019126435-appb-000001
其中,M为属性特征向量的个数,e k和e j均表示任意的属性特征向量,C m表示相应检测对象对应的属性特征向量的个数;上述Lpush损失函数形如:
Figure PCTCN2019126435-appb-000002
其中,M为属性特征向量的个数,e k和e j均表示任意的属性特征向量,Δ表示预设的距离值。
It should be noted that the above Lpull loss function has the form:
Figure PCTCN2019126435-appb-000001
Among them, M is the number of attribute feature vectors, e k and e j both represent arbitrary attribute feature vectors, and C m represents the number of attribute feature vectors corresponding to the corresponding detection object; the above Lpush loss function has the form:
Figure PCTCN2019126435-appb-000002
Among them, M is the number of attribute feature vectors, e k and e j both represent arbitrary attribute feature vectors, and Δ represents a preset distance value.
embedding encoding网络训练完成,并在通过区域候选网络RPN模型得到预选框之后,使用embedding encoding网络获得各个预选框的属性特征向量,即得到embedding value(嵌入值)。embedding value可以为N维向量,对每个预选框得到一个N维向量,该N维向量可以表示为:[a 1,a 2,…,a N]。 The embedding encoding network training is completed, and after the preselection box is obtained through the regional candidate network RPN model, the embedding encoding network is used to obtain the attribute feature vector of each preselection box, that is, the embedding value (embedding value) is obtained. The embedding value can be an N-dimensional vector, and an N-dimensional vector is obtained for each preselection box. The N-dimensional vector can be expressed as: [a 1 , a 2 ,..., a N ].
在本申请实施例中,获得属性特征向量的目的是区分预选框内不同的物体实例(instance,也即,检测对象),该特征向量需要具有实例级别的区分能力,能区分每一个检测对象,而不仅仅是类别级别的区分能力(区分检测对象的种类),所以对特征提取网络的选取有一定的要求,而实例属性特征投影网络获得的属性特征向量embedding value,具备很好的实例级别的区分能力。In the embodiment of the present application, the purpose of obtaining the attribute feature vector is to distinguish different object instances (instances, that is, detection objects) in the preselection box. The feature vector needs to have an instance-level distinguishing ability and be able to distinguish each detection object. It is not only the distinguishing ability at the category level (to distinguish the types of detection objects), so there are certain requirements for the selection of the feature extraction network, and the attribute feature vector embedding value obtained by the instance attribute feature projection network has a good instance level Distinguish ability.
另外,属性特征向量embedding encoding的生成是利用具有分组关系关联性建模模型Associate embedding根据实际预选框的关联关系直接优化得到的,是直接根据预选框分组任务进行优化的,因此可以得到更为直接和良好的性能提升。In addition, the attribute feature vector embedding encoding is generated by directly optimizing the association relationship of the actual preselection box using the association modeling model with grouping relationship, Associate embedding, and is directly optimized according to the preselection box grouping task, so it can be obtained more directly And good performance improvement.
进一步地,实例属性特征投影网络由神经网络实现,能够与预选框的检测网络(例如特征金字塔网络和区域候选网络RPN)进行融合,二者共享网络的基础特征,减少计算量。并且,在预选框的检测网络训练过程中可以直接与实例属性特征投影网络进行结合,实现二者整体网络的联合训练,无需增加其他外部信息,训练过程比较简单。Furthermore, the instance attribute feature projection network is implemented by a neural network, which can be integrated with the detection network of the preselection box (such as the feature pyramid network and the regional candidate network RPN), and the two share the basic features of the network, reducing the amount of calculation. In addition, the detection network training process of the preselection box can be directly combined with the instance attribute feature projection network to realize the joint training of the two overall networks without adding other external information, and the training process is relatively simple.
进一步地,在得到上述N维向量之后,就可以通过比较两个不同的预选框的N维向量之间欧氏距离的大小,来判断这两个不同的预选框是否属于同一个分组,即确定这两个不同的预选框是否属于同一个检测对象。Further, after obtaining the above-mentioned N-dimensional vector, it is possible to judge whether the two different pre-select boxes belong to the same group by comparing the size of the Euclidean distance between the N-dimensional vectors of the two different pre-select boxes. Whether the two different pre-select boxes belong to the same detection object.
可以通过设置预设阈值来判断两个N维向量之间欧式距离的大小。例如,对于预设阈值x,如果两个不同的预选框的N维向量之间欧氏距离小于x,则认为这两个预选框之间的距离较小,认为他们属于同一个分组。The Euclidean distance between two N-dimensional vectors can be judged by setting a preset threshold. For example, for the preset threshold x, if the Euclidean distance between the N-dimensional vectors of two different preselection boxes is less than x, the distance between the two preselection boxes is considered to be small, and they are considered to belong to the same group.
针对其他预选框,均采用上述方式确定其所属的分组,此处不再一一介绍。For other preselected boxes, the above method is used to determine the group to which they belong, and they will not be introduced here.
通过上述处理方式,能够准确地确定每个预选框所属的检测对象,从而进一步降低检测对象漏检的概率。Through the above processing method, the detection object to which each preselection box belongs can be accurately determined, thereby further reducing the probability of missed detection of the detection object.
可选地,通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组,可以通过下述实施方式实现:Optionally, the clustering module of the association modeling model determines the group to which each preselection box belongs to the at least one preselection box based on the attribute feature vector of each preselection box to obtain the at least one preselection box The frame group can be realized by the following implementation:
步骤S1,计算任意两个所述属性特征向量之间的向量距离值,得到多个向量距离值;Step S1: Calculate the vector distance value between any two of the attribute feature vectors to obtain multiple vector distance values;
步骤S2,将所述多个向量距离值中小于预设阈值的两个预选框添加至相同的分组,未添加至分组中的其他每一个预选框分别单独作为一个分组;Step S2, adding two preselected boxes that are less than a preset threshold among the plurality of vector distance values to the same group, and each other preselected box that is not added to the group is separately regarded as a group;
步骤S3,通过聚类算法对得到的至少一个分组进行聚类分组,得到所述至少一个预选框组。Step S3, clustering the at least one obtained group by a clustering algorithm, to obtain the at least one preselected box group.
在本申请实施例中,使用上述embedding encoding网络对所有预选框回归得到属性特征向量,分别计算任意两个属性特征向量之间的向量距离值,向量距离值可以通过欧式距离等距离计算方法计算得到。In the embodiment of this application, the above-mentioned embedding encoding network is used to regress all preselection boxes to obtain the attribute feature vector, and the vector distance value between any two attribute feature vectors is calculated respectively. The vector distance value can be calculated by the Euclidean distance equal distance calculation method .
之后,分别比较得到的所有向量距离值与预设阈值的大小,其中,预设阈值的大小可以根据实际需要或者根据经验确定,本实施例对此不作具体限定。如果向量距离值小于预设阈值,可以确定该向量距离值为目标向量距离值,认为该目标向量距离值对应的两个预选框对应同一个检测对象,因此,将该目标向量距离值所对应的两个预选框添加至同一分组。与其他属性特征向量之间的向量距离值均不小于预设阈值的属性特征向量对应的预选框分别单独作为一个分组。从而,可以得到至少一个分组。After that, all the vector distance values obtained are respectively compared with the size of the preset threshold, where the size of the preset threshold can be determined according to actual needs or experience, which is not specifically limited in this embodiment. If the vector distance value is less than the preset threshold, it can be determined that the vector distance value is the target vector distance value. It is considered that the two preselected boxes corresponding to the target vector distance value correspond to the same detection object. Therefore, the target vector distance value corresponds to the target vector distance value. Two preselected boxes are added to the same group. The preselected boxes corresponding to the attribute feature vectors whose vector distances from other attribute feature vectors are not less than the preset threshold value are respectively regarded as a group. Thus, at least one group can be obtained.
需要说明的是,如果两个不同的目标向量距离值对应的两组预选框中,有相同的预选框,即两个不同的目标向量距离值对应三个不同的预选框,可将该三个不同的预选框添加至同一分组。It should be noted that if the two sets of pre-selected boxes corresponding to two different target vector distance values have the same pre-selected box, that is, two different target vector distance values correspond to three different pre-selected boxes. Different preselection boxes are added to the same group.
在得到至少一个分组之后,通过聚类算法对得到的至少一个分组进行聚类分组。After at least one group is obtained, the at least one obtained group is clustered and grouped by a clustering algorithm.
需要说明的是,聚类算法可以为常用的算法,例如,可以为K均值聚类算法(K-means clustering algorithm,K-means)或均值漂移聚类算法等。It should be noted that the clustering algorithm can be a commonly used algorithm, for example, it can be a K-means clustering algorithm (K-means) or a mean shift clustering algorithm.
例如,待处理对象中有f1-f8号预选框,及四个检测对象A、B、C和D。对f1-f8号预选框分别使用embedding encoding算法回归出其属性特征向量,即embedding value。分别计算任意两个属性特征向量之间的向量距离值,再从多个向量距离值中筛选出小于预设阈值的目标向量距离值依次为s1-s4,其中,s1为f1和f2号预选框之间的向量距离值,s2为f2和f3号预选框之间的向量距离值,s3为f4和f5号预选框之间的向量距离值,s4为f5和f8号预选框之间的向量距离值。根据上述信息,将目标向量距离值s1对应的f1和f2号预选框添加至同一分组,将目标向量距离值s2对应的f2和f3号预选框添加至同一分组,由于f1和f2号预选框已经在同一分组,f2和f3号预选框也在同一分组,因此,f1、f2和f3号预选框在同一分组,同理,f4、f5和f8号预选框在同一分组。由于f6号和f7号预选框的属性特征向量与任意特征向量之间的向量距离值均不小于预设阈值,故将f6号和f7号预选框分别作为一个分组。分组结果中共包括四个分组,其中,一个分组包括f1、f2和f3号预选框;一个分组包括f4、f5和f8号预选框;一个分组包括f6号预选框;一个分组包括f7号预选框。根据得到的4个分组再进行聚类分组,可得到4个预选框组。For example, among the objects to be processed, there are preselection boxes f1-f8, and four detection objects A, B, C, and D. For the preselection boxes f1-f8, the embedding encoding algorithm is used to return the attribute feature vector, that is, the embedding value. Calculate the vector distance value between any two attribute feature vectors, and then filter out the target vector distance values that are less than the preset threshold from multiple vector distance values, which are s1-s4, where s1 is the preselection box of f1 and f2 The vector distance between F2 and F3, s3 is the vector distance between F4 and F5, and s4 is the vector distance between F5 and F8. value. According to the above information, add the f1 and f2 preselection boxes corresponding to the target vector distance value s1 to the same group, and add the f2 and f3 preselection boxes corresponding to the target vector distance value s2 to the same group, because the f1 and f2 preselection boxes are already In the same group, preselection boxes f2 and f3 are also in the same group, therefore, preselection boxes f1, f2 and f3 are in the same group, and similarly, preselection boxes f4, f5 and f8 are in the same group. Since the vector distance between the attribute feature vector of the preselection boxes f6 and f7 and any feature vector is not less than the preset threshold, the preselection boxes f6 and f7 are respectively regarded as a group. The grouping result includes four groups. One group includes preselection boxes f1, f2, and f3; one group includes preselection boxes f4, f5, and f8; one group includes preselection box f6; and one group includes preselection box f7. According to the obtained 4 groups and then cluster grouping, 4 preselection box groups can be obtained.
又例如,待处理图像中有f1-f4号预选框及A、B和C三个检测对象,对f1-f4号预选框分别使用embedding encoding算法回归出其属性特征向量,即embedding value分别为a1、a2、a3和a4,如果向量a1和向量a4之间的欧式距离小于预设阈值,则认为向量a1和向量a4属于A、B或C中的同一个检测对象;如果向量a1与向量a2、向量a1与向量a3以及向量a2与向量a3之间的向量距离值均不小于预设阈值,则认为向量a1、a2和a3三者两两之间均不属于同一个检测对象,若还满足向量a2与向量a4以及向量a3与向量a4之间的向量距离值都不小于预设阈值,可确定向量a2属于A、B或C中的某一个检测对象;向量a3属于A、B或C中不同于a2的检测对象,也不同于向量a1和向量a4对应的检测对 象的检测对象。即得到的分组结果可能为:a1和向量a4属于A,向量a2属于B,向量a3属于C。For another example, there are preselection boxes f1-f4 and three detection objects A, B, and C in the image to be processed, and the preselection boxes f1-f4 are used to return their attribute feature vectors by using the embedding encoding algorithm, that is, the embedding values are a1. , A2, a3 and a4, if the Euclidean distance between vector a1 and vector a4 is less than the preset threshold, then vector a1 and vector a4 are considered to belong to the same detection object in A, B or C; if vector a1 and vector a2 The vector distance value between vector a1 and vector a3 and vector a2 and vector a3 is not less than the preset threshold, it is considered that the vectors a1, a2, and a3 do not belong to the same detection object. If the vector is still satisfied The vector distance value between a2 and vector a4 and vector a3 and vector a4 is not less than the preset threshold. It can be determined that vector a2 belongs to one of the detection objects of A, B or C; vector a3 belongs to different among A, B or C The detection target for a2 is also different from the detection target corresponding to the vector a1 and the vector a4. That is, the grouping result obtained may be: a1 and vector a4 belong to A, vector a2 belongs to B, and vector a3 belongs to C.
在确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组之后,就可以对每个预选框组进行去重处理,得到去重处理之后的预选框组;并基于去重处理之后的预选框组确定每个检测对象的目标检测框。After determining the group to which each preselection box belongs in the at least one preselection box and obtaining at least one preselection box group, each preselection box group can be deduplicated to obtain the preselection box group after the deduplication processing; and based on The preselected box group after the deduplication process determines the target detection box of each detection object.
通过上述描述可知,每个预选框组可能包括可见框组和完整框组,基于此,步骤S208对每个预选框组进行去重处理,得到去重处理之后的预选框包括:对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组,去重处理之后的可见框组可能包括一个可见框也可能包括一组可见框。It can be seen from the above description that each preselection box group may include a visible box group and a complete box group. Based on this, step S208 performs deduplication processing on each preselection box group, and obtaining the preselection box after the deduplication processing includes: The visible frame group in a preselected frame group is deduplicated to obtain a visible frame group after the deduplication process. The visible frame group after the deduplication process may include a visible frame or a group of visible frames.
步骤S210基于所述去重处理之后的预选框组确定每个检测对象的目标检测框包括:基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框。Step S210 determining the target detection frame of each detection object based on the preselected frame group after the deduplication processing includes: determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group .
具体地,在本实施例中,首先,获取包含一个或多个检测对象的待处理图像;然后,对待处理图像进行物体检测,得到至少一个预选框;之后,确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组;接下来,对至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组;最后,基于去重处理之后的可见框组和完整框组确定每个检测对象的目标检测框。Specifically, in this embodiment, first, an image to be processed containing one or more detection objects is acquired; then, object detection is performed on the image to be processed to obtain at least one preselected box; then, each of the at least one preselected box is determined At least one preselection box group is obtained from the group to which each preselection box belongs; next, the visible box group in at least one preselection box group is deduplicated to obtain the visible box group after the deduplication processing; finally, based on the deduplication processing The visible frame group and the complete frame group determine the target detection frame of each detection object.
通过上述描述可知,在本申请实施例中,由于本申请实施例识别的检测对象可能密集存在于待处理图像中,从而导致检测对象的完整框的重合度较高,为了降低去重的复杂度,可以仅对预选框组中的可见框组进行去重处理。之后,就可以根据去重之后的可见框组和未去重的完整框组来确定每个检测对象的目标检测框。It can be seen from the above description that in the embodiments of the present application, the detection objects identified in the embodiments of the present application may be densely present in the image to be processed, resulting in a higher degree of coincidence of the complete frames of the detection objects. In order to reduce the complexity of deduplication , You can de-duplicate only the visible frame group in the preselected frame group. After that, the target detection frame of each detection object can be determined according to the visible frame group after deduplication and the complete frame group without deduplication.
具体地,在本实施例中,可以将去重之后的可见框组和未去重的完整框组输入到R-CNN模型中进行物体检测,进而得到每个检测对象的目标检测框。Specifically, in this embodiment, the visible frame group after deduplication and the complete frame group without deduplication can be input into the R-CNN model for object detection, and then the target detection frame of each detection object can be obtained.
需要说明的是,在本申请实施例中,在根据去重之后的可见框组和未去重的完整框组作为R-CNN模型的输入,重新进行物体检测时,对于被遮挡物体,可以仅将可见框组或完整框组作为R-CNN模型的输入,以提高检测的效率,也可以将可见框组和完整框组一同作为R-CNN模型的输入,以提高检测的精度,本实施例对此不做具体限定。It should be noted that, in the embodiment of the present application, the visible frame group after deduplication and the complete frame group without deduplication are used as the input of the R-CNN model. When object detection is performed again, for the occluded object, only The visible frame group or the complete frame group is used as the input of the R-CNN model to improve the efficiency of detection, and the visible frame group and the complete frame group can also be used as the input of the R-CNN model together to improve the accuracy of detection. This embodiment There is no specific restriction on this.
可选地,在本实施例中,步骤对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组包括:利用非极大值抑制算法对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组。Optionally, in this embodiment, the step of performing deduplication processing on the visible frame group in the at least one preselected frame group, and obtaining the visible frame group after the deduplication processing includes: using a non-maximum value suppression algorithm to At least one visible frame group in the preselected frame group is subjected to deduplication processing to obtain the visible frame group after the deduplication processing.
在本申请实施例中,使用非极大值抑制算法(non maximum suppression,nms)从预选框组中去掉多余的预选框,通过设置nms算法中的阈值,对预选框组中的可见框组进行去重处理。在得到各检测对象的预选框组之后,由于完整框组中各个完整框的重合度较高,可不对完整框进行去重处理。因此,使用nms算法仅对可见框组进行去重处理,得到去重处理之后的可见框组。也就是说,在本实施例中,在得到检测对象的预选框组之后,若预选框组中包含可见框组和完整框组,则可以对检测对象的可见框组进行去重处理。In the embodiment of this application, a non-maximum suppression algorithm (nms) is used to remove redundant preselection boxes from the preselection box group, and by setting the threshold value in the nms algorithm, the visible box group in the preselection box group is performed Duplicate processing. After the pre-selected frame group of each detection object is obtained, since each complete frame in the complete frame group has a high degree of overlap, the complete frame may not be deduplicated. Therefore, the nms algorithm is used to de-duplicate only the visible frame group, and the visible frame group after the de-duplication process is obtained. That is, in this embodiment, after the preselection frame group of the detection object is obtained, if the preselection frame group includes the visible frame group and the complete frame group, the visible frame group of the detection object can be deduplicated.
需要说明的是,参见图3所示的一种密集遮挡同类物体的可见框与完整框示意图。图3 中,左侧的一号框1和三号框3分别为遮挡对象P和被遮挡对象Q的完整框。通常在密集遮挡人群的人体检测过程中,使用nms算法仅针对同一种类的所有检测对象的预选框进行去重,无法针对实例(不同检测对象)进行良好的区分和认知,一号框1和三号框3之间的交并比一般大于nms中预设的阈值,这就导致两个问题:若阈值过高,则无法有效地对重复预选框进行去重;若阈值过低,则容易把后面被遮挡对象Q的三号框3删掉,造成该被遮挡对象Q的漏检。It should be noted that, refer to the schematic diagram of a visible frame and a complete frame that densely block similar objects as shown in FIG. 3. In Fig. 3, the first frame 1 and the third frame 3 on the left are complete frames of the occluded object P and the occluded object Q, respectively. Usually in the human body detection process of densely obscured people, the nms algorithm is used to de-duplicate only the preselection boxes of all detection objects of the same type, and it is impossible to distinguish and recognize the instances (different detection objects) well. Box 1 and The intersection ratio between the three boxes 3 is generally greater than the threshold preset in nms, which leads to two problems: if the threshold is too high, the repeated preselection boxes cannot be effectively removed; if the threshold is too low, it is easy The third frame 3 behind the occluded object Q is deleted, resulting in missed detection of the occluded object Q.
右侧的五号框5和六号框6之间也存在同样的问题。而虚线框二号框2为被遮挡对象Q的可见框,可以看到被遮挡对象Q的可见部分的二号框2和遮挡对象P的一号框1的重合度是明显小于三号框3与一号框1的重合度,因此,可以通过二号框2对遮挡对象P和被遮挡对象Q进行区分,将作为可见框的二号框2与作为完整框的三号框3进行绑定,成为一个预选框组,避免去重过程中将三号框3作为遮挡对象P的冗余而去掉。The same problem exists between box 5 and box 6 on the right. The dashed frame No. 2 frame 2 is the visible frame of the occluded object Q. It can be seen that the overlap between the second frame 2 of the visible part of the occluded object Q and the first frame 1 of the occluded object P is significantly smaller than the third frame 3. The degree of coincidence with the first frame 1, therefore, the occluded object P and the occluded object Q can be distinguished by the second frame 2, and the second frame 2 as the visible frame and the third frame 3 as the complete frame are bound , Becomes a pre-selected frame group, avoiding the redundant removal of the third frame 3 as the occlusion object P during the de-duplication process.
通过去重处理之后的可见框组和完整框组,能够简化计算过程,提高R-CNN模型的计算速度和计算准确度,从而得到更加准确的目标检测框。Through the visible frame group and the complete frame group after de-duplication processing, the calculation process can be simplified, the calculation speed and calculation accuracy of the R-CNN model can be improved, and a more accurate target detection frame can be obtained.
可选地,在本实施例中,步骤基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框包括:Optionally, in this embodiment, the step of determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group includes:
步骤S21,对所述去重处理之后的可见框组中的各个可见框进行局部特征对齐处理;以及对所述完整框组中的各个完整框进行局部特征对齐处理;Step S21, performing local feature alignment processing on each visible frame in the visible frame group after the deduplication processing; and performing local feature alignment processing on each complete frame in the complete frame group;
步骤S22,将特征对齐处理之后的可见框和特征对齐处理之后的完整框输入至目标物检测模型进行检测处理,得到所述特征对齐处理之后的可见框位置坐标和分类概率值,以及得到特征对齐处理之后的完整框的位置坐标和分类概率值;Step S22: Input the visible frame after the feature alignment processing and the complete frame after the feature alignment processing to the target detection model for detection processing to obtain the visible frame position coordinates and classification probability value after the feature alignment processing, and obtain the feature alignment The position coordinates and classification probability value of the complete box after processing;
步骤S23,基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框,其中,所述目标位置坐标包括:所述特征对齐处理之后的可见框位置坐标和/或所述特征对齐处理之后的完整框的位置坐标,所述目标分类概率值包括:所述特征对齐处理之后的可见框的分类概率值和/或所述特征对齐处理之后的完整框的分类概率值。Step S23: Determine the target detection frame of each detection object based on the target position coordinates and the target classification probability value, wherein the target position coordinates include: the visible frame position coordinates after the feature alignment processing and/or the feature alignment processing After the position coordinates of the complete frame, the target classification probability value includes: the classification probability value of the visible frame after the feature alignment processing and/or the classification probability value of the complete frame after the feature alignment processing.
在本申请实施例中,首先,对可见框组中的各个可见框及完整框组中的各个完整框进行局部特征对齐处理。局部特征对齐处理的目的是将可见框组中的各个可见框和完整框组中的各个完整框调整到同样的大小。In the embodiment of the present application, first, local feature alignment processing is performed on each visible frame in the visible frame group and each complete frame in the complete frame group. The purpose of the local feature alignment processing is to adjust each visible frame in the visible frame group and each complete frame in the complete frame group to the same size.
可选地,上述目标物检测模型可以选择R-CNN模型。在对去重处理之后的可见框组进行局部特征对齐处理,以及完整框中的完整框进行局部特征对齐处理之后,就可以利用对齐处理之后的可见框和对齐处理之后的完整框确定其所对应的检测对象的目标检测框。Optionally, the above-mentioned target detection model can be an R-CNN model. After performing local feature alignment processing on the visible frame group after de-duplication processing, and performing local feature alignment processing on the complete frame of the complete frame, the visible frame after the alignment process and the complete frame after the alignment process can be used to determine its corresponding The target detection frame of the detection object.
可选地,可以将对齐处理之后的可见框和/或对齐处理之后的完整框作为目标物检测模型(例如R-CNN模型)的输入,通过目标物检测模型的检测处理后,分别得到每个可见框的坐标位置和分类概率值,以及得到每个完整框的坐标位置和分类概率值。Optionally, the visible frame after the alignment process and/or the complete frame after the alignment process can be used as the input of the target detection model (such as the R-CNN model). After the detection process of the target detection model, each The coordinate position and classification probability value of the visible box, and the coordinate position and classification probability value of each complete box are obtained.
由于已经确定了每个可见框或完整框所属的检测对象,对每个检测对象包括的可见框或完整框,可以根据它们的目标位置坐标和目标分类概率值分别进行融合,融合后的可见框或融合后的完整框即为相应检测对象的目标检测框。对于未被遮挡的检测对象,其目标 检测框为它的最终完整框,最终完整框是对一个或多个完整框融合得到的一个检测框;对于被遮挡的检测对象,其目标检测框为它的最终完整框和最终可见框,最终可见框是对一个或多个可见框融合得到的一个检测框。其中,对于被遮挡的检测对象,对其完整框和可见框分别进行融合,得到最终完整框和最终可见框。Since the detection object to which each visible frame or complete frame belongs has been determined, the visible frame or complete frame included in each detection object can be separately fused according to their target position coordinates and target classification probability value. The fused visible frame Or the fused complete frame is the target detection frame of the corresponding detection object. For a detection object that is not occluded, its target detection frame is its final complete frame, and the final complete frame is a detection frame obtained by fusing one or more complete frames; for an occluded detection object, its target detection frame is it The final complete frame and the final visible frame of, the final visible frame is a detection frame obtained by fusing one or more visible frames. Among them, for the occluded detection object, the complete frame and the visible frame are respectively merged to obtain the final complete frame and the final visible frame.
需要说明的是,可以仅将特征对齐处理之后的可见框作为目标物检测模型的输入,也可以仅将特征对齐处理之后的完整框作为目标物检测模型的输入,还可以将特征对齐处理之后的可见框和特征对齐处理之后的完整框一起作为目标物检测模型的输入,本实施例对此不作具体限定。It should be noted that only the visible frame after the feature alignment process can be used as the input of the target detection model, or only the complete frame after the feature alignment process can be used as the input of the target detection model. The visible frame and the complete frame after the feature alignment process are used as the input of the target detection model, which is not specifically limited in this embodiment.
可选地,在本实施例中,步骤S23,基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框包括如下步骤:Optionally, in this embodiment, step S23, determining the target detection frame of each detection object based on the target position coordinates and the target classification probability value includes the following steps:
步骤S231,将所述目标分类概率值作为对应的目标位置坐标的权重;Step S231: Use the target classification probability value as the weight of the corresponding target position coordinate;
步骤S232,根据所述目标分类概率值对每个检测对象的所述目标位置坐标计算加权平均值,得到所述检测对象的目标检测框;所述目标检测框包括最终可见框和/或最终完整框。Step S232: Calculate a weighted average of the target position coordinates of each detection object according to the target classification probability value to obtain the target detection frame of the detection object; the target detection frame includes the final visible frame and/or the final complete frame.
在本申请实施例中,可见框的目标位置坐标表示可见框在待处理图像中对应的位置信息,可见框的目标分类概率值表示对可见框的检测处理结果的评估。完整框的目标位置坐标表示完整框在待处理图像中对应的位置信息,完整框的目标分类概率值表示对完整框的检测处理结果的评估。目标分类概率值越高,表示该可见框或完整框的检测处理结果越好,因此,赋予其更高的权重,可以将目标分类概率值作为权重值,从而对目标位置坐标计算加权平均值,得到对象的目标检测框,通过加权平均值法得到的目标检测框,融合了各个可见框或完整框的综合检测处理评估结果,得到目标检测框的位置也更加贴近检测对象的实际位置情况。In the embodiment of the present application, the target position coordinates of the visible frame indicate the corresponding position information of the visible frame in the image to be processed, and the target classification probability value of the visible frame indicates the evaluation of the detection processing result of the visible frame. The target position coordinates of the complete frame indicate the corresponding position information of the complete frame in the image to be processed, and the target classification probability value of the complete frame indicates the evaluation of the detection processing result of the complete frame. The higher the target classification probability value, the better the detection processing result of the visible frame or the complete frame. Therefore, to give it a higher weight, the target classification probability value can be used as the weight value to calculate the weighted average of the target position coordinates. The target detection frame of the object is obtained, and the target detection frame obtained by the weighted average method combines the comprehensive detection processing evaluation results of each visible frame or complete frame, and the position of the target detection frame is also closer to the actual position of the detection object.
需要说明的是,目标检测框是最终检测对象的精确可见框或精确完整框。其中,精确可见框是可以精确描述被遮挡检测对象的最大可见区域的最小包围框。It should be noted that the target detection frame is an accurate visible frame or an accurate complete frame of the final detected object. Among them, the precise visible frame is the smallest bounding frame that can accurately describe the maximum visible area of the occluded detection object.
可选地,在本实施例中,如果特征金字塔中包括多个特征图;那么对所述去重处理之后的可见框组中的各个可见框进行局部特征对齐处理包括如下步骤:Optionally, in this embodiment, if the feature pyramid includes multiple feature maps; performing local feature alignment processing on each visible frame in the visible frame group after the de-duplication processing includes the following steps:
步骤S31,在所述特征金字塔中选择第一目标特征图;Step S31, selecting a first target feature map in the feature pyramid;
步骤S32,基于所述去重处理之后的可见框组中的每个可见框对所述特征金字塔中的第一目标特征图进行特征裁剪,得到第一裁剪结果;对所述第一裁剪结果进行局部特征对齐处理。Step S32: Perform feature cropping on the first target feature map in the feature pyramid based on each visible frame in the visible frame group after the de-duplication processing to obtain a first cropping result; perform feature cropping on the first cropping result Local feature alignment processing.
在本申请实施例中,第一目标特征图是指可见框组中的可见框在特征金字塔中对应的特征图。由于特征金字塔中包含不同尺度的特征图,不同尺度的特征图通过金字塔网络对待处理图像进行不同比例的放缩得到。In the embodiment of the present application, the first target feature map refers to the feature map corresponding to the visible frame in the visible frame group in the feature pyramid. Since the feature pyramid contains feature maps of different scales, the feature maps of different scales are obtained by scaling the image to be processed in different proportions through the pyramid network.
在确定可见框对应的第一目标特征图后,可以将该可见框按照第一目标特征图相对于待处理图像放缩的比例进行放缩,并在第一目标特征图中确定放缩后的可见框的位置,进而获取该位置对应的第一目标特征图中的特征及其位置信息,作为第一裁剪结果。对第一裁剪结果进行局部特征对齐处理,并将对齐处理之后的第一裁剪结果输入到目标物检测模 型中进行物体检测。After the first target feature map corresponding to the visible frame is determined, the visible frame can be scaled according to the scaling ratio of the first target feature map relative to the image to be processed, and the scaled first target feature map is determined The position of the visible frame is then obtained, and the feature and its position information in the first target feature map corresponding to the position are obtained as the first cropping result. Perform local feature alignment processing on the first cropping result, and input the first cropping result after the alignment processing into the target detection model for object detection.
需要说明的是,可以利用Mask RCNN中的ROI Align模块将可见框对应的特征裁剪出来,再利用RCNN模型对第一裁剪结果进行进一步的局部特征对齐处理。It should be noted that the ROI Align module in Mask RCNN can be used to crop the features corresponding to the visible frame, and then the RCNN model can be used to perform further local feature alignment processing on the first cropping result.
可选地,在本实施例中,如果特征金字塔中包括多个特征图;对所述完整框组中的各个完整框进行局部特征对齐处理包括如下步骤:Optionally, in this embodiment, if the feature pyramid includes multiple feature maps; performing local feature alignment processing on each complete frame in the complete frame group includes the following steps:
步骤S41,在所述特征金字塔中选择第二目标特征图;Step S41, selecting a second target feature map in the feature pyramid;
步骤S42,基于所述完整框组中的各个完整框对所述特征金字塔中的第二目标特征图进行特征裁剪,得到第二裁剪结果;Step S42, performing feature cropping on the second target feature map in the feature pyramid based on each complete frame in the complete frame group to obtain a second cropping result;
步骤S43,对所述第二裁剪结果进行局部特征对齐处理。Step S43: Perform local feature alignment processing on the second cropping result.
在本申请实施例中,第二目标特征图是指完整框组中的完整框在特征金字塔中对应的特征图。由于特征金字塔中包含不同尺度的特征图,不同尺度的特征图通过对待处理图像进行不同比例的放缩得到,在确定完整框对应的第二目标特征图后,将该完整框按照第二目标特征图相对于待处理图像放缩的比例进行放缩,并在第二目标特征图中确定放缩后的完整框的位置,获取该位置对应的第二目标特征图中的特征及其位置信息,作为第二裁剪结果。在将第二裁剪结果输入到目标物检测模型之前,对第二裁剪结果进行局部特征对齐处理。In the embodiment of the present application, the second target feature map refers to the feature map corresponding to the complete frame in the complete frame group in the feature pyramid. Since the feature pyramid contains feature maps of different scales, the feature maps of different scales are obtained by scaling the image to be processed in different proportions. After determining the second target feature map corresponding to the complete frame, the complete frame is based on the second target feature The image is scaled relative to the scale of the image to be processed, and the position of the scaled complete frame is determined in the second target feature map, and the feature and its position information in the second target feature map corresponding to the position are acquired, As the second cropping result. Before inputting the second cropping result into the target detection model, perform local feature alignment processing on the second cropping result.
需要说明的是,可以利用Mask RCNN中的ROI Align模块将可见框对应的特征裁剪出来,再利用RCNN模型对第二裁剪结果进行进一步的局部特征对齐处理。It should be noted that the ROI Align module in Mask RCNN can be used to crop the features corresponding to the visible frame, and then the RCNN model can be used to perform further local feature alignment processing on the second cropping result.
在本申请实施例中,比起现有的物体检测算法仅考虑类别层次的物体检测,本申请实施例提供的方法可以对检测对象进行良好的区分和认知,在多物体特别是同类物体密集出现,产生遮挡的情况下,在RPN阶段使用可见框和完整框作为回归目标,同时,对于产生的预选框,根据其对应的不同检测对象,进行隐变量(embedding value)区分,从而不仅区分不同类别的物体的预选框,同时也区分不同检测对象的预选框,然后使用R-CNN对去重结果再次进行回归,并将不同检测对象的回归结果进行框的融合,得到最后的检测结果,从而实现对密集遮挡情况下对被遮挡物体的识别,避免了被遮挡物体的漏检。In the embodiment of this application, compared with the existing object detection algorithm which only considers the object detection at the category level, the method provided by the embodiment of this application can distinguish and recognize the detected objects well. In the case of multiple objects, especially similar objects, In the case of occlusion, the visible frame and the complete frame are used as the regression target in the RPN stage. At the same time, for the generated pre-selection frame, according to its corresponding different detection objects, the hidden variable (embedding value) is distinguished, so as not only to distinguish between different The pre-selection box of the object of the category, and also distinguish the pre-selection box of different detection objects, and then use R-CNN to regress the deduplication results, and merge the regression results of different detection objects to obtain the final detection result, thus Realize the recognition of occluded objects under dense occlusion, avoiding missed detection of occluded objects.
实施例三:Example three:
本申请实施例还提供了一种物体检测装置,该物体检测装置主要被配置成执行本申请实施例上述内容所提供的物体检测方法,以下对本申请实施例提供的物体检测装置做具体介绍。The embodiment of the present application also provides an object detection device, which is mainly configured to implement the object detection method provided in the above-mentioned content of the embodiment of the present application. The object detection device provided by the embodiment of the present application will be specifically introduced below.
图5是根据本申请实施例的一种物体检测装置的示意图,如图5所示,该物体检测装置主要包括图像获取单元10、预选框获取单元20、分组单元30、去重单元40和确定单元50,其中:Fig. 5 is a schematic diagram of an object detection device according to an embodiment of the present application. As shown in Fig. 5, the object detection device mainly includes an image acquisition unit 10, a preselection frame acquisition unit 20, a grouping unit 30, a deduplication unit 40, and a determination Unit 50, where:
图像获取单元10,被配置成获取包含一个或多个检测对象的待处理图像;The image acquisition unit 10 is configured to acquire a to-be-processed image containing one or more detection objects;
预选框获取单元20,被配置成对所述待处理图像进行物体检测,得到至少一个预选框,其中,所述预选框包括可见框和/或完整框,所述完整框为对一个检测对象整体的包围框,所述可见框为每个检测对象在所述待处理图像中可见区域的包围框;The pre-selected frame obtaining unit 20 is configured to perform object detection on the image to be processed to obtain at least one pre-selected frame, wherein the pre-selected frame includes a visible frame and/or a complete frame, and the complete frame is an overall detection object A bounding frame of, where the visible frame is a bounding frame of the visible area of each detection object in the image to be processed;
分组单元30,被配置成通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组;相同预选框组中的预选框属于相同的检测对象;The grouping unit 30 is configured to determine the group to which each preselection box in the at least one preselection box belongs through the association modeling model to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object;
去重单元40,被配置成对每个预选框组进行去重处理,得到去重处理之后的预选框组;The deduplication unit 40 is configured to perform deduplication processing on each preselection box group to obtain the preselection box group after the deduplication processing;
确定单元50,被配置成基于所述去重处理之后的预选框组确定每个检测对象的目标检测框。The determining unit 50 is configured to determine the target detection frame of each detection object based on the preselected frame group after the deduplication processing.
在本申请实施例中,首先获取包含一个或多个检测对象的待处理图像,然后,对待处理图像进行物体检测,得到至少一个预选框,接下来,确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组,通过对预选框组进行去重处理,去除冗余的预选框,得到去重之后的预选框组,从而基于去重处理之后的预选框组确定每个检测对象的目标检测框,进而实现了对待处理图像中一个或多个检测对象的检测,有效地避免检测对象的漏检。In the embodiment of the present application, the image to be processed containing one or more detection objects is first acquired, and then object detection is performed on the image to be processed to obtain at least one preselection box. Next, each preselection in the at least one preselection box is determined At least one preselection box group is obtained for the group to which the box belongs, and redundant preselection boxes are removed by deduplicating the preselection box group, and the preselection box group after deduplication is obtained, so that each preselection box group is determined based on the preselection box group after the deduplication processing. A target detection frame of a detection object, thereby realizing the detection of one or more detection objects in the image to be processed, effectively avoiding missed detection of the detection object.
可选地,每个所述预选框组包括可见框组和完整框组;去重单元40还被配置成:对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组;基于所述去重处理之后的预选框组确定每个检测对象的目标检测框包括:基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框。Optionally, each of the pre-selected box groups includes a visible box group and a complete box group; the deduplication unit 40 is further configured to: perform de-duplication processing on the visible box group in the at least one pre-selected box group to obtain de-duplication The visible frame group after processing; determining the target detection frame of each detection object based on the preselected frame group after the de-duplication processing includes: determining each detection frame based on the visible frame group after the de-duplication processing and the complete frame group The target detection frame of the object.
可选地,预选框获取单元20还被配置成:将所述待处理图像输入到特征金字塔网络中进行处理,得到特征金字塔;利用区域候选网络RPN模型对所述特征金字塔进行处理,得到所述至少一个预选框,其中,所述至少一个预选框中的每个预选框携带属性标签,所述属性标签被配置成确定每个预选框所属类型,所述类型包括完整框和可见框。Optionally, the preselection frame acquisition unit 20 is further configured to: input the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid; and use the regional candidate network RPN model to process the feature pyramid to obtain the At least one pre-selected box, wherein each pre-selected box in the at least one pre-selected box carries an attribute label configured to determine the type of each pre-selected box, and the type includes a complete box and a visible box.
可选地,分组单元30通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组包括:通过所述关联性建模模型的实例属性特征投影网络获得所述至少一个预选框中每个预选框的属性特征向量;通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组。Optionally, the grouping unit 30 determines the group to which each preselection box belongs to the at least one preselection box through the relevance modeling model, and obtaining at least one preselection group includes: projecting characteristics of the instance attributes of the relevance modeling model The network obtains the attribute feature vector of each preselection box in the at least one preselection box; the clustering module of the association modeling model determines the at least one preselection box based on the attribute feature vector of each preselection box The group to which each preselection box belongs obtains the at least one preselection box group.
可选地,所述实例属性特征投影网络通过Lpull损失函数和Lpush损失函数训练获得;其中,通过Lpull损失函数将属于同一个检测对象的预选框的属性特征向量的距离拉近,通过Lpush损失函数将属于不同检测对象的预选框的属性特征向量的距离拉远。Optionally, the instance attribute feature projection network is obtained through Lpull loss function and Lpush loss function training; wherein, the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object is shortened through the Lpull loss function, and the Lpush loss function is used Extend the distance between the attribute feature vectors of the preselection boxes belonging to different detection objects.
可选地,分组单元30通过所述关联性建模模型的聚类模块计算任意两个所述属性特征向量之间的向量距离值,得到多个向量距离值;将所述多个向量距离值中小于预设阈值的两个预选框添加至相同的分组,未添加至分组中的其他每一个预选框分别单独作为一个分组;通过聚类算法对得到的至少一个分组进行聚类分组,得到所述至少一个预选框组。Optionally, the grouping unit 30 calculates the vector distance value between any two of the attribute feature vectors through the clustering module of the association modeling model to obtain multiple vector distance values; The two pre-selected boxes that are smaller than the preset threshold in the group are added to the same group, and each other pre-selected box that is not added to the group is separately regarded as a group; at least one of the obtained groups is clustered and grouped by a clustering algorithm to obtain all Describes at least one preselection box group.
可选地,去重单元40还被配置成:利用非极大值抑制算法对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组。Optionally, the deduplication unit 40 is further configured to perform deduplication processing on the visible frame group in the at least one preselected frame group by using a non-maximum value suppression algorithm to obtain the visible frame group after the deduplication processing.
可选地,确定单元50还被配置成:对所述去重处理之后的可见框组中的各个可见框进行局部特征对齐处理;以及对所述完整框组中的各个完整框进行局部特征对齐处理;将特征对齐处理之后的可见框和特征对齐处理之后的完整框输入至目标物检测模型进行检测处 理,得到所述特征对齐处理之后的可见框位置坐标和分类概率值,以及得到特征对齐处理之后的完整框的位置坐标和分类概率值;基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框,其中,所述目标位置坐标包括:所述特征对齐处理之后的可见框位置坐标和/或所述特征对齐处理之后的完整框的位置坐标,所述目标分类概率值包括:所述特征对齐处理之后的可见框的分类概率值和/或所述特征对齐处理之后的完整框的分类概率值。Optionally, the determining unit 50 is further configured to: perform local feature alignment processing on each visible frame in the visible frame group after the de-duplication processing; and perform local feature alignment on each complete frame in the complete frame group Processing; input the visible frame after the feature alignment processing and the complete frame after the feature alignment processing into the target detection model for detection processing, obtain the visible frame position coordinates and classification probability value after the feature alignment processing, and obtain the feature alignment processing The position coordinates and classification probability value of the complete frame afterwards; the target detection frame of each detection object is determined based on the target position coordinates and the target classification probability value, wherein the target position coordinates include: the visible frame position after the feature alignment processing The coordinates and/or the position coordinates of the complete frame after the feature alignment processing, the target classification probability value includes: the classification probability value of the visible frame after the feature alignment processing and/or the complete frame after the feature alignment processing The classification probability value of.
可选地,确定单元50还被配置成:将所述目标分类概率值作为对应的目标位置坐标的权重;根据所述目标分类概率值对每个检测对象的所述目标位置坐标计算加权平均值,得到所述检测对象的目标检测框;所述目标检测框包括最终可见框和/或最终完整框。Optionally, the determining unit 50 is further configured to: use the target classification probability value as the weight of the corresponding target position coordinates; calculate a weighted average value for the target position coordinates of each detection object according to the target classification probability value , The target detection frame of the detection object is obtained; the target detection frame includes the final visible frame and/or the final complete frame.
可选地,所述特征金字塔中包括多个特征图,确定单元50还被配置成:在所述特征金字塔中选择第一目标特征图;基于所述去重处理之后的可见框组中的每个可见框对所述特征金字塔中的第一目标特征图进行特征裁剪,得到第一裁剪结果;对所述第一裁剪结果进行局部特征对齐处理。Optionally, the feature pyramid includes a plurality of feature maps, and the determining unit 50 is further configured to: select a first target feature map in the feature pyramid; and based on each of the visible frame groups after the de-duplication processing. A visible frame performs feature cropping on the first target feature map in the feature pyramid to obtain a first cropping result; performing local feature alignment processing on the first cropping result.
可选地,所述特征金字塔中包括多个特征图,确定单元50还被配置成:对所述完整框组中的各个完整框进行局部特征对齐处理包括:在所述特征金字塔中选择第二目标特征图;基于所述完整框组中的各个完整框对所述特征金字塔中的第二目标特征图进行特征裁剪,得到第二裁剪结果;对所述第二裁剪结果进行局部特征对齐处理。Optionally, the feature pyramid includes a plurality of feature maps, and the determining unit 50 is further configured to: perform local feature alignment processing on each complete frame in the complete frame group, including: selecting a second feature map in the feature pyramid. Target feature map; perform feature cropping on the second target feature map in the feature pyramid based on each complete frame in the complete frame group to obtain a second cropping result; perform local feature alignment processing on the second cropping result.
本申请实施例所提供的装置,其实现原理及产生的技术效果和前述方法实施例相同,为简要描述,装置实施例部分未提及之处,可参考前述方法实施例中的相应内容。The implementation principles and technical effects of the device provided in the embodiments of this application are the same as those of the foregoing method embodiments. For a brief description, for the parts not mentioned in the device embodiments, please refer to the corresponding content in the foregoing method embodiments.
另外,在本申请实施例的描述中,除非另有明确的规定和限定,术语“安装”、“相连”和“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。In addition, in the description of the embodiments of the present application, unless otherwise clearly specified and limited, the terms "installed", "connected" and "connected" should be interpreted broadly, for example, they may be fixed or detachable connections. , Or integrally connected; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediate medium, and it can be the internal communication between two components. For those of ordinary skill in the art, the specific meanings of the above-mentioned terms in this application can be understood under specific circumstances.
在本申请的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”和“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、或者以特定的方位构造和操作,因此不能理解为对本申请的限制。此外,术语“第一”、“第二”和“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of this application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner" and "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the application and simplifying the description, and does not indicate or imply that the pointed device or element must have a specific orientation or be in a specific orientation. The azimuth structure and operation cannot be understood as a limitation of the application. In addition, the terms "first", "second" and "third" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance.
本申请实施例所提供的一种物体检测方法的计算机程序产品,包括存储了处理器可执行的非易失的程序代码的计算机可读存储介质,所述程序代码包括的指令可被配置成执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。The computer program product of an object detection method provided by an embodiment of the present application includes a computer-readable storage medium storing non-volatile program code executable by a processor, and instructions included in the program code can be configured to execute For the specific implementation of the method described in the previous method embodiment, please refer to the method embodiment, which will not be repeated here.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-mentioned system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分, 仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口、装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation. For example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some communication interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a nonvolatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
最后应说明的是:以上所述实施例,仅为本申请的具体实施方式,用以说明本申请的技术方案,而非对其限制,本申请的保护范围并不局限于此,尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the application, which are used to illustrate the technical solutions of the application, rather than limit it. The scope of protection of the application is not limited thereto, although referring to the foregoing The examples describe the application in detail, and those of ordinary skill in the art should understand that any person skilled in the art can still modify the technical solutions described in the foregoing examples within the technical scope disclosed in this application. Or it is easy to think of changes, or equivalent replacements of some of the technical features; and these modifications, changes or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be covered in this application Within the scope of protection. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (12)

  1. 一种物体检测方法,其特征在于,包括:An object detection method, characterized by comprising:
    获取包含一个或多个检测对象的待处理图像;Obtain a to-be-processed image containing one or more detection objects;
    对所述待处理图像进行物体检测,得到至少一个预选框,其中,所述预选框包括可见框和/或完整框,所述完整框为对一个检测对象整体的包围框,所述可见框为每个检测对象在所述待处理图像中可见区域的包围框;Perform object detection on the to-be-processed image to obtain at least one pre-selected box, wherein the pre-selected box includes a visible frame and/or a complete frame, the complete frame is an enclosing frame for an entire detection object, and the visible frame is The bounding frame of the visible area of each detection object in the image to be processed;
    通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组;相同预选框组中的预选框属于相同的检测对象;Determining the group to which each preselection box belongs in the at least one preselection box through an association modeling model to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object;
    对每个预选框组进行去重处理,得到去重处理之后的预选框组;Perform deduplication processing on each preselection box group to obtain the preselection box group after deduplication processing;
    基于所述去重处理之后的预选框组确定每个检测对象的目标检测框。The target detection frame of each detection object is determined based on the preselected frame group after the deduplication processing.
  2. 根据权利要求1所述的方法,其特征在于,通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组包括:The method according to claim 1, wherein determining the group to which each preselection box belongs to the at least one preselection box by using an association modeling model, and obtaining at least one preselection box group comprises:
    通过所述关联性建模模型的实例属性特征投影网络获得所述至少一个预选框中每个预选框的属性特征向量;Obtaining the attribute feature vector of each preselection box in the at least one preselection box through the instance attribute feature projection network of the association modeling model;
    通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组。Through the clustering module of the association modeling model, the group to which each preselection box belongs to the at least one preselection box is determined based on the attribute feature vector of each preselection box to obtain the at least one preselection box group.
  3. 根据权利要求2所述的方法,其特征在于,所述实例属性特征投影网络通过Lpull损失函数和Lpush损失函数训练获得;The method according to claim 2, wherein the instance attribute feature projection network is obtained through Lpull loss function and Lpush loss function training;
    其中,通过Lpull损失函数将属于同一个检测对象的预选框的属性特征向量的距离拉近,通过Lpush损失函数将属于不同检测对象的预选框的属性特征向量的距离拉远。Among them, the distance between the attribute feature vectors of the preselection boxes belonging to the same detection object is shortened by the Lpull loss function, and the distance between the attribute feature vectors of the preselection boxes belonging to different detection objects is shortened by the Lpush loss function.
  4. 根据权利要求2所述的方法,其特征在于,通过所述关联性建模模型的聚类模块,基于所述每个预选框的属性特征向量确定所述至少一个预选框中每个预选框所属的分组,得到所述至少一个预选框组包括:The method according to claim 2, wherein the clustering module of the association modeling model determines that each preselection box belongs to the at least one preselection box based on the attribute feature vector of each preselection box To obtain the at least one preselection box group includes:
    计算任意两个所述属性特征向量之间的向量距离值,得到多个向量距离值;Calculate the vector distance value between any two of the attribute feature vectors to obtain multiple vector distance values;
    将所述多个向量距离值中小于预设阈值的两个预选框添加至相同的分组,未添加至分组中的其他每一个预选框分别单独作为一个分组;Adding two preselected boxes of the plurality of vector distance values that are less than a preset threshold value to the same group, and each other preselected box that is not added to the group as a separate group;
    通过聚类算法对得到的至少一个分组进行聚类分组,得到所述至少一个预选框组。The at least one obtained group is clustered and grouped by a clustering algorithm to obtain the at least one preselected box group.
  5. 根据权利要求1所述的方法,其特征在于,每个所述预选框组包括可见框组和完整框组;对每个预选框组进行去重处理,得到去重处理之后的预选框包括:对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组;The method according to claim 1, wherein each of the preselection box groups includes a visible box group and a complete box group; performing deduplication processing on each preselection box group to obtain the preselection box after the deduplication processing includes: Performing de-duplication processing on the visible frame group in the at least one pre-selected frame group to obtain the visible frame group after the de-duplication processing;
    基于所述去重处理之后的预选框组确定每个检测对象的目标检测框包括:基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框。Determining the target detection frame of each detection object based on the preselected frame group after the deduplication processing includes: determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group.
  6. 根据权利要求5所述的方法,其特征在于,对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组包括:The method according to claim 5, wherein performing de-duplication processing on the visible frame group in the at least one pre-selected frame group to obtain the visible frame group after the de-duplication processing comprises:
    利用非极大值抑制算法对所述至少一个预选框组中的可见框组进行去重处理,得到去重处理之后的可见框组。A non-maximum value suppression algorithm is used to perform deduplication processing on the visible frame group in the at least one preselected frame group to obtain the visible frame group after the deduplication process.
  7. 根据权利要求6所述的方法,其特征在于,基于所述去重处理之后的可见框组和所述完整框组确定每个检测对象的目标检测框包括:The method according to claim 6, wherein determining the target detection frame of each detection object based on the visible frame group after the deduplication processing and the complete frame group comprises:
    对所述去重处理之后的可见框组中的各个可见框进行局部特征对齐处理;以及对所述完整框组中的各个完整框进行局部特征对齐处理;Performing local feature alignment processing on each visible frame in the visible frame group after the de-duplication processing; and performing local feature alignment processing on each complete frame in the complete frame group;
    将特征对齐处理之后的可见框和特征对齐处理之后的完整框输入至目标物检测模型进行检测处理,得到所述特征对齐处理之后的可见框位置坐标和分类概率值,以及得到特征对齐处理之后的完整框的位置坐标和分类概率值;The visible frame after the feature alignment processing and the complete frame after the feature alignment processing are input to the target detection model for detection processing to obtain the visible frame position coordinates and the classification probability value after the feature alignment processing, and to obtain the feature alignment processing The position coordinates and classification probability value of the complete box;
    基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框,其中,所述目标位置坐标包括:所述特征对齐处理之后的可见框位置坐标和/或所述特征对齐处理之后的完整框的位置坐标,所述目标分类概率值包括:所述特征对齐处理之后的可见框的分类概率值和/或所述特征对齐处理之后的完整框的分类概率值。The target detection frame of each detection object is determined based on the target position coordinates and the target classification probability value, wherein the target position coordinates include: the visible frame position coordinates after the feature alignment processing and/or the completeness after the feature alignment processing The position coordinates of the frame, the target classification probability value includes: the classification probability value of the visible frame after the feature alignment processing and/or the classification probability value of the complete frame after the feature alignment processing.
  8. 根据权利要求7所述的方法,其特征在于,基于目标位置坐标和目标分类概率值确定每个检测对象的目标检测框包括:The method according to claim 7, wherein the determining the target detection frame of each detection object based on the target position coordinates and the target classification probability value comprises:
    将所述目标分类概率值作为对应的目标位置坐标的权重;Taking the target classification probability value as the weight of the corresponding target position coordinate;
    根据所述目标分类概率值对每个检测对象的所述目标位置坐标计算加权平均值,得到所述检测对象的目标检测框;Calculating a weighted average of the target position coordinates of each detection object according to the target classification probability value to obtain the target detection frame of the detection object;
    所述目标检测框包括目标可见框和/或目标完整框。The target detection frame includes a visible target frame and/or a complete target frame.
  9. 根据权利要求1所述的方法,其特征在于,对所述待处理图像进行物体检测,得到至少一个预选框包括:The method according to claim 1, wherein, performing object detection on the image to be processed to obtain at least one preselected box comprises:
    将所述待处理图像输入到特征金字塔网络中进行处理,得到特征金字塔;Input the to-be-processed image into a feature pyramid network for processing to obtain a feature pyramid;
    利用区域候选网络RPN模型对所述特征金字塔进行处理,得到所述至少一个预选框,其中,所述至少一个预选框中的每个预选框携带属性标签,所述属性标签被配置成确定每个预选框所属类型,所述类型包括完整框和可见框。The feature pyramid is processed using the regional candidate network RPN model to obtain the at least one preselection box, wherein each preselection box in the at least one preselection box carries an attribute label, and the attribute label is configured to determine each The type of the preselected box, the type includes a complete box and a visible box.
  10. 一种物体检测装置,其特征在于,包括:An object detection device, characterized by comprising:
    图像获取单元,被配置成获取包含一个或多个检测对象的待处理图像;An image acquisition unit configured to acquire a to-be-processed image containing one or more detection objects;
    预选框获取单元,被配置成对所述待处理图像进行物体检测,得到至少一个预选框,其中,所述预选框包括可见框和/或完整框,所述完整框为对一个检测对象整体的包围框,所述可见框为每个检测对象在所述待处理图像中可见区域的包围框;The pre-selected frame acquisition unit is configured to perform object detection on the image to be processed to obtain at least one pre-selected frame, wherein the pre-selected frame includes a visible frame and/or a complete frame, and the complete frame is an overall view of a detected object A bounding frame, where the visible frame is a bounding frame of a visible area of each detection object in the image to be processed;
    分组单元,被配置成通过关联性建模模型确定所述至少一个预选框中每个预选框所属的分组,得到至少一个预选框组;相同预选框组中的预选框属于相同的检测对象;The grouping unit is configured to determine the group to which each preselection box in the at least one preselection box belongs through the association modeling model to obtain at least one preselection box group; the preselection boxes in the same preselection box group belong to the same detection object;
    去重单元,被配置成对每个预选框组进行去重处理,得到去重处理之后的预选框组;The deduplication unit is configured to perform deduplication processing on each preselection box group to obtain the preselection box group after the deduplication processing;
    确定单元,被配置成基于所述去重处理之后的预选框组确定每个检测对象的目标检测框。The determining unit is configured to determine the target detection frame of each detection object based on the preselected frame group after the deduplication processing.
  11. 一种电子设备,包括存储器和处理器,所述存储器中存储有能在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述权利要求1至9中任一项所述的方法的步骤。An electronic device, comprising a memory and a processor, and a computer program that can run on the processor is stored in the memory, wherein the processor implements the above claims 1 to 9 when the computer program is executed. The steps of any one of the methods.
  12. 一种具有处理器可执行的非易失的程序代码的计算机可读介质,其特征在于,所述程序代码使所述处理器执行所述权利要求1-9中任一所述方法。A computer-readable medium with non-volatile program code executable by a processor, wherein the program code causes the processor to execute the method described in any one of claims 1-9.
PCT/CN2019/126435 2019-03-12 2019-12-18 Object detection method and apparatus, and electronic device WO2020181872A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910186133.5 2019-03-12
CN201910186133.5A CN109948497B (en) 2019-03-12 2019-03-12 Object detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2020181872A1 true WO2020181872A1 (en) 2020-09-17

Family

ID=67009787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126435 WO2020181872A1 (en) 2019-03-12 2019-12-18 Object detection method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN109948497B (en)
WO (1) WO2020181872A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699881A (en) * 2020-12-31 2021-04-23 北京一起教育科技有限责任公司 Image identification method and device and electronic equipment
CN113743333A (en) * 2021-09-08 2021-12-03 苏州大学应用技术学院 Strawberry maturity identification method and device
CN113987667A (en) * 2021-12-29 2022-01-28 深圳小库科技有限公司 Building layout grade determining method and device, electronic equipment and storage medium
CN115731517A (en) * 2022-11-22 2023-03-03 南京邮电大学 Crowd detection method based on Crowd-RetinaNet network

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948497B (en) * 2019-03-12 2022-01-28 北京旷视科技有限公司 Object detection method and device and electronic equipment
CN110532897B (en) * 2019-08-07 2022-01-04 北京科技大学 Method and device for recognizing image of part
CN110827261B (en) * 2019-11-05 2022-12-06 泰康保险集团股份有限公司 Image quality detection method and device, storage medium and electronic equipment
CN111178128B (en) * 2019-11-22 2024-03-19 北京迈格威科技有限公司 Image recognition method, device, computer equipment and storage medium
CN111582177A (en) * 2020-05-09 2020-08-25 北京爱笔科技有限公司 Image detection method and related device
CN112348077A (en) * 2020-11-04 2021-02-09 深圳Tcl新技术有限公司 Image recognition method, device, equipment and computer readable storage medium
CN113761245B (en) * 2021-05-11 2023-10-13 腾讯科技(深圳)有限公司 Image recognition method, device, electronic equipment and computer readable storage medium
CN117237697A (en) * 2023-08-01 2023-12-15 北京邮电大学 Small sample image detection method, system, medium and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633655B1 (en) * 1998-09-05 2003-10-14 Sharp Kabushiki Kaisha Method of and apparatus for detecting a human face and observer tracking display
CN106557778A (en) * 2016-06-17 2017-04-05 北京市商汤科技开发有限公司 Generic object detection method and device, data processing equipment and terminal device
CN108960266A (en) * 2017-05-22 2018-12-07 阿里巴巴集团控股有限公司 Image object detection method and device
CN109948497A (en) * 2019-03-12 2019-06-28 北京旷视科技有限公司 A kind of object detecting method, device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012169119A1 (en) * 2011-06-10 2012-12-13 パナソニック株式会社 Object detection frame display device and object detection frame display method
US9697599B2 (en) * 2015-06-17 2017-07-04 Xerox Corporation Determining a respiratory pattern from a video of a subject
CN106529527A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Object detection method and device, data processing deice, and electronic equipment
US10657364B2 (en) * 2016-09-23 2020-05-19 Samsung Electronics Co., Ltd System and method for deep network fusion for fast and robust object detection
CN108399388A (en) * 2018-02-28 2018-08-14 福州大学 A kind of middle-high density crowd quantity statistics method
CN109190458B (en) * 2018-07-20 2022-03-25 华南理工大学 Method for detecting head of small person based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633655B1 (en) * 1998-09-05 2003-10-14 Sharp Kabushiki Kaisha Method of and apparatus for detecting a human face and observer tracking display
CN106557778A (en) * 2016-06-17 2017-04-05 北京市商汤科技开发有限公司 Generic object detection method and device, data processing equipment and terminal device
CN108960266A (en) * 2017-05-22 2018-12-07 阿里巴巴集团控股有限公司 Image object detection method and device
CN109948497A (en) * 2019-03-12 2019-06-28 北京旷视科技有限公司 A kind of object detecting method, device and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699881A (en) * 2020-12-31 2021-04-23 北京一起教育科技有限责任公司 Image identification method and device and electronic equipment
CN113743333A (en) * 2021-09-08 2021-12-03 苏州大学应用技术学院 Strawberry maturity identification method and device
CN113743333B (en) * 2021-09-08 2024-03-01 苏州大学应用技术学院 Strawberry maturity recognition method and device
CN113987667A (en) * 2021-12-29 2022-01-28 深圳小库科技有限公司 Building layout grade determining method and device, electronic equipment and storage medium
CN113987667B (en) * 2021-12-29 2022-05-03 深圳小库科技有限公司 Building layout grade determining method and device, electronic equipment and storage medium
CN115731517A (en) * 2022-11-22 2023-03-03 南京邮电大学 Crowd detection method based on Crowd-RetinaNet network
CN115731517B (en) * 2022-11-22 2024-02-20 南京邮电大学 Crowded Crowd detection method based on crown-RetinaNet network

Also Published As

Publication number Publication date
CN109948497B (en) 2022-01-28
CN109948497A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
WO2020181872A1 (en) Object detection method and apparatus, and electronic device
JP5554984B2 (en) Pattern recognition method and pattern recognition apparatus
US10096122B1 (en) Segmentation of object image data from background image data
US10402627B2 (en) Method and apparatus for determining identity identifier of face in face image, and terminal
JP6458394B2 (en) Object tracking method and object tracking apparatus
US11113842B2 (en) Method and apparatus with gaze estimation
CN108140032B (en) Apparatus and method for automatic video summarization
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN110135246A (en) A kind of recognition methods and equipment of human action
WO2020238897A1 (en) Panoramic image and video splicing method, computer-readable storage medium, and panoramic camera
WO2023082882A1 (en) Pose estimation-based pedestrian fall action recognition method and device
WO2018099032A1 (en) Target tracking method and device
WO2018082308A1 (en) Image processing method and terminal
KR101912748B1 (en) Scalable Feature Descriptor Extraction and Matching method and system
CN111160291B (en) Human eye detection method based on depth information and CNN
CN111008935B (en) Face image enhancement method, device, system and storage medium
WO2016139964A1 (en) Region-of-interest extraction device and region-of-interest extraction method
CN108764100B (en) Target behavior detection method and server
CN111709296A (en) Scene identification method and device, electronic equipment and readable storage medium
CN109961103B (en) Training method of feature extraction model, and image feature extraction method and device
JP5648452B2 (en) Image processing program and image processing apparatus
WO2020001016A1 (en) Moving image generation method and apparatus, and electronic device and computer-readable storage medium
WO2021027329A1 (en) Image recognition-based information push method and apparatus, and computer device
US20210125013A1 (en) Image recognition system and updating method thereof
CN109858464B (en) Bottom database data processing method, face recognition device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19919394

Country of ref document: EP

Kind code of ref document: A1