CN115761213A

CN115761213A - Target detection method and related device, electronic equipment and storage medium

Info

Publication number: CN115761213A
Application number: CN202211379775.5A
Authority: CN
Inventors: 闫深义; 殷保才; 殷兵; 胡金水
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-03-07

Abstract

The application discloses a target detection method, a related device, electronic equipment and a storage medium, wherein the target detection method comprises the following steps: detecting a main target in an image to be detected to obtain a first candidate frame of the main target, and respectively detecting a plurality of auxiliary targets in the image to be detected to obtain second candidate frames of the auxiliary targets; wherein, a plurality of auxiliary targets belong to the main target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; then, for each candidate frame combination, whether the candidate frame combination is removed or not is analyzed based on intersection between the first candidate frame and each second candidate frame; and taking the first candidate frame in the reserved candidate frame combination as a target detection frame of the main target. By the scheme, the accuracy of target detection can be improved while the detection cost is reduced and the detection adaptability is improved.

Description

Target detection method and related device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to a target detection method, a related apparatus, an electronic device, and a storage medium.

Background

With the rapid development of artificial intelligence technology, image recognition becomes an important field of artificial intelligence technology. Object detection is widely applied to various industries such as education, entertainment, medical treatment, traffic and the like as a front-end technology of image recognition.

Currently, the target detection still has a certain false detection rate in practical applications, for example, the target is detected by mistake if there is no target in the image. In order to reduce the false detection rate, a way of additionally adding a sensor or a way of constructing an a priori feature library is generally adopted. However, since the additional sensor increases the detection cost, the feature library constructed based on a large number of data sets cannot be updated at any time, and a judgment error is easily caused when a new image is input and has a large change. In view of this, how to improve the accuracy of target detection while reducing the detection cost and improving the detection adaptability is an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a target detection method, a related device, an electronic device and a storage medium, which can improve the accuracy of target detection while reducing the detection cost and improving the detection adaptability.

In order to solve the above technical problem, a first aspect of the present application provides a speech recognition method, including: detecting a main target on an image to be detected to obtain a first candidate frame of the main target, and respectively detecting a plurality of auxiliary targets on the image to be detected to obtain a second candidate frame of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; then, for each candidate frame combination, whether the candidate frame combination is removed or not is analyzed based on intersection between the first candidate frame and each second candidate frame; and taking the first candidate frame in the reserved candidate frame combination as a target detection frame of the main target.

In order to solve the above technical problem, a second aspect of the present application provides an object detecting device, which includes a detecting module, a combining module, an analyzing module, and a determining module. The detection module is used for detecting a main target in an image to be detected to obtain a first candidate frame of the main target, and respectively detecting a plurality of auxiliary targets in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; the combination module is used for combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; the analysis module is used for analyzing whether the candidate frame combination is eliminated or not based on the intersection between the first candidate frame and each second candidate frame for each candidate frame combination; the determining module is used for taking a first candidate frame in the reserved candidate frame combination as a target detection frame of the main target.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the object detection method of the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium storing program instructions executable by a processor, the program instructions being for implementing the object detection method of the first aspect.

According to the scheme, a first candidate frame of a main target is obtained by detecting the main target in an image to be detected, and a plurality of auxiliary targets are respectively detected in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; then, for each candidate frame combination, whether the candidate frame combination is removed or not is analyzed based on the intersection between the first candidate frame and each second candidate frame; the method comprises the steps of using a first candidate frame in a reserved candidate frame combination as a target detection frame of a main target, on one hand, obtaining a first candidate frame and a second candidate frame by respectively detecting the main target and a plurality of auxiliary targets in an image to be detected, and combining the first candidate frame and the second candidate frame based on the first candidate frame to further obtain a candidate frame combination. Therefore, the accuracy of target detection can be improved while the detection cost is reduced and the detection adaptability is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic flow chart diagram of an embodiment of a target detection method of the present application;

FIG. 2 is a schematic diagram of an embodiment in which the primary target is a human;

FIG. 3 is a schematic view of another embodiment of a human being as the primary target;

FIG. 4 is a block diagram of an embodiment of the object detection method of the present application;

FIG. 5 is a block diagram of an embodiment of an object detection device according to the present application;

FIG. 6 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 7 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the drawings.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C. "several" means at least one. The terms "first," "second," and the like in the description and in the claims, as well as in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a target detection method according to the present application.

Specifically, the method may include the steps of:

step S11: and detecting the main target in the image to be detected to obtain a first candidate frame of the main target, and respectively detecting a plurality of auxiliary targets in the image to be detected to obtain a second candidate frame of each auxiliary target.

In the embodiment of the disclosure, the plurality of auxiliary targets belong to the main target, that is, the main target and the plurality of auxiliary targets have a constraint relationship. Illustratively, in the case where the primary target is a human body, the plurality of secondary targets includes at least a human face; alternatively, where the primary target is a vehicle, the plurality of secondary targets may include a rear view mirror, a license plate, or the like. It is understood that the illustrated manner is only one possible situation in practical application, and does not limit the categories of the primary target and the plurality of secondary targets in practical application, and the primary target and the plurality of secondary targets may be determined according to practical situations, and are not specifically limited herein.

In one implementation scenario, the first candidate box and the second candidate box may be detected based on a Network model, which may include, but is not limited to, CNN (convolutional Neural Network), RNN (Recurrent Neural Network), and so on.

In another implementation scenario, in order to improve the accuracy of the first candidate frame and the second candidate frame, the detection may be performed based on a network model, and then the further screening may be performed to obtain the corresponding candidate frames. Screening modalities may include, but are not limited to: NMS (Non-Maximum Suppression) screening, threshold screening, and the like. The determination manner of the first candidate frame and the second candidate frame may be determined according to actual situations, and is not specifically limited herein. It should be noted that, in actual application, the primary target may detect at least one first candidate frame, and each secondary target may detect at least one second candidate frame. Taking the example that the primary target is a human body and the secondary target includes a human face and a human shoulder, the primary target may detect at least one first candidate frame of the human body, the human face may detect at least one second candidate frame, and the human shoulder may detect at least one second candidate frame. Other cases may be analogized, and no one example is given here.

Step S12: and combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination.

In one implementation scenario, any first candidate frame of the primary target is combined with any second candidate frame of each secondary target respectively to obtain a candidate frame combination. It will be appreciated that the candidate box combination comprises a first candidate box for the primary target and a second candidate box for each secondary target. That is, the number of candidate box combinations is determined based on the number of first candidate boxes and second candidate boxes. Specifically, if there are N auxiliary targets and the main target is detected to obtain K first candidate frames, the ith auxiliary target is detected to obtain M _i The second candidate frame can be combined to obtain K M ₁ *…M _i *…M _N A candidate box combination. For example, referring to fig. 2, fig. 2 is a schematic diagram of an embodiment in which the primary target is a person, when the primary target is a person, a face can be set as the secondary target, and the first candidate frame can be the external moment of the personThe shape frame and the second candidate frame may be a circumscribed rectangle frame of the face. When the primary target is a vehicle, the vehicle rearview mirror may be set as a secondary target, the first candidate frame may be an external rectangular frame of the vehicle, the second candidate frame may be an external rectangular frame of the vehicle rearview mirror, and the candidate frame combination may include an external rectangular frame of the vehicle (the first candidate frame) and an external rectangular frame of a license plate (the second candidate frame), or the license plate may also be set as a secondary target, the second candidate frame may also be an external rectangular frame of a license plate, and at this time, the candidate frame combination may include an external rectangular frame of the vehicle (the first candidate frame) and an external rectangular frame of the vehicle rearview mirror (the second candidate frame). Of course, the vehicle rearview mirror and the license plate may be set as the auxiliary targets, and the candidate frame combination may include a circumscribed rectangle frame of the vehicle (the first candidate frame), a circumscribed rectangle frame of the vehicle rearview mirror (the second candidate frame), and a circumscribed rectangle of the license plate (the second candidate frame).

Step S13: and for each candidate frame combination, analyzing whether the candidate frame combination is eliminated or not based on the intersection between the first candidate frame and each second candidate frame.

In one implementation scenario, as a possible implementation manner, in response to an intersection between a first candidate box and any second candidate box in the candidate box combination, the candidate box combination may be retained; referring to fig. 3, fig. 3 is a schematic diagram of another embodiment of a combination of candidate frames in response to absence of intersection between a first candidate frame and at least one second candidate frame in the combination of candidate frames, when the main target is a person, a constraint relationship exists between the first candidate frame and the second candidate frame, for example, the first candidate frame is a human-body external rectangular frame, the second candidate frame is a human-face external rectangular frame, and when there is no intersection between the first candidate frame and the second candidate frame, the combination of candidate frames is necessarily an abnormal detection result, and the combination of candidate frames is rejected.

In another implementation scenario, unlike the foregoing embodiment, in order to improve the accuracy of the analysis of the candidate box combination, for each intersection generated by the candidate box combination, the second candidate box to which the intersection belongs may be used as the current candidate box. Referring to fig. 2, as shown in fig. 2, a first candidate frame and a second candidate frame group are combined to obtain a candidate frame combination, an intersection (i.e., an area of the second candidate frame) generated in the candidate frame combination is determined, and the second candidate frame to which the intersection belongs is used as a current candidate frame, i.e., the second candidate frame shown in fig. 2. After determining the current candidate box, the degree of correlation between the intersection and the current candidate box may be measured in several dimensions, respectively. It should be noted that several dimensions may include: at least one of a first dimension associated with a geometric space, a second dimension associated with a color space. It can be understood that, in the actual application process, the specific category of the dimensions may be determined according to the actual situation of the main target, for example, when the main target is a person, the dimensions may only include a first dimension related to a geometric space, and further, the degree of correlation between the intersection and the current candidate box may be measured by the first dimension such as a line and a plane; when the primary target is a vehicle, the plurality of dimensions may include a first dimension related to a geometric space and a second dimension related to a color space, and then the degree of correlation between the intersection and the current candidate frame may be measured not only by the first dimension such as a line and a plane, but also by the second dimension such as color consistency.

In one implementation scenario, in response to several dimensions including the x-axis direction, a ratio of the intersection to the length of the current candidate box in the x-axis direction may be obtained as the degree of correlation in the x-axis direction. Specifically, it can be expressed by an expression:

Lx _rate ＝Lx _∩ /Lx

wherein, lx _rate Indicating the ratio of the intersection to the length of the current candidate frame in the x-axis direction, lx _∩ Indicating the length of the intersection in the x-axis direction and Lx indicating the length of the current candidate box in the x-axis direction.

In another implementation scenario, in response to several dimensions including the y-axis direction, a ratio of the intersection to the length of the current candidate frame in the y-axis direction may be obtained as the degree of correlation in the y-axis direction. Specifically, it can be expressed by an expression:

Ly _rate ＝Ly _∩ /Ly

wherein, ly _rate Indicating the ratio of the length of the intersection to the current candidate frame in the y-axis direction, ly _∩ The length of the intersection in the y-axis direction is represented, and Ly the length of the current candidate box in the y-axis direction is represented.

In yet another implementation scenario, a ratio of the intersection to the area of the current candidate box may be obtained as a degree of correlation of the area dimension in response to the dimensions including the area dimension. Specifically, it can be expressed by an expression:

Srate＝S _∩ /S

wherein Srate represents the ratio of the area of the intersection to the current candidate frame, S _∩ Denotes the area of the intersection and S denotes the area of the current candidate box.

Further, after obtaining the degrees of correlation corresponding to the intersection in several dimensions, it may be determined whether the current candidate box (i.e., the second candidate box to which the intersection belongs) is correlated with the first candidate box in the candidate box combination. Specifically, whether the degree of correlation of each dimension meets the preset requirement may be checked based on the degree threshold values respectively set for the plurality of dimensions, the preset requirement may be that the degree of correlation of each dimension is greater than the degree threshold value, or the preset requirement may also be that the degree of correlation of each dimension is within a preset degree interval, and the preset requirement may be determined according to an actual situation, which is not specifically limited herein. And determining whether the current candidate frame is related to the first candidate frame in the candidate frame combination or not based on the dimension number meeting the preset requirement. It is understood that, as a possible implementation manner, when the correlation degree of each dimension meets a preset requirement, it may be determined that the current candidate frame is correlated with a first candidate frame in the candidate frame combination; alternatively, as another possible implementation manner, when the degree of correlation of the partial dimensions meets a preset requirement, it may be determined that the current candidate frame is correlated with the first candidate frame in the candidate frame combination, which is not limited herein. In summary, the number of dimensions meeting the preset requirement may be determined to be related to the first candidate frame in the candidate frame combination when the number is not less than the number threshold, and the number threshold may be set to a predetermined proportion, such as 50%, 70%, etc., of the total number of dimensions, and is not limited herein. In the above manner, the degree of correlation is checked through the degree thresholds respectively set by the plurality of dimensions, so as to determine whether the current candidate frame is correlated with the first candidate frame in the candidate frame combination. In addition, whether the candidate frame combination is rejected is determined based on a second candidate frame related to the first candidate frame in the candidate frame combination. In the above manner, the second candidate frame to which the intersection belongs is taken as the current candidate frame, and whether the current candidate frame is related to the first candidate frame in the candidate frame combination is determined based on the respective corresponding correlation degrees of the dimensions, in the process, the correlation of the candidate frame combination is determined based on the respective corresponding correlation degrees of the dimensions, so that the accuracy of the judgment on the correlation between the current candidate frame and the first candidate frame in the candidate frame combination is improved, and whether the candidate frame combination is removed is further determined, which is favorable for improving the accuracy of the analysis on the candidate frame combination.

In addition, after one intersection generated by the candidate frame combination is analyzed to determine whether the second candidate frame to which the intersection belongs is related to the first candidate frame in the candidate frame combination, the next intersection may be continuously analyzed until all intersections generated by the pair of candidate frame combinations are analyzed, that is, whether the second candidate frame to which each intersection generated by the candidate frame combination belongs is related to the first candidate frame in the candidate frame combination may be obtained, so that whether the candidate frame combination is retained may be determined. Further, for each candidate box combination, the same may be done to choose to keep or cull the candidate box combination.

In a specific implementation scenario, as an implementation manner, when each second candidate box in the candidate box combination is related to the first candidate box in the candidate box combination, a reserved candidate box combination may be determined; when at least a second candidate box in the candidate box combination is not related to the first candidate box in the candidate box combination, the candidate box combination can be determined to be rejected.

In another specific implementation scenario, different from the foregoing implementation, for each candidate frame combination, a total number of second candidate frames related to the first candidate frame in the candidate frame combination may be counted, and in a case that the total number is not less than a number threshold, the candidate frame combination may be determined to be retained, or in a case that the total number is less than the number threshold, a candidate frame combination may be determined to be rejected. Specifically, the product of a preset ratio (e.g., 80%, 90%, etc.) and the total number of secondary targets may be used as the number threshold. In the above manner, the total number of second candidate frames related to the first candidate frame in the candidate frame combination is counted, and whether the candidate frame combination is rejected or not is determined according to the total number, so that the method is beneficial to improving the applicability of the target detection method.

Step S14: and taking the first candidate frame in the reserved candidate frame combination as a target detection frame of the main target.

In one implementation scenario, after the retained candidate frame combination is obtained, a first candidate frame in the retained candidate frame combination may be used as a target detection frame of the main target. That is, the number of target detection frames is determined based on the reserved candidate frame combinations, that is, when the reserved candidate frame combinations are n, the number of target detection frames of the main target is also n.

According to the scheme, a first candidate frame of a main target is obtained by detecting the main target in an image to be detected, and a plurality of auxiliary targets are respectively detected in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; then, for each candidate frame combination, whether the candidate frame combination is removed or not is analyzed based on the intersection between the first candidate frame and each second candidate frame; the method comprises the steps of using a first candidate frame in reserved candidate frame combinations as a target detection frame of a main target, on one hand, obtaining the first candidate frame and a second candidate frame by respectively detecting the main target and a plurality of auxiliary targets in an image to be detected, combining the first candidate frame and the second candidate frame based on the first candidate frame to further obtain the candidate frame combinations, obtaining the candidate frame combinations only by obtaining image information of the image to be detected without adding other equipment in the process, reducing the cost of target detection as much as possible, and improving detection adaptability, on the other hand, analyzing whether the candidate frame combinations are removed or not based on intersection between the first candidate frame and each second candidate frame in each candidate frame combination, and further obtaining the target detection frame of the main target based on the reserved candidate frame combinations. Therefore, the accuracy of target detection can be improved while the detection cost is reduced and the detection adaptability is improved.

Referring to fig. 4, fig. 4 is a schematic diagram of a frame of an embodiment of the object detection method of the present application. Specifically, an image to be detected may be obtained first, a main target is detected in the image to be detected, a first candidate frame of the main target is obtained, and a plurality of auxiliary targets (auxiliary target 1, \ 8230;, auxiliary target n) are detected in the image to be detected, respectively, to obtain a second candidate frame of each auxiliary target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; and further, judging the candidate frame combination based on the intersection between the first candidate frame and each second candidate frame, eliminating the candidate frame combination which does not meet the preset condition to obtain a reserved candidate frame combination, and taking the first candidate frame in the reserved candidate frame combination as a target detection frame of the main target.

According to the scheme, a first candidate frame of a main target is obtained by detecting the main target in an image to be detected, and a plurality of auxiliary targets are respectively detected in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; then, for each candidate frame combination, whether the candidate frame combination is removed or not is analyzed based on intersection between the first candidate frame and each second candidate frame; the method comprises the steps of using a first candidate frame in reserved candidate frame combinations as a target detection frame of a main target, on one hand, obtaining the first candidate frame and a second candidate frame by respectively detecting the main target and a plurality of auxiliary targets in an image to be detected, combining the first candidate frame and the second candidate frame based on the first candidate frame to further obtain the candidate frame combinations, obtaining the candidate frame combinations only by obtaining image information of the image to be detected without adding other equipment in the process, reducing the cost of target detection as much as possible, and improving detection adaptability, on the other hand, analyzing whether the candidate frame combinations are removed or not based on intersection between the first candidate frame and each second candidate frame in each candidate frame combination, and further obtaining the target detection frame of the main target based on the reserved candidate frame combinations. Therefore, the accuracy of target detection can be improved while the detection cost is reduced and the detection adaptability is improved.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of the object detection device according to the present application. The object detection device 50 comprises a detection module 51, a combination module 52, an analysis module 53 and a determination module 54. The detection module 51 is configured to detect a main target in an image to be detected to obtain a first candidate frame of the main target, and detect a plurality of auxiliary targets in the image to be detected to obtain second candidate frames of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; the combining module 52 is configured to combine any one of the first candidate frames of the primary target with any one of the second candidate frames of each of the secondary targets, respectively, to obtain a candidate frame combination; the analysis module 53 is configured to, for each candidate frame combination, analyze whether to eliminate the candidate frame combination based on an intersection between the first candidate frame and each second candidate frame; the determining module 54 is configured to use the first candidate frame in the reserved candidate frame combination as the target detection frame of the main target.

According to the scheme, a first candidate frame of a main target is obtained by detecting the main target in an image to be detected, and a plurality of auxiliary targets are respectively detected in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein, a plurality of auxiliary targets belong to the main target; combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination; then, for each candidate frame combination, whether the candidate frame combination is removed or not is analyzed based on intersection between the first candidate frame and each second candidate frame; the method comprises the steps of using a first candidate frame in a reserved candidate frame combination as a target detection frame of a main target, on one hand, obtaining a first candidate frame and a second candidate frame by respectively detecting the main target and a plurality of auxiliary targets in an image to be detected, and combining the first candidate frame and the second candidate frame based on the first candidate frame to further obtain a candidate frame combination. Therefore, the accuracy of target detection can be improved while the detection cost is reduced and the detection adaptability is improved.

In some disclosed embodiments, the analysis module 53 includes an acquisition sub-module and a determination sub-module. The obtaining submodule is used for taking a second candidate frame to which the intersection belongs as a current candidate frame for each intersection generated by the candidate frame combination, measuring the correlation degree between the intersection and the current candidate frame in a plurality of dimensions respectively, and determining whether the current candidate frame is correlated with a first candidate frame in the candidate frame combination based on the correlation degree corresponding to the plurality of dimensions respectively; the determining submodule is used for determining whether to eliminate the candidate frame combination based on a second candidate frame related to the first candidate frame in the candidate frame combination.

Therefore, the second candidate frame to which the intersection belongs is taken as the current candidate frame, whether the current candidate frame is related to the first candidate frame in the candidate frame combination is determined based on the correlation degrees respectively corresponding to the dimensions, in the process, the correlation of the candidate frame combination is determined based on the correlation degrees respectively corresponding to the dimensions, the accuracy of judging the correlation of the first candidate frame in the current candidate frame and the candidate frame combination is improved, and whether the candidate frame combination is removed is further determined, so that the accuracy of analyzing the candidate frame combination is improved.

In some disclosed embodiments, the several dimensions include: at least one of a first dimension associated with a geometric space, a second dimension associated with a color space.

In some disclosed embodiments, the acquisition submodule includes a first response unit, a second response unit, and a third response unit. The first response unit is used for responding to a plurality of dimensions including the x-axis direction, and acquiring the ratio of the intersection to the length of the current candidate frame in the x-axis direction as the correlation degree in the x-axis direction; the second response unit is used for responding to the plurality of dimensions including the y-axis direction, and acquiring the ratio of the intersection to the length of the current candidate frame in the y-axis direction as the correlation degree in the y-axis direction; the third response unit is used for responding to a plurality of dimensions including an area dimension, and acquiring the ratio of the intersection to the area of the current candidate frame as the correlation degree of the area dimension.

In some disclosed embodiments, the acquisition submodule further includes a verification unit and a determination unit. The checking unit is used for checking whether the correlation degree of each dimension meets the preset requirement or not based on the degree threshold values respectively set by the plurality of dimensions; the determining unit is used for determining whether the current candidate frame is related to a first candidate frame in the candidate frame combination or not based on the dimension number meeting the preset requirement.

Therefore, the degree of correlation is checked through the degree threshold values respectively set by the plurality of dimensions, and whether the current candidate frame is correlated with the first candidate frame in the candidate frame combination is further determined.

In some disclosed embodiments, the determination submodule includes a statistics unit and a selection unit. The statistical unit is used for counting the total number of second candidate frames related to the first candidate frame in the candidate frame combination; the selection unit is used for determining whether to reject the candidate frame combination or not based on the total number.

Therefore, the total number of second candidate frames related to the first candidate frame in the candidate frame combination is counted, and whether the candidate frame combination is rejected or not is determined according to the total number, so that the applicability of the target detection method is improved.

In some disclosed embodiments, where the primary target is a human body, the number of secondary targets includes at least a human face.

Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an electronic device according to an embodiment of the present disclosure. The electronic device 60 comprises a memory 61 and a processor 62 coupled to each other, the memory 61 stores program instructions, and the processor 62 is configured to execute the program instructions to implement the steps in any of the above-mentioned embodiments of the object detection method. Specifically, the electronic device 60 may include, but is not limited to: conference tablets, touch screen organizers, desktop computers, laptops, servers, cell phones, tablet computers, and the like, without limitation.

In particular, the processor 62 is configured to control itself and the memory 61 to implement the steps of any of the above-described embodiments of the object detection method. The processor 62 may also be referred to as a CPU (Central Processing Unit). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be collectively implemented by an integrated circuit chip.

According to the scheme, on one hand, a first candidate frame and a second candidate frame are obtained by respectively detecting a main target and a plurality of auxiliary targets in an image to be detected, and are combined based on the first candidate frame and the second candidate frame to obtain a candidate frame combination, in the process, the candidate frame combination can be obtained only by obtaining image information of the image to be detected without adding other equipment, the cost of target detection is reduced as much as possible, and the detection adaptability is improved, on the other hand, for each candidate frame combination, whether the candidate frame combination is eliminated or not is analyzed based on intersection between the first candidate frame and each second candidate frame, and then the target detection frame of the main target is obtained based on the reserved candidate frame combination, so that the target detection can be assisted through the constraint relation between the main target and the auxiliary targets, the accuracy of the reserved candidate frame combination is improved, and the accuracy of the target detection frame of the main target is improved. Therefore, the accuracy of target detection can be improved while the detection cost is reduced and the detection adaptability is improved.

Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 70 stores program instructions 71 capable of being executed by the processor, the program instructions 71 being for implementing the steps in any of the above-described embodiments of the object detection method.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

The foregoing description of the various embodiments is intended to highlight different aspects of the various embodiments that are the same or similar, which can be referenced with one another and therefore are not repeated herein for brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is only one type of logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of target detection, comprising:

detecting a main target in an image to be detected to obtain a first candidate frame of the main target, and respectively detecting a plurality of auxiliary targets in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein the plurality of secondary targets belong to the primary target;

combining any first candidate frame of the main target with any second candidate frame of each auxiliary target respectively to obtain a candidate frame combination;

for each candidate frame combination, analyzing whether the candidate frame combination is rejected or not based on the intersection between the first candidate frame and each second candidate frame;

and taking the first candidate frame in the reserved candidate frame combination as a target detection frame of the main target.

2. The method of claim 1, wherein analyzing whether to cull the candidate box combination based on the intersection between the first candidate box and each of the second candidate boxes comprises:

for each intersection generated by the candidate box combination, taking a second candidate box to which the intersection belongs as a current candidate box, respectively measuring the correlation degrees between the intersection and the current candidate box in a plurality of dimensions, and determining whether the current candidate box is correlated with the first candidate box in the candidate box combination based on the correlation degrees respectively corresponding to the plurality of dimensions;

determining whether to reject the candidate box combination based on a second candidate box in the candidate box combination related to the first candidate box.

3. The method of claim 2, wherein the several dimensions comprise: at least one of a first dimension associated with a geometric space, a second dimension associated with a color space.

4. The method of claim 3, wherein the measuring the degree of correlation between the intersection and the current candidate box in several dimensions, respectively, comprises at least one of:

in response to that the dimensions comprise an x-axis direction, acquiring the ratio of the length of the intersection to the length of the current candidate frame in the x-axis direction as the degree of correlation in the x-axis direction;

in response to that the dimensions comprise a y-axis direction, acquiring the ratio of the length of the intersection to the length of the current candidate frame in the y-axis direction as the degree of correlation in the y-axis direction;

in response to the dimensions including an area dimension, obtaining a ratio of the area of the intersection to the current candidate frame as a degree of correlation of the area dimension.

5. The method of claim 2, wherein determining whether the current candidate box is related to the first candidate box in the candidate box combination based on the degrees of relevance corresponding to the dimensions respectively comprises:

based on the degree threshold values respectively set by the dimensions, checking whether the correlation degree of each dimension meets the preset requirement;

determining whether the current candidate box is related to the first candidate box in the candidate box combination based on the number of dimensions meeting the preset requirement.

6. The method of claim 2, wherein determining whether to eliminate the candidate frame combination based on a second candidate frame of the candidate frame combination related to the first candidate frame comprises:

counting the total number of second candidate boxes related to the first candidate box in the candidate box combination;

based on the total number, determining whether to cull the candidate box combination.

7. The method of claim 1, wherein the plurality of secondary objects comprises at least a human face if the primary object is a human body.

8. An object detection device, comprising:

the detection module is used for detecting a main target in an image to be detected to obtain a first candidate frame of the main target, and respectively detecting a plurality of auxiliary targets in the image to be detected to obtain a second candidate frame of each auxiliary target; wherein the plurality of secondary targets belong to the primary target;

the combining module is used for combining any one first candidate frame of the main target with any one second candidate frame of each auxiliary target respectively to obtain a candidate frame combination;

an analysis module, configured to analyze, for each candidate frame combination, whether to reject the candidate frame combination based on an intersection between the first candidate frame and each second candidate frame;

and the determining module is used for taking the first candidate frame in the reserved candidate frame combination as the target detection frame of the main target.

9. An electronic device comprising a memory having stored therein program instructions and a processor for executing the program instructions to implement the object detection method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that program instructions executable by a processor for implementing the object detection method of any one of claims 1 to 7 are stored.