CN114169419A

CN114169419A - Target object detection method and device, computer equipment and storage medium

Info

Publication number: CN114169419A
Application number: CN202111448151.XA
Authority: CN
Inventors: 施晨; 林培文
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-11

Abstract

The present disclosure provides a target object detection method, apparatus, computer device and storage medium, wherein the method comprises: acquiring an image to be detected; detecting the image to be detected based on a pre-trained first object detection network, and determining a detection result corresponding to the image to be detected; cutting out area images corresponding to the target objects from the image to be detected based on first position information of the target objects in the image to be detected in the detection result; inputting the area image into a pre-trained classification network, so as to verify the detection result of the first object detection network through the classification network, and determining the detection result based on the verification result.

Description

Target object detection method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of object detection technologies, and in particular, to a method and an apparatus for detecting a target object, a computer device, and a storage medium.

Background

The use of detection networks for object detection is a very important direction in the field of computer vision, which can be applied in many practical tasks, such as autopilot, robotics, smart medicine, etc.

In the related technology, a detection network is sensitive to selection of a detection threshold, and if the selection of the related detection threshold is too high, missed detection of a target object may be caused; if the relevant detection threshold is selected too low, false detection of the target object may be caused, and therefore the accuracy of the obtained detection result is low.

Disclosure of Invention

The embodiment of the disclosure at least provides a target object detection method, a target object detection device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for detecting a target object, including:

acquiring an image to be detected;

detecting the image to be detected based on a pre-trained first object detection network, and determining a detection result corresponding to the image to be detected;

cutting out area images corresponding to the target objects from the image to be detected based on first position information of the target objects in the image to be detected in the detection result;

inputting the area image into a pre-trained classification network, so as to verify the detection result of the first object detection network through the classification network, and determining the detection result based on the verification result.

Therefore, the detection result of the first object detection network is verified by adopting the pre-trained classification network, and even if the first object detection network performs false detection on the target object due to the lower detection threshold, the classification network can filter the false detection result, so that the dependence of the first object detection network on the detection threshold can be reduced, and the accuracy of the detection result can be improved.

In a possible implementation manner, the cropping, from the image to be detected, a region image corresponding to each target object based on first position information of each target object in the image to be detected in the detection result, includes:

aiming at any one target object, adjusting the first position information corresponding to the target object based on a preset amplification parameter, and determining second position information during cutting processing;

and cutting the image to be detected based on the second position information to obtain a region image corresponding to the target object.

In this way, by using the magnification parameter to perform the cropping processing, the obtained region image can contain the complete target object, so that the accuracy of the verification result can be ensured.

In a possible implementation manner, the performing, on the basis of the second position information, a cropping process on the image to be detected to obtain a region image corresponding to the target object includes:

under the condition that the position coordinate corresponding to the second position information exceeds the boundary of the image to be detected, filling pixel points in the region exceeding the boundary of the image to be detected according to a preset pixel value to obtain a region image corresponding to the target object; or,

and adjusting the second position information based on the third position information of the boundary of the image to be detected, and cutting the image to be detected based on the adjusted second position information to obtain a region image corresponding to the target object.

In one possible embodiment, the sample images for training the classification network are obtained by:

obtaining a sample image with a labeling frame; the labeling frame is used for selecting at least one target object contained in the sample image, and comprises a first detection frame based on manual labeling and a second detection frame based on second object detection network labeling, or comprises a second detection frame based on second object detection network labeling;

and based on the labeling frame, cutting the sample image to obtain a sample image for training the classification network.

Therefore, the sources of the sample images for training the classification network are richer, and the verification effect of the classification network can be improved.

In one possible embodiment, the sample image with the second detection frame is obtained by:

acquiring an initial sample image;

inputting the initial sample image into the second object detection network to obtain a plurality of second detection frames output by the second object detection network and a confidence corresponding to each second detection frame;

and determining a second detection frame with the corresponding confidence coefficient meeting a preset threshold condition, and acquiring a sample image carrying the determined second detection frame.

In this way, by setting a threshold condition for the confidence degrees, second detection frames with different confidence degrees can be obtained, and the second detection frames are used for training the classification network, so that the verification effect of the classification network can be improved.

In a possible embodiment, the second object detection network is the same network as the first object detection network.

Therefore, the object detection network providing training sample data in the training stage is the same as the detection network in actual verification, so that the matching degree of the classification network and the first object detection network can be improved, and the verification result is more accurate.

In a possible implementation, the classification network is trained by the following steps:

inputting the sample image to a classification network to be trained;

determining a target loss value in the training process based on the classification result of the sample image output by the classification network to be trained, the labeling frames of the sample image and the confidence degrees corresponding to the labeling frames, and adjusting the network parameter value of the classification network based on the target loss value, wherein the confidence degree corresponding to the first detection frame is a preset confidence degree.

In a possible implementation manner, the determining a target loss value in the current training process based on the classification result of the sample image output by the classification network to be trained, the labeling boxes of the sample image, and the confidence degrees corresponding to the labeling boxes includes:

determining a first loss value corresponding to each target object identified in the training process based on the classification result of the sample image output by the classification network to be trained and the marking frame of the sample image; and carrying out weighted summation on the first loss value based on the confidence degrees corresponding to the labeling boxes respectively, and determining a target loss value in the training process.

In this way, the confidence is used as the weight to introduce the loss function, so that the learning effect of the classification network on the difficult samples can be improved, and the verification capability of the difficult samples is improved.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for detecting a target object, including:

the acquisition module is used for acquiring an image to be detected;

the determining module is used for detecting the image to be detected based on a pre-trained first object detection network and determining a detection result corresponding to the image to be detected;

the cutting module is used for cutting out area images corresponding to all the target objects from the image to be detected based on first position information of all the target objects in the detection result in the image to be detected;

and the checking module is used for inputting the area image to a pre-trained classification network so as to check the detection result of the first object detection network through the classification network and determine the detection result based on the check result.

In a possible implementation manner, the cropping module, when cropping out, from the image to be detected, a region image corresponding to each target object based on the first position information of each target object in the image to be detected in the detection result, is configured to:

In a possible implementation manner, when the to-be-detected image is clipped based on the second position information to obtain a region image corresponding to the target object, the clipping module is configured to:

In a possible implementation, the obtaining module is further configured to obtain a sample image for training the classification network according to the following steps:

In a possible implementation, the acquiring module is further configured to acquire the sample image with the second detection frame according to the following steps:

acquiring an initial sample image;

In a possible embodiment, the apparatus further comprises a training module;

the training module is used for training the classification network according to the following steps:

inputting the sample image to a classification network to be trained;

In a possible embodiment, when determining the confidence of the target loss value in the current training process based on the classification result of the sample image output by the classification network to be trained, the labeling frames of the sample image, and the confidence corresponding to each labeling frame, the training module is configured to:

In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this disclosed embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effect of the detection apparatus, the computer device and the storage medium of the target object, reference is made to the description of the detection method of the target object, and details are not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a target object detection method provided by an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a specific method for cutting out an area image from an image to be detected in the method for detecting a target object according to the embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating that second position information during clipping processing is determined in the target object detection method provided by the embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a specific method for acquiring a sample image for training a classification network in a target object detection method provided by an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a specific method for acquiring a sample image with a second detection frame in the detection method for a target object provided in the embodiment of the present disclosure;

fig. 6 is a flowchart illustrating a specific method for training a classification network in the target object detection method provided by the embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a specific method for determining a target loss value in the current training process in the method for detecting a target object provided in the embodiment of the present disclosure;

fig. 8 is a schematic diagram illustrating an architecture of a target object detection apparatus provided in an embodiment of the present disclosure;

fig. 9 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Research shows that the detection network is sensitive to selection of a detection threshold, and if the selection of the relevant detection threshold is too high, missed detection of a target object may be caused; if the relevant detection threshold is selected too low, false detection of the target object may be caused, and therefore the accuracy of the obtained detection result is low.

Based on the research, the present disclosure provides a method, an apparatus, a computer device, and a storage medium for detecting a target object, in which a classification network trained in advance is used to verify a detection result of a first object detection network, so that even if the first object detection network performs false detection on the target object due to a low detection threshold, the classification network can filter the false detection result, thereby reducing the dependence of the first object detection network on the detection threshold and improving the accuracy of the detection result.

To facilitate understanding of the present embodiment, first, a method for detecting a target object disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the method for detecting a target object provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: terminal equipment or servers or other processing devices. In some possible implementations, the method of detecting the target object may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a method for detecting a target object according to an embodiment of the present disclosure is shown, where the method includes S101 to S104, where:

s101: and acquiring an image to be detected.

S102: and detecting the image to be detected based on a pre-trained first object detection network, and determining a detection result corresponding to the image to be detected.

S103: and cutting out area images respectively corresponding to all the target objects from the image to be detected based on first position information of all the target objects in the image to be detected in the detection result.

S104: inputting the area image into a pre-trained classification network, so as to verify the detection result of the first object detection network through the classification network, and determining the detection result based on the verification result.

The following is a detailed description of the above steps.

For S101 and S102, the detection result may include at least one target object included in the image to be detected and first position information of the at least one target object in the image to be detected, where the target object may be an object such as an automobile or a truck.

For example, the image to be detected is detected based on the first object detection network, and each automobile included in the image to be detected and first position information of each automobile in the image to be detected can be detected.

Illustratively, taking a target object to be detected as an automobile, where the number of the automobiles is 3 as an example, 3 area images can be cut out from the image to be detected based on first position information corresponding to each automobile, and each area image includes 1 automobile.

In one possible embodiment, as shown in fig. 2, the area image corresponding to each target object can be cut out from the image to be detected by the following steps:

s201: and aiming at any one target object, adjusting the first position information corresponding to the target object based on a preset amplification parameter, and determining second position information during cutting processing.

Here, the amplification parameter may be randomly selected from a predetermined numerical range, and the numerical range may be, for example, 0 to 0.3.

Specifically, the first position information may include position coordinates of a plurality of coordinate points, and when the position coordinates of each coordinate point are amplified, different amplification parameters may be selected for different coordinate points, for example, one amplification parameter may be selected to adjust an abscissa and an ordinate of the coordinate point a; or two amplification parameters can be selected to respectively adjust the abscissa and the ordinate of the coordinate point A.

In practical applications, the area corresponding to the first position information is generally a quadrilateral, that is, the first position information includes coordinates of 4 coordinate points, the area corresponding to the first position information can be represented by the coordinates of the 4 coordinate points, and if the corresponding amplification parameters are respectively selected for the 4 coordinate points, excessive computing resources may be consumed, for this reason, any 2 coordinate points in diagonal positions among the 4 coordinate points may be selected, the adjustment range of the current amplification process may be obtained by amplifying the 2 coordinate points in diagonal positions, and the adjustment of the first position information may be completed by using the same adjustment range for the other 2 coordinate points. Therefore, the calculation resources consumed by randomly selecting the amplification parameters can be reduced, and the region image obtained after the cutting processing is rectangular, so that the subsequent classification network can conveniently carry out undistorted image standardization processing on the region image.

For example, a schematic diagram for determining the second position information during the clipping process may be as shown in fig. 3, the first position information corresponding to the target object includes position coordinates (10, 20) of a coordinate point a, position coordinates (20, 20) of a coordinate point B, position coordinates (10, 10) of a coordinate point C, and position coordinates (20, 10) of a coordinate point D, and the coordinate points selected in the diagonal lines are the coordinate point B and the coordinate point C. For the coordinate point C, randomly selected magnification parameters are 0.1 (for the abscissa) and 0.3 (for the ordinate), 10 × (1-0.1) ═ 9, 10 × (1-0.3) ═ 7, and then the position coordinate of the coordinate point C' in the second position information obtained after the magnification parameter adjustment is (9, 7); for the coordinate point B, randomly selected magnification parameters are 0.05 (for the abscissa) and 0.1 (for the ordinate), 20 × (1+0.05) is 21, and 20 × (1+0.1) is 22, and then the position coordinate of the coordinate point B' in the second position information obtained after the magnification parameter adjustment is (21, 22); according to the abscissa of the coordinate point B ' and the ordinate of the coordinate point C ', the position coordinate of the coordinate point D ' can be determined to be (21, 7); from the ordinate of the coordinate point B ' and the abscissa of the coordinate point C ', the position coordinate of the coordinate point a ' can be determined to be (9, 22), and the second position information at the time of the clipping processing can be determined.

Since the enlargement parameter is to enlarge the area corresponding to the first position information, for the coordinate points (e.g., a and C in fig. 3) close to the origin of the coordinate system in the first position information, when the enlargement parameter is actually used to perform the arithmetic processing, the difference between 1 and the enlargement parameter may be used as a coefficient to perform the arithmetic processing; in contrast, for coordinate points (e.g., B and D in fig. 3) far from the origin of the coordinate system in the first position information, when the enlargement parameter is actually used for the calculation processing, the sum of 1 and the enlargement parameter may be used as a coefficient for the calculation processing.

S202: and cutting the image to be detected based on the second position information to obtain a region image corresponding to the target object.

In practical application, since the second position information is obtained after the first position information is adjusted by using the magnification parameter, a situation that a position coordinate corresponding to the second position information exceeds a boundary of the image to be detected may occur, and at this time, the image to be detected may be cropped in any one of the following ways to obtain the area image:

and in the mode 1, filling pixel points in the region exceeding the boundary of the image to be detected according to a preset pixel value to obtain a region image corresponding to the target object.

Here, the preset pixel value may be 0 (corresponding to black color) or 255 (corresponding to white color).

And 2, adjusting the second position information based on the third position information of the boundary of the image to be detected, and cutting the image to be detected based on the adjusted second position information to obtain a region image corresponding to the target object.

Here, a coordinate point where the position coordinate corresponding to the second position information exceeds the boundary of the image to be detected may be determined as a coordinate point to be adjusted, and the position coordinate of the coordinate point to be adjusted may be adjusted using the position coordinate in the corresponding third position information; and determining the position coordinate in the third position information corresponding to the second position information according to the position coordinate of the coordinate point which does not exceed the image to be detected in the second position coordinate and the coordinate point to be adjusted.

For example, taking the area image ABCD in fig. 3 as an example, it may be determined that the coordinate point B and the coordinate point D are coordinate points to be adjusted, and it may be determined that the position coordinate in the third position information corresponding to the coordinate point B is (15, 20) by the intersection point of the line segment AB and the boundary of the image to be adjusted, and then (15, 20) may be used as the position coordinate in the cropping process corresponding to the coordinate point B (i.e., the position coordinate of the position point B in the second position information is adjusted); the position coordinate in the third position information corresponding to the coordinate point D can be determined to be (15, 10) by the intersection point of the line segment CD and the boundary of the image to be detected, and then (15, 10) can be used as the position coordinate in the clipping process corresponding to the coordinate point D (i.e., the position coordinate of the position point D in the second position information is adjusted).

Here, the classification network may be composed of a backbone network and a classification header; the backbone network may adopt networks such as Resnet50 and Resnet34, and may also adopt lightweight networks such as mobilene, and the classification header may be composed of 4 convolutional layers and 2 full-link layers.

Specifically, the output result of the classification network may include two classes (i.e., the classification network is a binary classification network), where 0 (first class) indicates that the region image is not a target object and 1 (second class) indicates that the region image is a target object, and the size of the region image may be uniformly scaled to a target size, such as 64px × 64px, before the region image is input into the classification network. In practical applications, if the output result of the classification network is 0 for any one of the detection results of the first object detection network, it indicates that there is a problem in detecting the target object by the detection result (detection error, non-target object is identified as target object), and the detection result needs to be deleted; and if the output result of the classification network is 1, the detection result indicates that the detection of the target object is correct, and the detection result is the correct detection result corresponding to the target object.

In one possible implementation, as shown in fig. 4, a sample image for training the classification network may be obtained by:

s401: obtaining a sample image with a labeling frame; the labeling frame is used for selecting at least one target object contained in the sample image, and the labeling frame comprises a first detection frame based on manual labeling and a second detection frame based on second object detection network labeling, or the labeling frame comprises a second detection frame based on second object detection network labeling.

Here, since the confidence of each detection box is needed when training the classification network, and the labeling of the confidence of the negative example sample may be subjectively affected by the labeling personnel, the first detection box labeled manually may be an easily identifiable positive example sample.

For example, taking the target object as an automobile as an example, for a classification network, identification of a first detection frame including a complete automobile image is simpler, and therefore the positive example sample may be the first detection frame in which one complete automobile image is framed; the first detection frame containing a part of the car image is difficult to identify, so the negative sample can be the first detection frame in which 20% of the car body image is framed.

In one possible embodiment, as shown in fig. 5, the sample image with the second detection frame may be obtained by:

s4011: an initial sample image is acquired.

S4012: and inputting the initial sample image into the second object detection network to obtain a plurality of second detection frames output by the second object detection network and the confidence corresponding to each second detection frame.

Here, in order to improve the verification capability of the detection result of the first object detection network in the subsequent step, the second object detection network and the first object detection network used may be the same network when determining the training data for training the classification network.

S4013: and determining a second detection frame with the corresponding confidence coefficient meeting a preset threshold condition, and acquiring a sample image carrying the determined second detection frame.

Here, the second detection frame may also include positive examples that are easy to identify and negative examples that are difficult to identify.

Specifically, the second detection frame with the corresponding confidence coefficient lower than the first confidence coefficient threshold value can be determined as a negative sample, so as to improve the classification capability of the classification network on the difficult sample; a second detection box with a corresponding confidence higher than a second confidence threshold higher than the first confidence threshold may also be determined as a due sample to enrich the source of the due sample.

Further, in order to improve the sample accuracy of the negative sample, the negative sample may be manually screened, and the second detection frame that does not meet the requirement may be deleted, for example, the second detection frame whose object type corresponding to the detection frame is inconsistent with the type of the object in the inspection frame may be deleted.

S402: and based on the labeling frame, cutting the sample image to obtain a sample image for training the classification network.

Here, the specific procedure of the clipping process may refer to the related description of fig. 3, and is not described herein again.

Further, after the sample image used for training the classification network is obtained through the clipping processing and before the sample image is input into the classification network, a partial region in the sample image can be randomly selected, and the pixel values of the pixel points included in the partial region are set to be 0, so that the classification robustness of the classification network under the condition that the sample is shielded can be improved.

In one possible implementation, as shown in fig. 6, the classification network may be trained by:

s601: and inputting the sample image to a classification network to be trained.

S602: determining a target loss value in the training process based on the classification result of the sample image output by the classification network to be trained, the labeling frames of the sample image and the confidence degrees corresponding to the labeling frames, and adjusting the network parameter value of the classification network based on the target loss value, wherein the confidence degree corresponding to the first detection frame is a preset confidence degree.

Here, the labeling frame of the sample image is a labeling frame included in the sample image, and when the labeling frame is the first detection frame, the confidence corresponding to the positive example sample may be a preset value of 1; and if the labeling frame is the second detection frame, the confidence of the labeling frame is output by the second object detection network.

In one possible implementation, as shown in fig. 7, the target loss value in the training process may be determined by the following steps:

s6021: and determining a first loss value corresponding to each target object identified in the training process based on the classification result of the sample image output by the classification network to be trained and the labeling frame of the sample image.

S6022: and carrying out weighted summation on the first loss value based on the confidence degrees corresponding to the labeling boxes respectively, and determining a target loss value in the training process.

Illustratively, the first loss value corresponding to the target object 1 is L₁The first loss value corresponding to the target object 2 is L₂The first loss value corresponding to the target object 3 is L₃For example, the confidence corresponding to the labeling frame including the target object 1 is α, the confidence corresponding to the labeling frame including the target object 2 is β, and the confidence corresponding to the labeling frame including the target object 3 is χ, which may be α L according to the target loss function L₁+βL₂+χL₃And calculating the target loss value.

Therefore, the identification difficulty of the negative sample is positively correlated with the confidence coefficient, so that the learning effect of the classification network on the difficult sample can be improved by taking the confidence coefficient as weight and introducing a loss function, and the checking capability on the difficult sample is improved.

According to the target object detection method provided by the embodiment of the disclosure, the detection result of the first object detection network is verified by adopting the pre-trained classification network, so that even if the first object detection network performs false detection on the target object due to the fact that the detection threshold value is set to be low, the false detection result can be filtered by the classification network, the dependence of the first object detection network on the detection threshold value can be reduced, and the accuracy of the detection result is improved.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a device for detecting a target object corresponding to the method for detecting a target object, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the method for detecting a target object in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 8, which is a schematic diagram illustrating an architecture of an apparatus for detecting a target object according to an embodiment of the present disclosure, the apparatus includes: an acquisition module 801, a determination module 802, a cutting module 803 and a verification module 804; wherein,

an obtaining module 801, configured to obtain an image to be detected;

a determining module 802, configured to perform detection on the image to be detected based on a pre-trained first object detection network, and determine a detection result corresponding to the image to be detected;

a cropping module 803, configured to crop, from the image to be detected, area images corresponding to the target objects based on first position information of the target objects in the image to be detected in the detection result;

the checking module 804 is configured to input the area image to a classification network trained in advance, so as to check a detection result of the first object detection network through the classification network, and determine the detection result based on the check result.

In a possible implementation manner, the cropping module 803, when cropping out the area image corresponding to each target object from the image to be detected based on the first position information of each target object in the image to be detected in the detection result, is configured to:

In a possible implementation manner, when the to-be-detected image is clipped based on the second position information to obtain a region image corresponding to the target object, the clipping module 803 is configured to:

In a possible implementation, the obtaining module 801 is further configured to obtain a sample image for training the classification network according to the following steps:

In a possible implementation, the obtaining module 801 is further configured to obtain the sample image with the second detection frame according to the following steps:

acquiring an initial sample image;

In a possible embodiment, the apparatus further comprises a training module 805;

the training module 805 is configured to train the classification network according to the following steps:

inputting the sample image to a classification network to be trained;

In a possible implementation manner, when determining the confidence of the target loss value in the current training process based on the classification result of the sample image output by the classification network to be trained, the labeling boxes of the sample image, and the confidence corresponding to each labeling box, the training module 805 is configured to:

According to the detection device for the target object, the classification network trained in advance is adopted to verify the detection result of the first object detection network, so that even if the first object detection network performs false detection on the target object due to the fact that the detection threshold value is set to be low, the classification network can filter the false detection result, dependence of the first object detection network on the detection threshold value can be reduced, and accuracy of the detection result is improved.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 9, a schematic structural diagram of a computer device 900 provided in the embodiment of the present disclosure includes a processor 901, a memory 902, and a bus 903. The memory 902 is used for storing execution instructions, and includes a memory 9021 and an external memory 9022; the memory 9021 is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 901 and data exchanged with an external memory 9022 such as a hard disk, the processor 901 exchanges data with the external memory 9022 through the memory 9021, and when the computer device 900 is operated, the processor 901 communicates with the memory 902 through the bus 903, so that the processor 901 executes the following instructions:

acquiring an image to be detected;

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, executes the steps of the method for detecting a target object in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the method for detecting a target object in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of detecting a target object, comprising:

acquiring an image to be detected;

2. The method according to claim 1, wherein the cropping out, from the image to be detected, a region image corresponding to each target object based on first position information of each target object in the image to be detected in the detection result comprises:

3. The method according to claim 2, wherein the cropping the image to be detected based on the second position information to obtain a region image corresponding to the target object includes:

4. The method according to any one of claims 1 to 3, wherein the sample images for training the classification network are obtained by:

5. The method of claim 4, wherein the sample image with the second detection frame is obtained by:

acquiring an initial sample image;

6. The method of claim 5, wherein the second object detection network is the same network as the first object detection network.

7. The method according to any one of claims 4 to 6, wherein the classification network is trained by the following steps:

inputting the sample image to a classification network to be trained;

8. The method according to claim 7, wherein the determining a target loss value in the training process based on the classification result of the sample image output by the classification network to be trained, the labeling frames of the sample image, and the confidence degrees corresponding to the labeling frames comprises:

determining a first loss value corresponding to each target object identified in the training process based on the classification result of the sample image output by the classification network to be trained and the marking frame of the sample image;

and carrying out weighted summation on the first loss value based on the confidence degrees corresponding to the labeling boxes respectively, and determining a target loss value in the training process.

9. An apparatus for detecting a target object, comprising:

the acquisition module is used for acquiring an image to be detected;

10. A computer device, comprising: processor, memory and bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when a computer device is running, the machine-readable instructions when executed by the processor performing the steps of the method of detecting a target object according to any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of detecting a target object according to any one of claims 1 to 8.