CN113837173A

CN113837173A - Target object detection method, device, computer equipment and storage medium

Info

Publication number: CN113837173A
Application number: CN202010586550.1A
Authority: CN
Inventors: 刘文龙
Original assignee: SF Technology Co Ltd
Current assignee: SF Technology Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2021-12-24
Anticipated expiration: 2040-06-24
Also published as: CN113837173B

Abstract

The present application relates to a target object detection method, apparatus, computer equipment and storage medium. The method includes: acquiring a target image; performing target object detection on the target image through a trained target detection model to obtain an initial detection frame and corresponding confidence level corresponding to each target object in the target image; the target The detection model is pre-trained based on the neural network model after dynamically adjusting the network structure; candidate detection frames are selected from the initial detection frame according to the confidence; target detection frames that meet the requirements of the detection scene are selected from the candidate detection frames frame. The method can improve the detection accuracy and robustness of target object detection.

Description

Target object detection method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of target detection technologies, and in particular, to a target object detection method, an apparatus, a computer device, and a storage medium.

Background

Digitization and automation are development directions of various industries, and in scenes such as tobacco, postal service, medicine, logistics, chain supermarkets, department stores, manufacturing industry and the like, an automatic sorting line is usually set to realize detection of a target object and perform subsequent digitization business based on the detected target object. Therefore, how to detect the target object on the sorting line is a matter of concern.

Conventionally, a target object is generally detected from a target image based on image graphics operations such as image blurring, image binarization, opening and closing operations, and edge detection. However, this target object detection method is difficult to adapt to complicated and various production environments and target objects of various materials in a detection scene, and has problems of low detection accuracy, poor robustness, and the like.

Disclosure of Invention

In view of the above, it is necessary to provide a target object detection method, an apparatus, a computer device, and a storage medium capable of improving detection accuracy and robustness of target object detection.

A target object detection method, the method comprising:

acquiring a target image;

performing target object detection on the target image through a trained target detection model to obtain an initial detection frame and corresponding confidence coefficient corresponding to each target object in the target image; the target detection model is obtained by pre-training a neural network model after dynamically adjusting a network structure;

selecting a candidate detection frame from the initial detection frames according to the confidence;

and screening target detection frames meeting the requirements of the detection scene from the candidate detection frames.

In one embodiment, the screening, from the candidate detection frames, a target detection frame meeting the requirements of a detection scenario includes:

acquiring a preset limiting condition aiming at a incomplete object detection box;

and removing the candidate detection frame matched with the limiting condition from the candidate detection frames to obtain a target detection frame.

sequencing the candidate detection frames according to the sequence of the sizes of the detection frames from large to small to obtain a candidate detection frame sequence;

taking the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as a reference detection frame;

removing the candidate detection frame with the occupation ratio larger than a preset occupation ratio threshold value with the reference detection frame from the candidate detection frame sequence;

using the candidate detection frames which are remained after the elimination processing and do not comprise the reference detection frame as a new candidate detection frame sequence, and

and returning to the step of taking the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as the reference detection frame, and continuing to execute until the number of the remaining candidate detection frames which do not comprise the reference detection frame after the elimination processing is less than two, stopping iteration, and obtaining the target detection frame.

In one embodiment, the removing, from the candidate detection frame sequence, a candidate detection frame whose occupation ratio with the reference detection frame is greater than a preset occupation ratio threshold includes:

calculating the occupation ratio between the reference detection frame and other remaining candidate detection frames;

determining a target combination with a corresponding duty ratio larger than a preset duty ratio threshold value in each combination of the reference detection frame and other remaining candidate detection frames;

and removing the candidate detection frame with the minimum detection frame size in each target combination from the candidate detection frame sequence.

In one embodiment, the training step of the target detection model includes:

acquiring a training sample set;

constructing a neural network model after dynamically adjusting the network structure;

and training the constructed neural network model based on the training sample set to obtain a trained target detection model.

In one embodiment, the training the constructed neural network model based on the training sample set to obtain a trained target detection model includes:

training the constructed neural network model based on the training sample set to obtain a trained initial detection model;

and quantifying the model parameters of the initial detection model, and integrating the convolution layer, the batch normalization layer and the activation layer in the initial detection model to obtain the trained target detection model.

In one embodiment, the acquiring the target image includes:

acquiring an initial image to be detected;

and carrying out image preprocessing on the initial image to obtain a target image.

A target object detection apparatus, the apparatus comprising:

the acquisition module is used for acquiring a target image;

the detection module is used for detecting the target object of the target image through the trained target detection model to obtain an initial detection frame and corresponding confidence coefficient corresponding to each target object in the target image; the target detection model is obtained by pre-training a neural network model after dynamically adjusting a network structure;

a selecting module, configured to select a candidate detection frame from the initial detection frames according to the confidence;

and the screening module is used for screening the target detection frames meeting the detection scene requirements from the candidate detection frames.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the above-described method embodiments when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method embodiments described below.

According to the target object detection method, the device, the computer equipment and the storage medium, the trained target detection model is used for detecting the target object of the acquired target image, the initial detection frame and the confidence coefficient corresponding to each target object can be quickly and accurately obtained, the candidate detection frame is selected from the corresponding initial detection frames based on the confidence coefficient with higher accuracy, and the target detection frame is further screened from the candidate detection frames based on the detection scene requirement, so that the target object can be quickly and accurately positioned from the target image based on the target detection frame meeting the detection scene requirement, the detection speed and the detection precision can be improved. Furthermore, the target detection model is obtained by training the neural network model after the network structure is dynamically adjusted, and the target detection model is used for detecting the target object of the target image, so that the detection speed and the detection precision can be further improved. Therefore, the target detection frame corresponding to the target object in the target image is determined by combining the target detection model and the detection scene requirements, so that the target object is detected, the method and the device can be suitable for detecting the target objects of various materials in different production environments under corresponding detection scenes, have strong robustness and can improve the detection precision.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a target object detection method;

FIG. 2 is a schematic flow chart diagram illustrating a method for detecting a target object in one embodiment;

FIG. 3 is a schematic diagram of a preconfigured incomplete object detection box in one embodiment;

FIG. 4 is a diagram illustrating the presence of a box-in-box in candidate detection boxes corresponding to a target image in one embodiment;

FIG. 5 is a schematic diagram illustrating model optimization performed on a trained initial detection model to obtain a target detection model in one embodiment;

FIG. 6 is a schematic flow chart diagram illustrating a target object detection method according to another embodiment;

FIG. 7 is a schematic diagram of target object detection in one embodiment;

FIG. 8 is a block diagram showing the structure of a target object detection apparatus according to an embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target object detection method provided by the application can be applied to the application environment shown in fig. 1. Wherein the detection device 102 communicates with the management server 104 via a network. The detection device 102 obtains a target image, performs target object detection on the target image through a trained target detection model to obtain an initial detection frame and a corresponding confidence coefficient corresponding to each target object in the target image, selects a candidate detection frame from the initial detection frames according to the confidence coefficient, and screens the target detection frame meeting the detection scene requirement from the candidate detection frames, wherein the target detection model is obtained by pre-training a neural network model after dynamically adjusting a network structure. The detection device 102 may send the target image and the corresponding target detection frame to the management server 104. The detection device 102 may be a terminal or a detection server, and the terminal may be, but is not limited to, various personal computers, laptops, tablet computers, and portable wearable devices, and other integrated devices capable of implementing target object detection, and a GPU (Graphics Processing Unit) may be specifically integrated on the integrated device. The management server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a method for detecting a target object is provided, which is described by taking the method as an example applied to the detection apparatus in fig. 1, and includes the following steps:

step 202, a target image is acquired.

In one embodiment, a detection device receives a target image captured and transmitted by an image capture device. Image capture devices include, but are not limited to, industrial cameras and webcams. The image acquisition equipment can be used as a component part and arranged in the detection equipment, and can also be used as independent equipment for deployment. When the image acquisition device is deployed as an independent device, the image acquisition device can communicate with the detection device through a network and can also be electrically connected with the detection device. When the image acquisition equipment is communicated with the detection equipment through the network, the detection equipment and the image acquisition equipment can be deployed in a production environment at the same time, so that the network delay of the image acquisition equipment for sending the target image to the detection equipment through the network is reduced, and the detection efficiency can be improved.

In one embodiment, the target objects are conveyed and sorted by a sorting line deployed in a production environment, and corresponding target images are collected for the target objects conveyed on the sorting line by an image collecting device, so that the target objects are detected by a detecting device according to the target images. Taking a logistics scene as an example, the production environment may be a logistics distribution center or a parcel transit, and the parcel transit may specifically be an express transit.

In one embodiment, the image acquisition device periodically acquires the target image according to a preset acquisition cycle and sends the acquired target image to the detection device, so that the detection device executes corresponding target object detection operation based on the periodically acquired target image. The preset acquisition period refers to a time interval between two adjacent acquisition of the target image, such as 2 seconds. It can be understood that the preset collection period may specifically coincide with a time interval during which the target object loaded on the sorting line passes through the image collection device, so that it can be ensured that the image collection device collects one target image for each target object on the sorting line as far as possible.

Step 204, carrying out target object detection on the target image through the trained target detection model to obtain an initial detection frame and corresponding confidence coefficient corresponding to each target object in the target image; the target detection model is obtained by pre-training a neural network model after dynamically adjusting the network structure.

The target object refers to an object to be detected in the target image. The target objects in different detection scenes are usually different, for example, the target objects in a logistics scene are packages, and for example, the target objects in a medical scene are medical supplies. The initial detection box can be used to roughly locate the corresponding target object in the target image. Each target object may correspond to one or more initial detection boxes. The confidence level refers to the credibility of the target object to be detected in the target image based on the corresponding initial detection frame.

Specifically, in a training stage of the target detection model, a neural network model with a dynamically adjusted network structure is constructed, the constructed neural network model is trained based on a pre-obtained training sample set to obtain a trained target detection model, and the trained target detection model is deployed locally on the detection equipment. After the target image to be detected is obtained, the detection equipment detects the target object of the target image to be detected through the trained target detection model, and obtains an initial detection frame corresponding to each target object in the target image and a confidence coefficient corresponding to each initial detection frame.

In one embodiment, for each target object that can be detected in the target image, the detection device can obtain, through the trained target detection model, one or more initial detection frames corresponding to each target object. It can be understood that if the detection precision of the target detection model is not high enough, the target detection model may erroneously detect other objects in the target image as target objects to be detected, and output corresponding initial detection frames and confidence levels, and by the post-processing operation provided by the present application, based on the confidence level corresponding to each initial detection frame and the detection scene requirement, the initial detection frames corresponding to the target objects with the erroneous detection can be filtered, so that the target objects to be detected in the target image under the corresponding detection scene can be accurately detected.

In an embodiment, the neural network model after dynamically adjusting the network structure may replace a pooling layer in the network structure of the existing neural network model with a convolutional layer, specifically, may replace a largest pooling layer in the network structure with a convolutional layer, and may further update a kernel size of the largest pooling layer, and use the updated kernel size as the kernel size of the replaced convolutional layer. Therefore, the network structure of the neural network model is adjusted, the target detection model is obtained based on the neural network model after the network structure is adjusted through training, and compared with the target detection model obtained based on the neural network model without the network structure, the detection precision can be improved under the condition that the high-efficiency detection speed is ensured.

And step 206, selecting candidate detection frames from the initial detection frames according to the confidence.

The candidate detection frame is a detection frame that may be selected as a target detection frame corresponding to the corresponding target object.

Specifically, the detection device selects candidate detection frames with the confidence degrees larger than or equal to a confidence degree threshold value from the initial detection frames corresponding to the target image through non-maximum suppression post-processing operation according to the confidence degrees of the initial detection frames corresponding to the target image and each initial detection frame.

In one embodiment, the step of screening the initial detection frame for a candidate detection frame through a non-maximum suppression post-processing operation may specifically include: if the target image corresponds to an initial detection frame and the confidence corresponding to the initial detection frame is greater than or equal to the confidence threshold, the detection equipment determines the initial detection frame as a candidate detection frame; if the target image corresponds to a plurality of initial detection frames, the detection equipment selects the initial detection frame with the maximum confidence level from the plurality of initial detection frames as the currently selected candidate detection frame, eliminates the initial detection frame with the coincidence degree of the currently selected candidate detection frame larger than or equal to the coincidence degree threshold value from the plurality of initial detection frames, selects the initial detection frame with the maximum confidence level from the remaining initial detection frames which are not selected as the candidate detection frames as the currently selected candidate detection frame, and continuously executes the relevant steps of eliminating the detection frames aiming at the currently selected candidate detection frame until all the candidate detection frames are selected.

The confidence threshold is used for comparing with the confidence of the initial detection frame to screen the initial detection frame with the confidence greater than or equal to the confidence threshold from the initial detection frame as a candidate detection frame, and it can be understood that when the confidence of the initial detection frame is greater than or equal to the confidence threshold, it indicates that the corresponding target object located from the target image based on the initial detection frame is more reliable. The coincidence degree threshold is used for comparing with coincidence degrees of the initial detection frame and the currently selected candidate detection frame to eliminate the initial detection frame with higher coincidence degree with the currently selected candidate detection frame from the initial detection frames, and it can be understood that when one target object in the target image corresponds to a plurality of initial detection frames, coincidence degree between the plurality of initial detection frames is relatively higher, so that the initial detection frame with the highest confidence degree is taken as the candidate detection frame corresponding to the target object in the plurality of initial detection frames corresponding to the target object according to the above manner, and other initial detection frames with higher coincidence degree with the initial detection frame are eliminated. It can be understood that, after the candidate detection frames are selected from the plurality of initial detection frames corresponding to the target image in the above manner, a candidate detection frame with a confidence greater than or equal to the confidence threshold is further selected from the selected candidate detection frames as a candidate detection frame finally selected from the initial detection frames. The confidence threshold and the overlap threshold may be customized according to practical situations, for example, the confidence threshold is 80%, and the overlap threshold is 70%, which are not specifically limited herein.

And 208, screening target detection frames meeting the requirements of the detection scene from the candidate detection frames.

The detection scene refers to a scene where a target object to be detected is currently located, such as medical treatment, logistics, manufacturing industry and the like. The detection scene requirement refers to a requirement or a condition that a target object needs to meet in a corresponding detection scene, and specifically may refer to a requirement or a condition that a target detection frame detected for a target object that needs to be detected in a target image needs to meet in the target image. The detection scene requires that, for example, a target detection frame corresponding to a target object does not belong to a incomplete object detection frame in a corresponding target image, that is, a limitation condition corresponding to a preconfigured incomplete object detection frame is not matched with the target detection frame. For example, each target object corresponds to only one target detection frame in the corresponding target image, it can be understood that if two candidate detection frames are detected for one target object in the target image and the coincidence ratio of the two candidate detection frames is relatively high, which is called "frame-in-frame", the coincidence areas of the two candidate detection frames cover the same features of the target object, and thus one target detection frame corresponding to the target object needs to be selected from the two candidate detection frames.

Specifically, after the detection device selects candidate detection frames from initial detection frames corresponding to the target image based on the confidence, scene detection requirements preconfigured for the current detection scene are obtained, and each candidate detection frame is compared with the scene detection requirements, so that the target detection frame corresponding to the corresponding target object is screened from the selected candidate detection frames according to the comparison result, and therefore the corresponding target object can be positioned in the corresponding target image based on the target detection frame.

In one embodiment, the target detection frames meeting the requirements of the detection scene are screened out from the candidate detection frames, so that corresponding target objects are extracted from the target image based on the target detection frames, and subsequent digital services are performed based on the extracted target objects.

In one embodiment, if none of the candidate detection frames corresponding to the target image meets the detection scene requirement, the target detection frame meeting the detection scene requirement is not screened for the target image, so that it can be determined that the target object which needs to be detected and meets the detection scene requirement does not exist in the target image.

In one embodiment, in order to facilitate performing subsequent numerical services based on the detected target object, each target image is generally required to include and only include one target object to be detected, so that a target detection frame can be determined for each target image by the above-mentioned target object detection manner, so as to position the target object to be detected in the target image based on the target detection frame. If two or more target detection frames are determined for the target image according to the target object detection mode, which may indicate that the target image includes two target objects to be detected, the target image may be discarded, and corresponding warning information may be triggered for the target image. If the target detection frame meeting the scene detection requirement is not determined for the target image according to the target object detection mode, which may indicate that the target object to be detected is not placed on the sorting line as required, the target image may be discarded, and corresponding alarm information may be triggered for the target image.

According to the target object detection method, the trained target detection model is used for detecting the target object of the acquired target image, the initial detection frame and the confidence coefficient corresponding to each target object can be quickly and accurately obtained, the candidate detection frame is selected from the corresponding initial detection frames based on the confidence coefficient with higher accuracy, and the target detection frame is further screened from the candidate detection frames based on the detection scene requirement, so that the target object can be quickly and accurately positioned from the target image based on the target detection frame meeting the detection scene requirement, namely the detection of the target object is realized, and the detection speed and the detection precision can be improved. Furthermore, the target detection model is obtained by training the neural network model after the network structure is dynamically adjusted, and the target detection model is used for detecting the target object of the target image, so that the detection speed and the detection precision can be further improved. Therefore, the target detection frame corresponding to the target object in the target image is determined by combining the target detection model and the detection scene requirements, so that the target object is detected, the method and the device can be suitable for detecting the target objects of various materials in different production environments under corresponding detection scenes, have strong robustness and can improve the detection precision.

In one embodiment, step 208 includes: acquiring a preset limiting condition aiming at a incomplete object detection box; and removing the candidate detection frames matched with the limiting conditions from the candidate detection frames to obtain the target detection frame.

The incomplete object detection frame refers to a detection frame corresponding to an incomplete target object, and specifically may refer to a detection frame detected for an incomplete target object in a target image. The incomplete target object is a target object which is displayed incompletely in the target image, and does not mean that the target object is incomplete, namely that the target image only acquires partial regions or characteristics of the target image. The limiting condition corresponding to the incomplete object detection frame is a condition or basis for determining whether the candidate detection frame matches with the corresponding incomplete object detection frame, that is, for determining whether the candidate detection frame is the incomplete object detection frame.

Specifically, the detection device is preconfigured with corresponding incomplete object detection frames for the target image, and is preconfigured with corresponding limiting conditions for each incomplete object detection frame. In the target object detection process, the detection device acquires a limiting condition pre-configured for each incomplete object detection frame, and compares each candidate detection frame corresponding to the target image with the limiting condition corresponding to each incomplete object detection frame respectively. And when the candidate detection frame is judged to be matched with the limit condition corresponding to any incomplete object detection frame, judging that the candidate detection frame is the incomplete object detection frame, eliminating the candidate detection frame judged as the incomplete object detection frame from the candidate detection frames of the target image object, and determining the candidate detection frame after the incomplete object detection frame is eliminated as the target detection frame corresponding to the target image. The detection device may determine, as the target detection frame, a candidate detection frame that does not match all the defining conditions, based on a result of comparison between each candidate object detection frame and each defining condition. It can be understood that, in the target object detection process, since the image sizes of the target images are consistent, the same defective object detection frame is pre-configured for each target image.

In one embodiment, there are a plurality of default object detection boxes, and a corresponding limitation condition is preconfigured for each default object detection box. The incomplete object detection frame may specifically include an upper incomplete object detection frame, a lower incomplete object detection frame, a right incomplete object detection frame, a left object detection frame, and the like. The limiting condition of the above incomplete object detection frame, such as the maximum ordinate of the candidate detection frame is smaller than the first preset ratio of the height of the target image, i.e. y_max<h a, a limiting condition of the lower incomplete object detection frame, for example, the minimum ordinate of the candidate detection frame is greater than the second preset ratio of the height of the target image, that is, y_min>h (1-a), a limiting condition of the left incomplete object detection frame, such as a first preset ratio that the maximum abscissa of the candidate detection frame is smaller than the width of the target image, that is, x_max<w a, limit conditions for the detection frame of the right incomplete object, e.g. candidate detectionThe minimum abscissa of the measuring frame is larger than the second preset ratio of the width of the target image, namely x_min>w (1-a), wherein, y_max、y_min、x_max、x_minRespectively representing the maximum ordinate, the minimum ordinate, the maximum abscissa and the minimum abscissa of the candidate detection frame, and h, w, a and (1-a) respectively representing the height of a target image, the width of the target image, a first preset ratio and a second preset ratio corresponding to the target image. The first preset proportion refers to a preset percentage or percentage, and can be customized according to the actual situation, such as 20%. The sum of the first preset proportion and the second preset proportion is 1.

Fig. 3 is a schematic diagram of a pre-configured incomplete object detection box in an embodiment. As shown in fig. 3, four cull object detection boxes are preconfigured for the target image, where reference numeral 301 denotes an upper cull object detection box, reference numeral 302 denotes a lower cull object detection box, reference numeral 303 denotes a right cull object detection box, and reference numeral 304 denotes a left object detection box. In the target object detection process, based on the above-mentioned limiting conditions, the detection frame data corresponding to each candidate detection frame is compared with the limiting conditions corresponding to each incomplete object detection frame, so as to screen the target detection frame according to the comparison result, for example, if the maximum vertical coordinate of the candidate detection frame is smaller than the first preset ratio of the height of the target image, that is, if the lower frame of the candidate detection frame is located in the upper incomplete object detection frame shown by reference numeral 301, it is determined that the candidate detection frame is the upper incomplete object detection frame. Thus, if the lower frame of the candidate detection frame is in the upper incomplete object detection frame, or the upper frame of the candidate detection frame is in the lower incomplete object detection frame, or the left frame of the candidate detection frame is in the right incomplete object detection frame, or the right frame of the candidate detection frame is in the left incomplete object detection frame, the candidate detection frame is determined to be the incomplete object detection frame.

The incomplete object detection frame shown in fig. 3 is only used as an example, and is not used for specific limitation, and it can be understood that, since the upper incomplete object detection frame is only used for limiting the lower border of the incomplete object detection frame, the upper incomplete object detection frame shown in fig. 3 only needs to have a height that is consistent with the first preset ratio of the height of the target image, and the width of the upper incomplete object detection frame is not specifically required, for example, the width of the upper incomplete object detection frame may also be consistent with the width of the target image. Reference numeral 305 shown in fig. 3 denotes a reference image of the same size as the target image, for illustrating the relationship between each of the damaged object detection frames and the target image. Reference numeral 306 shown in fig. 3 denotes a position area where the target object is located in the corresponding target image in a normal case, that is, denotes a position area where the target detection frame corresponding to the target object is located in the corresponding target image.

In the above embodiment, based on the limiting condition corresponding to the incomplete object detection frame, the candidate detection frames belonging to the incomplete object detection frame are removed from the candidate detection frames, so as to obtain the target detection frame corresponding to the target object to be detected in the target image.

In one embodiment, step 208 includes: sequencing the candidate detection frames according to the sequence of the sizes of the detection frames from large to small to obtain a candidate detection frame sequence; taking a candidate detection frame with the largest detection frame size in the candidate detection frame sequence as a reference detection frame; removing candidate detection frames with the occupation ratio greater than a preset occupation ratio threshold value with the reference detection frame from the candidate detection frame sequence; and taking the candidate detection frames which are remained after the elimination processing and do not comprise the reference detection frame as a new candidate detection frame sequence, returning to the step of taking the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as the reference detection frame, and continuing to execute until the number of the candidate detection frames which are remained after the elimination processing and do not comprise the reference detection frame is less than two, stopping iteration, and obtaining the target detection frame.

The candidate detection frame sequence is obtained by sequencing a plurality of candidate detection frames according to the sequence of the sizes of the detection frames from large to small. The detection frame size refers to the size of the candidate detection frame, and specifically may refer to the product of the height and the width of the candidate detection frame. The reference detection frame is a candidate detection frame which is used for comparing with other candidate detection frames and determining whether to reject other candidate detection frames in the process of the next iteration loop. The occupation ratio refers to a reference detection frame and a candidate detection frameThe ratio of the area of intersection between the reference detection frame and the candidate detection frame to the minimum area among the areas corresponding to the reference detection frame and the candidate detection frame, that is, the ratio

Where mlou denotes an intersection ratio between the reference detection frame and the candidate detection frame, | a ≦ B | denotes an intersection area between the reference detection frame a and the candidate detection frame B, and min (a, B) denotes an area minimum value between the area of the reference detection frame a and the area of the candidate detection frame B, for example, if the area of the reference detection frame a is larger than the area of the candidate detection frame B, min (a, B) is the area of the reference detection frame a. The preset duty ratio threshold can be customized according to actual conditions, such as 80%.

Specifically, if a candidate detection frame is selected from the initial detection frames corresponding to the target image, the detection device determines the candidate detection frame as the target detection frame. And if a plurality of candidate detection frames are selected from the initial detection frames corresponding to the target image, the detection equipment sorts the candidate detection frames according to the sequence of the sizes of the detection frames from large to small to obtain a corresponding candidate detection frame sequence. The detection equipment determines the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as the current reference detection frame, calculates the occupation ratio between the reference detection frame and each other candidate detection frame in the candidate detection frame sequence, and eliminates the candidate detection frame with the occupation ratio larger than the preset occupation ratio threshold from the candidate detection frame sequence according to the occupation ratio obtained by calculation. After the detection equipment eliminates each candidate detection frame with the occupation ratio to the reference detection frame larger than a preset occupation ratio threshold value from the candidate detection frame sequence, if the number of the candidate detection frames which are remained in the candidate detection frame sequence and do not include the reference detection frame is larger than or equal to two, a new candidate detection frame sequence is obtained according to the candidate detection frames which are remained in the candidate detection frame sequence and do not include the reference detection frame. And for the new candidate detection frame sequence, the detection equipment takes the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as the current reference detection frame, and returns to the step of calculating the occupation ratio between the reference detection frame and each other candidate detection frame in the candidate detection frame sequence to continue to execute until the iteration process is stopped when the number of the remaining candidate detection frames in the candidate detection frame sequence targeted by the next iteration cycle and not including the reference detection frame in the current iteration cycle process is less than two, so as to obtain the target detection frame corresponding to the target image.

In one embodiment, if two candidate detection frames with an occupation ratio larger than a preset occupation ratio threshold exist in the candidate detection frames corresponding to the target image, the two candidate detection frames can be understood as "frame-in-frame", through the iterative removal processing operation, the candidate detection frame with a smaller detection frame size can be removed from the "frame-in-frame", and the remaining candidate detection frame with a larger detection frame size is determined as the target detection frame.

FIG. 4 is a diagram illustrating the presence of a frame-in-frame in candidate detection frames corresponding to a target image in one embodiment. As shown in fig. 4, reference numeral 401 denotes an image having the same size as the image of the target image,

reference numerals

402 and 403 denote one candidate detection frame corresponding to the target image, respectively, and since the occupation ratio between the two candidate detection frames is greater than the preset occupation ratio threshold, it is determined that the candidate detection frame denoted by reference numeral 403 is the frame-in-frame of the candidate detection frame denoted by reference numeral 402, and thus it is necessary to eliminate the candidate detection frame denoted by reference numeral 403 and determine the candidate detection frame denoted by reference numeral 402 as the target detection frame corresponding to the target image.

In an embodiment, after the detection device selects candidate detection frames from initial detection frames corresponding to the target image, candidate detection frames matched with the limiting conditions corresponding to the pre-configured incomplete object detection frames are removed from the selected candidate detection frames to obtain the candidate detection frames from which the incomplete object detection frames are removed, and frame middle frames are removed according to the iteration mode aiming at the candidate detection frames from which the incomplete object detection frames are removed to obtain the target detection frames corresponding to the target image. It can be understood that the detection device may also remove the frame middle frame from the initially acquired candidate detection frames, and then remove the incomplete object detection frame from the candidate detection frames from which the frame middle frame is removed.

In the above embodiment, based on the size of each candidate detection frame and the occupation ratio between the candidate detection frames, the frame middle frame is removed from the candidate detection frames corresponding to the target image in an iterative manner, so as to obtain the target detection frame corresponding to the target object to be detected in the target image, and the detection accuracy of the target detection frame can be improved.

In one embodiment, the step of removing the candidate detection frame from the candidate detection frame sequence, in which the occupation ratio to the reference detection frame is greater than the preset occupation ratio threshold value, includes: calculating the occupation ratio between the reference detection frame and other residual candidate detection frames; determining a target combination with the corresponding occupation ratio larger than a preset occupation ratio threshold value in each combination of the reference detection frame and other remaining candidate detection frames; and removing the candidate detection frame with the smallest detection frame size in each target combination from the candidate detection frame sequence.

Specifically, in each iteration process, after selecting a reference detection frame from the candidate detection frame sequence targeted by the current iteration process, the detection device calculates an occupation ratio between the reference detection frame and each of the other candidate detection frames in the candidate detection frame sequence, and uses the reference detection frame and each of the other candidate detection frames as a combination to obtain one or more combinations corresponding to the candidate detection frame sequence, where each combination includes the reference detection frame and one candidate detection frame in the candidate detection frame sequence, and uses the occupation ratio between the reference detection frame and the candidate detection frame in each combination as the occupation ratio corresponding to the combination to obtain the occupation ratio corresponding to each combination. And the detection equipment determines the combination with the occupation ratio larger than a preset occupation ratio threshold value as a target combination according to the calculated occupation ratio, determines a candidate detection frame with the minimum detection frame size from each target combination, and eliminates the determined candidate detection frame from a corresponding candidate detection frame sequence.

In one embodiment, the detection device sequentially calculates the occupation ratio between the reference detection frame and each of the other candidate detection frames, removes the candidate detection frame with the smallest detection frame size in the combination corresponding to the occupation ratio from the candidate detection frame sequence when the currently calculated occupation ratio is greater than a preset occupation ratio threshold, or determines a target combination based on the calculated occupation ratio after calculating the occupation ratio corresponding to each combination, determines a candidate detection frame to be deleted based on the determined target organization, and deletes the determined candidate detection frame from the candidate detection frame sequence.

In the above embodiment, based on the duty ratio and the preset duty ratio threshold, the candidate detection frame with the smaller detection frame size is removed from the two candidate detection frames with the duty ratio larger than the preset duty ratio threshold, the candidate detection frame with the larger detection frame size is retained, and the target detection frame is obtained through iterative removal processing, where the target detection frame is the candidate detection frame with the larger detection frame size in the multiple candidate detection frames corresponding to the same target object, so that the corresponding target object can be accurately positioned from the target image based on the target detection frame.

In one embodiment, the training step of the target detection model includes: acquiring a training sample set; constructing a neural network model after dynamically adjusting the network structure; and training the constructed neural network model based on the training sample set to obtain a trained target detection model.

Specifically, the training step of the target detection model is performed by a model training device, and includes: the model training equipment acquires a plurality of sample images, manually labels each target object in each sample image to obtain a sample detection frame corresponding to each sample image, and obtains a training sample set according to the sample images and the corresponding sample detection frames. Accordingly, the model training apparatus dynamically adjusts the network structure of the neural network, and constructs an initial neural network model, which is a model that has not been model-trained and cannot be used to detect a corresponding target object from a target image, based on the neural network after the network structure is adjusted. Further, the model training device trains the constructed neural network model according to the acquired training sample set to obtain a trained target detection model, and the trained target detection model is a model which can be used for quickly and accurately extracting a corresponding target object from a target image. And the model training equipment takes the sample images in the training sample set as input features and takes the corresponding sample detection boxes in the training sample set as expected output features, and trains the constructed neural network model to obtain a trained target detection model.

In an embodiment, the model training device may be a detection device for performing target object detection, a management server, or a model training server dedicated to training a model, and is not limited in particular herein.

In one embodiment, the model training device obtains a test sample set, the test sample set includes test images and a test detection frame obtained by manually labeling each test image, after a corresponding target detection model is obtained based on training of the training sample set, the target detection model is tested through the test sample set, if the target detection model is judged to meet preset training requirements based on a test result, the target detection model is judged to be a trained target detection model, otherwise, the target detection model is continuously trained based on a new training sample set, and the trained target detection model is obtained.

In one embodiment, the sample images in the training sample set and the test images in the test sample set are acquired from a sorting line by an image acquisition device. The image acquisition equipment can send the acquired test images and sample images to the data storage equipment through the industrial personal computer, so that the model training equipment can conveniently acquire corresponding sample images and test images from the data storage equipment. It can be understood that the sample images in the training sample set and the test images in the test sample set can be collected from different production environments in different detection scenes, so that the target detection model obtained based on the training of the training sample set can be suitable for different production environments in different detection scenes, and can also be collected from different production environments in the same detection scene, so that the target detection model obtained based on the training of the training sample set can have higher detection accuracy and speed in each production environment in the detection scene.

In one embodiment, the neural network model may be constructed based on YOLOv3 after the network structure is dynamically adjusted, and specifically may be constructed based on YOLOv3-tiny after the network structure is dynamically adjusted. Taking the initial neural network YOLOv3-tiny as an example, the model training apparatus dynamically adjusts each maxpool layer (max pooling layer) in the initial neural network with kernel _ size of 2 and strings of 2 to a pooling layer of kernel _ size of 3 and strings of 2 (step), and dynamically adjusts each maxpool layer in the initial neural network with kernel _ size of 2 and strings of 1 to a pooling layer of kernel _ size of 3 and strings of 3 and 1, that is, each max pooling layer in the initial neural network with kernel size of 2 and steps of 2, and dynamically adjusts each max pooling layer in the initial neural network with kernel size of 3 and 2 to a kernel size of 3 and 2 and core size of 2 and 1.

In one embodiment, the model training device performs image preprocessing on the sample image and the test image, constructs a training sample set based on the preprocessed sample image, and constructs a test sample set based on the preprocessed test image. Image pre-processing includes, but is not limited to, image scaling, image filling, and image normalization.

In one embodiment, in the model training process, the model training device calculates a difference value between a prediction detection frame output by the neural network model for a sample image and a sample detection frame corresponding to the sample image, and dynamically adjusts a model parameter of the neural network model in a direction of decreasing a loss function of the neural network model through a reverse gradient transfer manner based on the difference value. The difference value between the prediction detection frame and the corresponding sample detection frame can be correspondingly determined according to the respective corresponding areas of the prediction detection frame and the sample detection frame and the preset mapping relation. The preset mapping relation is as follows:

wherein GIoU represents a difference value between the prediction detection frame a and the sample detection frame B, IoU represents an intersection-merge ratio between the prediction detection frame a and the sample detection frame B, a ≈ B represents an intersection of the prediction detection frame a and the sample detection frame B, and both a ≈ B and U represent a union of the prediction detection frame a and the sample detection frame B. A. the_cThe area of the minimum occlusion region of the prediction box a and the sample box B is represented, that is, the area of the minimum rectangular box including both the prediction box a and the sample box B is represented.

It can be understood that, in the existing neural network model, the difference value between the prediction detection frame and the sample detection frame is usually evaluated through a cross-over ratio, the difference value represented through the cross-over ratio can be used for representing the distance between the prediction detection frame and the sample detection frame, but the stacking condition of the target object in the sample image cannot be truly reflected, and the difference value calculated by aiming at the prediction detection frame and the sample detection frame based on the preset mapping relation can truly reflect the stacking condition of the target object in the sample image, so that the model parameter is dynamically adjusted based on the difference value, the accuracy of the model parameter can be improved, and the prediction precision of the trained target detection model can be improved.

In one embodiment, in the model training process, after the sample detection frames corresponding to each sample image are obtained through manual labeling, each sample detection frame is clustered into the sample detection frames of the preset number of categories according to the corresponding height and width, and the sample detection frame obtained through manual labeling for each sample image is adjusted into the sample detection frame of the corresponding category according to the clustering result, so that when the target object detection is performed on the target image based on the adjusted sample detection frame and the target detection model obtained through training of the corresponding sample image, the output initial detection frame is any one of the preset number of categories. It is to be understood that the initial detection frame output by the target detection model for the template image may be an approximation of any one of the sample detection frames in the above-mentioned preset number of categories. The preset number may be determined according to the number of size categories, such as 6, that may exist in the target object in the detection scene.

In the above embodiment, the neural network model is constructed based on the neural network with the dynamically adjusted network structure, and the constructed neural network model is trained based on the training sample set to obtain the trained target detection model, so that the target object detection is performed on the target image based on the trained target detection model, and the detection accuracy and speed can be improved.

In one embodiment, training the constructed neural network model based on the training sample set to obtain a trained target detection model, includes: training the constructed neural network model based on a training sample set to obtain a trained initial detection model; and quantifying the model parameters of the initial detection model, and integrating the convolution layer, the batch normalization layer and the activation layer in the initial detection model to obtain the trained target detection model.

The initial detection model is obtained by performing model training on the constructed neural network model based on a training sample set, and the target detection model is obtained by performing model optimization on the trained initial detection model.

Specifically, the model training device trains the constructed neural network model based on a training sample set to obtain a trained initial detection model. The model training equipment quantizes model parameters of the trained initial detection model to obtain quantized model parameters, integrates the trained convolution layer, batch normalization layer and activation layer in the initial detection model to obtain an integration layer, and further obtains a trained corresponding target detection model according to the quantized model parameters and the integrated integration layer.

In one embodiment, the model training device quantizes the model parameters of the initial detection model, and specifically may refer to quantizing the precision of the model parameters, such as quantizing the precision of the model parameters from FP32 (32 bits floating point) to FP16 (16 bits floating point) or INT8 (8 bits integer). Thus, by quantifying the accuracy of the model parameters, the amount of data of the target detection model can be reduced.

In one embodiment, the model training device obtains an initial detection model through training of a model training platform, quantifies model parameters of the initial detection model through a model optimization platform, and integrates a convolution layer, a batch normalization layer and an activation layer of the initial detection model to optimize the initial detection model to obtain a corresponding target detection model. Specifically, after the model training device obtains a trained initial detection model through the model training platform based on the training sample set, the model training platform converts the initial detection model into a model format that can be recognized by the model optimization platform, the model optimization platform performs model optimization on the format-converted initial detection model, and the optimized initial detection model is used as a trained target detection model, wherein the model format of the target detection model is also a model format that can be supported by the model optimization platform. Model training platforms include, but are not limited to, the pitorch (an open-source Python machine learning library, Torch-based, for applications such as natural language processing) and the TensorFlow (a symbolic mathematical system based on data flow programming), and the model optimization platform includes, but is not limited to, the TensorRT (a high-performance deep learning Inference (Inference) optimizer).

FIG. 5 is a schematic diagram illustrating a principle of performing model optimization on the trained initial detection model to obtain a target detection model in one embodiment. The model training platform is a pytorch and a TensorFlow, and the model optimization platform is a TensorRT, wherein the initial detection model obtained based on the training of the pytorch is recorded as a pytorch model, and the initial detection model obtained based on the training of the TensorFlow is recorded as a TensorFlow model. As shown in FIG. 5, the process of the pytorech model optimization is: carrying out model format conversion on the pyrrch model to obtain an ONNX model, and optimizing the ONNX model through TensorRT to obtain a trt model, wherein the trt model is a target detection model with a model format of trt; the process of TensorFlow model optimization is as follows: and converting the model format of the TensorFlow model to obtain an uff model, and optimizing the uff model through TensorRT to obtain a trt model, wherein the trt model is a target detection model with the model format of trt.

In one embodiment, the target detection model obtained through optimization of the model optimization platform may be stored by storing model parameters of the target detection model and a network layer, where the network layer is an integration layer obtained by integrating a convolution layer, a batch normalization layer, and an activation layer in the initial detection model. In this way, the trained corresponding target detection model can be created by obtaining the trained model parameters and the corresponding network layer and based on the model parameters and the network layer.

In the above embodiment, the trained initial detection model can be used to quickly and accurately detect the target object from the target image and obtain the initial detection frame corresponding to the target object, but the initial detection model occupies a relatively large memory. The initial detection model is optimized to obtain a target detection model with small memory occupation, namely, a lightweight target detection model is obtained by optimization under the condition of ensuring detection speed and precision, so that the target detection model can be suitable for various types of detection equipment, such as terminals and management servers deployed in a production environment. It can be understood that the target detection model obtained through model optimization reduces the memory and performance requirements on the detection equipment running the target detection model, thereby reducing the equipment cost and improving the utilization rate of the equipment.

In one embodiment, step 202 comprises: acquiring an initial image to be detected; and carrying out image preprocessing on the initial image to obtain a target image.

Specifically, the detection device acquires an initial image to be detected through the image acquisition device according to the image acquisition mode provided in the above one or more embodiments, and performs image preprocessing on the initial image according to a preset preprocessing flow to obtain a corresponding target image, so that when the target object detection is performed on the target image through the trained target detection model, the detection accuracy of the target object can be further improved.

In one embodiment, image pre-processing includes, but is not limited to, image scaling, image filling and image normalization, and the like. The image scaling means scaling the longest edge of the initial image to a fixed length in an equal scaling manner. The image filling means that pixel points with pixel values as preset pixel values are filled on two sides of the shortest side of the initial image after the image scaling is performed, so that the length of the shortest side is filled to a fixed length, that is, the length of the longest side of the initial image after the filling is consistent with the length of the shortest side. The image normalization means that RGB channels of the initial image or the initial image after image filling are respectively normalized to 0-1, then the mean value is subtracted, and finally the square difference is divided to obtain the initial image after image normalization, wherein the detection equipment can perform image normalization on the initial image or the initial image after image filling according to the following image normalization formula.

y_i＝(x_i/255.0-mean_i)/std_i

Wherein i represents the ith channel of RGB three channels, mean_iMean, std, of the image representing the ith channel_iRepresenting the image variance, mean, of the ith channel_iAnd std_iAre calculated from sample images in a training sample set in a model training stage.

It can be understood that the detection device may determine the initial image subjected to image scaling and image filling in sequence as the target image, and may also determine the initial image subjected to image scaling, image filling and image normalization in sequence as the target image. In this way, the initial image is adjusted to the target image meeting the input characteristic requirements of the target detection model through image preprocessing, so that the target object detection can be rapidly and accurately performed on the target image through the target detection model.

In the embodiment, the target image is obtained by performing image preprocessing on the initial image to be detected, so that the accuracy of target object detection can be improved when the target object is detected on the target image.

As shown in fig. 6, in an embodiment, a target object detection method is provided, which specifically includes the following steps:

step 602, obtaining an initial image to be detected.

And step 604, performing image preprocessing on the initial image to obtain a target image.

606, performing target object detection on the target image through the trained target detection model to obtain an initial detection frame and corresponding confidence coefficient corresponding to each target object in the target image; the target detection model is obtained by pre-training a neural network model after dynamically adjusting the network structure.

And 608, selecting candidate detection frames from the initial detection frames according to the confidence.

Step 610, acquiring a limitation condition pre-configured for the incomplete object detection box.

Step 612, removing the candidate detection frames matched with the limiting conditions from the candidate detection frames to obtain the target detection frame.

And 614, sequencing the candidate detection frames according to the sequence of the sizes of the detection frames from large to small to obtain a candidate detection frame sequence.

In step 616, the candidate detection frame with the largest detection frame size in the candidate detection frame sequence is used as the reference detection frame.

At step 618, the cross-over ratio between the reference detection box and the other remaining candidate detection boxes is calculated.

And step 620, determining a target combination with the corresponding occupation ratio larger than a preset occupation ratio threshold value in each combination of the reference detection frame and other remaining candidate detection frames.

At step 622, the candidate detection frame with the smallest detection frame size in each target combination is removed from the candidate detection frame sequence.

In step 624, when the number of the remaining candidate detection frames after the culling process and excluding the reference detection frame is greater than or equal to two, the remaining candidate detection frames after the culling process and excluding the reference detection frame are used as a new candidate detection frame sequence, and the process returns to step 616 to continue the execution.

In step 626, when the number of the remaining candidate detection frames which do not include the reference detection frame is smaller than two, the falling is stopped, and the target detection frame corresponding to the target image is obtained.

In the above embodiment, the trained target detection model is obtained by training based on the neural network model after the network structure is dynamically adjusted, so that the detection speed and precision of the target detection model can be improved, and the detection speed and precision can be improved by detecting the target object in the target image through the target detection model, thereby improving the overall digital operation efficiency. After an initial detection frame and a corresponding confidence coefficient corresponding to a target image are obtained through target detection model prediction, and a candidate detection frame with a high confidence coefficient is selected from the initial detection frame based on the confidence coefficient, a redundancy screening mechanism is adopted, and a defective object detection frame and a frame-in-frame are removed from the candidate detection frame, so that a target detection frame with high precision is obtained, manual intervention and correction can be reduced under the condition of improving detection precision, and labor cost is reduced. Moreover, the target object detection mode is applicable to different production environments, and the time for installing and debugging equipment is shortened, so that the labor cost can be reduced.

It can be understood that the target object detection method provided in one or more embodiments described above can be applied to complex and diverse production environments in a detection scene, and can still ensure detection accuracy while ensuring higher detection speed for the cases of target image transmission distortion, misalignment and the like caused by background ambient light interference, image transmission noise and the like, and target objects stacked and placed randomly on a sorting line, thereby achieving balance between detection speed and detection accuracy.

FIG. 7 is a schematic diagram of target object detection in one embodiment. As shown in fig. 7, the complete target object detection process includes three stages, i.e., model training, model optimization, and target object detection. The model training in the first stage mainly comprises three steps of data set making, network model building and network training, wherein the data set making mainly refers to obtaining a training sample set, the network model building refers to building a neural network model after dynamically adjusting a network structure, and the network training refers to training the built neural network model based on the training sample set to obtain a trained initial detection model. And performing model optimization on the initial detection model obtained in the first stage in the second stage to obtain a trained target detection model. It can be understood that, after the first stage and the second stage are executed before the third stage, and the trained target detection model is obtained through training, the first stage and the second stage do not need to be executed dynamically in the application stage of the target detection model, and of course, the target detection model can be further optimized in the detection precision in the application stage according to the actual situation.

Further, the target object detection corresponding to the stage three is an application stage of the target detection model, and the process of the stage three mainly includes: acquiring an image, preprocessing the acquired image, performing model reasoning on the preprocessed image through a trained target detection model to obtain a corresponding initial detection frame, and screening the initial detection frame through a detection frame screening mechanism to obtain a final detection result.

In an embodiment, the model training and model optimization operations provided in one or more of the above embodiments may be specifically implemented by a model training device through a GPU (Graphics Processing Unit). Accordingly, in the target object detection process, the target detection model may also be run by the detection device through the GPU.

It should be understood that although the steps in the flowcharts of fig. 2 and 6 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 and 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternatively with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 8, there is provided a target object detection apparatus 800 comprising: an obtaining module 801, a detecting module 802, a selecting module 803 and a screening module 804, wherein:

an obtaining module 801, configured to obtain a target image;

the detection module 802 is configured to perform target object detection on the target image through the trained target detection model to obtain an initial detection frame and a corresponding confidence level corresponding to each target object in the target image; the target detection model is obtained by pre-training a neural network model after dynamically adjusting the network structure;

a selecting module 803, configured to select a candidate detection frame from the initial detection frames according to the confidence;

and the screening module 804 is configured to screen the target detection frames meeting the requirements of the detection scene from the candidate detection frames.

In one embodiment, the screening module 804 is further configured to obtain a constraint condition preconfigured for the incomplete object detection box; and removing the candidate detection frames matched with the limiting conditions from the candidate detection frames to obtain the target detection frame.

In an embodiment, the screening module 804 is further configured to sort the candidate detection frames according to a sequence of sizes of the detection frames from large to small to obtain a candidate detection frame sequence; taking a candidate detection frame with the largest detection frame size in the candidate detection frame sequence as a reference detection frame; removing candidate detection frames with the occupation ratio greater than a preset occupation ratio threshold value with the reference detection frame from the candidate detection frame sequence; and taking the candidate detection frames which are remained after the elimination processing and do not comprise the reference detection frame as a new candidate detection frame sequence, returning to the step that the candidate detection frame with the largest detection frame size in the candidate detection frame sequence is taken as the reference detection frame for continuous execution, and stopping iteration until the candidate detection frames which are remained after the elimination processing and do not comprise the reference detection frame are less than two to obtain the target detection frame.

In one embodiment, the filtering module 804 is further configured to calculate an occupation ratio between the reference frame and the remaining candidate frames; determining a target combination with the corresponding occupation ratio larger than a preset occupation ratio threshold value in each combination of the reference detection frame and other remaining candidate detection frames; and removing the candidate detection frame with the smallest detection frame size in each target combination from the candidate detection frame sequence.

In an embodiment, the target object detection apparatus 800 further includes: a model training module;

the model training module is used for acquiring a training sample set; constructing a neural network model after dynamically adjusting the network structure; and training the constructed neural network model based on the training sample set to obtain a trained target detection model.

In one embodiment, the model training module is further configured to train the constructed neural network model based on a training sample set to obtain a trained initial detection model; and quantifying the model parameters of the initial detection model, and integrating the convolution layer, the batch normalization layer and the activation layer in the initial detection model to obtain the trained target detection model.

In one embodiment, the obtaining module 801 is further configured to obtain an initial image to be detected; and carrying out image preprocessing on the initial image to obtain a target image.

For specific limitations of the target object detection apparatus, reference may be made to the above limitations of the target object detection method, which are not described herein again. The modules in the target object detection device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal as a detection device, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a target object detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory in which a computer program is stored and a processor, which when executing the computer program performs the steps of the above-described method embodiments.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. a target object detection method, is characterized in that, described method comprises:

Get the target image;

Perform target object detection on the target image through the trained target detection model, and obtain the initial detection frame and corresponding confidence level corresponding to each target object in the target image; the target detection model is based on the dynamic adjustment of the network structure. The neural network model is pre-trained;

Select a candidate detection frame from the initial detection frame according to the confidence;

The target detection frame that meets the requirements of the detection scene is screened from the candidate detection frame.

2. The method according to claim 1, wherein the screening of the target detection frame that meets the requirements of the detection scene from the candidate detection frame comprises:

Get the pre-configured qualifications for the incomplete object detection frame;

Eliminate the candidate detection frames that match the limited conditions from the candidate detection frames to obtain target detection frames.

3. The method according to claim 1, wherein the screening of the target detection frame that meets the requirements of the detection scene from the candidate detection frame comprises:

Sort each candidate detection frame according to the size of the detection frame from large to small to obtain a sequence of candidate detection frames;

Using the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as the reference detection frame;

Eliminate candidate detection frames whose intersection ratio with the reference detection frame is greater than a preset intersection ratio threshold from the candidate detection frame sequence;

Taking the candidate detection frame remaining after the culling process and excluding the reference detection frame as a new candidate detection frame sequence, and

Return to the step of using the candidate detection frame with the largest detection frame size in the candidate detection frame sequence as the reference detection frame and continue to execute until the remaining candidate detection frame after the elimination process and excluding the reference detection frame is smaller than When two, stop the iteration and get the target detection frame.

4 . The method according to claim 3 , wherein the removing candidate detection frames with an intersection ratio with the reference detection frame greater than a preset intersection ratio threshold from the candidate detection frame sequence comprises: 4 .

Calculate the intersection ratio between the reference detection frame and other remaining candidate detection frames;

In each combination of the reference detection frame and other remaining candidate detection frames, determine a target combination whose corresponding intersection ratio is greater than a preset intersection ratio threshold;

The candidate detection frame with the smallest detection frame size in each target combination is eliminated from the candidate detection frame sequence.

5. The method according to claim 1, wherein the training step of the target detection model comprises:

Get the training sample set;

Construct a neural network model after dynamically adjusting the network structure;

The constructed neural network model is trained based on the training sample set to obtain a trained target detection model.

6. The method according to claim 5, wherein the constructed neural network model is trained based on the training sample set to obtain a trained target detection model, comprising:

The constructed neural network model is trained based on the training sample set to obtain a trained initial detection model;

The model parameters of the initial detection model are quantified, and the convolution layer, the batch normalization layer and the activation layer in the initial detection model are integrated to obtain a trained target detection model.

7. The method according to any one of claims 1 to 6, wherein the acquiring a target image comprises:

Obtain the initial image to be detected;

Perform image preprocessing on the initial image to obtain a target image.

8. A target object detection device, wherein the device comprises:

The acquisition module is used to acquire the target image;

The detection module is used to perform target object detection on the target image through the trained target detection model, and obtain the initial detection frame and corresponding confidence level corresponding to each target object in the target image; the target detection model is based on The neural network model after dynamically adjusting the network structure is pre-trained;

a selection module, configured to select candidate detection frames from the initial detection frame according to the confidence;

The screening module is used for screening target detection frames that meet the requirements of the detection scene from the candidate detection frames.

9. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the method according to any one of claims 1 to 7 when the processor executes the computer program. step.

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.