CN112766375A

CN112766375A - Target object detection method and device and computer-readable storage medium

Info

Publication number: CN112766375A
Application number: CN202110073394.3A
Authority: CN
Inventors: 葛锦洲
Original assignee: Nanjing Huichuan Image Vision Technology Co ltd
Current assignee: Nanjing Huichuan Image Vision Technology Co ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2021-05-07

Abstract

The invention discloses a method and a device for detecting a target object and a computer readable storage medium, wherein the method for detecting the target object comprises the following steps: acquiring a target image; acquiring the area of a target object in the target image, and determining the training weight of the target object according to the area; training a preset neural network model according to the training weight and the target image to obtain a target neural network model; inputting the target image into the target neural network model to determine a location and a category of the target object. The method and the device can solve the problem that the small target is easy to lose in the detection process of the large target and the small target coexisting scene.

Description

Target object detection method and device and computer-readable storage medium

Technical Field

The present invention relates to the field of machine vision technologies, and in particular, to a method and an apparatus for detecting a target object, and a computer-readable storage medium.

Background

Currently, the development benefits from continuous perfection of supporting infrastructure, continuous enlargement of the overall scale of the manufacturing industry, continuous improvement of the intelligentization level and continuous enlargement of the trial production scale in the electronic and semiconductor fields. Machine vision is used in many applications in electronic products and semiconductors. The consumer electronics industry components and parts size is less, and the detection requires highly, is fit for using machine vision system to detect.

With the coming of the new era of industrial intelligent manufacturing, the target positioning in the industrial scene, the performance of the target visual detection of materials and parts are increasingly important for improving the overall level of the intelligent manufacturing. The existing target detection method adopts the idea of reasoning based on a lightweight Convolutional Neural Network (CNN) to position and regress the position, score and category of each target by improving the idea of absorbing a regression box in the current mainstream target detection algorithm. However, this is an end-to-end training mode, and cannot perform a proper screening on a potential target, while a lightweight network structure is poor in generalization capability of more pursuing real-time operation detection efficiency, so that a small target is easily lost in a detection process of a large target and a small target coexisting scene.

Disclosure of Invention

The invention mainly aims to provide a method and a device for detecting a target object and a computer readable storage medium, which solve the problem that small targets are easy to lose in the detection process of a large target and small target coexistence scene.

In order to achieve the above object, the present invention provides a method, an apparatus and a computer readable storage medium for detecting a target object, wherein the method comprises:

acquiring a target image;

acquiring the area of a target object in the target image, and determining the training weight of the target object according to the area;

training a preset neural network model according to the training weight and the target image to obtain a target neural network model;

inputting the target image into the target neural network model to determine a location and a category of the target object.

Optionally, the step of inputting the target image into the target neural network model to determine the position and the category of the target object comprises:

inputting the target image into the target neural network model to generate first prediction box coordinates and scores for the target object;

and determining the position and the category of the target object according to the first prediction frame coordinate and the score.

Optionally, the step of determining the position and the category of the target object according to the first prediction box coordinate and the score includes:

processing the first prediction box coordinates by adopting a recursive regression strategy algorithm to generate second prediction box coordinates of the target object;

determining the position of the target object according to the second prediction frame coordinate;

and determining the category of the target object according to the score.

Optionally, the step of training a preset neural network model according to the training weights and the target image to obtain a target neural network model includes:

training a preset neural network model according to the training weight and the target image, and calculating a loss function of the preset neural network model;

updating training parameters of the preset neural network model according to the loss function, wherein the training parameters comprise learning rate;

and when the loss function is converged, determining the converged preset neural network model as a target neural network model.

Optionally, the step of acquiring the target image includes:

acquiring an acquired original image;

and determining the target image according to the original image.

Optionally, the step of determining the target image according to the original image includes:

acquiring the ratio of the area of a target object in an original image to the area of the original image;

when the ratio is larger than a preset value, determining the original image as the target image;

and when the ratio is smaller than or equal to a preset value, selecting a target area in the original image, and determining an image corresponding to the target area as a target image.

acquiring the resolution of an original image;

when the resolution is smaller than a preset resolution, determining the original image as the target image;

and when the resolution is greater than or equal to a preset resolution, selecting a target area in the original image, and determining an image corresponding to the target area as a target image.

Optionally, the obtaining an area of a target object in the target image, and the determining a training weight of the target object according to the area includes:

performing data enhancement processing on the target image;

and acquiring the area of a target object in the target image after data enhancement processing, and determining the training weight of the target object according to the area.

In addition, in order to achieve the above object, the present invention further provides a detection apparatus, which includes a memory, a processor, and a target object detection program stored in the memory and executable on the processor, wherein the target object detection program implements any one of the steps of the target object detection method when executed by the processor.

Further, to achieve the above object, the present invention provides a computer-readable storage medium having stored thereon a target object detection program, which when executed by a processor, implements the steps of the target object detection method according to any one of the above.

The invention provides a method and a device for detecting a target object and a computer readable storage medium, wherein a target image is obtained; the method comprises the steps of obtaining the area of a target object in a target image, determining the training weight of the target object according to the area, training a preset neural network model according to the training weight and the target image to obtain a target neural network model, and inputting the target image into the target neural network model to determine the position and the type of the target object. According to the scheme, the training weight of the target object is updated in a self-adaptive mode according to the area distribution characteristics of the target object, interaction of a multi-scale feature layer is improved, the extraction precision of the target neural network model obtained through training on the target feature is improved, especially the extraction precision of the small target is improved, the small target is prevented from being lost in the detection process, the detection capability of the large and small target coexistence scene is improved, and the problem that the small target is easy to be lost in the detection process of the large and small target coexistence scene is solved.

Drawings

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a hardware architecture of a detection apparatus according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for detecting a target object according to the present invention;

FIG. 3 is a flowchart illustrating a second method for detecting a target object according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a third embodiment of a method for detecting a target object according to the present invention;

fig. 5 is a flowchart illustrating a fourth method for detecting a target object according to the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: the method comprises the steps of obtaining the area of a target object in a target image, determining the training weight of the target object according to the area, training a preset neural network model according to the training weight and the target image to obtain a target neural network model, and inputting the target image into the target neural network model to determine the position and the type of the target object. According to the scheme, the training weight of the target object is updated in a self-adaptive mode according to the area distribution characteristics of the target object, interaction of a multi-scale feature layer is improved, the extraction precision of the trained target neural network model on the target feature is improved, especially the extraction precision of the small target, the small target is prevented from being lost in the detection process, and the problem that the small target is easy to lose in the detection process of the large target and the small target coexisting scene is solved.

For a better understanding of the above technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

For better understanding of the above technical solutions, the following detailed descriptions will be provided in conjunction with the drawings and the detailed description of the embodiments.

As shown in fig. 1, fig. 1 is a schematic diagram of a hardware architecture of a detection apparatus according to an embodiment of the present invention.

As shown in fig. 1, the detecting means may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The network interface 1004 may optionally include a standard wired interface, a wireless interface (such as a non-volatile memory), such as a disk memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the detection device shown in FIG. 1 does not constitute a limitation of the detection device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system and a detection program of a target object.

In the detection apparatus shown in fig. 1, the processor 1001 may be configured to call a detection program of a target object stored in the memory 1005, and perform the following operations:

acquiring a target image;

Further, the processor 1001 may call the detection program of the target object stored in the memory 1005, and also perform the following operations:

and determining the category of the target object according to the score.

acquiring an acquired original image;

and determining the target image according to the original image.

performing data enhancement processing on the target image;

Referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of a method for detecting a target object according to the present invention, where the method for detecting a target object includes the following steps:

step S10, acquiring a target image;

in this embodiment, the executing body is a detecting device, the detecting device is used for positioning and identifying a target object in an image, the detecting device may be a terminal device, for example, a computer, a mobile phone, a portable computer, or the like, and the detecting device may also be a server or a control device that controls these terminal devices.

In this embodiment, a neural network model for detecting a target object is stored in the detection device, the detection device preprocesses the acquired image, inputs the processed image data into the neural network model, trains the neural network model with the input image data to obtain the trained neural network model, and inputs the acquired image into the trained neural network model to predict the position and the type of the target object in the image.

In this embodiment, the detection device obtains a target image, where the target image is an image obtained by preprocessing a captured image, the target image includes a target object, the target object is an object to be detected and identified by a neural network model, the number of the target objects in the target image may be one or more, and the target object may be a large target object or a small target object, and it should be noted that the large target object and the small target object in this embodiment are relative sizes of an area occupied by the target object in the target image and an area of the target image, that is, if the area occupied by the target object in the target image is larger than the area of the target image, the target object may be regarded as a large target object; if the area of the target object in the target image is small relative to the area of the target image, the target object may be considered a small target object.

Specifically, an image acquired by the measuring device is preprocessed, and the preprocessed image is determined as a target image.

Step S20, acquiring the area of a target object in the target image, and determining the training weight of the target object according to the area;

in this embodiment, after acquiring a target image, a detection device acquires an area of a target object in the target image, and determines a training weight of the target object according to the area of the target object, where the area of the target object is an area of an area occupied by the target object in the target image, and the area of the target object is used for measuring the size of the target object, and generally speaking, the larger the area of the target object is, the larger the target object is; the smaller the area of the target object, the smaller the target object. The training weight is the weight of training of a target object in the model training process, and is used for measuring the degree of sufficiency of model training, generally speaking, the larger the training weight of the target object is, the more the neural network model trains the target object, the higher the recognition accuracy of the trained model to the target object is, and the higher the recognition efficiency is; the smaller the training weight of the target object is, the less the neural network model can train the target object, the lower the recognition accuracy of the trained model on the target object is, and the lower the training efficiency is. In this embodiment, different training weights are given to target objects with different areas, so as to improve the detection capability of the trained model on the target objects with different sizes, especially the detection capability of the model on the small target objects, thereby improving the detection capability of the model on the large and small target coexistence scene.

Specifically, after the target image is acquired, the area of each target object in the target image is calculated, the training weight of each target object is determined according to the area of the target object, and the detection device can automatically calculate the area of the area occupied by the target object, wherein generally speaking, the training weight is smaller for a large target object, namely a target object with a larger area; for small target objects, i.e., target objects with smaller areas, the training weights are larger. It should be noted that the specific value of the training weight of the target object can be determined according to actual needs. For example, the detection accuracy is required, and the determination is not limited herein.

Step S30, training a preset neural network model according to the training weight and the target image to obtain a target neural network model;

in this embodiment, after the training weight of the target object in the target image is determined, the preset neural network model is trained according to the training weight and the target image, so as to obtain the target neural network model, where the preset neural network model is a neural network model that is pre-stored in the detection device and is not yet trained, the preset neural network model is not yet trained and cannot be used for detection and identification of the target object, the target neural network model is a trained preset neural network model, and the target neural network model can be used for detection and identification of the target object.

Specifically, after the training weight of a target object in a target image is determined, the target image is input into a preset neural network model, the preset neural network model is trained based on the training weight of the target object, a loss function of the preset neural network model is calculated in real time or at regular time in the training process according to the distribution characteristics of different periods, training parameters are adaptively changed according to the convergence condition of the loss function until the loss function converges, and the corresponding preset neural network model when the loss function converges is determined as the target neural network model, wherein the training parameters at least comprise a learning rate.

Step S40, inputting the target image into the target neural network model to determine the position and the category of the target object.

In this embodiment, after the target neural network model is obtained through training, the target image is input into the target neural network model, the target image is detected and identified by using the target neural network model, and the position and the category of the target object in the target image are determined, where the position of the target object is the position of the target object in the target image, the category of the target object is the category of the target object, the position of the target object is represented by a prediction frame, and the prediction frame corresponds to corresponding prediction frame coordinates.

In the technical scheme provided by this embodiment, the area of the target object in the target image is obtained, the training weight of the target object is determined according to the area, the preset neural network model is trained according to the training weight and the target image to obtain the target neural network model, and the target image is input into the target neural network model to determine the position and the category of the target object. According to the scheme, the training weight of the target object is updated in a self-adaptive mode according to the area distribution characteristics of the target object, interaction of a multi-scale feature layer is improved, the extraction precision of the trained target neural network model on the target feature is improved, especially the extraction precision of the small target, the small target is prevented from being lost in the detection process, and the problem that the small target is easy to lose in the detection process of the large target and the small target coexisting scene is solved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the method for detecting a target object according to the present invention, where based on the first embodiment, the step of S40 includes:

step S41, inputting the target image into the target neural network model to generate a first prediction box coordinate and a score of the target object;

in this embodiment, after obtaining the target neural network model, inputting the target image into the target neural network model, and generating first prediction frame coordinates and a score of the target object, where the first prediction frame coordinates are coordinates corresponding to a prediction frame displaying an approximate position of the target object, the first prediction frame coordinates are used for representing the approximate position of the target object in the target image, the score is a basis for determining a category of the target object, and the category of the target object is determined according to the level of the score, and it should be noted that a score mechanism of the target neural network model is obtained by comprehensively designing position information and category information of the target object.

Specifically, the target image is input into a target neural network model for detection and identification, prediction frame information of the target object is generated, the score of the target object is predicted based on a score mechanism, detection data is output, and the output detection data is encoded to obtain the first prediction frame coordinate and the score of the target object.

And step S42, determining the position and the category of the target object according to the first prediction frame coordinate and the score.

In this embodiment, after the first prediction frame coordinate and the score of the target object are obtained, the position and the category of the target object are determined according to the first prediction frame coordinate and the score.

In this embodiment, a recursive regression strategy algorithm is used to process the first prediction frame coordinates, generate second prediction frame coordinates of the target object, and determine the position of the target object according to the second prediction frame coordinates, where the second prediction frame coordinates are coordinates corresponding to a prediction frame displaying the accurate position of the target object. The recursive regression strategy algorithm can significantly improve the regression accuracy of the prediction box, so that the determination of the position of the target object is more accurate, and it can be understood that the coordinates of the second prediction box are more accurate than the coordinates of the first prediction box in predicting the position of the target object.

In the present embodiment, the category of the target object is determined by the score, and in general, the category corresponding to the highest score is determined as the category of the target object. It should be noted that, for a target object, the target neural network model predicts its scores in various categories, and the scores reflect the similarity between the target object and the categories, and generally speaking, the higher the score of the target object in a certain category is, the more likely the target object belongs to the category; the lower the score of the target object on a certain category is, the less likely the target object belongs to the category, and the category corresponding to the highest score is determined as the category of the target object in the present embodiment.

In the technical scheme provided by this embodiment, a target image is input into a target neural network model to generate a first prediction frame coordinate and a score of a target object, the first prediction frame coordinate is processed by using a recursive regression strategy algorithm to generate a second prediction frame coordinate of the target object, the position of the target object is determined according to the second prediction frame coordinate, and the category of the target object is determined according to the score. In the embodiment, based on the first prediction frame coordinate, a recursive regression strategy algorithm is adopted, so that the regression precision of the target object prediction frame is remarkably improved, and the detection precision of the target object position in the target image is improved.

Referring to fig. 4, fig. 4 is a flowchart illustrating a third embodiment of the method for detecting a target object according to the present invention, where based on the first embodiment, the step of S10 includes:

step S11, acquiring the acquired original image;

in this embodiment, an acquired original image is acquired, wherein the original image is an image that has not been subjected to preprocessing.

Alternatively, an original image containing the target object is acquired by the image acquisition device, and then the acquired original image is input to the detection device, so that the detection device acquires the acquired original image.

Optionally, an original image containing the target object is acquired by an image acquisition function of the detection device, so as to obtain the acquired original image.

Step S12, determining the target image according to the original image.

In the present embodiment, a target image is determined from an acquired original image.

Optionally, obtaining a ratio of an area of a target object in the original image to an area of the original image, judging a relationship between the ratio and a preset value, and determining the original image as the target image when the ratio is greater than the preset value; and when the ratio is smaller than or equal to a preset value, selecting a target area in the original image, and determining an image corresponding to the target area as a target image. The ratio reflects the size of the target object relative to the original image, the larger the ratio is, the larger the target object relative to the original image is, the smaller the ratio is, and the smaller the target object relative to the original image is. The preset value is a critical value for judging whether to select a sample in a frame, when the ratio is greater than the preset value, the size of the target object relative to the original image is proper, sampling is not needed, the whole original image can be directly used as the target image, when the ratio is less than or equal to the preset value, the target object is too small relative to the original image, sampling is needed to be performed on the original image, a target area is selected in the original image in a frame, an image corresponding to the target area is determined to be the target image, the preset value can be selected to be 1/8, of course, in other embodiments, the preset value can be determined according to actual needs, and is not limited herein.

In this embodiment, a target area is framed in an original image, where the target area is an area in which a user is interested, that is, an area that the user wants to detect and identify, and the target area contains a target object, and the target area can be determined by the user, and when a ratio is smaller than or equal to a preset value, the user manually frames or the detection device automatically frames the target area by obtaining coordinate position information of the target area, and determines an image corresponding to the target area as a target image, and when the ratio is smaller than or equal to the preset value, the target area is framed, so that the relative size of the target object is increased, training efficiency and training accuracy are increased, and further, detection accuracy of the model is increased.

Optionally, the resolution of the original image is obtained, when the resolution is smaller than a preset resolution, the original image is determined as a target image, when the resolution is greater than or equal to the preset resolution, a target area is framed in the original image, and an image corresponding to the target area is determined as the target image. The image training method comprises the steps that in the training process of images, if the resolution of the images is too high, the problem of insufficient video memory often occurs, model training efficiency and accuracy are affected, the critical condition that whether the video memory is insufficient can occur in the image training process is judged through preset resolution, when the resolution of the images is larger than or equal to the preset resolution, the problem of insufficient video memory can occur in the training process of the images, and when the resolution of the images is smaller than the preset resolution, the problem of insufficient video memory cannot occur in the training process of the images. When the resolution of the image is smaller than the preset resolution, the original image is determined as the target image, when the resolution of the image is greater than or equal to the preset resolution, the target area is framed in the original image, the image corresponding to the target area is determined as the target image, so that the resolution of the image is reduced, the problem of insufficient video memory in the training process is avoided, the model training efficiency and precision are improved, the specific process of framing the target area can refer to the above contents, and the details are not repeated.

In the technical scheme provided by this embodiment, the acquired original image is acquired, the target image is determined according to the acquired original image, and the target image meets the requirement of model training by analyzing the distribution characteristics and resolution conditions of the picture size, so that the precision of model training and the efficiency of model training are effectively ensured.

Referring to fig. 5, fig. 5 is a flowchart illustrating a fourth embodiment of the method for detecting a target object according to the present invention, where based on the first embodiment, the step of S20 includes:

step S21, performing data enhancement processing on the target image;

in this embodiment, after a target image is acquired, data enhancement processing is performed on the target image to prevent a neural network model from learning too many irrelevant features to cause overfitting of the model, so as to enhance generalization capability of the trained neural network model, wherein the data enhancement mainly includes: and (3) rotating the target image, and carrying out brightness and contrast conversion, random cropping, mirror image conversion, distortion and the like.

Step S22, acquiring the area of the target object in the target image after the data enhancement processing, and determining the training weight of the target object according to the area.

In this embodiment, the area of the target object in the target image after the data enhancement processing is acquired, and the training weight of the target object is determined according to the area.

In the technical scheme provided by this embodiment, the generalization capability of the trained neural network model can be effectively improved by performing data enhancement processing on the target image, and the detection capability of the neural network model on the target object is improved.

Based on the foregoing embodiments, the present invention further provides a detection apparatus, where the detection apparatus may include a memory, a processor, and a target object detection program stored in the memory and executable on the processor, and when the processor executes the target object detection program, the steps of the target object detection method according to any one of the foregoing embodiments are implemented.

Based on the above embodiments, the present invention further provides a computer-readable storage medium, on which a target object detection program is stored, and the target object detection program realizes the steps of the target object detection method according to any one of the above embodiments when executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for detecting a target object, the method comprising:

acquiring a target image;

2. The method for detecting a target object according to claim 1, wherein the step of inputting the target image into the target neural network model to determine the position and the class of the target object comprises:

3. The method for detecting a target object according to claim 2, wherein the step of determining the position and the category of the target object based on the first prediction box coordinates and the score comprises:

and determining the category of the target object according to the score.

4. The method for detecting a target object according to claim 1, wherein the step of training a preset neural network model according to the training weights and the target image to obtain the target neural network model comprises:

5. The method for detecting a target object according to claim 1, wherein the step of acquiring a target image includes:

acquiring an acquired original image;

and determining the target image according to the original image.

6. The method for detecting a target object according to claim 5, wherein the step of determining the target image from the original image comprises:

7. The method for detecting a target object according to claim 5, wherein the step of determining the target image from the original image comprises:

acquiring the resolution of an original image;

8. The method for detecting a target object according to claim 1, wherein the step of obtaining an area of a target object in the target image and determining the training weight of the target object according to the area comprises:

performing data enhancement processing on the target image;

9. A detection apparatus, characterized in that the detection apparatus comprises a memory, a processor and a detection program of a target object stored on the memory and operable on the processor, and the detection program of the target object realizes the steps of the detection method of the target object according to any one of claims 1 to 8 when executed by the processor.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a target object detection program, which when executed by a processor implements the steps of the target object detection method according to any one of claims 1 to 8.