Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The target detection method and the target detection device provided by the embodiment of the invention aim to realize the detection of the target to be detected, and can be applied to various target detection scenes, such as defect detection scenes in industrial production. Because the target detected by the deep learning target detection algorithm may contain a false target, that is, the target is not necessarily a true target to be detected, the embodiment of the present invention implements the distinguishing of the true and false targets through the target detection method and apparatus.
As shown in fig. 1, the target detection method according to the embodiment of the present invention includes the following steps:
s1, detecting the sample image through a deep learning target detection algorithm to obtain at least one first detection frame and at least one second detection frame, wherein the first image in the first detection frame is a true target, and the second image in the second detection frame is a false target.
In an embodiment of the present invention, the deep learning target detection algorithm may be an RCNN algorithm, a YOLO algorithm, and the like, and may detect a set target based on a deep learning principle and generate a corresponding detection frame.
In general, similar objects may be included in an image, and the deep learning object detection algorithm classifies the similar objects into one class, that is, the image may include objects that the deep learning object detection algorithm cannot successfully distinguish. For example, as shown in the defect detection result shown in fig. 1, the left detection frame is a product collision defect, i.e., a true target, and the right detection frame is a broken filament, i.e., a false target, attached to the surface of the product.
The embodiment of the invention performs target detection on a sample image containing at least one real target and at least one false target to obtain a detection frame (namely a first detection frame) of at least one real target and a detection frame (namely a second detection frame) of at least one false target, and then extracts images in the first detection frame and the second detection frame independently.
S2, respectively obtaining N characteristic variables of each first image and each second image, and selecting N selected characteristic variables from the N characteristic variables, wherein the difference value between the selected characteristic variables of the first image and the selected characteristic variables of the second image is larger than a corresponding preset difference value, N is larger than or equal to 2 and smaller than or equal to N, and N and N are positive integers.
It should be understood that, besides some features utilized by the deep learning object detection algorithm, the image also includes color features, shape features and other texture features, such as a gray level co-occurrence matrix, etc., so that the embodiments of the present invention may further distinguish objects obtained by the deep learning object detection algorithm by using the color features, the shape features and other texture features of the image.
Specifically, the N feature variables obtained in the embodiment of the present invention include a plurality of parameters, including at least one histogram distribution parameter, at least one texture characteristic parameter, at least one global threshold parameter, and at least one profile information parameter. For the N characteristic variables, some characteristic variables of the false target are the same as or similar to the true target, so that the characteristic variables with larger differences need to be selected to distinguish the true target from the false target.
For example, for the two targets in fig. 2, the histogram distribution, the global threshold, the texture characteristic gray level co-occurrence matrix, the contour area length, the moment feature, and the like can be obtained respectively.
The histograms of the two objects in fig. 2 are shown in fig. 3, and it can be seen that the color distributions of the two objects are very similar and difficult to distinguish from each other.
The global thresholds of the two targets in fig. 2 are shown in table 1, in the embodiment of the present invention, each selected feature variable is correspondingly provided with a preset difference value to quantify that "the difference is large", for example, for the one-dimensional maximum entropy, if the difference between the one-dimensional maximum entropies of the two images is greater than 20 (the difference between the one-dimensional maximum entropies of the bump and the broken filament in table 1 is 30), the one-dimensional maximum entropy may be considered to be large, and the one-dimensional maximum entropy is used as the selected feature variable to be used as the input variable of the subsequent multiple linear regression.
TABLE 1
It should be noted that, taking one first detection frame and one second detection frame as an example in fig. 2, fig. 3 and table 1, for the case that there are multiple first detection frames and multiple second detection frames, if there is a difference value between an image in any first detection frame and an image in any second detection frame, which is greater than a corresponding preset difference value, a feature variable may be a selected feature variable.
The selected feature variables are different for different kinds of targets and different image acquisition scenes, for example, in one embodiment of the present invention, the selected feature variables are a pixel maximum value, a pixel minimum value, a one-dimensional maximum entropy, and a contour area.
And S3, performing multiple linear regression on the n selected characteristic variables to obtain a linear regression function.
The multiple linear regression adopts the formula h θ (x (i) )=θ 0 +θ 1 x 1 (i) +θ 2 x 2 (i) +…+θ n x n (i) Wherein x is 1 (i) 、x 2 (i) ,、…、x n (i) Selecting feature vectors for n kinds; theta.theta. 1 、θ 2 、…、θ n For n selected eigenvectors, θ 0 Is a bias term; h is a total of θ (x (i) ) In the function fitting process, the true probability corresponding to the first image, namely the true target, is 1, and the true probability corresponding to the second image, namely the false target, is 0; i.e. iThe ordinal numbers of the first image and the second image are shown, and the total number of the first image and the second image, namely the total number of the detected true and false targets in the sample image, is m, wherein m is a positive integer greater than or equal to 2.
The loss function defined in the multivariate linear regression of the embodiment of the invention adopts the mean square error loss:
wherein the content of the first and second substances,
is a fitting probability value;
is the true probability value.
Further, the embodiment of the invention enables the fitting probability value to approach the true probability value through a gradient descent method. The algorithm process of the gradient descent method is as follows:
}
where j represents the ordinal number of the selected feature vector and α is the learning rate.
Calculating the corresponding weight theta of each selected feature vector through gradient descent 1 、θ 2 、…、θ n And the bias term theta 0 Then a linear regression function with the input of n selected characteristic variables and the output of the probability that the image is the true target can be obtained.
In addition, for convenience of calculation, before performing multiple linear regression on the n selected feature variables, normalization processing can be performed on the n selected feature variables, for example, the maximum value of a pixel and the minimum value of a pixel can be divided by 225, so as to normalize each selected feature variable to 0-1. It should be understood that if n selected feature variables are normalized here, then in the subsequent step S6 the n selected feature variables are also normalized before being input into the linear regression function.
And S4, detecting the image to be detected through a deep learning target detection algorithm to obtain a plurality of third detection frames.
This step is the same as the deep learning target detection algorithm used in step S1, and the image to be detected and the sample image are images obtained in the same scene, for example, an image of a certain product obtained by using the same shooting parameters by the same camera. And performing target detection on the image to be detected to obtain at least one detection frame to be detected, namely a third detection frame, and then independently extracting the image in the third detection frame.
And S5, acquiring n selected characteristic variables of the image in each third detection frame.
The image in the third detection frame may be a true target or a false target, and thus may be distinguished. The first step of the differentiation process is to obtain n selected feature variables, i.e. the feature variables obtained here are the same as the feature variables selected in step S2 described above. For example, if the feature variables selected in step S2 are the pixel maximum value, the pixel minimum value, the one-dimensional maximum entropy, and the contour area, the pixel maximum value, the pixel minimum value, the one-dimensional maximum entropy, and the contour area of the image in the third detection frame are also acquired here.
And S6, inputting the n selected characteristic variables of the images in the third detection frames into a linear regression function, and judging whether the image in each third detection frame is a true target or not according to the output result.
The second step of the differentiation process is to input the n selected feature variables obtained in step S5 into the linear regression function obtained in step S3, and output a probability value. In the embodiment of the present invention, a probability value comparison threshold may be set, and if the output probability value is greater than the probability value comparison threshold, it may be determined that the image in the third detection box is a true target, otherwise it is determined that the image in the third detection box is a false target. In one embodiment of the present invention, the probability value comparison threshold may be 0.5.
According to the target detection method provided by the embodiment of the invention, firstly, the true and false targets are detected through the target detection algorithm, and then the true and false targets are distinguished through the multiple linear regression function based on multiple characteristic variables of the image, so that the effective distinguishing of the similar targets can be realized, and the accuracy of target detection is greatly improved.
Corresponding to the target detection method of the above embodiment, the present invention further provides a target detection apparatus.
As shown in fig. 4, the object detection apparatus according to the embodiment of the present invention includes: the system comprises a first detection module 10, a first acquisition module 20, a regression module 30, a second detection module 40, a second acquisition module 50 and a judgment module 60. The first detection module 10 is configured to detect a sample image through a deep learning target detection algorithm to obtain at least one first detection frame and at least one second detection frame, where a first image in the first detection frame is a true target and a second image in the second detection frame is a false target; the first obtaining module 20 is configured to obtain N feature variables of each first image and each second image, respectively, and select N selected feature variables from the N feature variables, where a difference between the selected feature variables of the first image and the selected feature variables of the second image is greater than a corresponding preset difference value, N is greater than or equal to 2 and less than or equal to N, and N are positive integers; the regression module 30 is configured to perform multiple linear regression on the n selected feature variables to obtain a linear regression function; the second detection module 40 is configured to detect an image to be detected through a deep learning target detection algorithm to obtain a plurality of third detection frames; the second obtaining module 50 is configured to obtain n selected feature variables of the image in each third detection frame; the judging module 60 is configured to input the n selected feature variables of the images in the third detection frames into a linear regression function, and determine whether the image in each third detection frame is a true target according to the output result.
In an embodiment of the present invention, the deep learning target detection algorithm may be an RCNN algorithm, a YOLO algorithm, and the like, and may detect a set target based on a deep learning principle and generate a corresponding detection frame.
In general, similar objects may be included in an image, and the deep learning object detection algorithm classifies the similar objects into one class, that is, the image may include objects that the deep learning object detection algorithm cannot successfully distinguish. For example, as shown in the defect detection result shown in fig. 1, the left detection frame is a product collision defect, i.e., a true target, and the right detection frame is a hair attached to the surface of the product, i.e., a false target.
In the embodiment of the present invention, a first detection module 10 performs target detection on a sample image including at least one true target and at least one false target to obtain a detection frame (i.e., a first detection frame) of the at least one true target and a detection frame (i.e., a second detection frame) of the at least one false target, and then extracts images in the first detection frame and the second detection frame separately.
It should be understood that, besides some features utilized by the deep learning object detection algorithm, the image also includes color features, shape features and other texture features, such as a gray level co-occurrence matrix, etc., so that the embodiments of the present invention may further distinguish objects obtained by the deep learning object detection algorithm by using the color features, the shape features and other texture features of the image.
Specifically, the first obtaining module 20 may obtain a plurality of at least one histogram distribution parameter, at least one texture characteristic parameter, at least one global threshold parameter, and at least one profile information parameter. For the N characteristic variables, some characteristic variables of the false target are the same as or similar to the true target, so that the characteristic variables with larger differences need to be selected to distinguish the true target from the false target.
For example, for the two targets in fig. 2, the histogram distribution, the global threshold, the texture characteristic gray level co-occurrence matrix, the contour area length, the moment feature, and the like can be obtained respectively.
The histograms of the two objects in fig. 2 are shown in fig. 3, and it can be seen that the color distributions of the two objects are very similar and difficult to distinguish from each other.
As shown in table 1, in the embodiment of the present invention, each selected feature variable is correspondingly provided with a preset difference value to quantify that the "difference is large", for example, for the one-dimensional maximum entropy, if the difference between the one-dimensional maximum entropies of the two images is greater than 20 (the difference between the one-dimensional maximum entropies of the bump and the hairline in table 1 is 30), the one-dimensional maximum entropies of the two images can be considered as being large, and the one-dimensional maximum entropy is used as the selected feature variable to be used as an input variable of the subsequent multiple linear regression.
It should be noted that, taking a first detection frame and a second detection frame as an example in fig. 2, fig. 3 and table 1, for the case that there are multiple first detection frames and multiple second detection frames, if there is a difference value between an image in any first detection frame and an image in any second detection frame, and a difference value of a certain characteristic variable is greater than a corresponding preset difference value, the characteristic variable may be a selected characteristic variable.
The selected feature variables are different for different kinds of targets and different image acquisition scenes, for example, in one embodiment of the present invention, the selected feature variables are a pixel maximum value, a pixel minimum value, a one-dimensional maximum entropy, and a contour area.
The multiple linear regression adopts the formula h θ (x (i) )=θ 0 +θ 1 x 1 (i) +θ 2 x 2 (i) +…+θ n x n (i) Wherein x is 1 (i) 、x 2 (i) ,、…、x n (i) Selecting feature vectors for n kinds; theta.theta. 1 、θ 2 、…、θ n Weights, θ, for n selected eigenvectors 0 Is a bias term; h is θ (x (i) ) In the process of function fitting, the true probability corresponding to the first image, namely the true target, is 1, and the true probability corresponding to the second image, namely the false target, is 0; i represents the ordinal number of the first image and the second image, and the total number of the first image and the second image, namely the total number of the detected true and false targets in the sample image, is m, wherein m is a positive integer greater than or equal to 2.
The loss function defined in the multivariate linear regression of the embodiment of the invention adopts the mean square error loss:
wherein the content of the first and second substances,
is a fitting probability value;
is the true probability value.
Further, the embodiment of the invention makes the fitting probability value approach the real probability value through a gradient descent method. The algorithm process of the gradient descent method is as follows:
}
where j represents the ordinal number of the selected feature vector and α is the learning rate.
Calculating the corresponding weight theta of each selected feature vector through gradient descent 1 、θ 2 、…、θ n And the bias term theta 0 Then a linear regression function with the input of n selected characteristic variables and the output of the probability that the image is the true target can be obtained.
In addition, for the convenience of calculation, before performing the multiple linear regression on the n selected feature variables, the regression module 30 may further perform normalization processing on the n selected feature variables, for example, the maximum value and the minimum value of the pixel may be divided by 225, so as to normalize each selected feature variable to 0-1. It should be appreciated that if the regression module 30 normalizes the n selected feature variables, the subsequent decision module 60 normalizes the n selected feature variables before inputting them into the linear regression function.
The second detection module 40 is the same as the first detection module 10 in the deep learning target detection algorithm, and the image to be detected and the sample image are images obtained in the same scene, for example, an image of a certain product obtained by the same camera using the same shooting parameters. The second detection module 40 performs target detection on the image to be detected to obtain at least one detection frame to be detected, namely a third detection frame, and then extracts the image in the third detection frame independently.
The image in the third detection frame may be a true target or a false target, and thus can be distinguished by the second acquiring module 50 and the determining module 60. In the first step of the distinguishing process, the second obtaining module 50 obtains n selected feature variables, that is, the feature variables obtained by the second obtaining module 50 are the same as the feature variables selected by the first obtaining module 20. For example, if the first obtaining module 20 selects the feature variables as the pixel maximum value, the pixel minimum value, the one-dimensional maximum entropy, and the contour area, the second obtaining module 50 also obtains the pixel maximum value, the pixel minimum value, the one-dimensional maximum entropy, and the contour area of the image in the third detection frame. In the second step of the process, the determining module 60 inputs the n selected feature variables obtained by the second obtaining module 50 into the linear regression function obtained by the regression module 30, and outputs a probability value. In the embodiment of the present invention, a probability value comparison threshold may be set, and if the output probability value is greater than the probability value comparison threshold, it may be determined that the image in the third detection box is a true target, otherwise it is determined that the image in the third detection box is a false target. In one embodiment of the present invention, the probability value comparison threshold may be 0.5.
According to the target detection device provided by the embodiment of the invention, firstly, the true and false targets are detected through the target detection algorithm, and then the true and false targets are distinguished through the multiple linear regression function based on multiple characteristic variables of the image, so that the effective distinguishing of similar targets can be realized, and the target detection accuracy is greatly improved.
The invention further provides a computer device corresponding to the embodiment.
The computer device according to the embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the object detection method according to the above-described embodiment of the present invention can be implemented.
According to the computer equipment provided by the embodiment of the invention, when the processor executes the computer program stored on the memory, the true and false targets are detected through the target detection algorithm, and then the true and false targets are distinguished through the multiple linear regression function based on multiple characteristic variables of the image, so that the effective distinguishing of similar targets can be realized, and the accuracy of target detection is greatly improved.
The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.
A non-transitory computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, can implement the object detection method according to the above-described embodiment of the present invention.
According to the non-transitory computer-readable storage medium of the embodiment of the invention, when the processor executes the computer program stored thereon, the true and false targets are detected through the target detection algorithm, and then the true and false targets are distinguished through the multiple linear regression function based on multiple characteristic variables of the image, so that the effective distinguishing of similar targets can be realized, and the accuracy of target detection is greatly improved.
In the description of the present invention, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to imply that the number of technical features indicated is significant. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood according to specific situations by those of ordinary skill in the art.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "above," and "over" a second feature may be directly on or obliquely above the second feature, or simply mean that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.