[ summary of the invention ]
In order to overcome the defect that the detection and identification accuracy rate is low when the target size is sensitive in the conventional detection and identification method, the invention provides an image detection and identification method and system, electronic equipment, and an image classification network optimization method and system, and provides an image detection and identification method for solving the technical problems, and the image detection and identification method comprises the following steps: s1, providing an image to be detected with at least one object to be recognized; s2, detecting the target to be recognized in the image to be recognized by using a detection classification model to obtain a mask image corresponding to the target to be recognized; s3, combining the image to be detected in the step S1 and the mask image in the step S2 to obtain a multi-channel image; and S4, inputting the multi-channel image into a trained classification network for detection so as to classify the target to be recognized.
Preferably, the step S2 specifically includes the following steps: s21, positioning a rectangular frame corresponding to each target to be recognized; and step S22, obtaining a mask image corresponding to the target to be recognized according to the rectangular frame.
Preferably, the step S2 further includes the following steps performed between the step S21 and the step S22: step S21A, obtaining confidence corresponding to each rectangular frame; step S21B, judging whether the rectangular frame is qualified or not according to the size relation between the confidence coefficient and a preset threshold value; if yes, go to step S22; if not, the process returns to step S21 again.
Preferably, if the rectangular frame is determined to be qualified, a step S21C is further included between the step S21B and the step S22, where the rectangular frame is scaled according to a preset scaling ratio to obtain a plurality of rectangular frames with different sizes corresponding to each target to be recognized.
Preferably, in the step S22, the pixel value in the rectangular frame is set to 255, and the pixel value in the region outside the rectangular frame is set to 0 to obtain a binary image, which is the mask image.
Preferably, in step S3, the number of channels, the width, and the height of the mask image, and the number of channels, the width, and the height of the image to be detected are combined to obtain the multi-channel image, where the number of channels of the image to be detected is n, and the number of channels of the combined multi-channel image is n + 1.
In order to solve the above technical problem, the present invention further provides an image classification network optimization method, including the following steps: t1, providing an image to be detected with at least one object to be recognized; t2, detecting the target to be recognized in the image to be recognized by using a detection classification model to obtain a mask image corresponding to the target to be recognized; t3, combining the image to be tested in the step S1 and the mask image in the step S2 to obtain a multi-channel image; and T4, inputting the multi-channel images as a training set into the trained classification network for training to obtain the optimized classification network.
In order to solve the above technical problem, the present invention further provides an image detection and recognition system, including: the image acquisition unit is used for acquiring an image to be detected with at least one target to be identified; the detection unit is used for detecting the targets to be identified in the image to be detected so as to obtain a mask image corresponding to each target to be identified; the merging unit is used for merging the image to be detected and the mask image to obtain a multi-channel image; and the classification unit is used for inputting the multi-channel image into a trained classification network for detection so as to classify the target to be recognized.
In order to solve the above technical problem, the present invention further provides an image classification network optimization system, including: the image acquisition unit is used for acquiring an image to be detected with at least one target to be identified; the detection unit is used for detecting the targets to be identified in the image to be detected so as to obtain a mask image corresponding to each target to be identified; the merging unit is used for merging the image to be detected and the mask image to obtain a multi-channel image; and the training unit is used for inputting the multi-channel images into the trained classification network for training so as to obtain the optimized classification network.
The present invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the computer program is configured to execute the image detection and identification method when running; the processor is arranged to execute the image detection recognition method as described above by means of the computer program.
Compared with the prior art, when the detection of the picture to be detected is started, the mask image is obtained by utilizing the detection classification model in a prediction mode, the target to be recognized is distinguished from the background by the mask image, so that the recognition efficiency of the detection classification model on the target to be recognized is improved, meanwhile, the mask image and the image to be detected are further combined to obtain a multi-channel image, the multi-channel image represents more comprehensive information of the target to be recognized relative to the image to be detected, the information is input into a trained classification network to be detected to classify the target to be recognized, the detection accuracy can be well improved, and a more detailed classification result can be obtained.
And judging whether the rectangular frame is qualified or not according to the size relation between the confidence coefficient and a preset threshold value so as to perform management and control at the initial stage of obtaining the multi-channel image and further improve the accuracy of detecting and classifying the pictures to be detected.
When the positioning frame is qualified, the rectangular frame is zoomed according to a preset zoom scale to obtain a plurality of rectangular frames with different sizes corresponding to each target to be recognized, so that the classification preparation is improved by the obtained multi-channel images, meanwhile, the data set of the training classification network is enriched, and the performance of the trained classification network is further improved.
And training and optimizing the trained classification network by using the multi-channel image as a training set to obtain an optimized classification network, so that the model performance of the classification network is improved, and a more accurate detection classification result is obtained in the subsequent picture detection and identification process
The image classification network optimization method and the electronic equipment provided by the invention have the same beneficial effects as the image detection and identification method.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, a first embodiment of the present invention provides an image detection and identification method, including the following steps:
s1, acquiring an image to be detected with at least one target to be identified;
s2, detecting the target to be recognized in the image to be recognized by using a detection classification model to obtain a mask image corresponding to the target to be recognized;
s3, combining the image to be detected in the step S1 and the mask image in the step S2 to obtain a multi-channel image; and S4, inputting the multi-channel image into a trained classification network for detection so as to classify the target to be recognized.
In step S1, the image to be measured is a picture or video obtained by camera shooting. And when the image is a video, the image to be detected is each frame of image intercepted from the video. It is understood that one target to be recognized or a plurality of targets to be recognized may be included in each image to be detected according to the set analysis task. For example, there are 1 bottle of mineral water and 3 bottles of cola in a picture, when needing to detect the discernment to the mineral water, its target of waiting to discern is 1, when needing to detect the discernment to the cola, its target of waiting to discern is 3.
Referring to fig. 2, in the step S2, the target to be recognized in the image to be detected is detected by using a detection classification model to obtain a mask image corresponding to the target to be recognized. The method specifically comprises the following steps:
s21, positioning a rectangular frame corresponding to each target to be recognized; and
and step S22, obtaining a mask image corresponding to the target to be recognized according to the rectangular frame.
In the step S21, a rectangular frame corresponding to each target to be recognized is located by detecting the classification model, and the rectangular frame includes a rectangular area where the maximum boundary of the target to be recognized is located, as shown in fig. 2a, taking a bottled beverage as an example, the image 20 to be detected has the first beverage 200 and the second beverage 300, and the rectangular frame 400 precisely locates the rectangular areas where the maximum boundaries of the first beverage 200 and the second beverage 300 are located, respectively. It is understood that the detection algorithm of the conventional detection classification model generally includes any one of fast R-CNN, Cascade R-CNN and Mask R-CNN, and will not be described in detail herein.
In the step S22, a mask image corresponding to the target to be recognized is obtained according to the rectangular frame, and the specific operations are as follows: in this step, a threshold value may be set based on an empirical value or a characteristic of the preprocessed image, binarization processing is performed by using the threshold value, all pixels with a gray level greater than or equal to the threshold value are determined to belong to an edge of a bubble, the gray level is 255 (i.e., represented by a white color), and the pixel value in the rectangular frame is set to be 255; otherwise, the pixel points are excluded from the object region, and the gray value is represented by 0 (i.e. relative to black), that is, the pixel value of the region outside the rectangular frame is set to 0. The method can carry out binarization processing on a preprocessed image through two functions in an OpenCV algorithm, wherein the two functions are as follows:
(1)cvThreshold(dst,dst,230,255,CV_THRESH_BINARY_INV);
(2)cvAdaptiveThreshold(dst,dst,255,CV_ADAPTIVE_THRESH_MEAN_C,CV_THRESH_BINARY,9,-10)
thus, the binary image is the mask image, as shown in fig. 2 a.
Referring to fig. 3, the step S2 further includes the following steps performed between the step S21 and the step S22:
step S21A, obtaining confidence corresponding to each rectangular frame;
S21B, S21B, judging whether the rectangular frame is qualified or not according to the size relation between the confidence coefficient and a preset threshold value;
if yes, go to step S22;
if not, the process returns to step S21 again.
In the step S21A, the confidence level corresponding to each rectangular box obtained is also obtained by the detection classification model used in the step S21. That is, it is also obtained by any one of the algorithms Faster R-CNN, Cascade R-CNN and Master R-CNN.
In the step S21B, it is determined whether the rectangular frame is qualified or not according to the confidence level, and a threshold is set, and if the confidence level is greater than or equal to the set threshold, the rectangular frame is considered to be qualified, otherwise, the rectangular frame is considered to be unqualified. It can be understood that: if the rectangular frame is qualified, the rectangular frame can well frame the target to be recognized, for example, 100% of the target to be recognized is framed in the rectangular frame, or 80% -95% of the target to be recognized in the rectangular frame is framed in the rectangular frame, otherwise, the rectangular frame is considered to be unqualified. When the rectangular frame is not qualified, it is necessary to return to step S21 again.
Referring to fig. 4, the step S2 further includes a step S21C,
if yes, that is, if the rectangular frame is qualified, step S21C is executed correspondingly: scaling the rectangular frames according to a preset scaling ratio to obtain a plurality of rectangular frames with different sizes corresponding to each target to be identified; the step S21C is between the step S21B and the step S22. The rectangular frame positioned in step 21 is a rectangular frame corresponding to one target to be recognized, and after each rectangular frame is scaled according to a preset scaling ratio, a plurality of rectangular frames with different sizes corresponding to each target to be recognized are obtained, so as to form a plurality of images to be detected with the sizes of the rectangular frames with different sizes. Optionally, the scaling is set according to an empirical value, such as: may be 0.8 times, 0.85 times, 0.9 times, 1.05 times, 1.1 times, 1.2 times, or other values of the rectangular box in step 21. And zooming the rectangular frame according to a preset zooming scale to obtain more to-be-detected image sets representing each to-be-recognized target, so that a classification network trained on the to-be-detected image sets obtains a better classification detection effect.
Referring to fig. 1, in the step S3, merging the image to be tested in the step S1 and the mask image in the step S2 to obtain a multi-channel image is based on the number of channels, width and height of the mask image and the number of channels, width and height of the image to be tested to obtain the multi-channel image. The number of channels of the image to be detected is n, and the number of channels of the combined multi-channel image is n + 1. For example, the width, height and channel number of the image to be measured are respectively: w1, H1 and n; the width, height and number of channels of the mask image are respectively: w2, H2 and 1 are combined by superimposing the width, height and number of channels. The width, height and channel number of the combined multi-channel image are respectively as follows: w1+ W2, H1+ H2 and n + 1. It can be understood that the image to be measured is usually a color image, which is an RGB three-channel image, i.e. the number of channels is 3. It is understood that, since the rectangular frame is scaled at the preset scaling ratio in step 21C to obtain a plurality of rectangular frames having different sizes corresponding to the objects to be recognized, in this step, a plurality of multi-channel images corresponding to each image to be detected are provided.
Referring to fig. 1 again, in the step S4, S4, the multi-channel image is input into a trained classification network for detection to classify the target to be recognized. In this step, the trained classification network is any one of the existing and commonly used classification networks, such as SSD, yolo, fast-rcnn and mask-rcnn, or other classification networks.
Referring to fig. 5, a second embodiment of the present invention provides an image classification network optimization method, which includes steps S1-S3 and steps provided in the first embodiment: and T4, inputting the multi-channel images as a training set into the trained classification network for training to obtain the optimized classification network.
In the step T4, the trained classification network is any one of the existing and commonly used classification networks, such as SSD, yolo, fast-rcnn and mask-rcnn, or other classification networks.
Referring to fig. 6, a third embodiment of the invention provides an image detection and recognition system 100, which includes: an image acquisition unit 101, a detection unit 102, a merging unit 103, and a classification unit 104.
An image acquisition unit 101 for acquiring an image to be detected having at least one object to be recognized;
the detection unit 102 is configured to detect targets to be identified in the image to be detected to obtain a mask image corresponding to each target to be identified;
a merging unit 103, configured to merge the image to be detected and the mask image to obtain a multi-channel image;
and the classification unit 104 is configured to input the multi-channel image into a trained classification network for detection so as to classify the target to be recognized.
Referring to fig. 7, the detecting unit 102 includes: a positioning frame generation unit 1021, and a mask image generation unit 1022.
The positioning frame generation unit 1021 is used for generating a rectangular frame corresponding to each target to be identified;
a mask image generating unit 1022, configured to obtain a mask image corresponding to each target to be identified according to the rectangular frame.
Referring to fig. 8, an image classification network optimization system 200 according to a fourth embodiment of the present invention includes the detection unit 102, the merging unit 103, and the training unit 205 according to the third embodiment. The training unit 205 is configured to input the multi-channel image into a trained classification network for training to obtain an optimized classification network.
Referring to fig. 9, a fifth embodiment of the present invention provides an electronic device 700, including a memory 701 and a processor 702, where the memory 701 stores a computer program, and the computer program is configured to execute the image detection and recognition method according to the first embodiment when running;
the processor 702 is arranged to execute the image detection recognition method according to the first embodiment by means of the computer program.
Referring now to FIG. 10, a block diagram of a computer system 800 suitable for use in implementing a terminal device/server of an embodiment of the present application is shown. The terminal device/server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
According to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "for example" programming language or similar programming languages. The program code may execute entirely on the management-side computer, partly on the management-side computer, as a stand-alone software package, partly on the management-side computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the administrative side computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Compared with the prior art, when the detection is started, the mask image is obtained by utilizing the detection classification model for prediction, the target to be recognized is distinguished from the background by the mask image, so that the recognition efficiency of the detection classification model on the target to be recognized is improved, meanwhile, the mask image and the image to be recognized are further combined to obtain a multi-channel image, the multi-channel image represents more comprehensive information of the target to be recognized relative to the image to be recognized, the information is input into a trained classification network for detection so as to classify the target to be recognized, the detection accuracy can be well improved, and a more detailed classification result can be obtained.
And judging whether the rectangular frame is qualified or not according to the size relation between the confidence coefficient and a preset threshold value so as to perform management and control at the initial stage of obtaining the multi-channel image and further improve the accuracy of detecting and classifying the pictures to be detected.
When the positioning frame is qualified, the rectangular frame is zoomed according to a preset zoom scale to obtain a plurality of rectangular frames with different sizes corresponding to each target to be recognized, so that the classification preparation is improved by the obtained multi-channel images, meanwhile, the data set of the training classification network is enriched, and the performance of the trained classification network is further improved.
And training and optimizing the trained classification network by using the multi-channel image as a training set to obtain an optimized classification network, so that the model performance of the classification network is improved, and a more accurate detection classification result is obtained in the subsequent picture detection and identification process
The invention provides an image detection and identification system and an electronic device, which have the same beneficial effects as the image detection and identification method.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit of the present invention are intended to be included within the scope of the present invention.