WO2020151329A1 - Target detection based identification box determining method and device and terminal equipment - Google Patents

Target detection based identification box determining method and device and terminal equipment Download PDF

Info

Publication number
WO2020151329A1
WO2020151329A1 PCT/CN2019/118131 CN2019118131W WO2020151329A1 WO 2020151329 A1 WO2020151329 A1 WO 2020151329A1 CN 2019118131 W CN2019118131 W CN 2019118131W WO 2020151329 A1 WO2020151329 A1 WO 2020151329A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
recognition frame
detected
recognition
Prior art date
Application number
PCT/CN2019/118131
Other languages
French (fr)
Chinese (zh)
Inventor
徐锐杰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151329A1 publication Critical patent/WO2020151329A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention belongs to the field of data processing technology, and in particular relates to a method, a device, a terminal device, and a computer-readable storage medium for determining a recognition frame based on target detection.
  • the use of computer technology to detect and track targets has shown a trend of heating up.
  • the target can be a human face, a vehicle or a building, etc. How to accurately locate the target in the image is an urgent problem in target detection.
  • images are usually detected by deep convolutional neural network algorithms. Based on the characteristics of the algorithm, multiple recognition frames are usually obtained after detection. Therefore, it is necessary to determine the optimal recognition frame from multiple recognition frames.
  • the optimal recognition frame usually only the recognition frame is calculated, and the cross-to-combination algorithm is specifically applied to multiple recognition frames, and the recognition frame with the largest cross-to-combination ratio is regarded as the optimal recognition Frame, because the actual image is not combined in the calculation process, it is easy to lead to unreliable results.
  • the result of the cross-union operation in the prior art is not reliable, the determined optimal recognition frame cannot fit the target well, and the accuracy of target detection is low.
  • the embodiments of the present application provide a method, a device, a terminal device, and a computer non-volatile readable storage medium for determining a recognition frame based on target detection to solve the problem of the low accuracy of the recognition frame determined in the prior art, which leads to target detection.
  • the problem of low accuracy is a problem of low accuracy.
  • the first aspect of the embodiments of the present application provides a method for determining a recognition frame based on target detection, including:
  • the image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one.
  • the sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding
  • the target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • a second aspect of the embodiments of the present application provides an apparatus for determining a recognition frame based on target detection, including:
  • An analysis unit configured to obtain an image to be detected containing a target, and perform identification frame analysis on the image to be detected to obtain at least one identification frame to be detected;
  • An interception unit configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image
  • the input unit is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain the sub-recognition frame output by the recognition frame optimization model corresponding to each input image, and Correct the latter sub-recognition frame according to the previous sub-recognition frame, and determine the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is preset
  • the sample image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • a third aspect of the embodiments of the present application provides a terminal device.
  • the terminal device includes a memory, a processor, and computer-readable instructions that are stored in the memory and run on the processor.
  • the processor The following steps are implemented when the computer-readable instruction is executed:
  • the image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one.
  • the sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding
  • the target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • the fourth aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When implementing the following steps:
  • the image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one.
  • the sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding
  • the target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • At least one recognition frame to be detected is obtained by analyzing the image to be detected, the image to be detected is intercepted according to each recognition frame to be detected to obtain a captured image, and then the image to be detected and all the captured images are input into the pre-trained recognition
  • the frame optimization model obtains the target recognition frame.
  • This application analyzes the image to be detected and all intercepted images through the recognition frame optimization model, and generates the target recognition frame by combining the image characteristics, so that the generated target recognition frame is more suitable for the target in the image to be detected. The accuracy of the determined target recognition frame and the accuracy of target detection are improved.
  • FIG. 1 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 1 of the present application;
  • FIG. 2 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 2 of the present application;
  • FIG. 3 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 3 of the present application;
  • FIG. 4 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 4 of the present application;
  • FIG. 5 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 5 of the present application;
  • FIG. 6 is a network structure diagram of the inception framework provided by Embodiment 6 of the present application.
  • FIG. 7 is a structural diagram of the first inception structure provided by Embodiment 7 of the present application.
  • FIG. 8 is a structural diagram of a second inception structure provided by Embodiment 8 of the present application.
  • FIG. 9 is a structural diagram of a third inception structure provided by Embodiment 9 of the present application.
  • FIG. 10 is a structural block diagram of an apparatus for determining a recognition frame based on target detection according to Embodiment 10 of the present application;
  • FIG. 11 is a schematic diagram of a terminal device provided in Embodiment 11 of the present application.
  • Fig. 1 shows an implementation process of a method for determining a recognition frame based on target detection provided by an embodiment of the present application, which is detailed as follows:
  • an image to be detected containing a target is acquired, and an identification frame analysis is performed on the image to be detected to obtain at least one identification frame to be detected.
  • Target detection is one of the core technologies in the field of computer vision. The purpose is to detect all targets in the image and determine the location of each target. In view of the situation that the recognition frame determined in the target detection process cannot fit the target well, in the embodiment of the present application, the image to be detected containing the target is first obtained, and the recognition frame analysis of the image to be detected is performed to obtain at least one recognition frame to be detected .
  • Recognition frame analysis can be implemented based on open source target detection models, such as regional convolutional neural networks (Region- Convolutional Neural Network, R-CNN) model or Single Shot MultiBox Detector (SSD) model, etc., based on the target detection model, when the image to be detected is analyzed based on the recognition frame, the sliding window algorithm or selective search is first performed
  • the image to be detected is divided into at least two recognition frames by the method, etc., and the target detection model calculates each of the separated recognition frames to obtain the confidence level of each recognition frame.
  • the confidence level indicates that the target is located in the recognition frame. The higher the confidence, the higher the probability that the target is located in the recognition frame.
  • the calculated confidence value depends on the architecture and weight of the target detection model in the actual application scenario, which is not done in the embodiment of this application. Repeat. After the confidence of each recognition frame is obtained, the recognition frame corresponding to the confidence higher than the confidence threshold is determined as the recognition frame to be detected. On the one hand, the amount of subsequent calculations is reduced, and on the other hand, the recognition frame to be detected and the target The correlation degree of is higher, so the accuracy of the subsequent determination of the target recognition frame can be improved.
  • the confidence threshold can be determined according to the accuracy requirements for determining the target recognition frame. The higher the accuracy requirement, the greater the confidence threshold. For example, it can be set to 60%. It is worth mentioning that the specific types of targets to be detected in the embodiments of this application are not limited.
  • the targets can be human faces, vehicles, or trees, etc., but after the target type is selected, all targets in the embodiments of this application refer to The target type has been selected, and the recognition frame to be detected and other recognition frames in the embodiment of the present application are preferably rectangular frames.
  • the image to be detected is normalized to a preset size, and the normalized image to be detected is zero-averaged.
  • the size of the image to be detected is normalized to a preset size in advance.
  • the preset size can be set freely, such as 299 (pixel length) ⁇ 299 (pixel width) ).
  • the normalized image to be detected is zero-averaged, which is convenient to improve the effect of the recognition frame analysis.
  • the average value of the original values of all pixels in the image to be detected is calculated, and then the image to be detected
  • the value of each pixel in the image is updated to the difference between the original value of the pixel and the above average value (that is, the result of subtracting the average value from the original value).
  • Recognition frame analysis can be performed on the image to be detected.
  • the image to be detected is intercepted according to each identification frame to be detected to obtain a intercepted image.
  • Each obtained recognition frame to be detected corresponds to a coordinate set.
  • Each coordinate in the coordinate set is the coordinate of a corner of the recognition frame to be detected in the image to be detected. Therefore, after obtaining the recognition frame to be detected, according to each The to-be-detected recognition frame is intercepted from the to-be-detected image to obtain the intercepted image, which facilitates the subsequent determination of the best recognition frame.
  • the to-be-detected image and all the intercepted images are sequentially input into a pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and according to The former sub-recognition frame corrects the latter sub-recognition frame, and the last sub-recognition frame after the correction is determined as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample
  • the image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • the target recognition frame is corrected and determined in combination with the image characteristics of the image to be detected.
  • the image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model.
  • the sub-recognition frame corresponding to each image output by the recognition frame optimization model is obtained.
  • the accuracy of the sub-recognition frame may be low and it cannot fit the target well.
  • the sub-recognition frame of the input next image is corrected according to the input sub-recognition frame of the previous image (according to the image input time from front to back) Order) until the image to be detected and all intercepted images are input to the recognition frame optimization model, and the last sub-recognition frame after the correction is determined as the target recognition frame, which is the area where the target is located.
  • the recognition frame optimization model is trained through preset sample images and corresponding manually labeled frames, so that sub-recognition frames can be obtained according to the specific characteristics of the image. The specific training process will be explained later.
  • the embodiment of the present application does not limit the modification method of the sub-recognition frame.
  • the coordinate set of the previous sub-recognition frame and the coordinate set of the current sub-recognition frame can be averaged (calculated separately for each corner). ), and update the calculated coordinate set to the coordinate set of the current sub-recognition frame, so as to realize the correction of the current sub-recognition frame.
  • the coordinate set of each sub-recognition frame is obtained by placing the sub-recognition frame in the image to be detected, that is, the coordinate set of the sub-recognition frame is relative to the image to be detected.
  • the image set to be detected includes and only includes the image to be detected PictureA
  • the images PictureB and PictureC are intercepted
  • the coordinate set obtained after the neural network operation on PictureA is [top left point (100, 100), bottom left point ( 100, 50), the lower right point (200, 50), the upper right point (200, 100)]
  • the coordinate set of the sub-recognition frame obtained after the neural network operation of PictureB is [upper left point (90, 90),
  • the coordinate set of the sub-recognition frame obtained after the neural network operation of PictureC is [upper left corner point (85 , 90), the lower left corner point (85, 50), the lower right corner point (180, 50), the upper right corner point (180, 90)]
  • the sub-recognition frame in PictureB is first corrected based on the sub-recognition frame in PictureA
  • At least one identification frame to be detected is obtained by analyzing the identification frame of the image to be detected, and the image to be detected is intercepted according to each identification frame to be detected to obtain the intercepted image, and then The image to be detected and all captured images are sequentially input into the preset recognition frame optimization model to obtain the target recognition frame.
  • the embodiment of the present application determines the target recognition frame based on the image characteristics of the image to be detected, so that the target recognition frame can better fit the image to be detected The target in, improves the accuracy of the determined target recognition frame.
  • FIG. 2 shows a method based on the first embodiment of the present application, and is a method obtained by expanding the process before sequentially inputting the image to be detected and all the intercepted images into the pre-trained recognition frame optimization model.
  • the embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 2, the method for determining a recognition frame may include the following steps:
  • S201 acquire at least the two sample images and the corresponding manual annotation frame, wherein the sample image is an image containing the target, and the manual annotation frame is an image in the manually annotated sample image. State the area where the target is located.
  • the training process of the optimization model of the recognition frame first obtain at least two sample images and a manually labeled frame corresponding to each sample image.
  • the sample image contains the target to be detected in the embodiment of this application.
  • the manual labeling box is the area where the target is located in the manually labelled sample image.
  • the sample images can be freely selected by the user or can be directly retrieved from the open source image library.
  • the number of sample images should be large. For example, the number of sample images obtained in the embodiment of this application can be 1000 Zhang.
  • the identification frame analysis is performed on each of the sample images to obtain at least one sample identification frame, and the sample image is intercepted according to each sample identification frame to obtain a sample intercepted image.
  • each sample image corresponds to at least one sample identification frame.
  • the corresponding sample image is intercepted according to each sample identification frame obtained to obtain the sample intercepted image.
  • a sample image set is constructed based on the sample image and all the corresponding sample intercepted images, and at least two of the sample image sets constructed are sequentially input to a preset basic model, and according to the input sample
  • the manual annotation frame corresponding to the image set adjusts the weight of the basic model until the identification frame output by the adjusted basic model matches the manual annotation frame, and the adjusted basic model is determined
  • the model is the recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
  • a sample image set is constructed based on each sample image and all sample intercepted images corresponding to the sample image.
  • the number of sample image sets is equal to the number of sample images, that is, each sample image set corresponds to a sample image.
  • the sample image set only refers to the sample images and all sample intercepted images into a specific set, and does not specifically refer to a specific storage form.
  • the constructed at least two sample image sets are successively used as input parameters and input to the preset basic model.
  • the basic model can be implemented based on the inception framework, and the network structure of the inception framework is shown in FIG. 6.
  • convolution kernel refers to the convolution kernel, which is a filter matrix used to perform convolution operations on different windows in the image
  • patch size refers to the receptive field (equivalent to size) of the convolution kernel, If the receptive field of a certain convolution kernel is 3 ⁇ 3, it means that the length of the convolution kernel is 3 elements and the width is 3 elements
  • trimde refers to the step length, when performing convolution operation, convolution The kernel slides on the images on the three channels (red channel, green channel and blue channel), and uses the weighted sum of the original values of the image pixels on the three channels and the convolution kernel as the convolution kernel The step length is the number of steps per slide of the convolution kernel (the number of pixels in the path);
  • input “size” refers to the size of the image as the input parameter of this layer, the last parameter in “input size” refers to the depth of the image, for example, “input size” is 299 ⁇ 299 ⁇ 3, it is limited as the input parameter
  • conv “Padded” refers to the convolution kernel that contains the boundary filling function
  • pool refers to the pooling layer, which is used to reduce the data volume of the input parameters of this layer, prevent overfitting, and keep the depth of the image unchanged
  • linear refers to the linear layer, whose input parameter is the calculated unnormalized probability of each coordinate set
  • softmax refers to the classification output layer, which applies the softmax function in the neural network to calculate the The probability is normalized to complete the classification (in actual training, the coordinate set with the highest probability output by the "softmax” layer can be determined as the coordinate set output by the basic model this time).
  • FIG. 6 refers to the first inception structure.
  • the first inception structure does not limit the receptive field of the convolution kernel. In the entire basic model, the number of first inception structures is limited to three.
  • Fig. 7 is a structural diagram of the first inception structure, where "Base” is the input layer of the first inception structure, and "Filter Concat” is the output layer of the first inception structure.
  • Figure 6 in FIG. 6 refers to the second inception structure, and the specific structure diagram of the second inception structure is shown in FIG. 8.
  • the second inception structure splits the 5*5 convolution kernel on the basis of the first inception structure, and specifically splits it into two 3*3 convolution kernels to reduce the amount of calculation and improve training efficiency.
  • the number of second inception structures is limited to 5.
  • Figure 7 in FIG. 6 refers to the third inception structure, and the specific structure diagram of the third inception structure is shown in FIG. 9.
  • the third inception structure splits the n*n convolution kernel, which is specifically split into a 1*n convolution kernel and an n*1 convolution kernel, further reducing Small amount of calculation, where n is an integer greater than 1.
  • the number of third inception structures is limited to two.
  • the weights in the basic model can be initialized (values can be set randomly within a preset range), and the weights are the specific values of each level (including each convolution kernel) in the basic model.
  • the basic model will perform calculations and output a recognition frame, which is the area where the predicted target is located in the sample image.
  • the weight of the basic model is adjusted according to the manual labeling box corresponding to the sample image set until the adjustment
  • the recognition frame output by the latter basic model matches the manual labeling frame, and the specific weight adjustment method is described in detail later.
  • the basic model After the basic model training is completed, that is, the weight adjustment is completed, the basic model is used as the recognition frame optimization model, and the image to be detected and all intercepted images are input as input parameters into the recognition frame optimization model. Because the recognition frame optimization model has been The target to be detected has a good recognition effect, so the recognition frame output by the recognition frame optimization model is directly determined as the target recognition frame.
  • a preset basic model is trained based on manually annotated sample images, the trained basic model is used as the recognition frame optimization model, and the model is trained based on manual supervision to improve The fit of the recognition frame optimization model and the analysis method of the recognition frame is improved, so that the recognition frame optimization model has a better recognition effect on the image to be detected and the captured image, and further improves the accuracy of the determined target recognition frame.
  • the constructed at least two sample image sets are sequentially input into the preset basic model, and the basic model is adjusted according to the manual annotation frame corresponding to the input sample image set.
  • the weight adjustment is performed until the recognition frame output by the adjusted basic model matches the manual label frame.
  • the embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 3, the method for determining a recognition frame may include the following steps:
  • the recognition frame corresponding to each input sample image set output by the basic model is determined as a basic recognition frame, and the difference between the basic recognition frame and the corresponding manual labeling frame is calculated parameter.
  • the recognition frame output by the basic model corresponding to the sample image set is obtained, and the recognition frame is determined as the basic recognition frame, and then the basic recognition frame is calculated corresponding to the sample image set
  • the parameters of the difference between the manually labeled boxes are not limit the calculation method of the difference parameter.
  • the difference between the coordinates of the basic identification frame and the manually marked frame at the four corner points can be calculated separately, and the average value of the four differences can be used as the difference parameter.
  • a preset loss function such as a quadratic cost function
  • the expected parameters are preset in the embodiments of this application. If the obtained difference parameter is greater than or equal to the expected parameter, then the basic model is weighted based on the difference parameter.
  • the weight adjustment operation can be based on the neural network Open source gradient descent algorithm or back propagation algorithm and other algorithms are implemented.
  • the difference can be calculated when the difference parameter is greater than or equal to the expected parameter
  • the partial derivative of the parameter and the weight of the convolution kernel constitutes a gradient vector of the difference parameter with respect to the weight, and then the value of the weight of the convolution kernel is adjusted based on the gradient vector, so that the difference parameter is as small as possible.
  • the basic model After completing the weight adjustment based on a difference parameter, input the next sample image set to the basic model, obtain the basic recognition frame output by the basic model, and update the value of the difference parameter. If the updated difference parameter is greater than or equal to the expected parameter, Then adjust the weight and repeat the above operation until the difference parameter is less than the expected parameter.
  • the difference parameter is less than the expected parameter, it can be determined that the basic recognition frame output by the basic model matches the manual labeling frame.
  • the basic model can continue to be trained until all the sample image sets are input.
  • the difference parameter is calculated, and when the difference parameter is greater than or equal to the preset expected parameter, the difference parameter is compared to the basic model.
  • the model performs weight adjustment and repeats the above operation until the difference parameter is less than the expected parameter, which improves the analysis accuracy of the identification frame optimization model.
  • the detection image and all intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub recognition frame corresponding to each input image from the recognition frame optimization model is obtained, and the next sub recognition frame is performed according to the previous sub recognition frame
  • Correction is a method obtained by refining the process of determining the last sub-recognition frame after correction as the target recognition frame.
  • the embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 4, the method for determining a recognition frame may include the following steps:
  • the to-be-detected image is cut according to all the to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and the size of the cut image is scaled, wherein each cut image corresponds to a cut image. Mentioned goals.
  • the image to be detected may contain at least two targets.
  • the target is a vehicle and the image to be detected is a captured image of an intersection, the image to be detected may contain at least two vehicles.
  • the target recognition frame is determined separately for each target. Specifically, since each target identified corresponds to at least one recognition frame to be detected, all the target recognition frames corresponding to each target are calculated. The maximum coverage area of the frame is identified, and the image to be detected is cut according to the maximum coverage area to obtain a cut image, and each cut image corresponds to an identified target.
  • the maximum coverage area when calculating the maximum coverage area, first obtain the union area of all the identification frames to be detected corresponding to the target, and then take the horizontal coordinate range covered by the union area as the horizontal coordinate range of the maximum coverage area, and then the union area
  • the vertical coordinate range of the coverage is used as the vertical coordinate range of the maximum coverage area to construct the maximum coverage area.
  • the maximum coverage area is rectangular, so the maximum coverage area is obtained based on the union area. In fact, the shape may be irregular
  • the union area of is completed to the maximum coverage area of the rectangle.
  • the size of the cut image may not meet the standard size of the input parameters of the recognition frame optimization model (for example, the standard size is the preset size, specifically 299 ⁇ 299, and the size of the cut image is 199 ⁇ 199), Therefore, the size of each cut image obtained is scaled until the cut image reaches the standard size. It is worth mentioning that, unlike the cut image, if the above-mentioned intercepted image and sample intercepted image do not meet the standard size of the recognition frame optimization model, the area where the excess size in the intercepted image and sample intercepted image is located can be blanked. Processing or gray-scale processing to prevent the subsequent size confusion when determining the recognition frame.
  • the scaled cut image and all the corresponding intercepted images are sequentially input to the recognition frame optimization model, and the recognition frame output by the recognition frame optimization model is restored in size, and the size is restored
  • the identification frame is determined as the target identification frame corresponding to the cut image.
  • the cut image is scaled, input the scaled cut image and all the corresponding cut images (here the cut image is obtained by cutting the cut image according to the recognition frame to be detected corresponding to the cut image).
  • the weight-adjusted recognition frame optimization model will restore the size of the recognition frame output by the recognition frame optimization model (here the recognition frame is the last corrected recognition frame output by the recognition frame optimization model).
  • the scale and steps of the size restoration In S401, the scale of the size of the cut image is reversed. For example, when the size of the cut image is reduced, the cut image is reduced to one-half. In this step, the recognition frame is expanded to the center of the recognition frame. Twice the original.
  • the recognition frame after the size restoration is determined as the target recognition frame corresponding to the cut image, and the target recognition frame is re-placed in the initial image to be detected, so as to facilitate subsequent target detection operations.
  • the number of target recognition frames finally obtained in the embodiment of the present application is equal to the number of recognized targets, that is, one target recognition frame corresponds to one target.
  • the image to be detected is cut according to each identified target to obtain a cut image, and the cut image is scaled in size, and the cut image after the scale is scaled and the corresponding All the intercepted images of, are sequentially input to the recognition frame optimization model, and the size of the recognition frame output by the recognition frame optimization model is restored to obtain the target recognition frame.
  • the embodiment of the present application determines the target recognition frame separately for each recognized target, which improves target recognition The pertinence and accuracy of the frame determination prevents poor calculation results after inputting the image to be detected with at least two targets into the recognition frame optimization model.
  • the embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 5, the method for determining a recognition frame may include the following steps:
  • the attribute feature corresponding to the image to be detected is determined as a target feature.
  • At least two attribute features can be set in advance, and the basic model is separately trained according to each attribute feature to obtain the recognition frame optimization model.
  • the number of attribute features and the final recognition frame optimization model Are equal in number.
  • the recognition frame optimization model inputs at least two sample image sets corresponding to women to another basic model for weight adjustment, and the final weight adjustment is performed on the basic model as another recognition frame optimization model.
  • the image to be detected and all the captured images are sequentially input to the recognition frame optimization model corresponding to the target feature.
  • the recognition frame optimization model After determining the target feature, input the image to be detected and all the intercepted images into the weight-adjusted recognition frame optimization model corresponding to the target feature, because the recognition frame optimization model has a good analysis effect on the image corresponding to the target feature Therefore, the accuracy of the target recognition frame determined subsequently is improved.
  • the attribute feature corresponding to the image to be detected is determined as the target feature, and the image set to be detected is input into the weight-adjusted recognition frame optimization corresponding to the target feature
  • the model through targeted training of the recognition frame optimization model, and targeted input of the image set to be detected into the corresponding recognition frame optimization model, improves the accuracy of target detection.
  • FIG. 10 shows a structural block diagram of a device for determining a recognition frame based on target detection provided by an embodiment of the present application.
  • the device for determining a recognition frame include:
  • the analyzing unit 101 is configured to obtain a to-be-detected image containing a target, and perform an identification frame analysis on the to-be-detected image to obtain at least one to-be-detected identification frame;
  • the interception unit 102 is configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image
  • the input unit 103 is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain sub-recognition frames output by the recognition frame optimization model corresponding to each input image, The latter sub-recognition frame is corrected according to the previous sub-recognition frame, and the last sub-recognition frame after the correction is determined to be the target recognition frame, wherein the recognition frame optimization model is a preset
  • the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • the input unit 103 further includes:
  • the acquiring unit is configured to acquire at least the two sample images and the corresponding manual labeling frame, wherein the sample image is an image containing the target, and the manual labeling frame is the sample image manually labeled The area where the target is located;
  • a sample interception unit configured to perform identification frame analysis on each sample image to obtain at least one sample identification frame, and intercept the sample image according to each sample identification frame to obtain a sample intercepted image
  • the construction unit is configured to construct a sample image set based on the sample image and all the corresponding sample intercepted images, input at least two of the constructed sample image sets into a preset basic model in sequence, and according to the input
  • the manual labeling frame corresponding to the sample image set adjusts the weight of the basis until the identification frame output by the adjusted basic model matches the manual labeling frame, and the adjusted basis is determined
  • the model is the recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
  • the building unit includes:
  • the calculation unit is configured to determine the recognition frame corresponding to each input sample image set output by the basic model as a basic recognition frame, and calculate the difference between the basic recognition frame and the corresponding manual annotation frame Difference parameter
  • the weight adjustment unit is configured to, if the difference parameter is greater than or equal to a preset expected parameter, adjust the weight of the basic model based on the difference parameter, and repeatedly obtain the next basic recognition frame output by the basic model , And adjust the weight of the basic model based on the updated difference parameter until the difference parameter is smaller than the expected parameter.
  • the input unit includes:
  • the cutting unit is used to cut the to-be-detected image according to all the to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and perform size scaling on the cut image, wherein each cut image corresponds to one
  • the determining unit is configured to input the scaled cut image and all the corresponding intercepted images into the recognition frame optimization model in sequence, and restore the size of the recognition frame output by the recognition frame optimization model to restore the size
  • the subsequent recognition frame is determined as the target recognition frame corresponding to the cut image.
  • each recognition frame optimization model is obtained by training a sample image set corresponding to the same attribute feature
  • the input unit includes:
  • a feature determining unit configured to determine the attribute feature corresponding to the image to be detected as a target feature
  • the input unit it is used to input the to-be-detected image and all the intercepted images into the recognition frame optimization model corresponding to the target feature in sequence.
  • the analysis unit 101 further includes:
  • the normalization unit is used to normalize the to-be-detected image to a preset size, and perform zero-averaging on the normalized to-be-detected image.
  • the device for determining a recognition frame based on target detection inputs the image to be detected and all intercepted images into a pre-trained recognition frame optimization model to obtain the target recognition frame, which improves the accuracy of the determined target recognition frame. And the accuracy of target detection.
  • Fig. 11 is a schematic diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 11 of this embodiment includes: a processor 110, a memory 111, and computer-readable instructions 112 stored in the memory 111 and running on the processor 110, for example, based on target detection
  • the identification box determines the program.
  • the processor 110 executes the computer-readable instruction 112
  • the steps in the above-mentioned embodiments of the method for determining a recognition frame based on target detection are implemented, such as steps S101 to S103 shown in FIG. 1.
  • the processor 110 executes the computer-readable instruction 112
  • the functions of the units in the foregoing embodiments of the device for determining a recognition frame based on target detection are implemented, for example, the functions of the units 101 to 103 shown in FIG.
  • the computer-readable instruction 112 may be divided into one or more units, and the one or more units are stored in the memory 111 and executed by the processor 110 to complete the application .
  • the one or more units may be an instruction segment of a series of computer-readable instructions capable of completing specific functions, and the instruction segment is used to describe the execution process of the computer-readable instruction 112 in the terminal device 11.
  • the computer-readable instruction 112 may be divided into an analysis unit, an interception unit, and an input unit, and the specific functions of each unit are as follows:
  • An analysis unit configured to obtain an image to be detected, and analyze the recognition frame of the image to be detected to obtain at least one recognition frame to be detected;
  • An interception unit configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image
  • the input unit is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain the sub-recognition frame output by the recognition frame optimization model corresponding to each input image, and Correct the latter sub-recognition frame according to the previous sub-recognition frame, and determine the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is preset
  • the sample image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  • the terminal device 11 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 110 and a memory 111.
  • FIG. 11 is only an example of the terminal device 11, and does not constitute a limitation on the terminal device 11. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the terminal device may also include input and output devices, network access devices, buses, etc.
  • the so-called processor 110 may be a central processing unit (Central Processing Unit, CPU), it can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 111 may be an internal storage unit of the terminal device 11, such as a hard disk or a memory of the terminal device 11.
  • the memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) equipped on the terminal device 11.
  • the memory 111 may also include both an internal storage unit of the terminal device 11 and an external storage device.
  • the memory 111 is used to store the computer-readable instructions and other programs and data required by the terminal device.
  • the memory 111 can also be used to temporarily store data that has been output or will be output.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM Read-Only Memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory.
  • RAM Random Access Memory
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

A target detection based identification box determining method and device, a terminal equipment, and a computer nonvolatile readable storage medium. The method comprises: acquiring an image to be detected comprising a target and performing identification box analysis on the image to be detected, to obtain at least one identification box to be detected; carrying out, according to each of the identification boxes to be detected, image capturing on the image to be detected to obtain captured images; and sequentially inputting the image to be detected and all the captured images into a pretrained identification box optimizing model to obtain a target identification box, wherein the target identification box is used for indicating a region where the target is located. A target identification box is generated by using the method and image features of the image to be detected as a combination, thus improving the accuracy of determining the target identification box and thereby improving the accuracy of target detection.

Description

基于目标检测的识别框确定方法、装置及终端设备Method, device and terminal equipment for determining recognition frame based on target detection
本申请要求于2019年01月23日提交中国专利局、申请号为201910064290.9、发明名称为“基于目标检测的识别框确定方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 23, 2019, the application number is 201910064290.9, and the invention title is "Method, Apparatus, and Terminal Equipment for Identifying Frame Based on Target Detection". The reference is incorporated in this application.
技术领域Technical field
本发明属于数据处理技术领域,尤其涉及基于目标检测的识别框确定方法、装置、终端设备以及计算机可读存储介质。The present invention belongs to the field of data processing technology, and in particular relates to a method, a device, a terminal device, and a computer-readable storage medium for determining a recognition frame based on target detection.
背景技术Background technique
随着计算机技术的发展和计算机视觉原理的广泛应用,利用计算机技术对目标进行检测及跟踪的相关研究呈现出愈加热门的发展趋势,根据场景的不同,目标可为人脸、车辆或建筑等,而如何在图像中精准地定位出目标是目标检测中亟待解决的问题。With the development of computer technology and the widespread application of computer vision principles, the use of computer technology to detect and track targets has shown a trend of heating up. Depending on the scene, the target can be a human face, a vehicle or a building, etc. How to accurately locate the target in the image is an urgent problem in target detection.
目前,通常是通过深度卷积神经网络算法对图像进行检测,基于算法特性,在进行检测后通常会得到多个识别框,故还需要从多个识别框中确定出最优的识别框,而在现有技术中,在确定最优识别框时,通常仅是对识别框进行运算,具体对多个识别框应用交并比算法,将其中交并比结果最大的识别框视为最优识别框,由于在运算过程中并未结合实际的图像,容易导致结果不可靠。综上,由于现有技术中进行交并比运算的结果并不可靠,导致确定出的最优识别框无法较好地贴合目标,目标检测的准确性低。At present, images are usually detected by deep convolutional neural network algorithms. Based on the characteristics of the algorithm, multiple recognition frames are usually obtained after detection. Therefore, it is necessary to determine the optimal recognition frame from multiple recognition frames. In the prior art, when determining the optimal recognition frame, usually only the recognition frame is calculated, and the cross-to-combination algorithm is specifically applied to multiple recognition frames, and the recognition frame with the largest cross-to-combination ratio is regarded as the optimal recognition Frame, because the actual image is not combined in the calculation process, it is easy to lead to unreliable results. In summary, since the result of the cross-union operation in the prior art is not reliable, the determined optimal recognition frame cannot fit the target well, and the accuracy of target detection is low.
技术问题technical problem
本申请实施例提供了基于目标检测的识别框确定方法、装置、终端设备以及计算机非易失性可读存储介质,以解决现有技术中确定出的识别框的准确性低,导致目标检测的准确性低的问题。The embodiments of the present application provide a method, a device, a terminal device, and a computer non-volatile readable storage medium for determining a recognition frame based on target detection to solve the problem of the low accuracy of the recognition frame determined in the prior art, which leads to target detection. The problem of low accuracy.
技术解决方案Technical solutions
本申请实施例的第一方面提供了一种基于目标检测的识别框确定方法,包括:The first aspect of the embodiments of the present application provides a method for determining a recognition frame based on target detection, including:
获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;Acquiring a to-be-detected image containing a target, and performing identification frame analysis on the to-be-detected image to obtain at least one identification frame to be detected;
根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;Intercepting the image to be detected according to each identification frame to be detected to obtain a intercepted image;
将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one. The sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding The target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
本申请实施例的第二方面提供了一种基于目标检测的识别框确定装置,包括:A second aspect of the embodiments of the present application provides an apparatus for determining a recognition frame based on target detection, including:
分析单元,用于获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;An analysis unit, configured to obtain an image to be detected containing a target, and perform identification frame analysis on the image to be detected to obtain at least one identification frame to be detected;
截取单元,用于根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;An interception unit, configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image;
输入单元,用于将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The input unit is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain the sub-recognition frame output by the recognition frame optimization model corresponding to each input image, and Correct the latter sub-recognition frame according to the previous sub-recognition frame, and determine the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is preset The sample image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
本申请实施例的第三方面提供了一种终端设备,所述终端设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A third aspect of the embodiments of the present application provides a terminal device. The terminal device includes a memory, a processor, and computer-readable instructions that are stored in the memory and run on the processor. The processor The following steps are implemented when the computer-readable instruction is executed:
获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;Acquiring a to-be-detected image containing a target, and performing identification frame analysis on the to-be-detected image to obtain at least one identification frame to be detected;
根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;Intercepting the image to be detected according to each identification frame to be detected to obtain a intercepted image;
将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one. The sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding The target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
本申请实施例的第四方面提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:The fourth aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When implementing the following steps:
获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;Acquiring a to-be-detected image containing a target, and performing identification frame analysis on the to-be-detected image to obtain at least one identification frame to be detected;
根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;Intercepting the image to be detected according to each identification frame to be detected to obtain a intercepted image;
将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one. The sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding The target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
有益效果Beneficial effect
本申请实施例,通过对待检测图像进行分析得到至少一个待检测识别框,根据每个待检测识别框对待检测图像进行截取得到截取图像,再将待检测图像和所有截取图像输入预先训练好的识别框优化模型得到目标识别框,本申请通过识别框优化模型对待检测图像和所有截取图像进行分析,结合图像特征生成目标识别框,使得生成的目标识别框更贴合于待检测图像中的目标,提升了确定出的目标识别框的准确性以及进行目标检测的准确性。In the embodiment of the present application, at least one recognition frame to be detected is obtained by analyzing the image to be detected, the image to be detected is intercepted according to each recognition frame to be detected to obtain a captured image, and then the image to be detected and all the captured images are input into the pre-trained recognition The frame optimization model obtains the target recognition frame. This application analyzes the image to be detected and all intercepted images through the recognition frame optimization model, and generates the target recognition frame by combining the image characteristics, so that the generated target recognition frame is more suitable for the target in the image to be detected. The accuracy of the determined target recognition frame and the accuracy of target detection are improved.
附图说明Description of the drawings
图1是本申请实施例一提供的基于目标检测的识别框确定方法的实现流程图;FIG. 1 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 1 of the present application;
图2是本申请实施例二提供的基于目标检测的识别框确定方法的实现流程图;FIG. 2 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 2 of the present application;
图3是本申请实施例三提供的基于目标检测的识别框确定方法的实现流程图;FIG. 3 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 3 of the present application;
图4是本申请实施例四提供的基于目标检测的识别框确定方法的实现流程图;FIG. 4 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 4 of the present application;
图5是本申请实施例五提供的基于目标检测的识别框确定方法的实现流程图;FIG. 5 is a flowchart of an implementation of a method for determining a recognition frame based on target detection according to Embodiment 5 of the present application;
图6是本申请实施例六提供的inception框架的网络结构图;FIG. 6 is a network structure diagram of the inception framework provided by Embodiment 6 of the present application;
图7是本申请实施例七提供的第一inception结构的结构图;FIG. 7 is a structural diagram of the first inception structure provided by Embodiment 7 of the present application;
图8是本申请实施例八提供的第二inception结构的结构图;FIG. 8 is a structural diagram of a second inception structure provided by Embodiment 8 of the present application;
图9是本申请实施例九提供的第三inception结构的结构图;FIG. 9 is a structural diagram of a third inception structure provided by Embodiment 9 of the present application;
图10是本申请实施例十提供的基于目标检测的识别框确定装置的结构框图;FIG. 10 is a structural block diagram of an apparatus for determining a recognition frame based on target detection according to Embodiment 10 of the present application;
图11是本申请实施例十一提供的终端设备的示意图。FIG. 11 is a schematic diagram of a terminal device provided in Embodiment 11 of the present application.
本发明的实施方式Embodiments of the invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, those skilled in the art should be clear that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solution described in the present application, specific embodiments are used for description below.
图1示出了本申请实施例提供的基于目标检测的识别框确定方法的实现流程,详述如下:Fig. 1 shows an implementation process of a method for determining a recognition frame based on target detection provided by an embodiment of the present application, which is detailed as follows:
在S101中,获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框。In S101, an image to be detected containing a target is acquired, and an identification frame analysis is performed on the image to be detected to obtain at least one identification frame to be detected.
目标检测是计算机视觉领域的核心技术之一,目的是检测出图像中所有的目标,并确定每个目标所在的位置。针对目标检测过程中确定出的识别框无法较好贴合目标的情况,在本申请实施例中,首先获取包含目标的待检测图像,并对待检测图像进行识别框分析得到至少一个待检测识别框。识别框分析可基于开源的目标检测模型实现,如区域卷积神经网络(Region- Convolutional Neural Network,R-CNN)模型或单网络目标检测框架(Single Shot MultiBox Detector,SSD)模型等,在基于目标检测模型对待检测图像进行识别框分析时,首先会根据滑动窗口算法或选择性搜索法等方式将待检测图像分隔为至少两个识别框,再由目标检测模型对分隔出的每个识别框进行计算,从而得到每个识别框的置信度,该置信度指示目标位于该识别框内的概率,置信度越高,则目标位于该识别框内的概率越高,计算出的置信度的数值取决于实际应用场景中目标检测模型的架构以及权重,在本申请实施例中不做赘述。在得到每个识别框的置信度后,将其中高于置信度阈值的置信度所对应的识别框确定为待检测识别框,一方面减少后续计算量,另一方面由于待检测识别框与目标的关联程度较高,故可提升后续确定目标识别框的准确性,置信度阈值可根据确定目标识别框的准确性要求进行确定,准确性要求越高,则将置信度阈值设置得越大,比如可设置为60%。值得一提的是,本申请实施例对待检测的目标的具体种类并不做限定,比如目标可为人脸、车辆或树木等,但选定目标的种类后,本申请实施例的所有目标均指已选定种类的目标,并且本申请实施例中的待检测识别框以及其他的识别框优选为矩形框。Target detection is one of the core technologies in the field of computer vision. The purpose is to detect all targets in the image and determine the location of each target. In view of the situation that the recognition frame determined in the target detection process cannot fit the target well, in the embodiment of the present application, the image to be detected containing the target is first obtained, and the recognition frame analysis of the image to be detected is performed to obtain at least one recognition frame to be detected . Recognition frame analysis can be implemented based on open source target detection models, such as regional convolutional neural networks (Region- Convolutional Neural Network, R-CNN) model or Single Shot MultiBox Detector (SSD) model, etc., based on the target detection model, when the image to be detected is analyzed based on the recognition frame, the sliding window algorithm or selective search is first performed The image to be detected is divided into at least two recognition frames by the method, etc., and the target detection model calculates each of the separated recognition frames to obtain the confidence level of each recognition frame. The confidence level indicates that the target is located in the recognition frame. The higher the confidence, the higher the probability that the target is located in the recognition frame. The calculated confidence value depends on the architecture and weight of the target detection model in the actual application scenario, which is not done in the embodiment of this application. Repeat. After the confidence of each recognition frame is obtained, the recognition frame corresponding to the confidence higher than the confidence threshold is determined as the recognition frame to be detected. On the one hand, the amount of subsequent calculations is reduced, and on the other hand, the recognition frame to be detected and the target The correlation degree of is higher, so the accuracy of the subsequent determination of the target recognition frame can be improved. The confidence threshold can be determined according to the accuracy requirements for determining the target recognition frame. The higher the accuracy requirement, the greater the confidence threshold. For example, it can be set to 60%. It is worth mentioning that the specific types of targets to be detected in the embodiments of this application are not limited. For example, the targets can be human faces, vehicles, or trees, etc., but after the target type is selected, all targets in the embodiments of this application refer to The target type has been selected, and the recognition frame to be detected and other recognition frames in the embodiment of the present application are preferably rectangular frames.
可选地,将待检测图像归一化至预设尺寸,并对归一化后的待检测图像进行零均值化。在对待检测图像进行识别框分析之前,为了提升分析效果,预先将待检测图像的尺寸归一化至预设尺寸,预设尺寸可自由设置,如设置为299(像素长)×299(像素宽)。在此基础上,对归一化后的待检测图像进行零均值化,便于提升进行识别框分析的效果,具体计算出待检测图像中所有像素点的原数值的平均值,然后将待检测图像中每个像素点的数值更新为该像素点的原数值与上述平均值的差值(即原数值减去平均值的结果),在待检测图像中所有像素点的数值都更新完毕后,便可对待检测图像进行识别框分析。通过上述方法提升了待检测图像的统一性,同时提升了后续的分析效果。Optionally, the image to be detected is normalized to a preset size, and the normalized image to be detected is zero-averaged. Before analyzing the recognition frame of the image to be detected, in order to improve the analysis effect, the size of the image to be detected is normalized to a preset size in advance. The preset size can be set freely, such as 299 (pixel length) × 299 (pixel width) ). On this basis, the normalized image to be detected is zero-averaged, which is convenient to improve the effect of the recognition frame analysis. Specifically, the average value of the original values of all pixels in the image to be detected is calculated, and then the image to be detected The value of each pixel in the image is updated to the difference between the original value of the pixel and the above average value (that is, the result of subtracting the average value from the original value). After the values of all pixels in the image to be detected are updated, Recognition frame analysis can be performed on the image to be detected. Through the above method, the unity of the image to be detected is improved, and the subsequent analysis effect is improved.
在S102中,根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像。In S102, the image to be detected is intercepted according to each identification frame to be detected to obtain a intercepted image.
得到的每一个待检测识别框都对应一个坐标集,坐标集内的每一个坐标都是待检测识别框的一个角位于待检测图像中的坐标,故在得到待检测识别框后,根据每个待检测识别框对待检测图像进行截取得到截取图像,便于后续确定出效果最优的识别框。Each obtained recognition frame to be detected corresponds to a coordinate set. Each coordinate in the coordinate set is the coordinate of a corner of the recognition frame to be detected in the image to be detected. Therefore, after obtaining the recognition frame to be detected, according to each The to-be-detected recognition frame is intercepted from the to-be-detected image to obtain the intercepted image, which facilitates the subsequent determination of the best recognition frame.
在S103中,将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。In S103, the to-be-detected image and all the intercepted images are sequentially input into a pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and according to The former sub-recognition frame corrects the latter sub-recognition frame, and the last sub-recognition frame after the correction is determined as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample The image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
相较于传统方式中利用交并比算法来确定目标识别框,在本申请实施例中,结合待检测图像中的图像特征来进行目标识别框的修正及确定。具体地,将待检测图像和所有截取图像依次输入预先训练好的识别框优化模型,对于每个输入的图像来说,获取识别框优化模型输出的与每个图像对应的子识别框,由于单个子识别框的准确度可能较低,无法较好地贴合目标,故根据输入的上一个图像的子识别框对输入的下一个图像的子识别框进行修正(根据图像输入的时间从前到后的顺序),直到待检测图像和所有截取图像都输入至识别框优化模型,将修正完成的最后一个子识别框确定为目标识别框,该目标识别框即为目标所在的区域。值得一提的是,识别框优化模型是通过预设的样本图像及对应的人工标注框进行训练得到的,从而能够根据图像的具体特征来得到子识别框,具体训练过程在后文进行阐述。Compared with the traditional method using the cross-and-comparison algorithm to determine the target recognition frame, in the embodiment of the present application, the target recognition frame is corrected and determined in combination with the image characteristics of the image to be detected. Specifically, the image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model. For each input image, the sub-recognition frame corresponding to each image output by the recognition frame optimization model is obtained. The accuracy of the sub-recognition frame may be low and it cannot fit the target well. Therefore, the sub-recognition frame of the input next image is corrected according to the input sub-recognition frame of the previous image (according to the image input time from front to back) Order) until the image to be detected and all intercepted images are input to the recognition frame optimization model, and the last sub-recognition frame after the correction is determined as the target recognition frame, which is the area where the target is located. It is worth mentioning that the recognition frame optimization model is trained through preset sample images and corresponding manually labeled frames, so that sub-recognition frames can be obtained according to the specific characteristics of the image. The specific training process will be explained later.
另外,本申请实施例对子识别框的修正方式并不做限定,比如可将前一个子识别框的坐标集与当前的子识别框的坐标集进行平均值运算(对每个角点单独运算),并将运算得到的坐标集更新为当前的子识别框的坐标集,从而实现对当前的子识别框的修正。值得一提的是,每一个子识别框的坐标集都是将子识别框置于待检测图像中得到的,即子识别框的坐标集是相对于待检测图像而言的。举例来说,待检测图像集内包括且仅包括待检测图像PictureA,截取图像PictureB和PictureC,对PictureA进行神经网络运算后得到的坐标集为[左上角点(100,100),左下角点(100,50),右下角点(200,50),右上角点(200,100)],对PictureB进行神经网络运算后得到的子识别框的坐标集为[左上角点(90,90),左下角点(90,40),右下角点(190,40),右上角点(190,90)],对PictureC进行神经网络运算后得到的子识别框的坐标集为[左上角点(85,90),左下角点(85,50),右下角点(180,50),右上角点(180,90)],则首先基于PictureA中的子识别框对PictureB中的子识别框进行修正,得到修正后的PictureB的子识别框的坐标集为[左上角点(95,95),左下角点(95,45),右下角点(195,45),右上角点(195,95)],再基于修正后的PictureB的子识别框对PictureC的子识别框进行修正,得到修正后的PictureC的子识别框的坐标集为[左上角点(90,92.5),左下角点(90,47.5),右下角点(187.5,47.5),右上角点(187.5,92.5)],该修正后的PictureC的子识别框即为目标识别框。当然,以上仅为对子识别框进行修正的一种示例,并不构成对本申请实施例的限定。In addition, the embodiment of the present application does not limit the modification method of the sub-recognition frame. For example, the coordinate set of the previous sub-recognition frame and the coordinate set of the current sub-recognition frame can be averaged (calculated separately for each corner). ), and update the calculated coordinate set to the coordinate set of the current sub-recognition frame, so as to realize the correction of the current sub-recognition frame. It is worth mentioning that the coordinate set of each sub-recognition frame is obtained by placing the sub-recognition frame in the image to be detected, that is, the coordinate set of the sub-recognition frame is relative to the image to be detected. For example, the image set to be detected includes and only includes the image to be detected PictureA, the images PictureB and PictureC are intercepted, and the coordinate set obtained after the neural network operation on PictureA is [top left point (100, 100), bottom left point ( 100, 50), the lower right point (200, 50), the upper right point (200, 100)], the coordinate set of the sub-recognition frame obtained after the neural network operation of PictureB is [upper left point (90, 90), The lower left corner point (90, 40), the lower right corner point (190, 40), the upper right corner point (190, 90)], the coordinate set of the sub-recognition frame obtained after the neural network operation of PictureC is [upper left corner point (85 , 90), the lower left corner point (85, 50), the lower right corner point (180, 50), the upper right corner point (180, 90)], then the sub-recognition frame in PictureB is first corrected based on the sub-recognition frame in PictureA , The coordinate set of the sub-recognition frame of the revised PictureB is [top left point (95, 95), bottom left point (95, 45), bottom right point (195, 45), top right point (195, 95) ], and then modify the sub-recognition frame of PictureC based on the sub-recognition frame of PictureB after the correction, and the coordinate set of the sub-recognition frame of the corrected PictureC is [upper left point (90, 92.5), lower left point (90, 47.5), the lower right corner point (187.5, 47.5), the upper right corner point (187.5, 92.5)], the revised PictureC sub-identification box is the target identification box. Of course, the above is only an example of modifying the sub-identification frame, and does not constitute a limitation to the embodiment of the present application.
通过图1所示实施例可知,在本申请实施例中,通过对待检测图像进行识别框分析得到至少一个待检测识别框,根据每个待检测识别框对待检测图像进行截取得到截取图像,然后将待检测图像和所有截取图像依次输入预设的识别框优化模型得到目标识别框,本申请实施例结合待检测图像的图像特征确定目标识别框,使得目标识别框能够较好地贴合待检测图像中的目标,提升了确定出的目标识别框的准确性。It can be seen from the embodiment shown in FIG. 1 that, in the embodiment of the present application, at least one identification frame to be detected is obtained by analyzing the identification frame of the image to be detected, and the image to be detected is intercepted according to each identification frame to be detected to obtain the intercepted image, and then The image to be detected and all captured images are sequentially input into the preset recognition frame optimization model to obtain the target recognition frame. The embodiment of the present application determines the target recognition frame based on the image characteristics of the image to be detected, so that the target recognition frame can better fit the image to be detected The target in, improves the accuracy of the determined target recognition frame.
图2所示,是在本申请实施例一的基础上,对将待检测图像和所有截取图像依次输入预先训练好的识别框优化模型之前的过程进行扩展后得到的一种方法。本申请实施例提供了基于目标检测的识别框确定方法的实现流程图,如图2所示,该识别框确定方法可以包括以下步骤:FIG. 2 shows a method based on the first embodiment of the present application, and is a method obtained by expanding the process before sequentially inputting the image to be detected and all the intercepted images into the pre-trained recognition frame optimization model. The embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 2, the method for determining a recognition frame may include the following steps:
在S201中,获取至少所述两张样本图像以及对应的所述人工标注框,其中,所述样本图像为包含所述目标的图像,所述人工标注框为人工标注的所述样本图像中所述目标所在的区域。In S201, acquire at least the two sample images and the corresponding manual annotation frame, wherein the sample image is an image containing the target, and the manual annotation frame is an image in the manually annotated sample image. State the area where the target is located.
在本申请实施例,针对于识别框优化模型的训练过程,首先获取至少两张样本图像以及与每张样本图像对应的人工标注框,该样本图像是包含本申请实施例中待检测的目标的图像,人工标注框是人工标注的样本图像中目标所在的区域。样本图像可由用户自由选择,也可直接从开源的图像库中调取,为了提升对模型的训练效果,样本图像的数量应较大,比如本申请实施例中获取的样本图像的数量可为1000张。In the embodiment of this application, for the training process of the optimization model of the recognition frame, first obtain at least two sample images and a manually labeled frame corresponding to each sample image. The sample image contains the target to be detected in the embodiment of this application. For images, the manual labeling box is the area where the target is located in the manually labelled sample image. The sample images can be freely selected by the user or can be directly retrieved from the open source image library. In order to improve the training effect of the model, the number of sample images should be large. For example, the number of sample images obtained in the embodiment of this application can be 1000 Zhang.
在S202中,对每张所述样本图像进行识别框分析得到至少一个样本识别框,并根据每个所述样本识别框对所述样本图像进行截取得到样本截取图像。In S202, the identification frame analysis is performed on each of the sample images to obtain at least one sample identification frame, and the sample image is intercepted according to each sample identification frame to obtain a sample intercepted image.
对于得到的每张样本图像,对其进行识别框分析,本步骤中进行识别框分析的方式与步骤S101中一致。进行识别框分析后,每张样本图像对应至少一个样本识别框,为了对模型进行训练,故根据得到的每个样本识别框对对应的样本图像进行截取得到样本截取图像。For each obtained sample image, the recognition frame analysis is performed on it, and the method of performing recognition frame analysis in this step is the same as that in step S101. After the identification frame analysis, each sample image corresponds to at least one sample identification frame. In order to train the model, the corresponding sample image is intercepted according to each sample identification frame obtained to obtain the sample intercepted image.
在S203中,基于所述样本图像和对应的所有所述样本截取图像构建样本图像集,将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,确定所述调整后的所述基础模型为所述识别框优化模型;其中,构建的每一个所述样本图像集对应一个所述样本图像。In S203, a sample image set is constructed based on the sample image and all the corresponding sample intercepted images, and at least two of the sample image sets constructed are sequentially input to a preset basic model, and according to the input sample The manual annotation frame corresponding to the image set adjusts the weight of the basic model until the identification frame output by the adjusted basic model matches the manual annotation frame, and the adjusted basic model is determined The model is the recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
基于每张样本图像和与该样本图像对应的所有样本截取图像构建样本图像集,构建完成后,样本图像集的数量与样本图像的数量相等,即每一个样本图像集都对应一个样本图像,该样本图像集仅是指将样本图像和所有样本截取图像归入一个特定的集合,并不是特指某一种具体的存储形式。然后,将构建的至少两个样本图像集依次作为输入参数,输入至预设的基础模型。在本申请实施例中,基础模型可基于inception框架实现,inception框架的网络结构如图6所示。在图6中,“conv”指卷积核,卷积核为滤波矩阵,用于对图像中的不同窗口进行卷积运算;“patch size”指卷积核的感受野(相当于尺寸),若某个卷积核的感受野为3×3,则代表该卷积核的长为3个元素,宽为3个元素;“stride”指步长,在执行卷积操作时,将卷积核分别在三个通道(红色通道、绿色通道以及蓝色通道)上的图像上进行滑动,并将三个通道上图像像素点的原数值与卷积核的加权求和的结果作为卷积核的输出参数,步长即为卷积核每次滑动的步数(途径的像素点个数);“input size”是指作为该层的输入参数的图像的尺寸,“input size”中的最后一个参数是指图像的深度,比如“input size”为299×299×3,则限定作为该层的输入参数的图像的深度为3。除此之外,“conv padded”是指包含边界填充功能的卷积核;“pool”是指池化层,用于减小该层的输入参数的数据量,防止过拟合,同时保持图像的深度不变;“linear”是指线性层,其输入参数是计算出的各个坐标集的未归一化的概率;“softmax”是指分类输出层,其应用神经网络中的softmax函数,对计算出的各个坐标集的概率进行归一化,完成分类(在实际进行训练时,可将经“softmax”层输出的概率最大的坐标集确定为基础模型本次输出的坐标集)。A sample image set is constructed based on each sample image and all sample intercepted images corresponding to the sample image. After the construction is completed, the number of sample image sets is equal to the number of sample images, that is, each sample image set corresponds to a sample image. The sample image set only refers to the sample images and all sample intercepted images into a specific set, and does not specifically refer to a specific storage form. Then, the constructed at least two sample image sets are successively used as input parameters and input to the preset basic model. In the embodiment of the present application, the basic model can be implemented based on the inception framework, and the network structure of the inception framework is shown in FIG. 6. In Figure 6, "conv" refers to the convolution kernel, which is a filter matrix used to perform convolution operations on different windows in the image; "patch size" refers to the receptive field (equivalent to size) of the convolution kernel, If the receptive field of a certain convolution kernel is 3×3, it means that the length of the convolution kernel is 3 elements and the width is 3 elements; “stride” refers to the step length, when performing convolution operation, convolution The kernel slides on the images on the three channels (red channel, green channel and blue channel), and uses the weighted sum of the original values of the image pixels on the three channels and the convolution kernel as the convolution kernel The step length is the number of steps per slide of the convolution kernel (the number of pixels in the path); "input "size" refers to the size of the image as the input parameter of this layer, the last parameter in "input size" refers to the depth of the image, for example, "input size" is 299×299×3, it is limited as the input parameter of this layer The depth of the image is 3. In addition, "conv "Padded" refers to the convolution kernel that contains the boundary filling function; "pool" refers to the pooling layer, which is used to reduce the data volume of the input parameters of this layer, prevent overfitting, and keep the depth of the image unchanged; "linear "Refers to the linear layer, whose input parameter is the calculated unnormalized probability of each coordinate set; "softmax" refers to the classification output layer, which applies the softmax function in the neural network to calculate the The probability is normalized to complete the classification (in actual training, the coordinate set with the highest probability output by the "softmax" layer can be determined as the coordinate set output by the basic model this time).
此外,在图6中,“figure 5”是指第一inception结构,第一inception结构对卷积核的感受野不进行限制,在整个基础模型中,限定第一inception结构的数量为3个。图7为第一inception结构的结构图,其中“Base”为该第一inception结构的输入层,“Filter Concat”为该第一inception结构的输出层。In addition, in FIG. 6, “figure 5” refers to the first inception structure. The first inception structure does not limit the receptive field of the convolution kernel. In the entire basic model, the number of first inception structures is limited to three. Fig. 7 is a structural diagram of the first inception structure, where "Base" is the input layer of the first inception structure, and "Filter Concat" is the output layer of the first inception structure.
图6中的“figure 6”是指第二inception结构,第二inception结构的具体结构图如图8所示。第二inception结构在第一inception结构的基础上,对5*5的卷积核进行了拆分,具体拆分为2个3*3卷积核,以减小计算量,提升训练效率。在基础模型中,限定第二inception结构的数量为5个。"Figure 6" in FIG. 6 refers to the second inception structure, and the specific structure diagram of the second inception structure is shown in FIG. 8. The second inception structure splits the 5*5 convolution kernel on the basis of the first inception structure, and specifically splits it into two 3*3 convolution kernels to reduce the amount of calculation and improve training efficiency. In the basic model, the number of second inception structures is limited to 5.
图6中的“figure 7”是指第三inception结构,第三inception结构的具体结构图如图9所示。第三inception结构在第二inception结构的基础上,对n*n的卷积核进行了拆分,具体拆分为一个1*n的卷积核和一个n*1的卷积核,进一步减小计算量,其中,n为大于1的整数。在基础模型中,限定第三inception结构的数量为2个。"Figure 7" in FIG. 6 refers to the third inception structure, and the specific structure diagram of the third inception structure is shown in FIG. 9. On the basis of the second inception structure, the third inception structure splits the n*n convolution kernel, which is specifically split into a 1*n convolution kernel and an n*1 convolution kernel, further reducing Small amount of calculation, where n is an integer greater than 1. In the basic model, the number of third inception structures is limited to two.
在预先构建基础模型时,可对基础模型内的权重进行初始化(可在预设范围内随机设置数值),权重即为基础模型内各个层级(包括各个卷积核)的具体数值。在将样本图像集作为输入参数输入至基础模型后,基础模型会进行计算,并输出识别框,该识别框即为预测的目标位于样本图像中的区域。为了实现对基础模型的训练,使得基础模型的计算结果更为精准,在每将一个样本图像集输入至基础模型后,根据该样本图像集对应的人工标注框对基础模型进行权重调整,直到调整后的基础模型输出的识别框与人工标注框相匹配,具体的权值调整方式在后文进行详细阐述。When constructing the basic model in advance, the weights in the basic model can be initialized (values can be set randomly within a preset range), and the weights are the specific values of each level (including each convolution kernel) in the basic model. After the sample image set is input to the basic model as input parameters, the basic model will perform calculations and output a recognition frame, which is the area where the predicted target is located in the sample image. In order to realize the training of the basic model and make the calculation results of the basic model more accurate, after each sample image set is input to the basic model, the weight of the basic model is adjusted according to the manual labeling box corresponding to the sample image set until the adjustment The recognition frame output by the latter basic model matches the manual labeling frame, and the specific weight adjustment method is described in detail later.
在对基础模型训练完成,即权重调整完成后,将基础模型作为识别框优化模型,并将将待检测图像及所有截取图像作为输入参数输入至该识别框优化模型中,由于识别框优化模型已对待检测的目标有较好的识别效果,故直接将识别框优化模型输出的识别框确定为目标识别框。After the basic model training is completed, that is, the weight adjustment is completed, the basic model is used as the recognition frame optimization model, and the image to be detected and all intercepted images are input as input parameters into the recognition frame optimization model. Because the recognition frame optimization model has been The target to be detected has a good recognition effect, so the recognition frame output by the recognition frame optimization model is directly determined as the target recognition frame.
通过图2所示实施例可知,在本申请实施例中根据人工标注过的样本图像训练预设的基础模型,将训练完成的基础模型作为识别框优化模型,基于人工监督的方式训练模型,提升了识别框优化模型与识别框分析的方式的贴合度,使得识别框优化模型对待检测图像及截取图像有较好的识别效果,进一步提升了确定出的目标识别框的准确性。It can be seen from the embodiment shown in FIG. 2 that, in the embodiment of the present application, a preset basic model is trained based on manually annotated sample images, the trained basic model is used as the recognition frame optimization model, and the model is trained based on manual supervision to improve The fit of the recognition frame optimization model and the analysis method of the recognition frame is improved, so that the recognition frame optimization model has a better recognition effect on the image to be detected and the captured image, and further improves the accuracy of the determined target recognition frame.
图3所示,是在本申请实施例二的基础上,对将构建的至少两个样本图像集依次输入至预设的基础模型,并根据输入的样本图像集对应的人工标注框对基础模型进行权重调整,直到调整后的基础模型输出的识别框与人工标注框相匹配的过程进行细化后得到的一种方法。本申请实施例提供了基于目标检测的识别框确定方法的实现流程图,如图3所示,该识别框确定方法可以包括以下步骤:As shown in Fig. 3, on the basis of the second embodiment of the present application, the constructed at least two sample image sets are sequentially input into the preset basic model, and the basic model is adjusted according to the manual annotation frame corresponding to the input sample image set. The weight adjustment is performed until the recognition frame output by the adjusted basic model matches the manual label frame. A method obtained after refinement. The embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 3, the method for determining a recognition frame may include the following steps:
在S301中,将所述基础模型输出的与每个输入的所述样本图像集对应的识别框确定为基础识别框,并计算所述基础识别框与对应的所述人工标注框之间的差异参数。In S301, the recognition frame corresponding to each input sample image set output by the basic model is determined as a basic recognition frame, and the difference between the basic recognition frame and the corresponding manual labeling frame is calculated parameter.
在将每个样本图像集输入基础模型后,获取基础模型输出的与该样本图像集对应的识别框,并将该识别框确定为基础识别框,然后,计算基础识别框与该样本图像集对应的人工标注框之间的差异参数。本申请实施例对差异参数的计算方式并不做限定,比如可分别计算基础识别框与人工标注框在四个角点上坐标的差值,并将四个差值的平均值作为差异参数,也可将基础识别框与人工标注框在四个角点上坐标的差值导入预设的损失函数(如二次代价函数),从而得到差异参数,为了便于说明,后文以包含损失函数的方式举例。After each sample image set is input to the basic model, the recognition frame output by the basic model corresponding to the sample image set is obtained, and the recognition frame is determined as the basic recognition frame, and then the basic recognition frame is calculated corresponding to the sample image set The parameters of the difference between the manually labeled boxes. The embodiment of the application does not limit the calculation method of the difference parameter. For example, the difference between the coordinates of the basic identification frame and the manually marked frame at the four corner points can be calculated separately, and the average value of the four differences can be used as the difference parameter. It is also possible to import the difference between the coordinates of the basic identification box and the manual labeling box at the four corner points into a preset loss function (such as a quadratic cost function) to obtain the difference parameters. For ease of explanation, the following text contains the loss function Examples of ways.
在S302中,若所述差异参数大于或等于预设的期望参数,则基于所述差异参数对所述基础模型进行权重调整,重复获取所述基础模型输出的下一个所述基础识别框,并基于更新后的所述差异参数对所述基础模型进行权重调整,直到所述差异参数小于所述期望参数为止。In S302, if the difference parameter is greater than or equal to the preset expected parameter, weight adjustment is performed on the basic model based on the difference parameter, and the next basic recognition frame output by the basic model is repeatedly obtained, and The weight adjustment is performed on the basic model based on the updated difference parameter until the difference parameter is smaller than the expected parameter.
为了衡量基础模型的训练情况,在本申请实施例中预先设置期望参数,若得到的差异参数大于或等于期望参数,则基于该差异参数对基础模型进行权重调整,权重调整操作可基于神经网络中开源的梯度下降算法或反向传播算法等算法实现。举例来说,由于基础识别框是经过基础模型中的多个卷积核计算得到的,而每个卷积核中都包括有权值,故在差异参数大于或等于期望参数时,可计算差异参数与卷积核权值的偏导数,构成差异参数相对于权值的梯度向量,再基于该梯度向量调整卷积核权值的数值,从而使得差异参数尽量的小。In order to measure the training situation of the basic model, the expected parameters are preset in the embodiments of this application. If the obtained difference parameter is greater than or equal to the expected parameter, then the basic model is weighted based on the difference parameter. The weight adjustment operation can be based on the neural network Open source gradient descent algorithm or back propagation algorithm and other algorithms are implemented. For example, since the basic recognition frame is calculated by multiple convolution kernels in the basic model, and each convolution kernel includes weights, the difference can be calculated when the difference parameter is greater than or equal to the expected parameter The partial derivative of the parameter and the weight of the convolution kernel constitutes a gradient vector of the difference parameter with respect to the weight, and then the value of the weight of the convolution kernel is adjusted based on the gradient vector, so that the difference parameter is as small as possible.
在完成基于一个差异参数的权重调整后,将下一个样本图像集输入至基础模型,获取基础模型输出的基础识别框,并更新差异参数的数值,若更新后的差异参数大于或等于期望参数,则进行权重调整,并重复上述操作,直到差异参数小于期望参数为止。当然,在差异参数小于期望参数时,即可认定基础模型输出的基础识别框与人工标注框相匹配,当然可继续训练基础模型,直到所有的样本图像集都输入完毕为止。After completing the weight adjustment based on a difference parameter, input the next sample image set to the basic model, obtain the basic recognition frame output by the basic model, and update the value of the difference parameter. If the updated difference parameter is greater than or equal to the expected parameter, Then adjust the weight and repeat the above operation until the difference parameter is less than the expected parameter. Of course, when the difference parameter is less than the expected parameter, it can be determined that the basic recognition frame output by the basic model matches the manual labeling frame. Of course, the basic model can continue to be trained until all the sample image sets are input.
通过图3所示实施例可知,在本申请实施例中,通过在将样本图像集输入至基础模型后,计算差异参数,并当差异参数大于或等于预设的期望参数,基于差异参数对基础模型进行权重调整,重复上述操作直到差异参数小于期望参数为止,提升了识别框优化模型的分析精度。It can be seen from the embodiment shown in FIG. 3 that, in the embodiment of the present application, after the sample image set is input to the basic model, the difference parameter is calculated, and when the difference parameter is greater than or equal to the preset expected parameter, the difference parameter is compared to the basic model. The model performs weight adjustment and repeats the above operation until the difference parameter is less than the expected parameter, which improves the analysis accuracy of the identification frame optimization model.
图4所示,是在本申请实施例二的基础上,并在对待检测图像进行识别框分析后得到至少两个目标,且每个目标对应至少一个待检测识别框的基础上,对将待检测图像和所有截取图像依次输入预先训练好的识别框优化模型,获取识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个子识别框对后一个子识别框进行修正,将修正完成后的最后一个子识别框确定为目标识别框的过程进行细化后得到的一种方法。本申请实施例提供了基于目标检测的识别框确定方法的实现流程图,如图4所示,该识别框确定方法可以包括以下步骤:As shown in Figure 4, on the basis of the second embodiment of the present application, at least two targets are obtained after the recognition frame analysis of the image to be detected, and each target corresponds to at least one recognition frame to be detected. The detection image and all intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub recognition frame corresponding to each input image from the recognition frame optimization model is obtained, and the next sub recognition frame is performed according to the previous sub recognition frame Correction is a method obtained by refining the process of determining the last sub-recognition frame after correction as the target recognition frame. The embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 4, the method for determining a recognition frame may include the following steps:
在S401中,将所述待检测图像按照每个所述目标对应的所有待检测识别框进行切割得到切割图像,并对所述切割图像进行尺寸缩放,其中,每个所述切割图像对应一个所述目标。In S401, the to-be-detected image is cut according to all the to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and the size of the cut image is scaled, wherein each cut image corresponds to a cut image. Mentioned goals.
在实际应用场景中,待检测图像中可能含有至少两个目标,比如目标为车辆,待检测图像为路口的抓拍图像,则待检测图像中可能含有至少两辆车。针对上述情况,在本申请实施例中针对每个目标单独确定目标识别框,具体地,由于识别出的每个目标都对应至少一个待检测识别框,故计算出每个目标对应的所有待检测识别框的最大覆盖区域,并根据该最大覆盖区域对待检测图像进行切割得到切割图像,每个切割图像都对应识别出的一个目标。其中,在计算最大覆盖区域时,首先求出目标对应的所有待检测识别框的并集区域,再将该并集区域覆盖的横向坐标范围作为最大覆盖区域的横向坐标范围,将该并集区域覆盖的纵向坐标范围作为最大覆盖区域的纵向坐标范围,从而构建出最大覆盖区域,值得一提的是,最大覆盖区域为矩形,故基于并集区域得到最大覆盖区域,实则是将形状可能不规则的并集区域补全为矩形的最大覆盖区域。In actual application scenarios, the image to be detected may contain at least two targets. For example, if the target is a vehicle and the image to be detected is a captured image of an intersection, the image to be detected may contain at least two vehicles. In view of the above situation, in the embodiment of the present application, the target recognition frame is determined separately for each target. Specifically, since each target identified corresponds to at least one recognition frame to be detected, all the target recognition frames corresponding to each target are calculated. The maximum coverage area of the frame is identified, and the image to be detected is cut according to the maximum coverage area to obtain a cut image, and each cut image corresponds to an identified target. Among them, when calculating the maximum coverage area, first obtain the union area of all the identification frames to be detected corresponding to the target, and then take the horizontal coordinate range covered by the union area as the horizontal coordinate range of the maximum coverage area, and then the union area The vertical coordinate range of the coverage is used as the vertical coordinate range of the maximum coverage area to construct the maximum coverage area. It is worth mentioning that the maximum coverage area is rectangular, so the maximum coverage area is obtained based on the union area. In fact, the shape may be irregular The union area of is completed to the maximum coverage area of the rectangle.
在得到切割图像后,由于切割图像的尺寸可能不满足识别框优化模型的输入参数的标准尺寸(如标准尺寸为预设尺寸,具体为299×299,而切割图像的尺寸为199×199),故对得到的每个切割图像进行尺寸缩放,直到切割图像达到标准尺寸为止。值得一提的是,与切割图像不同的是,若上述的截取图像和样本截取图像不满足识别框优化模型的标准尺寸,则可对截取图像和样本截取图像中多余的尺寸所在的区域进行空白处理或灰度处理,防止后续确定识别框时尺寸混乱。After the cut image is obtained, since the size of the cut image may not meet the standard size of the input parameters of the recognition frame optimization model (for example, the standard size is the preset size, specifically 299×299, and the size of the cut image is 199×199), Therefore, the size of each cut image obtained is scaled until the cut image reaches the standard size. It is worth mentioning that, unlike the cut image, if the above-mentioned intercepted image and sample intercepted image do not meet the standard size of the recognition frame optimization model, the area where the excess size in the intercepted image and sample intercepted image is located can be blanked. Processing or gray-scale processing to prevent the subsequent size confusion when determining the recognition frame.
在S402中,将尺寸缩放后的所述切割图像和对应的所有所述截取图像依次输入至所述识别框优化模型,将所述识别框优化模型输出的识别框进行尺寸复原,将尺寸复原后的所述识别框确定为所述切割图像对应的所述目标识别框。In S402, the scaled cut image and all the corresponding intercepted images are sequentially input to the recognition frame optimization model, and the recognition frame output by the recognition frame optimization model is restored in size, and the size is restored The identification frame is determined as the target identification frame corresponding to the cut image.
在对切割图像进行尺寸缩放后,将尺寸缩放后的切割图像以及对应的所有截取图像(此处的截取图像是根据与切割图像对应的待检测识别框对切割图像进行截取得到的)依次输入至权重调整后的识别框优化模型,将识别框优化模型输出的识别框(此处的识别框是识别框优化模型输出的经过修正后的最后一个识别框)进行尺寸复原,尺寸复原的尺度与步骤S401中对切割图像进行尺寸缩放的尺度相逆,比如在对切割图像进行尺寸缩放时,将切割图像缩小至二分之一,则在本步骤中将识别框以识别框中心为原点,扩大至原先的二倍。将尺寸复原后的识别框确定为切割图像对应的目标识别框,并将目标识别框重新置入初始的待检测图像中,便于后续进行其他的目标检测操作。本申请实施例中最终得到的目标识别框的数量与识别出的目标的数量相等,即一个目标识别框对应一个目标。After the cut image is scaled, input the scaled cut image and all the corresponding cut images (here the cut image is obtained by cutting the cut image according to the recognition frame to be detected corresponding to the cut image). The weight-adjusted recognition frame optimization model will restore the size of the recognition frame output by the recognition frame optimization model (here the recognition frame is the last corrected recognition frame output by the recognition frame optimization model). The scale and steps of the size restoration In S401, the scale of the size of the cut image is reversed. For example, when the size of the cut image is reduced, the cut image is reduced to one-half. In this step, the recognition frame is expanded to the center of the recognition frame. Twice the original. The recognition frame after the size restoration is determined as the target recognition frame corresponding to the cut image, and the target recognition frame is re-placed in the initial image to be detected, so as to facilitate subsequent target detection operations. The number of target recognition frames finally obtained in the embodiment of the present application is equal to the number of recognized targets, that is, one target recognition frame corresponds to one target.
通过图4所示实施例可知,在本申请实施例中,将待检测图像按照每个识别出的目标进行切割得到切割图像,并对切割图像进行尺寸缩放,将尺寸缩放后的切割图像和对应的所有截取图像依次输入至识别框优化模型,将识别框优化模型输出的识别框进行尺寸复原得到目标识别框,本申请实施例针对每个识别出的目标单独确定目标识别框,提升了目标识别框确定的针对性和精确性,防止在将含有至少两个目标的待检测图像输入识别框优化模型后,计算效果差。It can be seen from the embodiment shown in FIG. 4 that in the embodiment of the present application, the image to be detected is cut according to each identified target to obtain a cut image, and the cut image is scaled in size, and the cut image after the scale is scaled and the corresponding All the intercepted images of, are sequentially input to the recognition frame optimization model, and the size of the recognition frame output by the recognition frame optimization model is restored to obtain the target recognition frame. The embodiment of the present application determines the target recognition frame separately for each recognized target, which improves target recognition The pertinence and accuracy of the frame determination prevents poor calculation results after inputting the image to be detected with at least two targets into the recognition frame optimization model.
图5所示,是在本申请实施例二的基础上,并在包括至少两个识别框优化模型以及至少预设的两个属性特征,且每个识别框优化模型由对应相同的属性特征的样本图像集训练得到的基础上,对将待检测图像和所有截取图像依次输入预先训练好的识别框优化模型的过程进行细化后得到的一种方法。本申请实施例提供了基于目标检测的识别框确定方法的实现流程图,如图5所示,该识别框确定方法可以包括以下步骤:As shown in FIG. 5, it is based on the second embodiment of the present application, and includes at least two recognition frame optimization models and at least two preset attribute features, and each recognition frame optimization model is composed of corresponding attribute features. Based on the training of the sample image set, the process of sequentially inputting the image to be detected and all the intercepted images into the pre-trained optimization model of the recognition frame is refined. The embodiment of the present application provides an implementation flowchart of a method for determining a recognition frame based on target detection. As shown in FIG. 5, the method for determining a recognition frame may include the following steps:
在S501中,将所述待检测图像对应的所述属性特征确定为目标特征。In S501, the attribute feature corresponding to the image to be detected is determined as a target feature.
在本申请实施例中,可预先设置至少两个属性特征,并根据每个属性特征对基础模型进行单独训练得到识别框优化模型,优选地,属性特征的种类数与最终得到的识别框优化模型的数量相等。在训练每个基础模型时,仅将对应相同的属性特征的至少两个样本图像集输入至基础模型。为了便于说明,假设属性特征包括男性和女性,则预先设置两个基础模型,将对应男性的至少两个样本图像集输入至其中一个基础模型进行权重调整,将最终权重调整完毕的该基础模型作为识别框优化模型,将对应女性的至少两个样本图像集输入至另一个基础模型进行权重调整,将最终权重调整完毕的该基础模型作为另一个识别框优化模型。在得到训练完成的识别框优化模型后,获取待检测图像对应的属性特征,并将该属性特征确定为目标特征,目标特征可由用户预先进行自定义,也可引入开源的分析组件,从而分析出目标特征。In the embodiment of this application, at least two attribute features can be set in advance, and the basic model is separately trained according to each attribute feature to obtain the recognition frame optimization model. Preferably, the number of attribute features and the final recognition frame optimization model Are equal in number. When training each basic model, only at least two sample image sets corresponding to the same attribute feature are input to the basic model. For ease of explanation, assuming that the attribute features include male and female, two basic models are set in advance, at least two sample image sets corresponding to males are input into one of the basic models for weight adjustment, and the final weight-adjusted basic model is used as The recognition frame optimization model inputs at least two sample image sets corresponding to women to another basic model for weight adjustment, and the final weight adjustment is performed on the basic model as another recognition frame optimization model. After obtaining the trained recognition frame optimization model, obtain the attribute feature corresponding to the image to be detected, and determine the attribute feature as the target feature. The target feature can be customized in advance by the user, or open source analysis components can be introduced to analyze Target characteristics.
在S502中,将所述待检测图像和所有所述截取图像依次输入至与所述目标特征对应的所述识别框优化模型。In S502, the image to be detected and all the captured images are sequentially input to the recognition frame optimization model corresponding to the target feature.
确定目标特征后,将待检测图像和所有所述截取图像输入至权重调整后的且与目标特征对应的识别框优化模型,由于该识别框优化模型对对应目标特征的图像有较好的分析效果,故提升了后续确定出的目标识别框的准确性。After determining the target feature, input the image to be detected and all the intercepted images into the weight-adjusted recognition frame optimization model corresponding to the target feature, because the recognition frame optimization model has a good analysis effect on the image corresponding to the target feature Therefore, the accuracy of the target recognition frame determined subsequently is improved.
通过图5所示实施例可知,在本申请实施例中,将待检测图像对应的属性特征确定为目标特征,并将待检测图像集输入至权重调整后的且与目标特征对应的识别框优化模型,通过针对性训练识别框优化模型,并将待检测图像集针对性地输入至对应的识别框优化模型,提升了目标检测的准确性。It can be seen from the embodiment shown in FIG. 5 that in the embodiment of the present application, the attribute feature corresponding to the image to be detected is determined as the target feature, and the image set to be detected is input into the weight-adjusted recognition frame optimization corresponding to the target feature The model, through targeted training of the recognition frame optimization model, and targeted input of the image set to be detected into the corresponding recognition frame optimization model, improves the accuracy of target detection.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
对应于上文实施例所述的基于目标检测的识别框确定方法,图10示出了本申请实施例提供的基于目标检测的识别框确定装置的结构框图,参照图10,该识别框确定装置包括:Corresponding to the method for determining a recognition frame based on target detection in the above embodiment, FIG. 10 shows a structural block diagram of a device for determining a recognition frame based on target detection provided by an embodiment of the present application. Referring to FIG. 10, the device for determining a recognition frame include:
分析单元101,用于获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;The analyzing unit 101 is configured to obtain a to-be-detected image containing a target, and perform an identification frame analysis on the to-be-detected image to obtain at least one to-be-detected identification frame;
截取单元102,用于根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;The interception unit 102 is configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image;
输入单元103,用于将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The input unit 103 is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain sub-recognition frames output by the recognition frame optimization model corresponding to each input image, The latter sub-recognition frame is corrected according to the previous sub-recognition frame, and the last sub-recognition frame after the correction is determined to be the target recognition frame, wherein the recognition frame optimization model is a preset The target recognition frame is used to indicate the area where the target is located in the image to be detected.
可选地,输入单元103还包括:Optionally, the input unit 103 further includes:
获取单元,用于获取至少所述两张样本图像以及对应的所述人工标注框,其中,所述样本图像为包含所述目标的图像,所述人工标注框为人工标注的所述样本图像中所述目标所在的区域;The acquiring unit is configured to acquire at least the two sample images and the corresponding manual labeling frame, wherein the sample image is an image containing the target, and the manual labeling frame is the sample image manually labeled The area where the target is located;
样本截取单元,用于对每张所述样本图像进行识别框分析得到至少一个样本识别框,并根据每个所述样本识别框对所述样本图像进行截取得到样本截取图像;A sample interception unit, configured to perform identification frame analysis on each sample image to obtain at least one sample identification frame, and intercept the sample image according to each sample identification frame to obtain a sample intercepted image;
构建单元,用于基于所述样本图像和对应的所有所述样本截取图像构建样本图像集,将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,确定所述调整后的所述基础模型为所述识别框优化模型;其中,构建的每一个所述样本图像集对应一个所述样本图像。The construction unit is configured to construct a sample image set based on the sample image and all the corresponding sample intercepted images, input at least two of the constructed sample image sets into a preset basic model in sequence, and according to the input The manual labeling frame corresponding to the sample image set adjusts the weight of the basis until the identification frame output by the adjusted basic model matches the manual labeling frame, and the adjusted basis is determined The model is the recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
可选地,构建单元包括:Optionally, the building unit includes:
计算单元,用于将所述基础模型输出的与每个输入的所述样本图像集对应的识别框确定为基础识别框,并计算所述基础识别框与对应的所述人工标注框之间的差异参数;The calculation unit is configured to determine the recognition frame corresponding to each input sample image set output by the basic model as a basic recognition frame, and calculate the difference between the basic recognition frame and the corresponding manual annotation frame Difference parameter
权重调整单元,用于若所述差异参数大于或等于预设的期望参数,则基于所述差异参数对所述基础模型进行权重调整,重复获取所述基础模型输出的下一个所述基础识别框,并基于更新后的所述差异参数对所述基础模型进行权重调整,直到所述差异参数小于所述期望参数为止。The weight adjustment unit is configured to, if the difference parameter is greater than or equal to a preset expected parameter, adjust the weight of the basic model based on the difference parameter, and repeatedly obtain the next basic recognition frame output by the basic model , And adjust the weight of the basic model based on the updated difference parameter until the difference parameter is smaller than the expected parameter.
可选地,若对待检测图像进行识别框分析后得到至少两个目标,且每个目标对应至少一个待检测识别框,则输入单元包括:Optionally, if at least two targets are obtained after the recognition frame analysis of the image to be detected, and each target corresponds to at least one recognition frame to be detected, the input unit includes:
切割单元,用于将所述待检测图像按照每个所述目标对应的所有待检测识别框进行切割得到切割图像,并对所述切割图像进行尺寸缩放,其中,每个所述切割图像对应一个所述目标;The cutting unit is used to cut the to-be-detected image according to all the to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and perform size scaling on the cut image, wherein each cut image corresponds to one The target
确定单元,用于将尺寸缩放后的所述切割图像和对应的所有所述截取图像依次输入至所述识别框优化模型,将所述识别框优化模型输出的识别框进行尺寸复原,将尺寸复原后的所述识别框确定为所述切割图像对应的所述目标识别框。The determining unit is configured to input the scaled cut image and all the corresponding intercepted images into the recognition frame optimization model in sequence, and restore the size of the recognition frame output by the recognition frame optimization model to restore the size The subsequent recognition frame is determined as the target recognition frame corresponding to the cut image.
可选地,包括至少两个识别框优化模型以及至少两个预设的属性特征,且每个识别框优化模型由对应相同的属性特征的样本图像集训练得到,则输入单元包括:Optionally, at least two recognition frame optimization models and at least two preset attribute features are included, and each recognition frame optimization model is obtained by training a sample image set corresponding to the same attribute feature, the input unit includes:
特征确定单元,用于将所述待检测图像对应的所述属性特征确定为目标特征;A feature determining unit, configured to determine the attribute feature corresponding to the image to be detected as a target feature;
针对输入单元,用于将所述待检测图像和所有所述截取图像依次输入至与所述目标特征对应的所述识别框优化模型。For the input unit, it is used to input the to-be-detected image and all the intercepted images into the recognition frame optimization model corresponding to the target feature in sequence.
可选地,分析单元101还包括:Optionally, the analysis unit 101 further includes:
归一单元,用于将所述待检测图像归一化至预设尺寸,并对归一化后的所述待检测图像进行零均值化。The normalization unit is used to normalize the to-be-detected image to a preset size, and perform zero-averaging on the normalized to-be-detected image.
因此,本申请实施例提供的基于目标检测的识别框确定装置通过将待检测图像和所有截取图像输入预先训练好的识别框优化模型,得到目标识别框,提升了确定出的目标识别框的准确性以及目标检测的准确性。Therefore, the device for determining a recognition frame based on target detection provided by the embodiment of the present application inputs the image to be detected and all intercepted images into a pre-trained recognition frame optimization model to obtain the target recognition frame, which improves the accuracy of the determined target recognition frame. And the accuracy of target detection.
图11是本申请实施例提供的终端设备的示意图。如图11所示,该实施例的终端设备11包括:处理器110、存储器111以及存储在所述存储器111中并可在所述处理器110上运行的计算机可读指令112,例如基于目标检测的识别框确定程序。所述处理器110执行所述计算机可读指令112时实现上述各个基于目标检测的识别框确定方法实施例中的步骤,例如图1所示的步骤S101至S103。或者,所述处理器110执行所述计算机可读指令112时实现上述各基于目标检测的识别框确定装置实施例中各单元的功能,例如图10所示单元101至103的功能。Fig. 11 is a schematic diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 11, the terminal device 11 of this embodiment includes: a processor 110, a memory 111, and computer-readable instructions 112 stored in the memory 111 and running on the processor 110, for example, based on target detection The identification box determines the program. When the processor 110 executes the computer-readable instruction 112, the steps in the above-mentioned embodiments of the method for determining a recognition frame based on target detection are implemented, such as steps S101 to S103 shown in FIG. 1. Alternatively, when the processor 110 executes the computer-readable instruction 112, the functions of the units in the foregoing embodiments of the device for determining a recognition frame based on target detection are implemented, for example, the functions of the units 101 to 103 shown in FIG.
示例性的,所述计算机可读指令112可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器111中,并由所述处理器110执行,以完成本申请。所述一个或多个单元可以是能够完成特定功能的一系列计算机可读指令的指令段,该指令段用于描述所述计算机可读指令112在所述终端设备11中的执行过程。例如,所述计算机可读指令112可以被分割成分析单元、截取单元以及输入单元,各单元具体功能如下:Exemplarily, the computer-readable instruction 112 may be divided into one or more units, and the one or more units are stored in the memory 111 and executed by the processor 110 to complete the application . The one or more units may be an instruction segment of a series of computer-readable instructions capable of completing specific functions, and the instruction segment is used to describe the execution process of the computer-readable instruction 112 in the terminal device 11. For example, the computer-readable instruction 112 may be divided into an analysis unit, an interception unit, and an input unit, and the specific functions of each unit are as follows:
分析单元,用于获取待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;An analysis unit, configured to obtain an image to be detected, and analyze the recognition frame of the image to be detected to obtain at least one recognition frame to be detected;
截取单元,用于根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;An interception unit, configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image;
输入单元,用于将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The input unit is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain the sub-recognition frame output by the recognition frame optimization model corresponding to each input image, and Correct the latter sub-recognition frame according to the previous sub-recognition frame, and determine the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is preset The sample image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
所述终端设备11可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器110、存储器111。本领域技术人员可以理解,图11仅仅是终端设备11的示例,并不构成对终端设备11的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。The terminal device 11 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 110 and a memory 111. Those skilled in the art can understand that FIG. 11 is only an example of the terminal device 11, and does not constitute a limitation on the terminal device 11. It may include more or less components than shown in the figure, or a combination of certain components, or different components. For example, the terminal device may also include input and output devices, network access devices, buses, etc.
所称处理器110可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 110 may be a central processing unit (Central Processing Unit, CPU), it can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器111可以是所述终端设备11的内部存储单元,例如终端设备11的硬盘或内存。所述存储器111也可以是所述终端设备11的外部存储设备,例如所述终端设备11上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card,FC)等。进一步地,所述存储器111还可以既包括所述终端设备11的内部存储单元也包括外部存储设备。所述存储器111用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器111还可用于暂时地存储已经输出或者将要输出的数据。The memory 111 may be an internal storage unit of the terminal device 11, such as a hard disk or a memory of the terminal device 11. The memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) equipped on the terminal device 11. Card, Flash Card (Flash Card, FC), etc. Further, the memory 111 may also include both an internal storage unit of the terminal device 11 and an external storage device. The memory 111 is used to store the computer-readable instructions and other programs and data required by the terminal device. The memory 111 can also be used to temporarily store data that has been output or will be output.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机非易失性可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (Read-Only Memory, ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种基于目标检测的识别框确定方法,其特征在于,包括:A method for determining a recognition frame based on target detection is characterized in that it includes:
    获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;Acquiring a to-be-detected image containing a target, and performing identification frame analysis on the to-be-detected image to obtain at least one identification frame to be detected;
    根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;Intercepting the image to be detected according to each identification frame to be detected to obtain a intercepted image;
    将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one. The sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding The target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  2. 如权利要求1所述的识别框确定方法,其特征在于,所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型之前,还包括:5. The method for determining a recognition frame according to claim 1, wherein before inputting the to-be-detected image and all the intercepted images sequentially into a pre-trained recognition frame optimization model, the method further comprises:
    获取至少两张所述样本图像以及对应的所述人工标注框,其中,所述样本图像为包含所述目标的图像,所述人工标注框为人工标注的所述样本图像中所述目标所在的区域;Acquire at least two of the sample images and the corresponding manual labeling frame, wherein the sample image is an image containing the target, and the manual labeling frame is the manually labelled sample image where the target is located area;
    对每张所述样本图像进行识别框分析得到至少一个样本识别框,并根据每个所述样本识别框对所述样本图像进行截取得到样本截取图像;Performing identification frame analysis on each of the sample images to obtain at least one sample identification frame, and intercepting the sample image according to each of the sample identification frames to obtain a sample intercepted image;
    基于所述样本图像和对应的所有所述样本截取图像构建样本图像集,将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,确定所述调整后的所述基础模型为所述识别框优化模型;其中,构建的每一个所述样本图像集对应一个所述样本图像。A sample image set is constructed based on the sample image and all the corresponding sample intercepted images, and at least two of the sample image sets constructed are sequentially input into a preset basic model, and according to the input sample image set corresponding to The manual labeling frame adjusts the weight of the basic model until the recognition frame output by the adjusted basic model matches the manual labeling frame, and it is determined that the adjusted basic model is the Recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
  3. 如权利要求2所述的识别框确定方法,其特征在于,所述将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,包括:The method for determining a recognition frame according to claim 2, wherein the constructed at least two sample image sets are sequentially input to a preset basic model, and all the sample image sets corresponding to the input The manual labeling frame adjusts the weight of the basic model until the identification frame output by the adjusted basic model matches the manual labeling frame, including:
    将所述基础模型输出的与每个输入的所述样本图像集对应的识别框确定为基础识别框,并计算所述基础识别框与对应的所述人工标注框之间的差异参数;Determining a recognition frame corresponding to each input sample image set output by the basic model as a basic recognition frame, and calculating a difference parameter between the basic recognition frame and the corresponding manual labeling frame;
    若所述差异参数大于或等于预设的期望参数,则基于所述差异参数对所述基础模型进行权重调整,重复获取所述基础模型输出的下一个所述基础识别框,并基于更新后的所述差异参数对所述基础模型进行权重调整,直到所述差异参数小于所述期望参数为止。If the difference parameter is greater than or equal to the preset expected parameter, the basic model is weighted based on the difference parameter, and the next basic recognition frame output by the basic model is repeatedly obtained, and based on the updated The difference parameter adjusts the weight of the basic model until the difference parameter is smaller than the expected parameter.
  4. 如权利要求2所述的识别框确定方法,其特征在于,若对所述待检测图像进行识别框分析后得到至少两个所述目标,且每个所述目标对应至少一个所述待检测识别框,则所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,包括:The method for determining the recognition frame of claim 2, wherein if at least two of the targets are obtained after the recognition frame analysis of the image to be detected, and each target corresponds to at least one of the recognition to be detected Box, the image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and Correcting the latter sub-recognition frame according to the previous sub-recognition frame, and determining the last sub-recognition frame after the correction as the target recognition frame includes:
    将所述待检测图像按照每个所述目标对应的所有待检测识别框进行切割得到切割图像,并对所述切割图像进行尺寸缩放,其中,每个所述切割图像对应一个所述目标;Cutting the to-be-detected image according to all to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and scaling the cut image, wherein each cut image corresponds to one target;
    将尺寸缩放后的所述切割图像和对应的所有所述截取图像依次输入至所述识别框优化模型,将所述识别框优化模型输出的识别框进行尺寸复原,将尺寸复原后的所述识别框确定为所述切割图像对应的所述目标识别框。The cut image after the size scaling and all the corresponding intercepted images are sequentially input to the recognition frame optimization model, the recognition frame output by the recognition frame optimization model is restored to the size, and the recognition after the size is restored The frame is determined as the target recognition frame corresponding to the cut image.
  5. 如权利要求2所述的识别框确定方法,其特征在于,若包括至少两个所述识别框优化模型以及至少两个预设的属性特征,且每个所述识别框优化模型由对应相同的所述属性特征的所述样本图像集训练得到,则所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,包括:The method for determining a recognition frame according to claim 2, wherein if it includes at least two optimization models of the recognition frame and at least two preset attribute characteristics, and each of the recognition frame optimization models corresponds to the same If the sample image set of the attribute characteristics is obtained through training, then the step of sequentially inputting the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model includes:
    将所述待检测图像对应的所述属性特征确定为目标特征;Determining the attribute feature corresponding to the image to be detected as a target feature;
    将所述待检测图像和所有所述截取图像依次输入至与所述目标特征对应的所述识别框优化模型。The image to be detected and all the intercepted images are sequentially input to the recognition frame optimization model corresponding to the target feature.
  6. 如权利要求1所述的识别框确定方法,其特征在于,所述对所述待检测图像进行识别框分析之前,还包括:8. The method for determining a recognition frame according to claim 1, characterized in that, before the performing recognition frame analysis on the image to be detected, the method further comprises:
    将所述待检测图像归一化至预设尺寸,并对归一化后的所述待检测图像进行零均值化。The image to be detected is normalized to a preset size, and the normalized image to be detected is zero-averaged.
  7. 一种基于目标检测的识别框确定装置,其特征在于,包括:A device for determining a recognition frame based on target detection is characterized in that it comprises:
    分析单元,用于获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;An analysis unit, configured to obtain an image to be detected containing a target, and perform identification frame analysis on the image to be detected to obtain at least one identification frame to be detected;
    截取单元,用于根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;An interception unit, configured to intercept the image to be detected according to each identification frame to be detected to obtain a intercepted image;
    输入单元,用于将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The input unit is configured to sequentially input the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model, and obtain the sub-recognition frame output by the recognition frame optimization model corresponding to each input image, and Correct the latter sub-recognition frame according to the previous sub-recognition frame, and determine the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is preset The sample image and the corresponding manual labeling frame are trained, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  8. 根据权利要求7所述的基于目标检测的识别框确定装置,其特征在于,所述输入单元包括:The device for determining a recognition frame based on target detection according to claim 7, wherein the input unit comprises:
    获取单元,用于获取至少所述两张样本图像以及对应的所述人工标注框,其中,所述样本图像为包含所述目标的图像,所述人工标注框为人工标注的所述样本图像中所述目标所在的区域;The acquiring unit is configured to acquire at least the two sample images and the corresponding manual labeling frame, wherein the sample image is an image containing the target, and the manual labeling frame is the sample image manually labeled The area where the target is located;
    样本截取单元,用于对每张所述样本图像进行识别框分析得到至少一个样本识别框,并根据每个所述样本识别框对所述样本图像进行截取得到样本截取图像;A sample interception unit, configured to perform identification frame analysis on each sample image to obtain at least one sample identification frame, and intercept the sample image according to each sample identification frame to obtain a sample intercepted image;
    构建单元,用于基于所述样本图像和对应的所有所述样本截取图像构建样本图像集,将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,确定所述调整后的所述基础模型为所述识别框优化模型;其中,构建的每一个所述样本图像集对应一个所述样本图像。The construction unit is configured to construct a sample image set based on the sample image and all the corresponding sample intercepted images, input at least two of the constructed sample image sets into a preset basic model in sequence, and according to the input The manual labeling frame corresponding to the sample image set adjusts the weight of the basis until the identification frame output by the adjusted basic model matches the manual labeling frame, and the adjusted basis is determined The model is the recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
  9. 根据权利要求8所述的基于目标检测的识别框确定装置,其特征在于,所述构建单元包括:The device for determining a recognition frame based on target detection according to claim 8, wherein the construction unit comprises:
    计算单元,用于将所述基础模型输出的与每个输入的所述样本图像集对应的识别框确定为基础识别框,并计算所述基础识别框与对应的所述人工标注框之间的差异参数;The calculation unit is configured to determine the recognition frame corresponding to each input sample image set output by the basic model as a basic recognition frame, and calculate the difference between the basic recognition frame and the corresponding manual annotation frame Difference parameter
    权重调整单元,用于若所述差异参数大于或等于预设的期望参数,则基于所述差异参数对所述基础模型进行权重调整,重复获取所述基础模型输出的下一个所述基础识别框,并基于更新后的所述差异参数对所述基础模型进行权重调整,直到所述差异参数小于所述期望参数为止。The weight adjustment unit is configured to, if the difference parameter is greater than or equal to a preset expected parameter, adjust the weight of the basic model based on the difference parameter, and repeatedly obtain the next basic recognition frame output by the basic model , And adjust the weight of the basic model based on the updated difference parameter until the difference parameter is smaller than the expected parameter.
  10. 根据权利要求8所述的基于目标检测的识别框确定装置,其特征在于,当对待检测图像进行识别框分析后得到至少两个目标,且每个目标对应至少一个待检测识别框时,所述输入单元包括:The device for determining a recognition frame based on target detection according to claim 8, wherein when at least two targets are obtained after the recognition frame analysis of the image to be detected, and each target corresponds to at least one recognition frame to be detected, the The input unit includes:
    切割单元,用于将所述待检测图像按照每个所述目标对应的所有待检测识别框进行切割得到切割图像,并对所述切割图像进行尺寸缩放,其中,每个所述切割图像对应一个所述目标;The cutting unit is used to cut the to-be-detected image according to all the to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and perform size scaling on the cut image, wherein each cut image corresponds to one The target
    确定单元,用于将尺寸缩放后的所述切割图像和对应的所有所述截取图像依次输入至所述识别框优化模型,将所述识别框优化模型输出的识别框进行尺寸复原,将尺寸复原后的所述识别框确定为所述切割图像对应的所述目标识别框。The determining unit is configured to input the scaled cut image and all the corresponding intercepted images into the recognition frame optimization model in sequence, and restore the size of the recognition frame output by the recognition frame optimization model to restore the size The subsequent recognition frame is determined as the target recognition frame corresponding to the cut image.
  11. 一种终端设备,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A terminal device, which is characterized by comprising a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor executes the computer-readable instructions as follows step:
    获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;Acquiring a to-be-detected image containing a target, and performing identification frame analysis on the to-be-detected image to obtain at least one identification frame to be detected;
    根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;Intercepting the image to be detected according to each identification frame to be detected to obtain a intercepted image;
    将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one. The sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding The target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  12. 根据权利要求11所述的终端设备,其特征在于,所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型之前,还包括:The terminal device according to claim 11, characterized in that, before sequentially inputting the to-be-detected image and all the intercepted images into a pre-trained recognition frame optimization model, the method further comprises:
    获取至少两张所述样本图像以及对应的所述人工标注框,其中,所述样本图像为包含所述目标的图像,所述人工标注框为人工标注的所述样本图像中所述目标所在的区域;Acquire at least two of the sample images and the corresponding manual labeling frame, wherein the sample image is an image containing the target, and the manual labeling frame is the manually labelled sample image where the target is located area;
    对每张所述样本图像进行识别框分析得到至少一个样本识别框,并根据每个所述样本识别框对所述样本图像进行截取得到样本截取图像;Performing identification frame analysis on each of the sample images to obtain at least one sample identification frame, and intercepting the sample image according to each of the sample identification frames to obtain a sample intercepted image;
    基于所述样本图像和对应的所有所述样本截取图像构建样本图像集,将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,确定所述调整后的所述基础模型为所述识别框优化模型;其中,构建的每一个所述样本图像集对应一个所述样本图像。A sample image set is constructed based on the sample image and all the corresponding sample intercepted images, and at least two of the sample image sets constructed are sequentially input into a preset basic model, and according to the input sample image set corresponding to The manual labeling frame adjusts the weight of the basic model until the recognition frame output by the adjusted basic model matches the manual labeling frame, and it is determined that the adjusted basic model is the Recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
  13. 根据权利要求12所述的终端设备,其特征在于,所述将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,包括:The terminal device according to claim 12, wherein the constructed at least two sample image sets are sequentially input into a preset basic model, and the artificial image sets corresponding to the input sample image sets are inputted in sequence. The labeling frame adjusts the weight of the basic model until the identification frame output by the adjusted basic model matches the manual labeling frame, including:
    将所述基础模型输出的与每个输入的所述样本图像集对应的识别框确定为基础识别框,并计算所述基础识别框与对应的所述人工标注框之间的差异参数;Determining a recognition frame corresponding to each input sample image set output by the basic model as a basic recognition frame, and calculating a difference parameter between the basic recognition frame and the corresponding manual labeling frame;
    若所述差异参数大于或等于预设的期望参数,则基于所述差异参数对所述基础模型进行权重调整,重复获取所述基础模型输出的下一个所述基础识别框,并基于更新后的所述差异参数对所述基础模型进行权重调整,直到所述差异参数小于所述期望参数为止。If the difference parameter is greater than or equal to the preset expected parameter, the basic model is weighted based on the difference parameter, and the next basic recognition frame output by the basic model is repeatedly obtained, and based on the updated The difference parameter adjusts the weight of the basic model until the difference parameter is smaller than the expected parameter.
  14. 根据权利要求12所述的终端设备,其特征在于,若对所述待检测图像进行识别框分析后得到至少两个所述目标,且每个所述目标对应至少一个所述待检测识别框,则所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,包括:The terminal device according to claim 12, wherein if at least two of the targets are obtained after identification frame analysis of the image to be detected, and each target corresponds to at least one identification frame to be detected, Then, the to-be-detected image and all the intercepted images are sequentially input to the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and according to the previous One of the sub-recognition frames corrects the latter sub-recognition frame, and the last sub-recognition frame after the correction is determined as the target recognition frame includes:
    将所述待检测图像按照每个所述目标对应的所有待检测识别框进行切割得到切割图像,并对所述切割图像进行尺寸缩放,其中,每个所述切割图像对应一个所述目标;Cutting the to-be-detected image according to all to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and scaling the cut image, wherein each cut image corresponds to one target;
    将尺寸缩放后的所述切割图像和对应的所有所述截取图像依次输入至所述识别框优化模型,将所述识别框优化模型输出的识别框进行尺寸复原,将尺寸复原后的所述识别框确定为所述切割图像对应的所述目标识别框。The cut image after the size scaling and all the corresponding intercepted images are sequentially input to the recognition frame optimization model, the recognition frame output by the recognition frame optimization model is restored to the size, and the recognition after the size is restored The frame is determined as the target recognition frame corresponding to the cut image.
  15. 根据权利要求12所述的终端设备,其特征在于,若包括至少两个所述识别框优化模型以及至少两个预设的属性特征,且每个所述识别框优化模型由对应相同的所述属性特征的所述样本图像集训练得到,则所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,包括:The terminal device according to claim 12, wherein if it includes at least two optimization models of the recognition frame and at least two preset attribute characteristics, and each of the optimization models of the recognition frame corresponds to the same The sample image set of the attribute characteristics is obtained through training, and then the input of the image to be detected and all the intercepted images into a pre-trained recognition frame optimization model in sequence includes:
    将所述待检测图像对应的所述属性特征确定为目标特征;Determining the attribute feature corresponding to the image to be detected as a target feature;
    将所述待检测图像和所有所述截取图像依次输入至与所述目标特征对应的所述识别框优化模型。The image to be detected and all the intercepted images are sequentially input to the recognition frame optimization model corresponding to the target feature.
  16. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:A computer non-volatile readable storage medium, the computer non-volatile readable storage medium storing computer readable instructions, characterized in that, when the computer readable instructions are executed by a processor, the following steps are implemented:
    获取包含目标的待检测图像,并对所述待检测图像进行识别框分析,得到至少一个待检测识别框;Acquiring a to-be-detected image containing a target, and performing identification frame analysis on the to-be-detected image to obtain at least one identification frame to be detected;
    根据每个所述待检测识别框对所述待检测图像进行截取得到截取图像;Intercepting the image to be detected according to each identification frame to be detected to obtain a intercepted image;
    将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,其中,所述识别框优化模型是通过预设的样本图像及对应的人工标注框训练得到的,所述目标识别框用于指示所述待检测图像中所述目标所在的区域。The image to be detected and all the intercepted images are sequentially input into the pre-trained recognition frame optimization model, and the sub-recognition frame output by the recognition frame optimization model corresponding to each input image is obtained, and the sub-recognition frame corresponding to each input image is obtained according to the previous one. The sub-recognition frame corrects the latter sub-recognition frame, and determines the last sub-recognition frame after the correction is completed as the target recognition frame, wherein the recognition frame optimization model is based on a preset sample image and corresponding The target recognition frame is obtained by manual labeling frame training, and the target recognition frame is used to indicate the area where the target is located in the image to be detected.
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型之前,还包括:The computer non-volatile readable storage medium according to claim 16, characterized in that, before inputting the to-be-detected image and all the intercepted images sequentially into a pre-trained recognition frame optimization model, the method further comprises:
    获取至少两张所述样本图像以及对应的所述人工标注框,其中,所述样本图像为包含所述目标的图像,所述人工标注框为人工标注的所述样本图像中所述目标所在的区域;Acquire at least two of the sample images and the corresponding manual labeling frame, wherein the sample image is an image containing the target, and the manual labeling frame is the manually labelled sample image where the target is located area;
    对每张所述样本图像进行识别框分析得到至少一个样本识别框,并根据每个所述样本识别框对所述样本图像进行截取得到样本截取图像;Performing identification frame analysis on each of the sample images to obtain at least one sample identification frame, and intercepting the sample image according to each of the sample identification frames to obtain a sample intercepted image;
    基于所述样本图像和对应的所有所述样本截取图像构建样本图像集,将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,确定所述调整后的所述基础模型为所述识别框优化模型;其中,构建的每一个所述样本图像集对应一个所述样本图像。A sample image set is constructed based on the sample image and all the corresponding sample intercepted images, and at least two of the sample image sets constructed are sequentially input into a preset basic model, and according to the input sample image set corresponding to The manual labeling frame adjusts the weight of the basic model until the recognition frame output by the adjusted basic model matches the manual labeling frame, and it is determined that the adjusted basic model is the Recognition frame optimization model; wherein, each of the sample image sets constructed corresponds to one sample image.
  18. 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,所述将构建的至少两个所述样本图像集依次输入至预设的基础模型,并根据输入的所述样本图像集对应的所述人工标注框对所述基础模型进行权重调整,直到调整后的所述基础模型输出的所述识别框与所述人工标注框相匹配,包括:The computer non-volatile readable storage medium according to claim 17, wherein the constructed at least two sample image sets are sequentially input into a preset basic model, and the input samples are The manual labeling frame corresponding to the image set adjusts the weight of the basic model until the identification frame output by the adjusted basic model matches the manual labeling frame, including:
    将所述基础模型输出的与每个输入的所述样本图像集对应的识别框确定为基础识别框,并计算所述基础识别框与对应的所述人工标注框之间的差异参数;Determining a recognition frame corresponding to each input sample image set output by the basic model as a basic recognition frame, and calculating a difference parameter between the basic recognition frame and the corresponding manual labeling frame;
    若所述差异参数大于或等于预设的期望参数,则基于所述差异参数对所述基础模型进行权重调整,重复获取所述基础模型输出的下一个所述基础识别框,并基于更新后的所述差异参数对所述基础模型进行权重调整,直到所述差异参数小于所述期望参数为止。If the difference parameter is greater than or equal to the preset expected parameter, the basic model is weighted based on the difference parameter, and the next basic recognition frame output by the basic model is repeatedly obtained, and based on the updated The difference parameter adjusts the weight of the basic model until the difference parameter is smaller than the expected parameter.
  19. 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,若对所述待检测图像进行识别框分析后得到至少两个所述目标,且每个所述目标对应至少一个所述待检测识别框,则所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,获取所述识别框优化模型输出的与每个输入的图像对应的子识别框,并根据前一个所述子识别框对后一个所述子识别框进行修正,将修正完成后的最后一个所述子识别框确定为目标识别框,包括:The computer non-volatile readable storage medium of claim 17, wherein if at least two of the targets are obtained after the identification frame analysis of the image to be detected, and each of the targets corresponds to at least one For the recognition frame to be detected, the image to be detected and all the intercepted images are sequentially input into a pre-trained recognition frame optimization model, and the output of the recognition frame optimization model corresponding to each input image is obtained Sub-recognition frame, and correcting the latter sub-recognition frame according to the previous sub-recognition frame, and determining the last sub-recognition frame after the correction as the target recognition frame, including:
    将所述待检测图像按照每个所述目标对应的所有待检测识别框进行切割得到切割图像,并对所述切割图像进行尺寸缩放,其中,每个所述切割图像对应一个所述目标;Cutting the to-be-detected image according to all to-be-detected recognition frames corresponding to each of the targets to obtain a cut image, and scaling the cut image, wherein each cut image corresponds to one target;
    将尺寸缩放后的所述切割图像和对应的所有所述截取图像依次输入至所述识别框优化模型,将所述识别框优化模型输出的识别框进行尺寸复原,将尺寸复原后的所述识别框确定为所述切割图像对应的所述目标识别框。The cut image after the size scaling and all the corresponding intercepted images are sequentially input to the recognition frame optimization model, the recognition frame output by the recognition frame optimization model is restored to the size, and the recognition after the size is restored The frame is determined as the target recognition frame corresponding to the cut image.
  20. 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,若包括至少两个所述识别框优化模型以及至少两个预设的属性特征,且每个所述识别框优化模型由对应相同的所述属性特征的所述样本图像集训练得到,则所述将所述待检测图像和所有所述截取图像依次输入预先训练好的识别框优化模型,包括:The computer non-volatile readable storage medium according to claim 17, wherein if it includes at least two optimization models of the recognition frame and at least two preset attribute characteristics, and each recognition frame is optimized The model is obtained by training the sample image set corresponding to the same attribute feature, and then sequentially inputting the to-be-detected image and all the intercepted images into a pre-trained recognition frame optimization model includes:
    将所述待检测图像对应的所述属性特征确定为目标特征;Determining the attribute feature corresponding to the image to be detected as a target feature;
    将所述待检测图像和所有所述截取图像依次输入至与所述目标特征对应的所述识别框优化模型。The image to be detected and all the intercepted images are sequentially input to the recognition frame optimization model corresponding to the target feature.
PCT/CN2019/118131 2019-01-23 2019-11-13 Target detection based identification box determining method and device and terminal equipment WO2020151329A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910064290.9A CN109886997B (en) 2019-01-23 2019-01-23 Identification frame determining method and device based on target detection and terminal equipment
CN201910064290.9 2019-01-23

Publications (1)

Publication Number Publication Date
WO2020151329A1 true WO2020151329A1 (en) 2020-07-30

Family

ID=66926531

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118131 WO2020151329A1 (en) 2019-01-23 2019-11-13 Target detection based identification box determining method and device and terminal equipment

Country Status (2)

Country Link
CN (1) CN109886997B (en)
WO (1) WO2020151329A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633295A (en) * 2020-12-22 2021-04-09 深圳集智数字科技有限公司 Prediction method and device for loop task, electronic equipment and storage medium
CN113420597A (en) * 2021-05-24 2021-09-21 北京三快在线科技有限公司 Method and device for identifying roundabout, electronic equipment and storage medium
CN114119455A (en) * 2021-09-03 2022-03-01 乐普(北京)医疗器械股份有限公司 Method and device for positioning blood vessel stenosis part based on target detection network
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
CN115690405A (en) * 2022-12-29 2023-02-03 中科航迈数控软件(深圳)有限公司 Machine vision-based machining track optimization method and related equipment
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
CN116309696A (en) * 2022-12-23 2023-06-23 苏州驾驶宝智能科技有限公司 Multi-category multi-target tracking method and device based on improved generalized cross-over ratio
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
CN117194992A (en) * 2023-11-01 2023-12-08 支付宝(杭州)信息技术有限公司 Model training and task execution method and device, storage medium and equipment
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886997B (en) * 2019-01-23 2023-07-11 平安科技(深圳)有限公司 Identification frame determining method and device based on target detection and terminal equipment
CN110443259B (en) * 2019-07-29 2023-04-07 中科光启空间信息技术有限公司 Method for extracting sugarcane from medium-resolution remote sensing image
CN112850436A (en) * 2019-11-28 2021-05-28 宁波微科光电股份有限公司 Pedestrian trend detection method and system of elevator intelligent light curtain
CN111160302B (en) * 2019-12-31 2024-02-23 深圳一清创新科技有限公司 Obstacle information identification method and device based on automatic driving environment
WO2021189889A1 (en) * 2020-03-26 2021-09-30 平安科技(深圳)有限公司 Text detection method and apparatus in scene image, computer device, and storage medium
CN111652012B (en) * 2020-05-11 2021-10-29 中山大学 Curved surface QR code positioning method based on SSD network model
CN111368116B (en) * 2020-05-26 2020-09-18 腾讯科技(深圳)有限公司 Image classification method and device, computer equipment and storage medium
CN112633118A (en) * 2020-12-18 2021-04-09 上海眼控科技股份有限公司 Text information extraction method, equipment and storage medium
CN112966683A (en) * 2021-03-04 2021-06-15 咪咕文化科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113111852B (en) * 2021-04-30 2022-07-01 苏州科达科技股份有限公司 Target detection method, training method, electronic equipment and gun and ball linkage system
CN113791078B (en) * 2021-09-02 2023-06-13 中国农业机械化科学研究院 Batch detection method and device for internal cracks of corn seeds
CN116029970A (en) * 2022-09-22 2023-04-28 北京城市网邻信息技术有限公司 Image recognition method, device, electronic equipment and storage medium
CN116313164B (en) * 2023-05-22 2023-08-22 亿慧云智能科技(深圳)股份有限公司 Anti-interference sleep monitoring method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150055829A1 (en) * 2013-08-23 2015-02-26 Ricoh Company, Ltd. Method and apparatus for tracking object
CN106971178A (en) * 2017-05-11 2017-07-21 北京旷视科技有限公司 Pedestrian detection and the method and device recognized again
CN108121986A (en) * 2017-12-29 2018-06-05 深圳云天励飞技术有限公司 Object detection method and device, computer installation and computer readable storage medium
CN108229380A (en) * 2017-12-29 2018-06-29 深圳市神州云海智能科技有限公司 A kind of detection method of target image, device and storage medium, robot
CN109886997A (en) * 2019-01-23 2019-06-14 平安科技(深圳)有限公司 Method, apparatus and terminal device are determined based on the identification frame of target detection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470172B (en) * 2017-02-23 2021-06-11 阿里巴巴集团控股有限公司 Text information identification method and device
US10553091B2 (en) * 2017-03-31 2020-02-04 Qualcomm Incorporated Methods and systems for shape adaptation for merged objects in video analytics
CN109191444A (en) * 2018-08-29 2019-01-11 广东工业大学 Video area based on depth residual error network removes altering detecting method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150055829A1 (en) * 2013-08-23 2015-02-26 Ricoh Company, Ltd. Method and apparatus for tracking object
CN106971178A (en) * 2017-05-11 2017-07-21 北京旷视科技有限公司 Pedestrian detection and the method and device recognized again
CN108121986A (en) * 2017-12-29 2018-06-05 深圳云天励飞技术有限公司 Object detection method and device, computer installation and computer readable storage medium
CN108229380A (en) * 2017-12-29 2018-06-29 深圳市神州云海智能科技有限公司 A kind of detection method of target image, device and storage medium, robot
CN109886997A (en) * 2019-01-23 2019-06-14 平安科技(深圳)有限公司 Method, apparatus and terminal device are determined based on the identification frame of target detection

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020476B2 (en) 2017-03-23 2024-06-25 Tesla, Inc. Data synthesis for autonomous control systems
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11797304B2 (en) 2018-02-01 2023-10-24 Tesla, Inc. Instruction set architecture for a vector computational unit
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11983630B2 (en) 2018-09-03 2024-05-14 Tesla, Inc. Neural networks for embedded devices
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11908171B2 (en) 2018-12-04 2024-02-20 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US12014553B2 (en) 2019-02-01 2024-06-18 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
CN112633295A (en) * 2020-12-22 2021-04-09 深圳集智数字科技有限公司 Prediction method and device for loop task, electronic equipment and storage medium
CN113420597A (en) * 2021-05-24 2021-09-21 北京三快在线科技有限公司 Method and device for identifying roundabout, electronic equipment and storage medium
CN114119455B (en) * 2021-09-03 2024-04-09 乐普(北京)医疗器械股份有限公司 Method and device for positioning vascular stenosis part based on target detection network
CN114119455A (en) * 2021-09-03 2022-03-01 乐普(北京)医疗器械股份有限公司 Method and device for positioning blood vessel stenosis part based on target detection network
CN116309696B (en) * 2022-12-23 2023-12-01 苏州驾驶宝智能科技有限公司 Multi-category multi-target tracking method and device based on improved generalized cross-over ratio
CN116309696A (en) * 2022-12-23 2023-06-23 苏州驾驶宝智能科技有限公司 Multi-category multi-target tracking method and device based on improved generalized cross-over ratio
CN115690405A (en) * 2022-12-29 2023-02-03 中科航迈数控软件(深圳)有限公司 Machine vision-based machining track optimization method and related equipment
CN117194992A (en) * 2023-11-01 2023-12-08 支付宝(杭州)信息技术有限公司 Model training and task execution method and device, storage medium and equipment
CN117194992B (en) * 2023-11-01 2024-04-19 支付宝(杭州)信息技术有限公司 Model training and task execution method and device, storage medium and equipment

Also Published As

Publication number Publication date
CN109886997A (en) 2019-06-14
CN109886997B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
WO2020151329A1 (en) Target detection based identification box determining method and device and terminal equipment
US11403876B2 (en) Image processing method and apparatus, facial recognition method and apparatus, and computer device
CN109389030B (en) Face characteristic point detection method and device, computer equipment and storage medium
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
US11961227B2 (en) Method and device for detecting and locating lesion in medical image, equipment and storage medium
WO2021159774A1 (en) Object detection model training method and apparatus, object detection method and apparatus, computer device, and storage medium
WO2021068322A1 (en) Training method and apparatus for living body detection model, computer device, and storage medium
CN109271870B (en) Pedestrian re-identification method, device, computer equipment and storage medium
US9928405B2 (en) System and method for detecting and tracking facial features in images
WO2020182036A1 (en) Image processing method and apparatus, server, and storage medium
US20200117906A1 (en) Space-time memory network for locating target object in video content
CN111583220B (en) Image data detection method and device
CN111950329A (en) Target detection and model training method and device, computer equipment and storage medium
EP3839807A1 (en) Facial landmark detection method and apparatus, computer device and storage medium
WO2021179471A1 (en) Face blur detection method and apparatus, computer device and storage medium
CN111968134B (en) Target segmentation method, device, computer readable storage medium and computer equipment
US20230034040A1 (en) Face liveness detection method, system, and apparatus, computer device, and storage medium
WO2022057309A1 (en) Lung feature recognition method and apparatus, computer device, and storage medium
CN112418278A (en) Multi-class object detection method, terminal device and storage medium
US20190236336A1 (en) Facial recognition method, facial recognition system, and non-transitory recording medium
KR20150051711A (en) Apparatus and method for extracting skin area for blocking harmful content image
CN113706564A (en) Meibomian gland segmentation network training method and device based on multiple supervision modes
CN113269149B (en) Method and device for detecting living body face image, computer equipment and storage medium
CN111325144A (en) Behavior detection method and apparatus, computer device and computer-readable storage medium
WO2023050651A1 (en) Semantic image segmentation method and apparatus, and device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911872

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19911872

Country of ref document: EP

Kind code of ref document: A1