WO2020038205A1 - 目标检测方法、装置、计算机可读存储介质及计算机设备 - Google Patents

目标检测方法、装置、计算机可读存储介质及计算机设备 Download PDF

Info

Publication number
WO2020038205A1
WO2020038205A1 PCT/CN2019/098742 CN2019098742W WO2020038205A1 WO 2020038205 A1 WO2020038205 A1 WO 2020038205A1 CN 2019098742 W CN2019098742 W CN 2019098742W WO 2020038205 A1 WO2020038205 A1 WO 2020038205A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
network
residual
target
Prior art date
Application number
PCT/CN2019/098742
Other languages
English (en)
French (fr)
Inventor
苗捷
冉辰
许典平
贾晓义
姜媚
林榆耿
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19851718.7A priority Critical patent/EP3843003B1/en
Publication of WO2020038205A1 publication Critical patent/WO2020038205A1/zh
Priority to US17/020,636 priority patent/US11710293B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1408Methods for optical code recognition the method being specifically adapted for the type of code
    • G06K7/14131D bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1408Methods for optical code recognition the method being specifically adapted for the type of code
    • G06K7/14172D bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, a device, a computer-readable storage medium, and a computer device for detecting an object.
  • Object detection is an important branch in image processing, and its purpose is to determine the position of the target object in the image.
  • the traditional target detection method is to find the position of the target object in the image by finding the anchor point on the target object in the image.
  • the positioning marks set at the three vertices of the two-dimensional code are searched in the image to determine the position of the two-dimensional code in the image.
  • traditional target detection methods are not robust and take a long time.
  • a target detection method which is applied to computer equipment, and the method includes:
  • Classify and regress according to the first image feature and the third image feature determine a candidate position parameter corresponding to a target object in the image to be measured, and a confidence level corresponding to the candidate position parameter;
  • a valid position parameter is selected from each of the candidate position parameters according to the confidence level, and a position of a target object in the image to be measured is determined according to the valid position parameter.
  • an object detection device in another aspect, and the device includes:
  • a test image acquisition module for acquiring a test image
  • An image feature acquisition module configured to extract a first image feature and a second image feature corresponding to the image to be tested
  • a hole convolution processing module configured to perform hole convolution according to the second image feature to obtain a third image feature corresponding to the image to be tested;
  • a candidate parameter acquisition module configured to perform classification and regression based on the first image feature and the third image feature, determine a candidate position parameter corresponding to a target object in the image to be measured, and correspond to the candidate position parameter Confidence
  • a target position determining module is configured to filter valid position parameters from each of the candidate position parameters according to the confidence level, and determine a position of a target object in the image to be measured according to the valid position parameters.
  • a computer-readable storage medium storing a computer program that, when executed by a processor, implements steps in the above-described target detection method.
  • a computer device including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the above-mentioned target detection method when the processor executes the computer program.
  • the above target detection method, device, computer-readable storage medium, and computer equipment extract the first image feature and the second image feature corresponding to the image to be tested, and then perform hole convolution according to the second image feature to obtain the first image feature corresponding to the image to be tested.
  • Three image features and then perform classification and regression based on the first image feature and the third image feature, and determine the position of the target object in the image to be tested according to the results of the classification and regression.
  • automatically extracting the image features corresponding to the image to be tested, and performing classification and regression based on the extracted image features can effectively improve the robustness of the detection and reduce the detection time.
  • the receptive field is effectively enlarged through the hole convolution processing, which can better adapt to the detection of target objects of different sizes.
  • FIG. 1 is an application environment diagram of a target detection method in an embodiment
  • FIG. 2 is a schematic flowchart of a target detection method according to an embodiment
  • FIG. 3 is a structural block diagram of a predetermined neural network in an embodiment
  • FIG. 4 is a structural block diagram of a downsampling module in an embodiment
  • FIG. 5 is a structural block diagram of a residual block in an embodiment
  • FIG. 6 is a structural block diagram of a residual block in an embodiment
  • FIG. 7 is a structural block diagram of a second residual block in an embodiment
  • FIG. 8 is a structural block diagram of a predetermined neural network in an embodiment
  • FIG. 9 is a schematic flowchart of a target detection method according to an embodiment
  • FIG. 10 is a structural block diagram of an object detection device in an embodiment
  • FIG. 11 is a schematic diagram of index comparison in identification code detection in an embodiment
  • FIG. 12 is a structural block diagram of a computer device in an embodiment
  • FIG. 13 is a structural block diagram of a computer device in one embodiment.
  • first the terms “first”, “second”, and the like used in this application are used to distinguish similar objects, but the objects themselves are not limited by these terms. It should be understood that these terms are interchangeable under appropriate circumstances without departing from the scope of this application.
  • first image feature may be described as a “second image feature”
  • second image feature is described as a “first image feature”.
  • the target detection methods provided by the embodiments of the present application can be applied to the application environment shown in FIG. 1.
  • the application environment may involve the terminal 110 and the server 120, and the terminal 110 and the server 120 may be connected through a network.
  • the model training can be completed on the server 120 to obtain a predetermined neural network with target detection capabilities. Furthermore, the predetermined neural network is deployed on the terminal 110. After obtaining the image to be tested, the terminal 110 inputs the image to be tested into a predetermined neural network, extracts the first image feature and the second image feature corresponding to the image to be tested through the predetermined neural network, and performs hole convolution according to the second image feature to obtain the image to be tested.
  • Measure the third image feature corresponding to the image and then perform classification and regression based on the first image feature and the third image feature to determine the candidate position parameter corresponding to the target object in the image to be measured and the confidence level corresponding to the candidate position parameter, and then The effective position parameters are selected from the candidate position parameters according to the confidence degree, and the position of the target object in the image to be measured is determined according to the effective position parameters.
  • the predetermined neural network may not be deployed on the terminal 110, but may be deployed on the server 120.
  • the terminal 110 may send the image to be tested to the server 120, and then The server 120 completes the tasks from inputting the image to be measured to a predetermined neural network to determining a position of a target object in the image to be measured.
  • the model training can also be completed on the terminal 110.
  • the terminal 110 can independently complete the model training and tasks from inputting the image to be tested to a predetermined neural network to determining the location of the target object in the image to be tested, without the need for the server 120 to participate .
  • the terminal 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device, but is not limited thereto.
  • the server 120 may be implemented by an independent physical server or a server cluster composed of multiple physical servers. It can be understood that, in FIG. 1, the terminal 110 is represented by a mobile phone and the server 120 is represented by an independent physical server, which are only exemplary descriptions and are not used to limit the terminal 110 and the server 120.
  • a target detection method is provided.
  • the method is applied to a computer device (such as the terminal 110 or the server 120 in FIG. 1 described above) as an example for description.
  • the method may include the following steps S202 to S210.
  • the image to be measured is an image that needs to be detected.
  • the purpose of target detection is to determine the position of the target object in the image.
  • the target object is essentially image content, which can be set in advance according to actual needs.
  • the target object can be an identification code, a vehicle, a pedestrian, a face, etc.
  • the identification code can be a two-dimensional code, a one-dimensional code (also known as a barcode), an applet code, or a PDF (Portable Data File) 417 code. Etc., but neither the target object nor the identification code.
  • the image to be tested may be an original image without adjustment, that is, after the terminal obtains the original image, the original image is not adjusted, but the original image itself is directly used as the image to be measured.
  • the image to be tested may also be an image obtained after adjusting the original image, that is, after the terminal obtains the original image, the original image is adjusted for better target detection, and the adjusted image is used as Test image.
  • the adjustment method for adjusting the image may include adjusting the resolution of the image, that is, the resolution of the original image may be adjusted to a reference resolution preset according to actual requirements.
  • the preset reference resolution may be one. In this case, the resolutions of all the images to be measured are unified as the reference resolution.
  • the resolution of the image to be tested may be set according to the computing capability of the terminal, that is, when the target detection is performed on terminals with different computing capabilities, the resolution of the image to be tested may be different.
  • there may be more than one preset reference resolution and a matching relationship between each reference resolution and each terminal description information may be established in advance, and the terminal description information is used to characterize the computing capability of the terminal.
  • step S202 may include the steps of: acquiring an original image, acquiring terminal description information used to characterize the computing capability of the terminal, and adjusting the original image according to a reference resolution matching the terminal description information to obtain an image to be tested.
  • the terminal description information may be classified.
  • the terminal computing information represented by different types of terminal description information is different, and the terminal description information of each category is matched with each reference resolution.
  • the terminal description information is divided into high-end terminal description information and low-end terminal description information.
  • the terminal computing capability represented by the high-end terminal description information is higher than the terminal computing capability represented by the low-end terminal description information.
  • the low-end terminal description information matches the second reference resolution.
  • the first reference resolution can be higher than the second reference resolution.
  • the first reference resolution is 512 ⁇ 512 and the second reference resolution is 300 ⁇ . 300. It can be understood that distinguishing the reference resolution matched by the description information of different types of terminals can improve the accuracy of target detection on high-end terminals and the real-time performance of target detection on low-end terminals.
  • the adjustment method for adjusting the image may be determined according to actual needs, and is not limited to adjusting the resolution of the image.
  • it may also include adjusting image attributes such as contrast, exposure, and color of the image.
  • the first image feature and the second image feature both correspond to the image to be tested, and both can be used to reflect the image characteristics of the image to be tested.
  • the first image feature is an image feature that requires classification and regression
  • the second image feature is an image feature that requires hole convolution processing.
  • the number of first image features may be an integer equal to or greater than one.
  • each first image feature may have a different spatial scale, for example, two first image features are extracted, and one of the first image features has a spatial scale of 19 ⁇ 19, and the other first image has The spatial scale of the feature is 10 ⁇ 10.
  • the number of second image features may be an integer equal to or greater than one. When there are more than one second image feature, each second image feature may also have a different spatial scale.
  • the first image feature and the second image feature corresponding to the image to be tested may be extracted through a predetermined neural network.
  • both the first image feature and the second image feature may be a feature map, and the data form may be a vector.
  • the predetermined neural network is a neural network that is pre-trained according to a sample image of a position where a target object has been calibrated, and has the ability to complete target detection.
  • a large number of identification code sample images can be obtained.
  • the identification code sample image includes the identification code as the target object, and the identification in the identification code sample image is calibrated.
  • the location of the code can be used to perform model training based on a large number of identification code sample images to obtain a predetermined neural network.
  • the predetermined neural network can realize end-to-end learning, that is, the image to be tested can be directly input to the predetermined neural network, and then the predetermined neural network directly outputs the prediction for predicting the position of the target object in the image to be tested.
  • the parameters are the candidate position parameters corresponding to the target object in the image to be measured and the confidence level corresponding to the candidate position parameters.
  • S206 Perform hole convolution according to the second image feature to obtain a third image feature corresponding to the image to be tested.
  • Dilated convolution also known as dilated convolution
  • dilated convolution is a type of convolution that injects holes between convolution kernels.
  • hole convolution introduces a hyperparameter called “dilation ratio”, which defines the distance between values when the convolution kernel processes data.
  • the third image feature is an image feature obtained by performing hole convolution processing according to the second image feature. Similar to the first image feature and the second image feature, the third image feature can also be used to reflect the image characteristics of the image to be measured, which can also be a feature map.
  • the spatial scale of the third image feature may be the same as the second image feature. In addition, the number of third image features may also be an integer equal to or greater than one. When there are more than one third image feature, each third image feature may have the same spatial scale, for example, the spatial scale of the second image feature is 10 ⁇ 10. After performing hole convolution processing according to the second feature, three third image features are obtained, and the three spatial dimensions of the three third image features are all 10 ⁇ 10.
  • the receptive field is the size of the area on the original image where the pixels on the feature map output by the hidden layer are mapped.
  • the larger the receptive field of a pixel on the original image the larger the range of the original image that it maps. It means that it may contain more global and higher semantic features.
  • Classification and regression are performed according to the first image feature and the third image feature, and a candidate position parameter corresponding to the target object in the image to be measured and a confidence level corresponding to the candidate position parameter are determined.
  • the candidate position parameter can be used to determine the candidate position where the target object is located in the image to be measured. Confidence is used to characterize the probability that the candidate position corresponding to the corresponding candidate position parameter is the position of the target object in the image to be measured. Among them, there are usually more than one candidate position parameter, and each candidate position parameter has its corresponding confidence.
  • classification and regression are performed according to the first image feature and the third image feature to determine the candidate position parameter corresponding to the target object in the image to be measured and the confidence level corresponding to the candidate position parameter.
  • the SSD Single Shot MultiBox Detector
  • the multi-frame prediction method mentioned in the target detection method is implemented.
  • each of the first image feature and the third image feature corresponds to several pre-selection boxes (ie, Bounding Box).
  • the pre-selection frame is a rectangular frame used to predict the position of the target object in the image to be measured.
  • each offset parameter (regression obtained) corresponding to each preselection box and each confidence (classification obtained) corresponding to each preselection box can be obtained.
  • the offset parameter corresponding to the preselected box is used to determine the location of the area mapped by the preselected box on the image to be tested, and the confidence corresponding to the preselected box is used to characterize the area covered by the preselected box that is mapped on the image to be tested.
  • the probability of the target object is a candidate position parameter corresponding to the target object in the image to be measured, and the confidence level corresponding to each preselection box is a corresponding one to the candidate position parameter.
  • Each confidence is a corresponding one to the candidate position parameter.
  • a four-dimensional parameter group can be used to describe the position of the preselection box on its corresponding image feature, and then regression is performed according to the four-dimensional parameter group corresponding to the preselection box to obtain the offset corresponding to the preselection box parameter.
  • the four-dimensional parameter group may include the abscissa (x) of the position point, the ordinate (y), the width (w), and the height (h) of the position point.
  • the position point is a position point of the preselection box, which may be a vertex of the preselection box, or a center point of the preselection box, and so on.
  • the width is the width of the preselection box
  • the height is the height of the preselection box.
  • a four-dimensional parameter group for describing the position of the preselection box DB1 on the first image feature including a vertex of the upper left corner of the preselection box DB1 in the first image feature , The vertical coordinate of the top left vertex in the first image feature, the width of the preselection box DB1, and the height of the preselection box DB1.
  • the position of the area mapped by the preselection box on the image to be measured can also be described by a four-dimensional parameter group.
  • the four-dimensional parameter group describing the location of the area mapped by the preselection box on the image to be measured may include the abscissa of a position point of the mapped area, the ordinate of the position point, the width of the mapped area, And the height of the mapped area.
  • a position point of the mapped area may be a vertex of the mapped area, or a center point of the mapped area, and so on.
  • each pixel on the first image feature may correspond to a predetermined number of pre-selection boxes, and the predetermined number may be set according to actual requirements.
  • a predetermined number of pre-selection boxes corresponding to the same pixel on the first image feature may have various aspect ratios and scales.
  • the first image feature F11 includes 361 (19 ⁇ 19) pixels, each of which corresponds to There are 6 preselection boxes, and the 6 preselection boxes can have various aspect ratios and sizes, and then there are 2166 preselection boxes (361 ⁇ 6) on the first image feature F11.
  • each pixel on the third image feature may correspond to a predetermined number of pre-selection boxes, and the predetermined number may be set according to actual needs.
  • the predetermined number of preselected frames corresponding to the same pixel on the third image feature may have various aspect ratios and sizes.
  • S210 Filter out valid position parameters from each candidate position parameter according to the confidence degree, and determine the position of the target object in the image to be measured according to the valid position parameters.
  • the effective position parameter is a candidate position parameter satisfying a predetermined screening condition.
  • the predetermined screening condition may be set in advance according to actual needs.
  • the predetermined screening condition may include that the confidence level corresponding to the candidate position parameter is greater than the predetermined confidence threshold, that is, the candidate position parameter whose corresponding confidence level is greater than the predetermined confidence threshold is valid.
  • Position parameters may include the maximum confidence level of the candidate position parameter, that is, the candidate position parameter whose corresponding confidence level is the largest among the confidence levels is used as the effective position parameter.
  • the effective position parameter has a corresponding preselection box (hereinafter, the preselection box corresponding to the effective position parameter is referred to as the effective preselection box).
  • the position of the area where the effective preselection box is mapped on the image to be tested is That is, the position of the target object in the image to be measured.
  • decoding and conversion according to the effective position parameters can obtain a four-dimensional parameter group.
  • the four-dimensional parameter group is used to describe the position of the area mapped by the effective pre-selection box on the image to be measured, that is, the image to be measured. Where the target object is located.
  • the four-dimensional parameter group may include an abscissa of a position point of the target object, an ordinate of the position point, a width of the target object, and a height of the target object.
  • a position point of the target object may be a vertex of the target object, or a center point of the target object, and so on.
  • the above object detection method extracts a first image feature and a second image feature corresponding to the image to be tested, and then performs hole convolution according to the second image feature to obtain a third image feature corresponding to the image to be tested.
  • the third image feature is classified and regressed, and the position of the target object in the image to be tested is determined according to the classification and regression results.
  • automatically extracting the image features corresponding to the image to be tested, and performing classification and regression based on the extracted image features can effectively improve the robustness of the detection and reduce the detection time.
  • the receptive field is effectively enlarged through the hole convolution processing, which can better adapt to the detection of target objects of different sizes.
  • the recall rate for smaller target audiences has been improved.
  • the first image feature and the second image feature corresponding to the image to be tested are extracted and output through the basic network in the predetermined neural network; the hole convolution network in the predetermined neural network is used to perform the process according to the second image feature. Hole convolution to obtain and output the third image feature corresponding to the image to be tested; classify and regression according to the first image feature and the third image feature through the output network in the predetermined neural network to determine the target object in the image to be tested The corresponding candidate position parameter and the confidence corresponding to the candidate position parameter.
  • the predetermined neural network may include a basic network, a hole convolutional network, and an output network.
  • the image to be tested is input into the predetermined neural network from the input of the basic network, and the output of the basic network is connected to the input of the output network on the one hand and the input of the hole convolution network on the other hand; the output of the hole convolution network is connected The input end of the output network; the output end of the output network is used to output each candidate position parameter corresponding to the target object in the image to be measured and each confidence level corresponding to each candidate position parameter.
  • the basic network may have multiple output terminals.
  • the output terminal connected to the output network in the basic network and the output terminal connected to the hole convolution network in the basic network may be completely the same, may be completely different, and may be partially the same. .
  • the basic network is a network that can be used for feature extraction.
  • the network framework of the basic network can be directly selected from existing network frameworks with feature extraction functions, such as VGG-16 (GG-Very-Deep-16CNN), or the existing network framework can be reconstructed.
  • the first image feature is an image feature obtained by the basic network according to the image to be tested and used for output to the output network.
  • the second image feature is an image feature obtained by the basic network according to the image to be measured and used for output to the hole convolutional network.
  • the first image feature output from the basic network to the output network and the second image feature output from the basic network to the hole convolution network may be completely the same, may be completely different, or may be partially the same.
  • the basic network outputs two different first image features and one second image feature, and the second image feature is the same as one of the first image features.
  • the third image feature is an image feature obtained by the hole convolutional network according to the second image feature and used for output to the output network.
  • the hole convolution network is a network for feature extraction through hole convolution.
  • the hole convolutional network may be formed by stacking hole convolutional layers.
  • the output network can be used to perform regression processing according to the first image feature and the third image feature, thereby determining each candidate position parameter corresponding to the target object in the image to be measured.
  • the output network may also be used to perform classification processing according to the first image feature and the third image feature, so as to determine each confidence level corresponding to each candidate position parameter.
  • the output network can be used to perform regression processing and classification processing on the preselection boxes on the first image feature and the third image feature, so as to obtain the offset parameters and confidence corresponding to each preselection box, that is, for any preselection box, the output The network will output the offset parameter corresponding to the pre-selection box and its corresponding confidence.
  • the network framework of the output network can be implemented using any adapted network framework, as long as classification and regression functions can be implemented, which is not limited in this application.
  • the smaller the size of the pre-selection box on the image feature output in the earlier position in the predetermined neural network the larger the size of the pre-selection box on the image feature output in the later position in the predetermined neural network. That is to say, in the predetermined neural network, the image features output at the earlier position are used to detect the small-sized target object, and the image features output at the later position are used to detect the large-sized target object.
  • the basic network outputs the first image feature F11 and the first image feature F12 to the output network
  • the hole convolution network outputs the third image feature F31, the third image feature F32, and the third image feature F33 to
  • the output network sorts each image feature according to the output position from back to front, first image feature F11, first image feature F12, third image feature F31, third image feature F32, and third image feature F33.
  • the size of the preselection box thereon gradually increases, and the size of the target object responsible for detection gradually increases, such as the size of the preselection box on the first image feature F12 Less than the size of the pre-selection box on the third image feature F31, the size of the target object detected by the first image feature F12 is smaller than the size of the target object detected by the third image feature F31.
  • the step of extracting and outputting the first image feature and the second image feature corresponding to the image to be tested through the basic network in the predetermined neural network may include the following steps: extracting the network through the primary feature in the basic network, The image to be tested is subjected to convolution processing and pooling processing in order to output the first intermediate feature corresponding to the image to be tested; through the residual network in the basic network, feature extraction is performed according to the first intermediate feature, and the extracted and tested images are output.
  • the basic network in the predetermined neural network includes a primary feature extraction network and a residual network.
  • the primary feature extraction network is a network for performing feature extraction on an image to be measured.
  • the residual network (Residual Network, ResNet) is a network that adds directly connected edges to the non-linear convolution layer. It can be used for further feature extraction on the output of the primary feature extraction network.
  • the residual network extraction feature extraction method and The internal structure of the residual network corresponds. Different internal structures can extract features differently.
  • the primary feature extraction network may include a convolutional layer and a pooling layer.
  • the convolution layer can be used to perform convolution processing to obtain image features.
  • the pooling layer (Pooling) can be used to perform dimensionality reduction processing on image features.
  • the pooling layer usually includes two forms of mean pooling (Mean Pooling) and max pooling (Max Pooling).
  • the convolutional layer in the primary feature extraction network may be a 3 ⁇ 3 ordinary convolution layer, and the pooling layer may be a 3 ⁇ 3 maximum pooling layer. Among them, 3 ⁇ 3 represents the size of the convolution kernel.
  • the first intermediate feature is an image feature obtained by sequentially performing a convolution process on a convolutional layer in a primary feature extraction network and performing a dimensionality reduction process on a pooling layer.
  • the residual network performs feature extraction according to its input information, obtains a first image feature corresponding to the image to be tested, outputs the first image feature to an output network of a predetermined neural network, and extracts the corresponding image to be tested. And output the second image feature to a hole convolutional network of a predetermined neural network.
  • the input information is the output of the primary feature extraction network (ie, the first intermediate feature).
  • the input information is the residual. The output of the previous residual network of the difference network.
  • using the residual network to build a basic network can effectively reduce the amount of parameters and calculations on the one hand, and facilitate the rapid convergence of the network on the other hand, and can effectively solve the problem of difficult training of deep networks.
  • the step of performing feature extraction according to the first intermediate feature through the residual network in the basic network and outputting the extracted first image feature and second image feature corresponding to the image to be measured may include the following steps:
  • the first intermediate feature is sequentially extracted through each residual network in the basic network.
  • the first target residual network outputs the first image feature corresponding to the image to be tested, and the second target residual network outputs the first image feature to Second image feature.
  • the first target residual network can be used to output the first image feature to an output network of a predetermined neural network.
  • the first target residual network is selected from each residual network included in the basic network.
  • the first target residual network may include one or more residual networks specified in advance among the residual networks of the basic network.
  • the second target residual network can be used for a hole convolutional network that outputs second image features to a predetermined neural network. Similarly, the second target residual network is also selected from the residual networks included in the basic network.
  • the second target residual network may include one or more residual networks specified in each residual network of the base network in advance.
  • the number of residual networks included in the first target residual network can be as large as possible to cover the first image features of different spatial scales, thereby improving the performance of target detection.
  • the second target residual network generally includes the residual network located at the extreme end of the basic network.
  • the residual network included in the first target residual network and the residual network included in the second target residual network may be completely the same, may be completely different, or may be partially the same.
  • the basic network of the predetermined neural network includes a primary feature extraction network, a residual network RN1, a residual network RN2, and a residual network RN3, and the four are sequentially connected.
  • the first target residual network may include a residual network RN2 and a residual network RN3, and the second target residual network may include a residual network RN3.
  • the primary feature extraction network first performs convolution processing and pooling processing on the image to be measured, and then the residual network RN1 performs feature processing on the output of the primary feature extraction network.
  • the residual network RN2 performs feature extraction processing on the output of the residual network RN1, and then the residual network RN3 performs feature extraction processing on the output of the residual network RN2.
  • the output result of the residual network RN2 and the output result of the residual network RN3 will be output as a first image feature to an output network of a predetermined neural network, and the output result of the residual network RN3 will be output as a second image feature to a predetermined Hole convolutional network for neural networks.
  • the steps of performing feature extraction according to the first intermediate feature through the residual network in the basic network and outputting the extracted first image feature and second image feature corresponding to the image to be measured may include the following steps: Steps: Downsampling the first intermediate feature by the downsampling module in the residual network to obtain and output the second intermediate feature; mapping the second intermediate feature to the image to be tested through the first residual block in the residual network Corresponding first and second image features.
  • the residual network includes a downsampling module and a first residual block.
  • the downsampling module is used to implement similar functions to the pooling layer, that is, to perform dimension reduction processing on image features.
  • the downsampling module may include a 1 ⁇ 1 ordinary convolution layer, a normalization layer (BN), an activation layer (Rectified Linear Units, RELU), and a 3 ⁇ 3
  • the normal convolution layer, the normalization layer, the 1 ⁇ 1 normal convolution layer, the normalization layer, and the activation layer are connected in sequence.
  • the second intermediate feature is an image feature obtained by the downsampling module in the residual network after downsampling the input information of the residual network.
  • the residual block is the basic block of the residual network.
  • the residual block usually includes a residual branch and a short-circuit branch.
  • the residual branch is used to non-linearly transform the input information of the residual block.
  • the short-circuit branch is used to The input information of the residual block is subject to identity transformation or linear transformation.
  • the first residual block is a residual block in the base network.
  • the first residual block can be an existing residual block, such as the conventional residual block shown in FIG. 5 or the bottleneck residual block shown in FIG. 6 (Bottleneck Residual Block), etc. Residual block is transformed.
  • the first residual block maps the second intermediate feature to a mapping method of the first image feature and the second image feature corresponding to the image to be tested, which corresponds to the internal structure of the first residual block.
  • the mapping method under different internal structures can be A little different.
  • the second intermediate feature is sequentially subjected to a 3 ⁇ 3 ordinary convolution layer for convolution processing and a normalization layer for normalization.
  • Normalization processing non-linear transformation processing for the activation layer, convolution processing for the 3 ⁇ 3 ordinary convolution layer, and normalization processing for the normalization layer; on the short-circuit branch, the second intermediate feature is identity mapped; , Synthesize the operation results of the residual branch and the operation results of the short-circuit branch, and non-linearly transform the synthesis result through the activation layer, so as to obtain the output result of the first residual block.
  • the first residual block is the first target residual block
  • the output result of the first residual block is the first image feature corresponding to the image to be measured
  • the first residual block is the second target residual.
  • Block the output result of the first residual block is the second image feature corresponding to the image to be tested.
  • each first residual block is connected sequentially.
  • the first intermediate feature is sequentially extracted through each residual network in the basic network, and the first target feature corresponding to the image under test is output through the first target residual network, and the target feature is output through the second target residual network.
  • the step of measuring the second image feature corresponding to the image may include the following steps: the first intermediate feature is sequentially passed through the first residual block in each residual network for feature extraction, and the first target in the first target residual network is passed The residual block outputs a first image feature corresponding to the image to be tested, and the second target residual block in the second target residual network outputs a second image feature corresponding to the image to be tested.
  • the first target residual block may be used to output a first image feature corresponding to the image to be tested to an output network of a predetermined neural network.
  • the first target residual block is selected from each first residual block in the first target residual network.
  • the first target residual block may include one or more first residual blocks specified in advance in each first residual block included in the first target residual network.
  • the second target residual block may be used to output a second image feature corresponding to the image to be tested to an output network of a predetermined neural network. Similarly, the second target residual block is selected from each first residual block in the second target residual network.
  • the second target residual block may include one or more first residual blocks specified in the first residual blocks included in the second target residual network in advance.
  • the first target residual block may include a first residual block located at the end of the first target residual network, because the output result of the first residual block located at the end of the first target residual network is The convolution layer experienced is the most in the first target residual network. Therefore, the output result of the first residual block located at the end of the residual network is output as the first image feature to the output network of the predetermined neural network. Can improve the performance of target detection.
  • the second target residual block may also include a first residual block located at the extreme end of the second target residual network.
  • the residual network RN3 includes a first residual block RB1, which is connected sequentially.
  • the first residual block RB2, the first residual block RB3, and the first residual block RB4 are four in total. It is assumed that the first target residual block includes a first residual block RB4 and the second target residual block includes a first residual block RB4.
  • the second intermediate feature is input into the residual network RN3, the second intermediate feature is first extracted by the first residual block RB1, and then the first residual block RB2 is applied to the first residual block RB1.
  • the output result is subjected to feature extraction, and then the first residual block RB3 performs feature extraction on the output result of the first residual block RB2, and then the first residual block RB4 performs feature extraction on the output result of the first residual block RB3.
  • the output result of the first residual block RB4 will be output to the output network of the predetermined neural network as the first image feature on the one hand, and will be output to the hole convolutional network of the predetermined neural network as the second image feature on the other hand.
  • the second intermediate feature is mapped to the first image feature and the second image feature corresponding to the image to be tested through the first residual block in the residual network, and the first image feature and the second image are output.
  • the feature step may include the following steps: a first separable convolution is performed according to the second intermediate feature through the first residual block in the residual network to obtain the first feature component; and the second intermediate feature identity is mapped to the second Feature components; combining first and second feature components to obtain a first target feature; mapping the first target feature to a first image feature and a second image feature corresponding to the image to be measured, and outputting the first image feature And second image feature.
  • the first residual block is obtained by reconstructing an existing residual block, and the following reconstruction method may be adopted: the existing residual block (such as the residual block shown in FIGS. 5 and 6)
  • the existing residual block such as the residual block shown in FIGS. 5 and 6)
  • the 3 ⁇ 3 ordinary convolutional layer used for feature extraction in China is replaced by a deep separable convolutional layer.
  • Depth separable convolution is a convolution method in which each channel uses a convolution kernel to perform convolution to obtain the output result of a corresponding channel, and then fuses information.
  • Feature extraction using deep separable convolution can reduce the size of the underlying network and increase the speed of the network.
  • feature extraction is performed on the second intermediate feature through a deep separable convolution layer to obtain a second intermediate feature correspondence.
  • identity mapping is performed on the second intermediate feature to obtain a second feature component corresponding to the third intermediate feature; and further, the first feature component and the second feature component are synthesized to obtain a first feature component.
  • combining the two feature components may be adding the two feature components.
  • the step of performing depth-separate convolution according to the second intermediate feature to obtain the first feature component may include the following steps: performing the second intermediate feature in order of dimensionality reduction, depth-separate convolution, and dimensional improvement, A first feature component is obtained.
  • the residual branch in the first residual block may include a dimensionality reduction layer, a depth separable convolution layer, and an dimensionality improvement layer, and the three are connected in sequence.
  • a dimensionality reduction layer is used to perform dimensionality reduction processing on the input information of the residual branch (that is, the second intermediate feature), thereby reducing the amount of parameters on the deep separable convolutional layer.
  • the dimensionality increasing layer is used to perform the dimensionalizing processing on the output result of the deep separable convolution layer, thereby ensuring that the input and output of the residual branch have the same dimension.
  • the dimensionality reduction layer may include a 1 ⁇ 1 ordinary convolution layer, a normalization layer, and an activation layer that are sequentially connected.
  • the dimension upgrading layer may include a 1 ⁇ 1 ordinary convolution layer and a normalization layer connected in sequence.
  • the dimensionality reduction layer and the dimensionality improvement layer may also adopt other adapted network structures, which are not limited in this application.
  • the second intermediate feature is input into the residual branch of the first residual block, and the second intermediate feature is first reduced by the dimensionality reduction layer, and then the depth-resolvable convolutional layer is used to reduce the dimensionality layer.
  • the output result of is convolved, and the output result of the depth-separate convolutional layer is up-dimensionalized by the dimensionalization layer, so as to obtain the first feature component.
  • the step of obtaining a third image feature corresponding to the image to be tested by performing a hole convolution process according to the second image feature through a hole convolution network in a predetermined neural network may include the following steps: The second residual block in the network is subjected to hole convolution according to the second image feature to obtain the third feature component; the second image feature is linearly mapped to the fourth feature component; and the third feature component and the fourth feature component are used for synthesis. To obtain a second target feature; and map the second target feature to a third image feature corresponding to the image to be measured.
  • the second residual block is the residual block in the hole convolutional network.
  • the second residual block may be obtained by reconstructing an existing residual block, and the following reconstruction method may be adopted: the existing residual block (such as the residual block shown in FIG. 5 and FIG. 6) is used The 3 ⁇ 3 ordinary convolution layer used for feature extraction was replaced with a hole convolution layer.
  • the second image feature of the input second residual block on the residual branch, feature extraction is performed according to the second image feature through a hole convolution layer to obtain a third feature component; on the short-circuit branch, The second image feature is linearly mapped to the fourth feature component; further, the third feature component and the fourth feature component are synthesized to obtain a second target feature; and then, the second target feature is nonlinearly transformed through the activation layer to obtain An output result of the second residual block (ie, a third image feature), and the third image feature is output to an output network of a predetermined neural network.
  • an additional convolution layer for feature extraction is provided on the short-circuit branch of the second residual block.
  • the additional convolution layer may include a 1 ⁇ 1 ordinary convolution layer and a normalization layer, and the two are sequentially connected. According to this, on the short-circuit branch, the input information of the second residual block is convolved by the 1 ⁇ 1 ordinary convolution layer, and then the output result of the 1 ⁇ 1 ordinary convolution layer is normalized by the normalization layer. To obtain the fourth feature component.
  • each second residual block is connected in sequence.
  • the second image feature output by the basic network is sequentially extracted through each second residual block in the hole convolution network, and the output results of each second residual block are used as the third image feature and output to the predetermined neural network.
  • the output network of the network is not limited to one second residual block in the hole convolutional network.
  • the step of performing hole convolution processing to obtain the third feature component according to the second image feature may include the following steps: performing the second image feature in order to reduce the dimension, the hole convolution, and the dimension to obtain the third feature. Feature components.
  • the residual branches in the second residual block include a dimensionality reduction layer, a hole convolution layer, and an dimensionality improvement layer, and the three are connected in sequence.
  • the input information of the second residual block is reduced by the dimensionality reduction layer, and then the output result of the dimensionality reduction layer by the hole convolution layer is reduced.
  • Feature extraction is performed, and the output result of the convolutional layer of the hole is further dimensioned by the dimension-upgrading layer, thereby obtaining a third feature component.
  • the input information of the second residual block at the forefront in the hole convolutional network is the second image feature, and the input information of the second residual block at the forefront of the hole convolutional neural network is the The output of the previous residual block of the second residual block.
  • a predetermined neural network includes a basic network, a hole convolutional network, and an output network.
  • the basic network includes a primary feature extraction network, a first residual network, a second residual network, and a third residual network, and the four are sequentially connected.
  • the primary feature extraction network includes a 3 ⁇ 3 ordinary convolution layer and a 3 ⁇ 3 maximum pooling layer, and the two are connected in sequence.
  • the first residual network includes a downsampling module and three first residual blocks, and the four are sequentially connected;
  • the second residual network includes a downsampling module and seven first residual blocks, and eight are in order Connection;
  • the third residual network includes a downsampling module and three first residual blocks, and the four are connected in sequence.
  • the hole convolutional network includes three sequentially connected second residual blocks.
  • a target detection method implemented according to a predetermined neural network shown in FIG. 8 is provided.
  • the method may include the following steps S902 to S922.
  • the image to be tested is input into the primary feature extraction network, so that the image to be tested is sequentially convolved through a 3 ⁇ 3 ordinary convolution layer in the primary feature extraction network, and the 3 ⁇ 3 maximum pooling layer is reduced in dimension.
  • S920 Classification and regression are performed according to the first image feature and the third image feature through the output network to determine the candidate position parameters corresponding to the target object in the image to be measured and the confidence level corresponding to the candidate position parameters.
  • S922 Filter out valid position parameters from each candidate position parameter according to the confidence level, and determine the position of the target object in the image to be measured according to the valid position parameters.
  • the number of channels of each layer can be uniformly scaled as needed, that is, the width of the network can be dynamically adjusted, so that the network effect and speed can be flexibly adjusted. In the actual experimental process, a smaller network width coefficient was selected. Finally, the basic network size of the predetermined neural network pre-trained on ImageNet (an image sample data set) was 3M, and the Top-1 accuracy reached 56%.
  • the three successive layers of the CONV layer (convolution layer), BN layer (normalization layer), and Scale layer (linear transformation layer) in the predetermined neural network can be merged and reduced to one CONV layer, thereby reducing the network.
  • the network volume can be reduced by about 5%, and the speed can be increased by 5% -10%.
  • the predetermined neural network in order to deploy the trained predetermined neural network to a mobile terminal, the predetermined neural network may be converted into a caffe model.
  • the conversion tool that comes with the NCNN framework (Tencent's open source deep learning forward framework) to convert the caffe model to the NCNN model, and convert the model parameters in the conversion process.
  • the model parameters can be quantized to 16 bits, and the size of the model can be reduced from 2.1M to 960K through the above simplification and compression operations.
  • the target detection methods provided by the embodiments of the present application can be applied to an identification code detection scenario, that is, the target object is an identification code.
  • the terminal obtains the image to be tested, first determine the position of the identification code in the image to be tested by using the target detection method provided by any embodiment of the present application, and then according to the position of the identification code in the image to be tested, the identification code in the image to be tested For identification.
  • the target detection method also supports the application scenario of one image and multiple codes.
  • each offset parameter is filtered according to the corresponding confidence level, and the effective offset parameter obtained by the filtering is used.
  • the target objects in the image to be measured are determined, and it is determined that the number of positions where each target object is located matches the number of identification codes in the image to be measured.
  • the target detection method in this application is used in the identification code detection process, and the comparison of the average single frame time and decoding success rate of other existing target detection schemes is shown in Figure 10. As can be seen from the figure, the target detection method in this application can effectively detect multiple identification codes of different sizes and angles in real time, and has good accuracy and recall while taking into account the time-consuming and comprehensive performance of the mobile terminal. Strong.
  • a target detection device 1100 is provided, which may include the following modules 1102 to 1110.
  • the image-to-be-measured acquisition module 1102 is configured to acquire the image to be-measured.
  • An image feature acquisition module 1104 is configured to extract a first image feature and a second image feature corresponding to an image to be measured.
  • a hole convolution processing module 1106 is configured to perform hole convolution according to the second image feature to obtain a third image feature corresponding to the image to be tested.
  • the candidate parameter acquisition module 1108 is configured to perform classification and regression according to the first image feature and the third image feature, and determine a candidate position parameter corresponding to the target object in the image to be measured and a confidence level corresponding to the candidate position parameter.
  • the target position determination module 1110 is configured to filter valid position parameters from each candidate position parameter according to the confidence degree, and determine the position of the target object in the image to be measured according to the valid position parameters.
  • the above object detection device extracts a first image feature and a second image feature corresponding to the image to be tested, and performs hole convolution based on the second image feature to obtain a third image feature corresponding to the image to be tested.
  • the third image feature is classified and regressed, and the position of the target object in the image to be tested is determined according to the classification and regression results.
  • automatically extracting the image features corresponding to the image to be tested, and performing classification and regression based on the extracted image features can effectively improve the robustness of the detection and reduce the detection time.
  • the receptive field is effectively enlarged through the hole convolution processing, which can better adapt to the detection of target objects of different sizes.
  • the image feature acquisition module 1104 is configured to extract and output the first image feature and the second image feature corresponding to the image to be tested through the basic network in the predetermined neural network;
  • the hole convolution processing module 1106 is configured to pass the predetermined The hole convolution network in the neural network performs hole convolution according to the second image feature to obtain and output the third image feature corresponding to the image to be tested;
  • the candidate parameter acquisition module 1108 is used to pass the output network in the predetermined neural network according to the first An image feature and a third image feature are classified and regressed to determine a candidate position parameter corresponding to the target object in the image to be measured and a confidence level corresponding to the candidate position parameter.
  • the image feature acquisition module 1104 may include the following units: a first intermediate feature output unit, configured to sequentially perform convolution processing and pooling processing on the image to be tested through the primary feature extraction network in the basic network, and output the A first intermediate feature corresponding to the measured image; an image feature acquiring unit, configured to perform feature extraction according to the first intermediate feature through a residual network in the basic network, and output the extracted first image feature and Second image feature.
  • a first intermediate feature output unit configured to sequentially perform convolution processing and pooling processing on the image to be tested through the primary feature extraction network in the basic network, and output the A first intermediate feature corresponding to the measured image
  • an image feature acquiring unit configured to perform feature extraction according to the first intermediate feature through a residual network in the basic network, and output the extracted first image feature and Second image feature.
  • the image feature acquisition unit may include the following subunits: a downsampling subunit, configured to downsample the first intermediate feature through a downsampling module in the residual network to obtain and output a second intermediate feature; A difference processing subunit, configured to map a second intermediate feature to a first image feature and a second image feature corresponding to an image to be tested through a first residual block in a residual network, and output the first image feature and the second Image features.
  • a downsampling subunit configured to downsample the first intermediate feature through a downsampling module in the residual network to obtain and output a second intermediate feature
  • a difference processing subunit configured to map a second intermediate feature to a first image feature and a second image feature corresponding to an image to be tested through a first residual block in a residual network, and output the first image feature and the second Image features.
  • the residual processing sub-unit may be further configured to: perform a deep separable convolution according to the second intermediate feature through the first residual block in the residual network to obtain the first feature component;
  • Feature identity mapping is the second feature component; combining the first feature component and the second feature component to obtain the first target feature; mapping the first target feature to the first image feature and the second image feature corresponding to the image to be measured And output the first image feature and the second image feature.
  • the residual processing sub-unit may be further configured to perform the dimensionality reduction, the depth separable convolution, and the dimensionality enhancement on the second intermediate feature in order to obtain the first feature component.
  • the image feature acquisition unit may be further configured to: first extract the first intermediate feature through each residual network in the basic network for feature extraction, output the first image feature corresponding to the image to be tested through the first target residual network, and A second image feature corresponding to the image to be tested is output through the second target residual network; wherein the first target residual network and the second target residual network are selected from each residual network in the basic network.
  • the image feature acquisition unit may be configured to perform feature extraction by sequentially passing the first intermediate feature through the first residual block in each residual network, and outputting the target object through the first target residual block in the first target residual network. Measure the first image feature corresponding to the image, and output the second image feature corresponding to the image to be tested through the second target residual block in the second target residual network; wherein the first target residual block is selected from the first target Each first residual block in the residual network, and the second target residual block are selected from each first residual block in the second target residual network.
  • the hole convolution processing module 1106 may include the following units: the hole convolution processing unit is configured to perform hole convolution according to a second image feature through a second residual block in the hole convolution network to obtain a first Three feature components; a linear mapping unit for linearly mapping the second image feature into a fourth feature component; a feature synthesis unit for synthesizing according to the third feature component and the fourth feature component to obtain a second target feature; feature mapping A unit configured to map the second target feature to a third image feature corresponding to the image to be tested.
  • the hole convolution processing unit is further configured to sequentially perform dimension reduction, hole convolution, and dimension enhancement on the second image feature to obtain a third feature component.
  • the image-to-be-tested acquisition module 1102 may include the following units: an original image acquisition unit for acquiring the original image; a description information acquisition unit for acquiring terminal description information used to characterize the computing capability of the terminal; a resolution
  • the adjusting unit is configured to adjust the original image according to the reference resolution matching the description information of the terminal to obtain an image to be tested.
  • the target object includes an identification code
  • the identification code includes at least one of a two-dimensional code, a one-dimensional code, and an applet code.
  • Each module in the target detection device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
  • a computer device including a memory and a processor.
  • the memory stores a computer program
  • the processor executes the computer program to implement the steps in the target detection method provided by any embodiment of the present application.
  • the computer device may be the terminal 110 shown in FIG. 1, and the internal structure diagram may be shown in FIG. 12.
  • the computer equipment includes a processor, a memory, a network interface, a display screen, and an input device connected through a system bus.
  • the processor is used to provide computing and control capabilities.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for running the operating system and the computer program in the non-volatile storage medium.
  • the computer program is executed by a processor to implement a target detection method.
  • This network interface is used to communicate with external terminals through a network connection.
  • the display may be a liquid crystal display or an electronic ink display.
  • the input device of the computer equipment may be a touch layer covered on a display screen, or a button, a trackball or a touchpad provided on the computer equipment housing, or an external keyboard, a touchpad or a mouse.
  • the computer device may be the server 120 shown in FIG. 1, and its internal structure diagram may be shown in FIG. 13.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor is used to provide computing and control capabilities.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an operation of the operating system and the computer program in the non-volatile storage medium. surroundings.
  • This database is used to store sample data for the training model.
  • This network interface is used to communicate with external terminals through a network connection.
  • the computer program is executed by a processor to implement a target detection method.
  • FIG. 12 and FIG. 13 are only block diagrams of some structures related to the solution of the application, and do not constitute a limitation on the computer equipment to which the solution of the application is applied.
  • Computer equipment may include more or fewer components than shown in the figures, or some components may be combined, or have a different arrangement of components.
  • the target detection device provided in each embodiment of the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in FIG. 12 or 13.
  • the memory of the computer device may store various program modules constituting the target detection device, for example, the image acquisition module 1102 shown in FIG. 11, the image feature acquisition module 1104, the hole convolution processing module 1106, the candidate parameter acquisition module 1108, And a target position determination module 1110.
  • the computer program constituted by each program module causes the processor to execute the steps in the target detection method of each embodiment of the present application described in this specification.
  • the computer device shown in FIG. 12 or FIG. 13 may perform step S202 by the image acquisition module 1102 to be tested in the target detection device shown in FIG. 11, execute step S204 by the image feature acquisition module 1104, and perform convolution processing by holes
  • the module 1106 performs step S206, the candidate parameter acquisition module 1108 performs step S208, the target position determination module 1110 performs step S210, and so on.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the target detection method provided by any embodiment of the present application is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种目标检测方法、装置、存储介质及计算机设备,该方法包括:获取待测图像;提取所述待测图像对应的第一图像特征和第二图像特征;根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征;根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度;按照所述置信度从各所述候选位置参数中筛选出有效位置参数,并根据所述有效位置参数确定所述待测图像中目标对象所在的位置。本申请中的方案能够提高目标检测的鲁棒性及减少耗时。

Description

目标检测方法、装置、计算机可读存储介质及计算机设备
本申请要求于2018年08月24日提交的申请号为201810974541.2、发明名称为“目标检测方法、装置、计算机可读存储介质及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种目标检测方法、装置、计算机可读存储介质及计算机设备。
背景技术
随着计算机技术的发展,人们越来越多地通过图像来传递信息。目标检测是图像处理中的重要分支,其目的是确定图像中目标对象所在的位置。
传统的目标检测方式,是通过在图像中查找目标对象上的定位点来确定目标对象在图像中的位置。以目标对象是二维码为例,在图像中查找设置在二维码的三个顶点处的定位标志,从而确定二维码在图像中的位置。然而,传统的目标检测方式鲁棒性不佳且耗时长。
发明内容
根据此,有必要提供一种目标检测方法、装置、计算机可读存储介质及计算机设备,用于解决传统技术中鲁棒性不佳且耗时长的技术问题。
一方面,提供了一种目标检测方法,应用于计算机设备中,所述方法包括:
获取待测图像;
提取所述待测图像对应的第一图像特征和第二图像特征;
根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征;
根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度;
按照所述置信度从各所述候选位置参数中筛选出有效位置参数,并根据所述有效位置参数确定所述待测图像中目标对象所在的位置。
另一方面,提供了一种目标检测装置,所述装置包括:
待测图像获取模块,用于获取待测图像;
图像特征获取模块,用于提取所述待测图像对应的第一图像特征和第二图像特征;
空洞卷积处理模块,用于根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征;
候选参数获取模块,用于根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度;
目标位置确定模块,用于按照所述置信度从各所述候选位置参数中筛选出有效位置参数,并根据所述有效位置参数确定所述待测图像中目标对象所在的位置。
又一方面,提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述目标检测方法中的步骤。
再一方面,提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述目标检测方法中的步骤。
上述目标检测方法、装置、计算机可读存储介质及计算机设备,提取待测图像对应的第一图像特征和第二图像特征,再根据第二图像特征进行空洞卷积,得到待测图像对应的第三图像特征,进而根据第一图像特征和第三图像特征进行分类及回归,并根据分类及回归的结果确定待测图像中目标对象所在的位置。如此,自动提取待测图像对应的图像特征,并根据提取到的图像特征进行分类和回归,能够有效地提高检测的鲁棒性及减少检测耗时。并且,通过空洞卷积处理有效地扩大了感受野,能够更好地适应不同大小的目标对象的检测。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中目标检测方法的应用环境图;
图2为一个实施例中目标检测方法的流程示意图;
图3为一个实施例中预定神经网络的结构框图;
图4为一个实施例中下采样模块的结构框图;
图5为一个实施例中残差块的结构框图;
图6为一个实施例中残差块的结构框图;
图7为一个实施例中第二残差块的结构框图;
图8为一个实施例中预定神经网络的结构框图;
图9为一个实施例中目标检测方法的流程示意图;
图10为一个实施例中目标检测装置的结构框图;
图11为一个实施例中在识别码检测中的指标对比示意图;
图12为一个实施例中计算机设备的结构框图;
图13为一个实施例中计算机设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用于解释本申请,并不用于限定本申请。
需要说明的是,本申请所使用的术语“第一”、“第二”等是用于对类似的对象作出命名上的区分,但这些对象本身不受这些术语限制。应当理解,在不脱离本申请的范围的情况下,这些术语在适当的情况下可以互换。例如,可以将“第一图像特征”描述为“第二图像特征”,且类似地,将“第二图像特征”描述为“第一图像特征”。
此外,术语“包括”、“包含”、“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于已清楚地列出的步骤或单元,而是还可以包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请各实施例提供的目标检测方法,可以应用于如图1所示的应用环境中。该应用环境可以涉及终端110和服务器120,终端110和服务器120可以通过网络连接。
其中,可以在服务器120上完成模型训练,得到具备目标检测能力的预定神经网络。进而,将该预定神经网络部署到终端110上。终端110获得待测图像后,将待测图像输入预定神经网络,通过预定神经网络提取待测图像对应的第一图像特征和第二图像特征,再根据第二图像特征进行空洞卷积,得到待测图像对应的第三图像特征,进而根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度,而后按照置信度从各候选位置参数中筛选出有效位置参数,并根据有效位置参数确定待测图像中目标对象所在的位置。
在其他实施例中,预定神经网络也可以不部署在终端110上,而部署在服务器120上,在此情况下,终端110获得待测图像后,可以将待测图像发送至服务器120,再由服务器120完成上述从将待测图像输入预定神经网络到确定待测图像中目标对象所在的位置的任务。此 外,也可以在终端110上完成模型训练,比如可以由终端110独立完成模型训练和从将待测图像输入预定神经网络到确定待测图像中目标对象所在的位置的任务,而无需服务器120参与。
其中,终端110可以是智能手机、平板电脑、笔记本电脑、台式计算机、个人数字助理和穿戴式设备等,但并不局限于此。服务器120可以用独立的物理服务器,或者多个物理服务器构成的服务器集群来实现。可以理解,在图1中,以手机表示终端110、以独立的物理服务器表示服务器120,均仅是一种示例性说明,并不用于限定终端110和服务器120。
在一个实施例中,如图2所示,提供了一种目标检测方法。以该方法应用于计算机设备(如上述图1中的终端110或服务器120)为例进行说明。该方法可以包括如下步骤S202至S210。
S202,获取待测图像。
其中,待测图像是需要进行目标检测的图像。目标检测的目的是确定目标对象在图像中所处的位置。目标对象本质上是图像内容,其可以根据实际需求预先设定。比如,目标对象可以是识别码、车辆、行人、人脸等,识别码可以是二维码、一维码(又称条形码)、小程序码、PDF(Portable Data File,便携数据文件)417码等,但目标对象和识别码均不局限于此。
在一个实施例中,待测图像可以是未作调整的原始图像,即终端获得原始图像后,未对该原始图像进行调整,而是直接将该原始图像本身作为待测图像。在另一个实施例中,待测图像也可以是调整原始图像后得到的图像,即终端获得原始图像后,为了更好地进行目标检测而对该原始图像进行调整,再将调整后的图像作为待测图像。
其中,对图像进行调整的调整方式可以包括调整图像的分辨率,即可以将原始图像的分辨率调整为根据实际需求预先设定的参考分辨率。在一个实施例中,预先设定的参考分辨率可以为一个,在此情况下,全部待测图像的分辨率均统一为该参考分辨率。
在另一个实施例中,可以根据终端的运算能力设置待测图像的分辨率,即在具有不同运算能力的终端上进行目标检测时,所使用的待测图像的分辨率可以有所不同。可选的,预先设定的参考分辨率可以多于一个,可以预先建立各参考分辨率与各终端描述信息之间的匹配关系,终端描述信息用于表征终端的运算能力。在此情况下,步骤S202可以包括如下步骤:获取原始图像,并获取用于表征终端的运算能力的终端描述信息,再根据与终端描述信息匹配的参考分辨率调整原始图像,得到待测图像。
在一个实施例中,可以对终端描述信息进行分类,不同类别的终端描述信息所表征的终端运算能力有所不同,各类别的终端描述信息与各参考分辨率分别匹配。举例说明,将终端 描述信息划分为高端终端描述信息和低端终端描述信息,高端终端描述信息表征的终端运算能力高于低端终端描述信息表征的终端运算能力,高端终端描述信息与第一参考分辨率匹配,低端终端描述信息与第二参考分辨率匹配,第一参考分辨率可以高于第二参考分辨率,比如第一参考分辨率是512×512,第二参考分辨率是300×300。可以理解,对不同类别的终端描述信息所匹配的参考分辨率作出区分,能够在高端终端上提高目标检测的准确性,在低端终端上提高目标检测的实时性。
此外,对图像进行调整的调整方式可以根据实际需求确定,而不局限于调整图像的分辨率,比如也可以包括调整图像的对比度、曝光、色彩等图像属性。
S204,提取待测图像对应的第一图像特征和第二图像特征。
第一图像特征和第二图像特征,两者均与待测图像对应,且均可以用于反映待测图像的图像特质。其中,第一图像特征是需要进行分类及回归的图像特征,第二图像特征是需要进行空洞卷积处理的图像特征。
第一图像特征的数目可以是等于或大于一的整数。第一图像特征多于一个时,各第一图像特征可以具有不同的空间尺度,比如提取到两个第一图像特征,其中一个第一图像特征的空间尺度为19×19,另一个第一图像特征的空间尺度为10×10。类似地,第二图像特征的数目也可以是等于或大于一的整数,第二图像特征多于一个时,各第二图像特征也可以具有不同的空间尺度。
在一个实施例中,可以通过预定神经网络提取待测图像对应的第一图像特征和第二图像特征。在此情况下,第一图像特征和第二图像特征均可以是特征图(Feature Map),其数据形态可以为向量。
预定神经网络,是根据已标定目标对象所在位置的样本图像预先训练得到的神经网络,其具备完成目标检测的能力。以目标对象为识别码为例,可以获取海量的识别码样本图像,对于任一识别码样本图像,该识别码样本图像中包含识别码这一目标对象,且标定了该识别码样本图像中识别码所在的位置,据此可以根据海量的识别码样本图像进行模型训练,得到预定神经网络。预定神经网络可以实现端到端的学习(End-to-end Learning),即可以直接将待测图像输入预定神经网络,进而预定神经网络直接输出用于预测待测图像中目标对象所在的位置的预测参数,即待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度。
S206,根据第二图像特征进行空洞卷积,得到待测图像对应的第三图像特征。
空洞卷积(dilated convolution),也称为扩张卷积,是在卷积核之间注入空洞的一种卷积方式。相较于普通卷积,空洞卷积引入了一个称为“扩张率(dilation rate)”的超参数,该参 数定义了卷积核处理数据时各值的间距。
第三图像特征,是根据第二图像特征进行空洞卷积处理得到的图像特征。与第一图像特征和第二图像特征类似,第三图像特征也可以用于反映待测图像的图像特质,其也可以是特征图。第三图像特征的空间尺度可以与第二图像特征相同。此外,第三图像特征的数目也可以是等于或大于一的整数,第三图像特征多于一个时,各第三图像特征可以具有相同的空间尺度,比如第二图像特征的空间尺度为10×10,根据第二特点进行空洞卷积处理后得到三个第三图像特征,这三个第三图像特征的三个空间尺度均为10×10。
需要说明的是,通过空洞卷积处理,一方面能够保持图像特征的空间尺度不变,从而避免因减少了图像特征的像素的信息而导致的信息损失,另一方面能够扩大感受野,从而实现更加精准的目标检测。其中,感受野是神经网络中的隐藏层输出的特征图上的像素点在原始图像上映射的区域大小,像素在原始图像上的感受野越大,表示其映射的原始图像范围越大,也意味着其可能蕴含更为全局、语义层次更高的特征。
S208,根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度。
候选位置参数,可以用于确定待测图像中目标对象所在的候选位置。置信度,用于表征相应候选位置参数所对应的候选位置是待测图像中目标对象所在的位置的概率。其中,候选位置参数通常多于一个,各候选位置参数都具有其对应的置信度。
在一个实施例中,根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度,可以通过SSD(Single Shot MultiBox Detector)目标检测方式中所提及的多框预测方式实现。
大致而言,第一图像特征和第三图像特征均对应若干个预选框(即Bounding Box)。预选框是用于预测目标对象在待测图像中所在的位置的矩形框。分别对各预选框进行分类和回归后,可以得到与各预选框分别对应的各偏移参数(回归得到)以及与各预选框分别对应的各置信度(分类得到),对于任一预选框,与该预选框对应的偏移参数用于确定该预选框在待测图像上映射的区域所在的位置,与该预选框对应的置信度用于表征该预选框在待测图像上映射的区域涵盖目标对象的概率。其中,与各预选框分别对应的各偏移参数,即为待测图像中的目标对象所对应的各候选位置参数,与各预选框分别对应的置信度,即为与候选位置参数分别对应的各置信度。
此外,对于任一预选框,可以用四维参数组来描述该预选框在其对应的图像特征上的位置,进而根据该预选框对应的四维参数组进行回归,从而得到该预选框对应的偏移参数。四维参数组可以包括位置点的横坐标(x)、该位置点的纵坐标(y)、宽度(w)以及高度(h)。 其中,该位置点为该预选框的一个位置点,可以为该预选框的一个顶点,也可以是该预选框的中心点等等。该宽度为该预选框的宽度,该高度为该预选框的高度。
举例说明,对于第一图像特征上的一个预选框DB1,用于描述该预选框DB1在第一图像特征上的位置的四维参数组,包括该预选框DB1的左上角顶点在第一图像特征中的横坐标,该左上角顶点在第一图像特征中的纵坐标,该预选框DB1的宽度、以及该预选框DB1的高度。
预选框在待测图像上映射的区域所在的位置,也可以用四维参数组来描述。类似地,描述预选框在待测图像上映射的区域所在的位置的四维参数组,可以包括该映射的区域的一个位置点的横坐标、该位置点的纵坐标、该映射的区域的宽度、以及该映射的区域的高度。该映射的区域的一个位置点可以为该映射的区域的一个顶点,也可以是该映射的区域的中心点等等。
在一个实施例中,第一图像特征上的每一个像素可以对应预定数目的预选框,预定数目可以根据实际需求设定。在一个实施例中,第一图像特征上的同一像素所对应的预定数目的预选框,可以具有多种横纵比(aspect ratio)和尺寸(scale)。以第一图像特征F11是空间尺度为19×19的特征图,且预定数目为6为例,在此情况下,第一图像特征F11包括361(19×19)个像素,每一个像素均对应6个预选框,6个预选框可以具有多种横纵比和尺寸,则第一图像特征F11上有2166个预选框(361×6)。类似地,第三图像特征上的每一个像素均可以对应预定数目的预选框,预定数目可以根据实际需求设定。在一个实施例中,第三图像特征上的同一像素所对应的预定数目的预选框,可以具有多种横纵比和尺寸。
S210,按照置信度从各候选位置参数中筛选出有效位置参数,并根据有效位置参数确定待测图像中目标对象所在的位置。
有效位置参数,是满足预定筛选条件的候选位置参数。其中,预定筛选条件可以根据实际需求预先设定,比如,预定筛选条件可以包括候选位置参数对应的置信度大于预定置信度阈值,即将其对应的置信度大于预定置信度阈值的候选位置参数作为有效位置参数。再比如,预定筛选条件也可以包括候选位置参数的置信度最大,即将其对应的置信度是各置信度中最大的候选位置参数作为有效位置参数。
结合前文所述,有效位置参数具有与之对应的预选框(以下将有效位置参数对应的预选框称为有效预选框),可以理解,有效预选框在待测图像上映射的区域所在的位置,即为待测图像中目标对象所在的位置。筛选出有效位置参数后,根据该有效位置参数进行解码换算,即可得到四维参数组,该四维参数组用于描述该有效预选框在待测图像上映射的区域所在的位置,即待测图像中目标对象所在的位置。其中,该四维参数组可以包括该目标对象的一个 位置点的横坐标、该位置点的纵坐标、该目标对象的宽度、以及该目标对象的高度。类似地,目标对象的一个位置点可以为该目标对象的一个顶点,也可以是该目标对象的中心点等等。
上述目标检测方法,提取待测图像对应的第一图像特征和第二图像特征,再根据第二图像特征进行空洞卷积,得到待测图像对应的第三图像特征,进而根据第一图像特征和第三图像特征进行分类及回归,并根据分类及回归的结果确定待测图像中目标对象所在的位置。如此,自动提取待测图像对应的图像特征,并根据提取到的图像特征进行分类和回归,能够有效地提高检测的鲁棒性及减少检测耗时。并且,通过空洞卷积处理有效地扩大了感受野,能够更好地适应不同大小的目标对象的检测。此外,提高了对尺寸较小的目标对象的召回率。
在一个实施例中,通过预定神经网络中的基础网络,提取并输出待测图像对应的第一图像特征和第二图像特征;通过预定神经网络中的空洞卷积网络,根据第二图像特征进行空洞卷积,得到并输出待测图像对应的第三图像特征;通过预定神经网络中的输出网络,根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度。
在本实施例中,如图3所示,预定神经网络可以包括基础网络、空洞卷积网络、以及输出网络。其中,待测图像从基础网络的输入端输入预定神经网络,基础网络的输出端一方面连接输出网络的输入端,另一方面连接空洞卷积网络的输入端;空洞卷积网络的输出端连接输出网络的输入端;输出网络的输出端用于输出待测图像中的目标对象所对应的各候选位置参数以及与各候选位置参数分别对应的各置信度。需要说明的是,基础网络可以具有多个输出端,基础网络中连接输出网络的输出端与该基础网络中连接空洞卷积网络的输出端,可以完全相同,也可以完全不同,还可以部分相同。
基础网络,是可以用于进行特征提取的网络。基础网络的网络框架可以直接选用具备特征提取功能的已有网络框架,比如VGG-16(GG-Very-Deep-16CNN),也可以将已有网络框架进行改造得到。
第一图像特征,是基础网络根据待测图像得到、且用于输出至输出网络的图像特征。第二图像特征,是基础网络根据待测图像得到、且用于输出至空洞卷积网络的图像特征。基础网络输出至输出网络的第一图像特征,与该基础网络输出至空洞卷积网络的第二图像特征,两者可以完全相同,也可以完全不同,还可以部分相同。比如基础网络输出两个不同的第一图像特征和一个第二图像特征,该第二图像特征与其中一个第一图像特征相同。
第三图像特征,是空洞卷积网络根据第二图像特征得到、且用于输出至输出网络的图像特征。其中,空洞卷积网络是通过空洞卷积进行特征提取的网络。在一个实施例中,空洞卷 积网络可以由空洞卷积层堆叠而成。
输出网络,可以用于根据第一图像特征和第三图像特征进行回归处理,从而确定待测图像中的目标对象所对应的各候选位置参数。并且,输出网络还可以用于根据第一图像特征和第三图像特征进行分类处理,从而确定与各候选位置参数分别对应的各置信度。其中,输出网络可以用于对第一图像特征和第三图像特征上的预选框进行回归处理和分类处理,从而得到各预选框对应的偏移参数和置信度,即对于任一预选框,输出网络将输出该预选框对应的偏移参数及其对应的置信度。此外,输出网络的网络框架可以采用任何适配的网络框架实现,只要可以实现分类及回归功能即可,本申请不作限定。
在一个实施例中,在预定神经网络中越靠前的位置输出的图像特征上的预选框的尺寸越小,在预定神经网络中越靠后的位置输出的图像特征上的预选框的尺寸越大。亦即是说,在预定神经网络中,靠前的位置输出的图像特征用于负责小尺寸的目标对象的检测,靠后的位置输出的图像特征用于负责大尺寸的目标对象的检测。比如,在预定神经网络中,基础网络输出第一图像特征F11和第一图像特征F12至输出网络,空洞卷积网络输出第三图像特征F31、第三图像特征F32、以及第三图像特征F33至输出网络,按照输出位置由前往后的顺序对各图像特征进行排序,依次为第一图像特征F11、第一图像特征F12、第三图像特征F31、第三图像特征F32、以及第三图像特征F33,则从第一图像特征F11到第三图像特征F33,其上的预选框的尺寸逐渐增大,其负责检测的目标对象的尺寸逐渐增大,比如第一图像特征F12上的预选框的尺寸小于第三图像特征F31上的预选框的尺寸,第一图像特征F12负责检测的目标对象的尺寸小于第三图像特征F31负责检测的目标对象的尺寸。
在一个实施例中,通过预定神经网络中的基础网络,提取并输出待测图像对应的第一图像特征和第二图像特征的步骤,可以包括如下步骤:通过基础网络中的初级特征提取网络,对待测图像依次进行卷积处理和池化处理,输出待测图像对应的第一中间特征;通过基础网络中的残差网络,根据第一中间特征进行特征提取,并输出提取到的与待测图像对应的第一图像特征和第二图像特征。
在本实施例中,预定神经网络中的基础网络,包括初级特征提取网络和残差网络。其中,初级特征提取网络,是用于对待测图像进行特征提取的网络。残差网络(Residual Network,ResNet),是为非线性卷积层增加直连边的网络,可以用于对初级特征提取网络的输出结果进行进一步的特征提取,残差网络提取特征的提取方式与残差网络的内部结构对应,不同的内部结构提取特征的方式可以有所不同。
在一个实施例中,初级特征提取网络可以包括卷积层和池化层。卷积层,可以用于进行 卷积处理得到图像特征。池化层(Pooling),可以用于对图像特征进行降维处理,池化层通常包括均值池化(Mean Pooling)和最大值池化(Max Pooling)两种形式。在一个实施例中,初级特征提取网络中的卷积层可以选用3×3的普通卷积层,池化层可以选用3×3的最大值池化层。其中,3×3表示卷积核的大小。
第一中间特征,是待测图像依次经过初级特征提取网络中的卷积层进行卷积处理、池化层进行降维处理后得到的图像特征。
在本实施例中,残差网络根据其输入信息进行特征提取,得到待测图像对应的第一图像特征,并将该第一图像特征输出至预定神经网络的输出网络,以及提取待测图像对应的第二图像特征,并将该第二图像特征输出至预定神经网络的空洞卷积网络。对于基础网络中位于最前端的残差网络,其输入信息为初级特征提取网络的输出结果(即第一中间特征),对于基础网络中不位于最前端的残差网络,其输入信息为该残差网络的前一个残差网络的输出结果。
本实施例中使用残差网络构建基础网络,一方面能够有效地减少参数量和计算量,另一方面有利于网络的快速收敛,从而能够有效地解决深度网络训练困难的问题。
在一个实施例中,基础网络中的残差网络多于一个,且各残差网络顺次连接。据此,通过基础网络中的残差网络,根据第一中间特征进行特征提取,并输出提取到的与待测图像对应的第一图像特征和第二图像特征的步骤,可以包括如下步骤:将第一中间特征依次经过基础网络中的各残差网络进行特征提取,通过第一目标残差网络输出待测图像对应的第一图像特征,并通过第二目标残差网络输出待测图像对应的第二图像特征。
第一目标残差网络,可以用于输出第一图像特征至预定神经网络的输出网络。第一目标残差网络选取自基础网络包括的各残差网络。其中,第一目标残差网络可以包括预先在基础网络的各残差网络中指定的一个或多于一个的残差网络。
第二目标残差网络,可以用于输出第二图像特征至预定神经网络的空洞卷积网络。类似地,第二目标残差网络也选取自基础网络包括的各残差网络。其中,第二目标残差网络可以包括预先在基础网络的各残差网络中指定的一个或多于一个的残差网络。
需要说明的是,第一目标残差网络中包括的残差网络的数目可以尽可能多,以覆盖到不同空间尺度的第一图像特征,从而提高目标检测的性能。此外,第二目标残差网络一般包括基础网络中位于最末端的残差网络。第一目标残差网络包括的残差网络和第二目标残差网络包括的残差网络,可以完全相同,也可以完全不同,还可以部分相同。
举例说明,预定神经网络的基础网络中包括初级特征提取网络、残差网络RN1、残差网络RN2、残差网络RN3,且四者顺次连接。可以预先设定第一目标残差网络包括残差网络 RN2和残差网络RN3,第二目标残差网络包括残差网络RN3。在此情况下,将待测图像输入基础网络后,先由初级特征提取网络对待测图像进行卷积处理和池化处理,再由残差网络RN1对初级特征提取网络的输出结果进行特征处理,再由残差网络RN2对残差网络RN1的输出结果进行特征提取处理,进而由残差网络RN3对残差网络RN2的输出结果进行特征提取处理。其中,残差网络RN2的输出结果和残差网络RN3的输出结果将作为第一图像特征被输出至预定神经网络的输出网络,残差网络RN3的输出结果将作为第二图像特征被输出至预定神经网络的空洞卷积网络。
在一个实施例中,通过基础网络中的残差网络,根据第一中间特征进行特征提取,并输出提取到的与待测图像对应的第一图像特征和第二图像特征的步骤,可以包括如下步骤:通过残差网络中的下采样模块对第一中间特征进行下采样,得到并输出第二中间特征;通过残差网络中的第一残差块,将第二中间特征映射为待测图像对应的第一图像特征和第二图像特征。
在本实施例中,残差网络包括下采样模块和第一残差块。其中,下采样模块,用于实现与池化层相似的功能,即用于对图像特征进行降维处理。在一个实施例中,如图4所示,下采样模块可以包括1×1的普通卷积层、归一化层(Batch Normalization,BN)、激活层(Rectified Linear Units,RELU)、3×3的普通卷积层、归一化层、1×1的普通卷积层、归一化层以及激活层,且上述各层顺次连接。
第二中间特征,是残差网络中的下采样模块对该残差网络的输入信息进行下采样后得到的图像特征。
残差块,是残差网络的基础块,残差块通常包括残差支路和短路支路,残差支路用于对残差块的输入信息进行非线性变换,短路支路用于对残差块的输入信息进行恒等变换或线性变换。相应地,第一残差块是基础网络中的残差块。第一残差块可以直接选用已有的残差块,比如图5所示的常规残差块、或者图6所示的瓶颈残差模块(Bottleneck Residual Block)等,也可以是由已有的残差块进行改造得到。
第一残差块将第二中间特征映射为待测图像对应的第一图像特征和第二图像特征的映射方式,与第一残差块的内部结构对应,不同的内部结构下的映射方式可以有所不同。比如,第一残差块为图5所示的常规残差块时,在残差支路上,将第二中间特征依次经过3×3普通卷积层进行卷积处理、归一化层进行归一化处理、激活层进行非线性变换处理、3×3普通卷积层进行卷积处理、归一化层进行归一化处理;在短路支路上,将第二中间特征进行恒等映射;进而,将残差支路的运算结果和短路支路的运算结果进行合成,并通过激活层对合成结 果进行非线性变换,从而得到该第一残差块的输出结果。此外,若该第一残差块为第一目标残差块,该第一残差块的输出结果即为待测图像对应的第一图像特征,该第一残差块为第二目标残差块,该第一残差块的输出结果即为待测图像对应的第二图像特征。
在一个实施例中,残差网络中的第一残差块多于一个,且各第一残差块顺次连接。据此,将第一中间特征依次经过基础网络中的各残差网络进行特征提取,通过第一目标残差网络输出待测图像对应的第一图像特征,并通过第二目标残差网络输出待测图像对应的第二图像特征的步骤,可以包括如下步骤:将第一中间特征依次经过各残差网络中的第一残差块进行特征提取,通过第一目标残差网络中的第一目标残差块输出待测图像对应的第一图像特征,并通过第二目标残差网络中的第二目标残差块输出待测图像对应的第二图像特征。
第一目标残差块,可以用于输出待测图像对应的第一图像特征至预定神经网络的输出网络。第一目标残差块选取自第一目标残差网络中的各第一残差块。其中,第一目标残差块可以包括预先在第一目标残差网络包括的各第一残差块中指定的一个或多于一个的第一残差块。
第二目标残差块,可以用于输出待测图像对应的第二图像特征至预定神经网络的输出网络。类似地,第二目标残差块选取自第二目标残差网络中的各第一残差块。其中,第二目标残差块可以包括预先在第二目标残差网络包括的各第一残差块中指定的一个或多于一个的第一残差块。
在一个实施例中,第一目标残差块可以包括位于第一目标残差网络最末端的第一残差块,由于位于第一目标残差网络最末端的第一残差块的输出结果所经历的卷积层,是该第一目标残差网络中最多的,因此,将位于残差网络最末端的第一残差块的输出结果作为第一图像特征输出至预定神经网络的输出网络,能够提高目标检测的性能。类似地,第二目标残差块也可以包括位于第二目标残差网络最末端的第一残差块。
举例说明,对于其所在的残差网络既是第一目标残差网络,也是第二目标残差网络的残差网络RN3,残差网络RN3中包括顺次连接的第一残差块RB1、第一残差块RB2、第一残差块RB3、以及第一残差块RB4,共计四个第一残差块。假设预先设定第一目标残差块包括第一残差块RB4,第二目标残差块包括第一残差块RB4。在此情况下,将第二中间特征输入残差网络RN3后,先由第一残差块RB1对第二中间特征进行特征提取,再由第一残差块RB2对第一残差块RB1的输出结果进行特征提取,进而由第一残差块RB3对第一残差块RB2的输出结果进行特征提取,然后由第一残差块RB4对第一残差块RB3的输出结果进行特征提取。其中,第一残差块RB4的输出结果一方面将作为第一图像特征被输出至预定神经网络的 输出网络,另一方面将作为第二图像特征被输出至预定神经网络的空洞卷积网络。
在一个实施例中,通过残差网络中的第一残差块,将第二中间特征映射为待测图像对应的第一图像特征和第二图像特征,并输出第一图像特征和第二图像特征的步骤,可以包括如下步骤:通过残差网络中的第一残差块,根据第二中间特征进行深度可分离卷积,得到第一特征分量;将第二中间特征恒等映射为第二特征分量;根据第一特征分量和第二特征分量进行合成,得到第一目标特征;将第一目标特征映射为待测图像对应的第一图像特征和第二图像特征,并输出第一图像特征和第二图像特征。
在本实施例中,第一残差块是将已有的残差块进行改造得到,可以采用如下改造方式:将已有的残差块(如图5和图6所示的残差块)中用于进行特征提取的3×3普通卷积层替换为深度可分离卷积层。
深度可分离卷积(depthwise separable convolution),是每一个通道使用一个卷积核进行卷积后得到对应一个通道的输出结果,再进行信息的融合的卷积方式。使用深度可分离卷积的方式进行特征提取,能够精简基础网络的尺寸,提升网络的运算速度。
在本实施例中,对于输入第一残差块的第二中间特征,一方面,在残差支路上,通过深度可分离卷积层对第二中间特征进行特征提取,得到第二中间特征对应的第一特征分量;在短路支路上,将第二中间特征进行恒等映射,得到第三中间特征对应的第二特征分量;进而,将第一特征分量和第二特征分量进行合成,得到第一目标特征;而后,通过激活层对第一目标特征进行非线性变换,从而得到该第一残差块的输出结果。其中,将两个特征分量进行合成可以是将两个特征分量相加。
在一个实施例中,根据第二中间特征进行深度可分离卷积,得到第一特征分量的步骤,可以包括如下步骤:将第二中间特征依次进行降维、深度可分离卷积以及升维,得到第一特征分量。
在本实施例中,第一残差块中的残差支路可以包括降维层、深度可分离卷积层、以及升维层,且三者顺次连接。降维层,用于对残差支路的输入信息(即第二中间特征)进行降维处理,从而减少深度可分离卷积层上的参数量。升维层,用于对深度可分离卷积层的输出结果进行升维处理,从而保证残差支路的输入和输出具有相同的维度。
在一个实施例中,降维层可以包括顺次连接的1×1的普通卷积层、归一化层以及激活层。升维层可以包括顺次连接的1×1的普通卷积层和归一化层。在其他实施例中,降维层和升维层也可以采用其他适配的网络结构,本申请不作限定。
在本实施例中,将第二中间特征输入第一残差块中的残差支路,先由降维层对第二中间特征进行降维,再由深度可分离卷积层对降维层的输出结果进行卷积,进而由升维层对深度可分离卷积层的输出结果进行升维,从而得到第一特征分量。
在一个实施例中,通过预定神经网络中的空洞卷积网络,根据第二图像特征进行空洞卷积处理,得到待测图像对应的第三图像特征的步骤,可以包括如下步骤:通过空洞卷积网络中的第二残差块,根据第二图像特征进行空洞卷积,得到第三特征分量;将第二图像特征线性映射为第四特征分量;根据第三特征分量和第四特征分量进行合成,得到第二目标特征;将第二目标特征映射为待测图像对应的第三图像特征。
第二残差块,是空洞卷积网络中的残差块。类似地,第二残差块可以是将已有的残差块进行改造得到,可以采用如下改造方式:将已有的残差块(如图5和图6所示的残差块)中用于进行特征提取的3×3普通卷积层替换为空洞卷积层。
在本实施例中,对于输入第二残差块的第二图像特征,在残差支路上,通过空洞卷积层根据第二图像特征进行特征提取,得到第三特征分量;在短路支路上,将第二图像特征线性映射为第四特征分量;进而,将第三特征分量和第四特征分量进行合成,得到第二目标特征;而后,通过激活层对第二目标特征进行非线性变换,得到该第二残差块的输出结果(即第三图像特征),并将第三图像特征输出至预定神经网络的输出网络。
在一个实施例中,第二残差块的短路支路上设置用于进行特征提取的附加卷积层。其中,附加卷积层可以包括1×1普通卷积层和归一化层,且两者顺次连接。据此,在短路支路上,先由1×1普通卷积层对第二残差块的输入信息进行卷积,再由归一化层对1×1普通卷积层的输出结果进行归一化,从而得到第四特征分量。
在一个实施例中,空洞卷积网络中的第二残差块多于一个,且各第二残差块顺次连接。据此,将基础网络输出的第二图像特征依次经过空洞卷积网络中的各第二残差块进行特征提取,各第二残差块的输出结果均作为第三图像特征,输出至预定神经网络的输出网络。
在一个实施例中,根据第二图像特征进行空洞卷积处理,得到第三特征分量的步骤,可以包括如下步骤:将第二图像特征依次进行降维、空洞卷积以及升维,得到第三特征分量。
在本实施例中,如图7所示,第二残差块中的残差支路,包括降维层、空洞卷积层及升维层,且三者顺次连接。
在本实施例中,对于第二残差块的残差支路,先由降维层对该第二残差块的输入信息进行降维,再由空洞卷积层对降维层的输出结果进行特征提取,进而由升维层对空洞卷积层的 输出结果进行升维,从而得到第三特征分量。其中,对于空洞卷积网络中位于最前端的第二残差块,其输入信息为第二图像特征,对于空洞卷积神经网络中不位于最前端的第二残差块,其输入信息为该第二残差块的前一个残差块的输出结果。
在一个实施例中,如图8所示,提供了一种预定神经网络。该预定神经网络包括基础网络、空洞卷积网络以及输出网络。其中,基础网络包括初级特征提取网络、第一残差网络、第二残差网络、以及第三残差网络,且四者顺次连接。初级特征提取网络包括3×3普通卷积层和3×3最大值池化层,且两者顺次连接。第一残差网络包括一个下采样模块和三个第一残差块,且四者顺次连接;第二残差网络包括一个下采样模块和七个第一残差块,且八者顺次连接;第三残差网络包括一个下采样模块和三个第一残差块,且四者顺次连接。空洞卷积网络包括三个顺次连接的第二残差块。
如图9所示,提供了一种根据图8所示的预定神经网络实现的目标检测方法。该方法可以包括如下步骤S902至S922。
S902,获取待测图像,该待测图像为300×300×3的图像,即尺寸为300×300、且通道数为3。
S904,将待测图像输入初级特征提取网络,使待测图像依次经过初级特征提取网络中的3×3普通卷积层进行卷积、3×3最大值池化层进行降维。
S906、将3×3最大值池化层的输出结果输入至第一残差网络,使该输出结果依次经过第一残差网络中的下采样模块进行下采样、以及三个第一残差块进行特征提取。
S908,将第一残差网络中位于最末端的第一残差块的输出结果输出至第二残差网络,使该输出结果依次经过第二残差网络中的下采样模块进行下采样、以及第二残差网络中的七个第一残差块进行特征提取。
S910,将第二残差网络中位于最末端的第一残差块的输出结果(该输出结果即为其中一个第一图像特征)分别输入至输出网络和第三残差网络,使该输出结果依次经过第三残差网络中的下采样模块进行下采样、以及第三残差网络中的三个第一残差块进行特征提取。
S912,将第三残差网络中位于最末端的第一残差块的输出结果(该输出结果即为其中一个第一图像特征)分别输入至输出网络和空洞卷积网络,使该输出结果经过空洞卷积网络中位于最前端的第二残差块进行特征提取。
S914,将空洞卷积网络中位于最前端的第二残差块的输出结果(该输出结果即为其中一个第三图像特征)分别输入至输出网络和空洞卷积网络中位于中间的第二残差块,将该输出结果经过位于中间的第二残差块进行特征提取。
S916,将空洞卷积网络中位于中间的第二残差块的输出结果(该输出结果即为其中一个第三图像特征)分别输入至输出网络和空洞卷积网络中位于最末端的第二残差块,使该输出结果经过该位于最末端的第二残差块进行特征提取。
S918,将空洞卷积网络中位于最末端的第二残差块的输出结果(该输出结果即为其中一个第三图像特征)输入至输出网络。
S920,通过输出网络,根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度。
S922,按照置信度从各候选位置参数中筛选出有效位置参数,并根据有效位置参数确定待测图像中目标对象所在的位置。
需要说明的是,本实施例中对各技术特征的限定,可以与前文中对相应技术特征的限定相同,此处不加赘述。
在合理条件下应当理解,虽然前文各实施例涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
需要说明的是,上述预定神经网络中,各层的通道数可以按需统一缩放,即可以动态调整网络的宽度,从而实现可灵活调整网络效果和速度。在实际实验过程中,选用了较小的网络宽度系数,最终在ImageNet(一个图像样本数据集)上预训练的预定神经网络中的基础网络大小为3M,Top-1准确率达到了56%。
其次,预定神经网络中CONV层(卷积层)、BN层(归一化层)以及Scale层(线性变换层)这三个层连续出现的结构可以融合化简为一个CONV层,从而减少网络体积,提升网络速度。经试验,融合化简后,可以将网络体积减少5%左右,速度提升5%~10%之间。
再则,在服务器上使用PyTorch训练预定神经网络的情况下,为了将训练得到的预定神经网络部署到移动端,可以将预定神经网络转换为caffe模型。进行移动端部署时,可以使用NCNN框架(腾讯开源的深度学习前向框架)自带的转换工具将caffe模型转换为NCNN模型,并在转换过程中将模型参数进行格式转换。经试验,可以将模型参数量化到16bit,且通过上述化简和压缩操作,可以将模型的大小由2.1M降为960K。
需要说明的是,本申请各实施例提供的目标检测方法可以应用于识别码检测场景,即目标对象为识别码。终端获得待测图像时,先通过本申请任一实施例提供的目标检测方法确定识别码在待测图像中的位置,进而再根据识别码在待测图像中的位置对待测图像中的识别码进行识别。据此,对于大图小码的应用场景,无需扫描无码干扰信息,能够有效地提升识别性能。此外,目标检测方法也支持一图多码的应用场景,当待测图像中包括的识别码多于一个时,将各偏移参数按照相应的置信度过滤,并根据过滤得到的有效偏移参数确定出待测图像中的目标对象,确定各目标对象所在的位置的数目与待测图像中识别码的数目相匹配。此外,在移动端进行实际测验时,在识别码检测过程中采用本申请中的目标检测方法,与其他现有目标检测方案的单帧平均耗时及解码成功率的对比情况如图10所示,由图可知,本申请中的目标检测方法可以实时有效的检测到多个不同大小、角度的识别码,具有良好的准确率和召回率的同时,兼顾了在移动端的运行耗时,综合性能强劲。
在一个实施例中,如图11所示,提供了一种目标检测装置1100,可以包括如下模块1102至1110。
待测图像获取模块1102,用于获取待测图像。
图像特征获取模块1104,用于提取待测图像对应的第一图像特征和第二图像特征。
空洞卷积处理模块1106,用于根据第二图像特征进行空洞卷积,得到待测图像对应的第三图像特征。
候选参数获取模块1108,用于根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度。
目标位置确定模块1110,用于按照置信度从各候选位置参数中筛选出有效位置参数,并根据有效位置参数确定待测图像中目标对象所在的位置。
上述目标检测装置,提取待测图像对应的第一图像特征和第二图像特征,再根据第二图像特征进行空洞卷积,得到待测图像对应的第三图像特征,进而根据第一图像特征和第三图像特征进行分类及回归,并根据分类及回归的结果确定待测图像中目标对象所在的位置。如此,自动提取待测图像对应的图像特征,并根据提取到的图像特征进行分类和回归,能够有效地提高检测的鲁棒性及减少检测耗时。并且,通过空洞卷积处理有效地扩大了感受野,能够更好地适应不同大小的目标对象的检测。
在一个实施例中,图像特征获取模块1104用于通过预定神经网络中的基础网络,提取并输出待测图像对应的第一图像特征和第二图像特征;空洞卷积处理模块1106用于通过预定神经网络中的空洞卷积网络,根据第二图像特征进行空洞卷积,得到并输出待测图像对应的第 三图像特征;候选参数获取模块1108用于通过预定神经网络中的输出网络,根据第一图像特征和第三图像特征进行分类及回归,确定待测图像中的目标对象所对应的候选位置参数以及与候选位置参数对应的置信度。
在一个实施例中,图像特征获取模块1104可以包括如下单元:第一中间特征输出单元,用于通过基础网络中的初级特征提取网络,对待测图像依次进行卷积处理和池化处理,输出待测图像对应的第一中间特征;图像特征获取单元,用于通过基础网络中的残差网络,根据第一中间特征进行特征提取,并输出提取到的与待测图像对应的第一图像特征和第二图像特征。
在一个实施例中,图像特征获取单元可以包括如下子单元:下采样子单元,用于通过残差网络中的下采样模块对第一中间特征进行下采样,得到并输出第二中间特征;残差处理子单元,用于通过残差网络中的第一残差块,将第二中间特征映射为待测图像对应的第一图像特征和第二图像特征,并输出第一图像特征和第二图像特征。
在一个实施例中,残差处理子单元还可以用于:通过残差网络中的第一残差块,根据第二中间特征进行深度可分离卷积,得到第一特征分量;将第二中间特征恒等映射为第二特征分量;根据第一特征分量和第二特征分量进行合成,得到第一目标特征;将第一目标特征映射为待测图像对应的第一图像特征和第二图像特征,并输出第一图像特征和第二图像特征。
在一个实施例中,残差处理子单元还可以用于:将第二中间特征依次进行降维、深度可分离卷积以及升维,得到第一特征分量。
在一个实施例中,基础网络中的残差网络多于一个,且各残差网络顺次连接。据此,图像特征获取单元还可以用于:将第一中间特征依次经过基础网络中的各残差网络进行特征提取,通过第一目标残差网络输出待测图像对应的第一图像特征,并通过第二目标残差网络输出待测图像对应的第二图像特征;其中,第一目标残差网络和第二目标残差网络均选取自基础网络中的各残差网络。
在一个实施例中,残差网络中的第一残差块多于一个,且各第一残差块顺次连接。据此,图像特征获取单元可以用于:将第一中间特征依次经过各残差网络中的第一残差块进行特征提取,通过第一目标残差网络中的第一目标残差块输出待测图像对应的第一图像特征,并通过第二目标残差网络中的第二目标残差块输出待测图像对应的第二图像特征;其中,第一目标残差块选取自第一目标残差网络中的各第一残差块,第二目标残差块选取自第二目标残差网络中的各第一残差块。
在一个实施例中,空洞卷积处理模块1106可以包括如下单元:空洞卷积处理单元,用于通过空洞卷积网络中的第二残差块,根据第二图像特征进行空洞卷积,得到第三特征分量; 线性映射单元,用于将第二图像特征线性映射为第四特征分量;特征合成单元,用于根据第三特征分量和第四特征分量进行合成,得到第二目标特征;特征映射单元,用于将第二目标特征映射为待测图像对应的第三图像特征。
在一个实施例中,空洞卷积处理单元还用于:将第二图像特征依次进行降维、空洞卷积以及升维,得到第三特征分量。
在一个实施例中,待测图像获取模块1102可以包括如下单元:原始图像获取单元,用于获取原始图像;描述信息获取单元,用于获取用于表征终端的运算能力的终端描述信息;分辨率调整单元,用于根据与终端描述信息匹配的参考分辨率调整原始图像,得到待测图像。
在一个实施例中,目标对象包括识别码,识别码包括二维码、一维码以及小程序码中的至少一项。
关于目标检测装置的限定,可以参见上文中对于目标检测方法的限定,此处不加赘述。目标检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现本申请任一实施例提供的目标检测方法中的步骤。
在一个实施例中,该计算机设备可以是图1所示的终端110,其内部结构图可以如图12所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该处理器用于提供计算和控制能力。该存储器包括非易失性存储介质和内存储器,该非易失性存储介质存储有操作系统和计算机程序,该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境,该计算机程序被处理器执行时以实现目标检测方法。该网络接口用于与外部的终端通过网络连接通信。该显示屏可以是液晶显示屏或者电子墨水显示屏。该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
在一个实施例中,该计算机设备可以是图1所示的服务器120,其内部结构图可以如图13所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该处理器用于提供计算和控制能力。该存储器包括非易失性存储介质和内存储器,该非易失性存储介质存储有操作系统、计算机程序和数据库,该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该数据库用于存储训练模型的样本数据。该网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现目标检测方法。
本领域技术人员可以理解,图12和图13中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,实际的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请各实施例提供的目标检测装置可以实现为一种计算机程序的形式,计算机程序可在如图12或图13所示的计算机设备上运行。计算机设备的存储器中可存储组成该目标检测装置的各个程序模块,比如,图11所示的待测图像获取模块1102、图像特征获取模块1104、空洞卷积处理模块1106、候选参数获取模块1108、以及目标位置确定模块1110。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的目标检测方法中的步骤。例如,图12或图13所示的计算机设备可以通过如图11所示的目标检测装置中的待测图像获取模块1102执行步骤S202、通过图像特征获取模块1104执行步骤S204、通过空洞卷积处理模块1106执行步骤S206、通过候选参数获取模块1108执行步骤S208、通过目标位置确定模块1110执行步骤S210等等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
据此,在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现本申请任一实施例提供的目标检测方法。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此, 本申请专利的保护范围应以所附权利要求为准。

Claims (26)

  1. 一种目标检测方法,其特征在于,应用于计算机设备中,所述方法包括:
    获取待测图像;
    提取所述待测图像对应的第一图像特征和第二图像特征;
    根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征;
    根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度;
    按照所述置信度从各所述候选位置参数中筛选出有效位置参数,并根据所述有效位置参数确定所述待测图像中目标对象所在的位置。
  2. 根据权利要求1所述的方法,其特征在于:
    所述提取所述待测图像对应的第一图像特征和第二图像特征,包括:通过预定神经网络中的基础网络,提取并输出所述待测图像对应的第一图像特征和第二图像特征;
    所述根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征,包括:通过所述预定神经网络中的空洞卷积网络,根据所述第二图像特征进行空洞卷积,得到并输出所述待测图像对应的第三图像特征;
    所述根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度,包括:通过所述预定神经网络中的输出网络,根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度。
  3. 根据权利要求2所述的方法,其特征在于,所述通过预定神经网络中的基础网络,提取并输出所述待测图像对应的第一图像特征和第二图像特征,包括:
    通过所述基础网络中的初级特征提取网络,对所述待测图像依次进行卷积处理和池化处理,输出所述待测图像对应的第一中间特征;
    通过所述基础网络中的残差网络,根据所述第一中间特征进行特征提取,并输出提取到的与所述待测图像对应的第一图像特征和第二图像特征。
  4. 根据权利要求3所述的方法,其特征在于,所述通过所述基础网络中的残差网络,根据所述第一中间特征进行特征提取,并输出提取到的与所述待测图像对应的第一图像特征和第二图像特征,包括:
    通过所述残差网络中的下采样模块对所述第一中间特征进行下采样,得到并输出第二中 间特征;
    通过所述残差网络中的第一残差块,将所述第二中间特征映射为所述待测图像对应的第一图像特征和第二图像特征,并输出所述第一图像特征和所述第二图像特征。
  5. 根据权利要求4所述的方法,其特征在于,所述通过所述残差网络中的第一残差块,将所述第二中间特征映射为所述待测图像对应的第一图像特征和第二图像特征,并输出所述第一图像特征和所述第二图像特征,包括:
    通过所述残差网络中的第一残差块,根据所述第二中间特征进行深度可分离卷积,得到第一特征分量;
    将所述第二中间特征恒等映射为第二特征分量;
    根据所述第一特征分量和所述第二特征分量进行合成,得到第一目标特征;
    将所述第一目标特征映射为所述待测图像对应的第一图像特征和第二图像特征,并输出所述第一图像特征和所述第二图像特征。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述第二中间特征进行深度可分离卷积,得到第一特征分量,包括:
    将所述第二中间特征依次进行降维、深度可分离卷积以及升维,得到第一特征分量。
  7. 根据权利要求4所述的方法,其特征在于,所述基础网络中的残差网络多于一个,且各所述残差网络顺次连接;
    所述通过所述基础网络中的残差网络,根据所述第一中间特征进行特征提取,并输出提取到的与所述待测图像对应的第一图像特征和第二图像特征,包括:
    将所述第一中间特征依次经过所述基础网络中的各残差网络进行特征提取,通过第一目标残差网络输出所述待测图像对应的第一图像特征,并通过第二目标残差网络输出所述待测图像对应的第二图像特征;
    其中,所述第一目标残差网络和所述第二目标残差网络均选取自所述基础网络中的各残差网络。
  8. 根据权利要求7所述的方法,其特征在于,所述残差网络中的第一残差块多于一个,且各所述第一残差块顺次连接;
    所述将所述第一中间特征依次经过所述基础网络中的各残差网络进行特征提取,通过第一目标残差网络输出所述待测图像对应的第一图像特征,并通过第二目标残差网络输出所述待测图像对应的第二图像特征,包括:
    将所述第一中间特征依次经过各所述残差网络中的第一残差块进行特征提取,通过所述 第一目标残差网络中的第一目标残差块输出所述待测图像对应的第一图像特征,并通过所述第二目标残差网络中的第二目标残差块输出所述待测图像对应的第二图像特征;
    其中,所述第一目标残差块选取自所述第一目标残差网络中的各所述第一残差块,所述第二目标残差块选取自所述第二目标残差网络中的各所述第一残差块。
  9. 根据权利要求2所述的方法,其特征在于,所述通过所述预定神经网络中的空洞卷积网络,根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征,包括:
    通过所述空洞卷积网络中的第二残差块,根据所述第二图像特征进行空洞卷积,得到第三特征分量;
    将所述第二图像特征线性映射为第四特征分量;
    根据所述第三特征分量和第四特征分量进行合成,得到第二目标特征;
    将所述第二目标特征映射为所述待测图像对应的第三图像特征。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述第二图像特征进行空洞卷积,得到第三特征分量,包括:
    将所述第二图像特征依次进行降维、空洞卷积以及升维,得到第三特征分量。
  11. 根据权利要求1所述的方法,其特征在于,所述获取待测图像,包括:
    获取原始图像;
    获取用于表征终端的运算能力的终端描述信息;
    根据与所述终端描述信息匹配的参考分辨率调整所述原始图像,得到待测图像。
  12. 根据权利要求1至11任一项所述的方法,其特征在于:
    所述目标对象包括识别码,所述识别码包括二维码、一维码以及小程序码中的至少一项。
  13. 一种目标检测装置,其特征在于,所述装置包括:
    待测图像获取模块,用于获取待测图像;
    图像特征获取模块,用于提取所述待测图像对应的第一图像特征和第二图像特征;
    空洞卷积处理模块,用于根据所述第二图像特征进行空洞卷积,得到所述待测图像对应的第三图像特征;
    候选参数获取模块,用于根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度;
    目标位置确定模块,用于按照所述置信度从各所述候选位置参数中筛选出有效位置参数,并根据所述有效位置参数确定所述待测图像中目标对象所在的位置。
  14. 根据权利要求13所述的装置,其特征在于:
    所述图像特征获取模块,还用于:通过预定神经网络中的基础网络,提取并输出所述待测图像对应的第一图像特征和第二图像特征;
    所述空洞卷积处理模块,还用于:通过所述预定神经网络中的空洞卷积网络,根据所述第二图像特征进行空洞卷积,得到并输出所述待测图像对应的第三图像特征;
    所述候选参数获取模块,还用于:通过所述预定神经网络中的输出网络,根据所述第一图像特征和所述第三图像特征进行分类及回归,确定所述待测图像中的目标对象所对应的候选位置参数以及与所述候选位置参数对应的置信度。
  15. 根据权利要求14所述的装置,其特征在于,所述图像特征获取模块,包括:
    第一中间特征输出单元,用于通过所述基础网络中的初级特征提取网络,对所述待测图像依次进行卷积处理和池化处理,输出所述待测图像对应的第一中间特征;
    图像特征获取单元,用于通过所述基础网络中的残差网络,根据所述第一中间特征进行特征提取,并输出提取到的与所述待测图像对应的第一图像特征和第二图像特征。
  16. 根据权利要求15所述的装置,其特征在于,所述图像特征获取单元,包括:
    下采样子单元,用于通过所述残差网络中的下采样模块对所述第一中间特征进行下采样,得到并输出第二中间特征;
    残差处理子单元,用于通过所述残差网络中的第一残差块,将所述第二中间特征映射为所述待测图像对应的第一图像特征和第二图像特征,并输出所述第一图像特征和所述第二图像特征。
  17. 根据权利要求16所述的装置,其特征在于,所述残差处理子单元,还用于:
    通过所述残差网络中的第一残差块,根据所述第二中间特征进行深度可分离卷积,得到第一特征分量;
    将所述第二中间特征恒等映射为第二特征分量;
    根据所述第一特征分量和所述第二特征分量进行合成,得到第一目标特征;
    将所述第一目标特征映射为所述待测图像对应的第一图像特征和第二图像特征,并输出所述第一图像特征和所述第二图像特征。
  18. 根据权利要求17所述的装置,其特征在于,所述残差处理子单元,还用于:
    将所述第二中间特征依次进行降维、深度可分离卷积以及升维,得到第一特征分量。
  19. 根据权利要求16所述的装置,其特征在于,所述基础网络中的残差网络多于一个,且各所述残差网络顺次连接;
    所述图像特征获取单元,还用于将所述第一中间特征依次经过所述基础网络中的各残差网络进行特征提取,通过第一目标残差网络输出所述待测图像对应的第一图像特征,并通过第二目标残差网络输出所述待测图像对应的第二图像特征;
    其中,所述第一目标残差网络和所述第二目标残差网络均选取自所述基础网络中的各残差网络。
  20. 根据权利要求19所述的装置,其特征在于,所述残差网络中的第一残差块多于一个,且各所述第一残差块顺次连接;
    所述图像特征获取单元,还用于将所述第一中间特征依次经过各所述残差网络中的第一残差块进行特征提取,通过所述第一目标残差网络中的第一目标残差块输出所述待测图像对应的第一图像特征,并通过所述第二目标残差网络中的第二目标残差块输出所述待测图像对应的第二图像特征;
    其中,所述第一目标残差块选取自所述第一目标残差网络中的各所述第一残差块,所述第二目标残差块选取自所述第二目标残差网络中的各所述第一残差块。
  21. 根据权利要求14所述的装置,其特征在于,所述空洞卷积处理模块,包括:
    空洞卷积处理单元,用于通过所述空洞卷积网络中的第二残差块,根据所述第二图像特征进行空洞卷积,得到第三特征分量;
    线性映射单元,用于将所述第二图像特征线性映射为第四特征分量;
    特征合成单元,用于根据所述第三特征分量和第四特征分量进行合成,得到第二目标特征;
    特征映射单元,用于将所述第二目标特征映射为所述待测图像对应的第三图像特征。
  22. 根据权利要求21所述的装置,其特征在于,所述空洞卷积处理单元,还用于将所述第二图像特征依次进行降维、空洞卷积以及升维,得到第三特征分量。
  23. 根据权利要求13所述的装置,其特征在于,所述待测图像获取模块,包括:
    原始图像获取单元,用于获取原始图像;
    描述信息获取单元,用于获取用于表征终端的运算能力的终端描述信息;
    分辨率调整单元,用于根据与所述终端描述信息匹配的参考分辨率调整所述原始图像,得到待测图像。
  24. 根据权利要求13至23任一项所述的装置,其特征在于:
    所述目标对象包括识别码,所述识别码包括二维码、一维码以及小程序码中的至少一项。
  25. 一种计算机可读存储介质,存储有计算机程序,其特征在于,所述计算机程序被处 理器执行时实现权利要求1至12中任一项所述的目标检测方法。
  26. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至12中任一项所述的目标检测方法。
PCT/CN2019/098742 2018-08-24 2019-07-31 目标检测方法、装置、计算机可读存储介质及计算机设备 WO2020038205A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19851718.7A EP3843003B1 (en) 2018-08-24 2019-07-31 Target detection method and apparatus, computer-readable storage medium, and computer device
US17/020,636 US11710293B2 (en) 2018-08-24 2020-09-14 Target detection method and apparatus, computer-readable storage medium, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810974541.2 2018-08-24
CN201810974541.2A CN110163197B (zh) 2018-08-24 2018-08-24 目标检测方法、装置、计算机可读存储介质及计算机设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/020,636 Continuation US11710293B2 (en) 2018-08-24 2020-09-14 Target detection method and apparatus, computer-readable storage medium, and computer device

Publications (1)

Publication Number Publication Date
WO2020038205A1 true WO2020038205A1 (zh) 2020-02-27

Family

ID=67645058

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098742 WO2020038205A1 (zh) 2018-08-24 2019-07-31 目标检测方法、装置、计算机可读存储介质及计算机设备

Country Status (4)

Country Link
US (1) US11710293B2 (zh)
EP (1) EP3843003B1 (zh)
CN (1) CN110163197B (zh)
WO (1) WO2020038205A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783685A (zh) * 2020-05-08 2020-10-16 西安建筑科技大学 一种基于单阶段网络模型的目标检测改进算法
CN111898497A (zh) * 2020-07-16 2020-11-06 济南博观智能科技有限公司 一种车牌检测的方法、系统、设备及可读存储介质
CN113762166A (zh) * 2021-09-09 2021-12-07 中国矿业大学 一种基于可穿戴式装备的小目标检测改善方法及系统

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582291A (zh) * 2019-02-19 2020-08-25 富士通株式会社 物体识别方法、装置和单步物体识别神经网络
CN111666960B (zh) * 2019-03-06 2024-01-19 南京地平线机器人技术有限公司 图像识别方法、装置、电子设备及可读存储介质
CN110705520A (zh) * 2019-10-22 2020-01-17 上海眼控科技股份有限公司 目标检测方法、装置、计算机设备和计算机可读存储介质
CN110751004B (zh) * 2019-10-25 2024-04-30 北京达佳互联信息技术有限公司 二维码检测方法、装置、设备及存储介质
CN111048071B (zh) * 2019-11-11 2023-05-30 京东科技信息技术有限公司 语音数据处理方法、装置、计算机设备和存储介质
KR20210061839A (ko) * 2019-11-20 2021-05-28 삼성전자주식회사 전자 장치 및 그 제어 방법
CN110942139A (zh) * 2019-11-22 2020-03-31 深圳市魔数智擎人工智能有限公司 深度学习神经网络部署系统及其方法
CN111079671B (zh) * 2019-12-20 2020-11-03 深圳集智数字科技有限公司 一种场景中异常物品的检测方法及装置
CN111027512B (zh) * 2019-12-24 2023-04-18 北方工业大学 一种遥感图像近岸船检测与定位方法及装置
JP7490359B2 (ja) * 2019-12-24 2024-05-27 キヤノン株式会社 情報処理装置、情報処理方法及びプログラム
CN111651762A (zh) * 2020-04-21 2020-09-11 浙江大学 一种基于卷积神经网络的pe恶意软件检测方法
CN111950354A (zh) * 2020-06-30 2020-11-17 深圳市雄帝科技股份有限公司 印章归属国识别方法、装置及电子设备
CN112529897B (zh) * 2020-12-24 2024-08-13 上海商汤智能科技有限公司 一种图像检测方法、装置、计算机设备及存储介质
CN112613570B (zh) * 2020-12-29 2024-06-11 深圳云天励飞技术股份有限公司 一种图像检测方法、图像检测装置、设备及存储介质
CN112801161B (zh) * 2021-01-22 2024-06-14 桂林市国创朝阳信息科技有限公司 小样本图像分类方法、装置、电子设备及计算机存储介质
CN112926692B (zh) * 2021-04-09 2023-05-09 四川翼飞视科技有限公司 基于非均匀混合卷积的目标检测装置、方法和存储介质
US11436438B1 (en) * 2021-05-07 2022-09-06 Sas Institute Inc. Tabular data generation for machine learning model training system
US11531907B2 (en) * 2021-05-07 2022-12-20 Sas Institute Inc. Automated control of a manufacturing process
CN113591840B (zh) * 2021-06-30 2024-07-02 北京旷视科技有限公司 一种目标检测方法、装置、设备和存储介质
CN113569769A (zh) * 2021-07-30 2021-10-29 仲恺农业工程学院 基于深度神经网络的红火蚁蚁巢远程识别与定位方法
WO2023015409A1 (zh) * 2021-08-09 2023-02-16 百果园技术(新加坡)有限公司 物体姿态的检测方法、装置、计算机设备和存储介质
CN114005017A (zh) * 2021-09-18 2022-02-01 北京旷视科技有限公司 目标检测方法、装置、电子设备及存储介质
CN114332752B (zh) * 2021-12-09 2024-06-21 国能宁夏灵武发电有限公司 一种作业人员安全装备异常佩戴状态检测方法与装置
CN114494116B (zh) * 2021-12-20 2024-07-09 苏州镁伽科技有限公司 器件边缘的检测方法、装置、存储介质及电子设备
CN114399628B (zh) * 2021-12-21 2024-03-08 四川大学 复杂空间环境下的绝缘子高效检测系统
CN114022558B (zh) * 2022-01-05 2022-08-26 深圳思谋信息科技有限公司 图像定位方法、装置、计算机设备和存储介质
CN114789440B (zh) * 2022-04-22 2024-02-20 深圳市正浩创新科技股份有限公司 基于图像识别的目标对接方法、装置、设备及其介质
CN115019146B (zh) * 2022-06-17 2024-10-15 集美大学 目标检测方法、系统及存内计算芯片
CN115100419B (zh) * 2022-07-20 2023-02-21 中国科学院自动化研究所 目标检测方法、装置、电子设备及存储介质
CN115620081B (zh) * 2022-09-27 2023-07-07 北京百度网讯科技有限公司 一种目标检测模型的训练方法及目标检测方法、装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504223A (zh) * 2016-09-12 2017-03-15 北京小米移动软件有限公司 图片的参考角度判定方法及装置
CN107423760A (zh) * 2017-07-21 2017-12-01 西安电子科技大学 基于预分割和回归的深度学习目标检测方法
US20180060719A1 (en) * 2016-08-29 2018-03-01 International Business Machines Corporation Scale-space label fusion using two-stage deep neural net
CN108154196A (zh) * 2018-01-19 2018-06-12 百度在线网络技术(北京)有限公司 用于输出图像的方法和装置
CN108229455A (zh) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 物体检测方法、神经网络的训练方法、装置和电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968884A (zh) * 2009-07-28 2011-02-09 索尼株式会社 检测视频图像中的目标的方法和装置
CN106408591B (zh) * 2016-09-09 2019-04-05 南京航空航天大学 一种抗遮挡的目标跟踪方法
CN108171103B (zh) 2016-12-07 2024-10-01 北京三星通信技术研究有限公司 目标检测方法及装置
CN108416250B (zh) * 2017-02-10 2021-06-22 浙江宇视科技有限公司 人数统计方法及装置
CN108229497B (zh) * 2017-07-28 2021-01-05 北京市商汤科技开发有限公司 图像处理方法、装置、存储介质、计算机程序和电子设备
CN107563290A (zh) * 2017-08-01 2018-01-09 中国农业大学 一种基于图像的行人检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060719A1 (en) * 2016-08-29 2018-03-01 International Business Machines Corporation Scale-space label fusion using two-stage deep neural net
CN106504223A (zh) * 2016-09-12 2017-03-15 北京小米移动软件有限公司 图片的参考角度判定方法及装置
CN108229455A (zh) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 物体检测方法、神经网络的训练方法、装置和电子设备
CN107423760A (zh) * 2017-07-21 2017-12-01 西安电子科技大学 基于预分割和回归的深度学习目标检测方法
CN108154196A (zh) * 2018-01-19 2018-06-12 百度在线网络技术(北京)有限公司 用于输出图像的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3843003A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783685A (zh) * 2020-05-08 2020-10-16 西安建筑科技大学 一种基于单阶段网络模型的目标检测改进算法
CN111898497A (zh) * 2020-07-16 2020-11-06 济南博观智能科技有限公司 一种车牌检测的方法、系统、设备及可读存储介质
CN111898497B (zh) * 2020-07-16 2024-05-10 济南博观智能科技有限公司 一种车牌检测的方法、系统、设备及可读存储介质
CN113762166A (zh) * 2021-09-09 2021-12-07 中国矿业大学 一种基于可穿戴式装备的小目标检测改善方法及系统

Also Published As

Publication number Publication date
EP3843003B1 (en) 2023-07-19
EP3843003A1 (en) 2021-06-30
CN110163197B (zh) 2023-03-10
EP3843003A4 (en) 2021-10-13
CN110163197A (zh) 2019-08-23
US20200410273A1 (en) 2020-12-31
US11710293B2 (en) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2020038205A1 (zh) 目标检测方法、装置、计算机可读存储介质及计算机设备
US12094193B2 (en) Image processing method and image processing device
CN109583445B (zh) 文字图像校正处理方法、装置、设备及存储介质
WO2021129691A1 (zh) 一种对目标检测方法以及相应装置
WO2020147445A1 (zh) 翻拍图像识别方法、装置、计算机设备和计算机可读存储介质
CN110059741B (zh) 基于语义胶囊融合网络的图像识别方法
CN110516541B (zh) 文本定位方法、装置、计算机可读存储介质和计算机设备
WO2019144855A1 (zh) 图像处理方法、存储介质和计算机设备
WO2019218136A1 (zh) 图像分割方法、计算机设备和存储介质
CN110263819A (zh) 一种用于贝类图像的目标检测方法及装置
CN112614125B (zh) 手机玻璃缺陷检测方法、装置、计算机设备及存储介质
CN107784288A (zh) 一种基于深度神经网络的迭代定位式人脸检测方法
CN114969417B (zh) 图像重排序方法、相关设备及计算机可读存储介质
CN110796000A (zh) 基于双向lstm的唇形样本生成方法、装置和存储介质
US20220301106A1 (en) Training method and apparatus for image processing model, and image processing method and apparatus
CN114444565B (zh) 一种图像篡改检测方法、终端设备及存储介质
CN114862845A (zh) 手机触摸屏的缺陷检测方法、装置、设备及存储介质
US20230401691A1 (en) Image defect detection method, electronic device and readable storage medium
CN112651333A (zh) 静默活体检测方法、装置、终端设备和存储介质
CN110348025A (zh) 一种基于字形的翻译方法、装置、存储介质及电子设备
CN110414516B (zh) 一种基于深度学习的单个汉字识别方法
CN112990107B (zh) 高光谱遥感图像水下目标检测方法、装置及计算机设备
CN113822871A (zh) 基于动态检测头的目标检测方法、装置、存储介质及设备
CN111709338B (zh) 一种用于表格检测的方法、装置及检测模型的训练方法
CN117710295A (zh) 图像处理方法、装置、设备、介质及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19851718

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019851718

Country of ref document: EP

Effective date: 20210324