WO2019218410A1 - 图像分类方法、计算机设备和存储介质 - Google Patents

图像分类方法、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019218410A1
WO2019218410A1 PCT/CN2018/090370 CN2018090370W WO2019218410A1 WO 2019218410 A1 WO2019218410 A1 WO 2019218410A1 CN 2018090370 W CN2018090370 W CN 2018090370W WO 2019218410 A1 WO2019218410 A1 WO 2019218410A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
positioning
image
segmentation
sub
Prior art date
Application number
PCT/CN2018/090370
Other languages
English (en)
French (fr)
Inventor
林迪
黄惠
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to US16/606,166 priority Critical patent/US11238311B2/en
Publication of WO2019218410A1 publication Critical patent/WO2019218410A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • G06V10/7515Shifting the patterns to accommodate for positional errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image classification method, a computer device, and a storage medium.
  • Fine-grained recognition emphasizes the nuances between object categories that recognize different shapes and poses.
  • the purpose of fine-grained object recognition is to identify the object class of a sub-object, which is used to find subtle differences between animals, product brands, and architectural styles.
  • An image classification method comprising:
  • the computer device obtains an image to be classified, and inputs the image to be classified into a trained image classification model, where the trained image classification model includes a positioning segmentation sub-network, a calibration sub-network, and a classification sub-network, and the calibration sub-network is formulated
  • the image classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is the calibrated image, in the opposite of the training.
  • the output of the valve linkage function is a function of the positioning area and the division area of the output of the segmentation sub-network;
  • the computer device performs positioning and segmentation of the image to be classified through the positioning and segmentation sub-network to obtain a segmented image including a positioning area and a segmentation area;
  • the computer device passes the segmented image through the calibration subnetwork, and the calibration subnetwork calibrates the target object to obtain a calibrated image;
  • the computer device performs fine-grained classification on the calibrated image through the classification sub-network to obtain a category corresponding to the image to be classified.
  • the positioning and dividing sub-network comprises a positioning sub-network and a segmentation sub-network, and the positioning sub-network shares parameters of the convolutional neural network with the segmentation sub-network.
  • the training steps of the image classification model include:
  • the computer device acquires a training image set, where each training image in the training image set includes a standard positioning label box, a standard segmentation label box, and a standard category label;
  • the computer device inputs each training image in the training image set into the positioning sub-network to obtain a segmented training image including a current positioning area and a current segmentation area;
  • the computer device calibrates the segmented training image according to the template to obtain a calibrated training image
  • the computer device inputs the calibrated training image into a classification sub-network to obtain a corresponding current output category
  • the computer device acquires a total objective function corresponding to the image classification model, where the total objective function includes a positioning segmentation sub-network objective function and a classification sub-network objective function, wherein the positioning segmentation sub-network objective function is related to the valve linkage function
  • the function calculates the value of the total objective function according to the current output category, the standard positioning label box, the standard segmentation label box and the standard category label;
  • the computer device adjusts the positioning segmentation subnetwork parameter and the classification subnetwork parameter according to the valve linkage function until the value of the total objective function satisfies a convergence condition;
  • the computer device obtains the trained image classification model.
  • the obtaining a template corresponding to each category from the training image set includes:
  • the computer device calculates a similarity between any two training images in the training image set to form a similarity matrix
  • the computer device passes the similarity matrix through a spectral clustering algorithm, and divides each training image into corresponding multiple clusters;
  • the computer device obtains each cluster center, and determines, according to the similarity between each training image in each cluster and the corresponding cluster center, the target training image corresponding to each cluster to obtain a template corresponding to each category, and the template is used to perform image processing on the image. calibration.
  • the calibrating the segmented training image according to the template to obtain a calibrated training image includes:
  • the computer device acquires a calibration objective function, the calibration objective function including a similarity function, a distance function, and a foreground confidence function;
  • the computer device adjusts a template center point, a rotation angle, a zoom ratio, and a current template until the calibration target function satisfies a convergence condition, and obtains a corresponding target template center point, a target rotation angle, a target zoom ratio, and a target template;
  • the computer device calibrates the segmented training image according to the target template center point, the target rotation angle, the target zoom ratio, and the target template to obtain a calibrated training image.
  • the total objective function is defined by the following formula:
  • J is the total objective function
  • E c is the positioning segmentation sub-network objective function
  • E ls is the classification sub-network objective function
  • W c is the parameter that needs to be determined to locate the segmentation sub-network
  • W ls is the parameter to be determined by the classification sub-network
  • V Indicates the valve linkage function
  • L is the positioning area of the positioning segmentation sub-network output
  • O is the segmentation area of the positioning segmentation sub-network output
  • I is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process.
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • I is the input original image
  • y gt is the standard category label
  • L gt is the standard positioning label box
  • o gt is the standard segmentation label box.
  • valve linkage function is defined by the following formula:
  • V is the valve linkage function
  • L is the positioning area of the positioning segmentation sub-network output
  • O is the segmentation area of the positioning segmentation sub-network output.
  • L L f
  • O O f
  • I is the original image of the input
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • c * is the calibration The template center point used
  • ⁇ * is the rotation angle used in the calibration
  • ⁇ * is the target scaling ratio used in the calibration
  • I represents the image after calibration of the original image
  • E a is the calibration energy function
  • the calibration The energy function is defined by the following formula:
  • E a (c, ⁇ , ⁇ , t; I, L, O) S(I(c, ⁇ , ⁇ ), t) + ⁇ d D(c, L) + ⁇ s F(O, t m )
  • c is the template center point
  • is the rotation angle
  • is the target scaling ratio
  • t is the template
  • S is the similarity function
  • ⁇ d and ⁇ s are custom constants
  • D is the distance function
  • F is the foreground confidence.
  • the degree function, t m is the binary mask of the template.
  • a computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
  • the trained image classification model includes a positioning segmentation sub-network, a calibration sub-network, and a classification sub-network
  • the calibration sub-network is formulated as a valve
  • the linkage function, the image classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is the calibrated image, and the back propagation in the training.
  • the output of the valve linkage function is a function of the positioning area and the division area of the output of the segmentation sub-network;
  • the image to be classified is subjected to target object localization and segmentation through a positioning and segmentation sub-network to obtain a segmented image including a positioning area and a segmentation area;
  • the segmented image passes through the calibration subnetwork, and the calibration subnetwork calibrates the target object to obtain a calibrated image;
  • the calibrated image is fine-grained by the classification sub-network to obtain a category corresponding to the image to be classified.
  • One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
  • the trained image classification model includes a positioning segmentation sub-network, a calibration sub-network, and a classification sub-network
  • the calibration sub-network is formulated as a valve
  • the linkage function, the image classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is the calibrated image, and the back propagation in the training.
  • the output of the valve linkage function is a function of the positioning area and the division area of the output of the segmentation sub-network;
  • the image to be classified is subjected to target segmentation and segmentation through a positioning and segmentation sub-network to obtain a segmented image including a positioning area and a segmentation area;
  • the segmented image passes through the calibration subnetwork, and the calibration subnetwork calibrates the target object to obtain a calibrated image;
  • the calibrated image is fine-grained by the classification sub-network to obtain a category corresponding to the image to be classified.
  • FIG. 1 is an application environment diagram of an image classification method in an embodiment
  • FIG. 2 is a schematic flow chart of an image classification method in an embodiment
  • FIG. 3 is a schematic flow chart of obtaining a trained image classification model in one embodiment
  • FIG. 4 is a schematic flowchart of determining a template corresponding to a category in an embodiment
  • Figure 5 is a schematic diagram of a training image of a bird's head and a bird's torso in one embodiment, wherein the image selected as a template is shown in the first column of Figures 5(a) and 5(b);
  • FIG. 6 is a schematic flow chart of obtaining a calibrated training image according to a template in an embodiment
  • FIG. 7 is a schematic diagram of a foreground confidence map and a binary mask of a calibration portion in one embodiment
  • Figure 8 is a schematic diagram of image comparison before and after calibration in one embodiment
  • FIG. 9 is a schematic diagram of a processing procedure of a depth system image classification system in an embodiment
  • FIG. 10 is a schematic diagram showing the comparison of the parameters of the non-shared convolutional neural network and the parameter sharing in an embodiment
  • FIG. 11 is a schematic diagram showing the comparison of the division and precision of the unshared convolutional neural network parameters and parameter sharing in an embodiment
  • Figure 12 is a schematic diagram showing an input image and a segmentation result in each case in one embodiment
  • Figure 13 is a schematic diagram showing the comparison of object segmentation accuracy with and without the valve linkage function on the CUB-200-2011 data set;
  • Figure 14 is a comparison result of the positioning accuracy of the method and other methods in the head and the torso of the present application;
  • Figure 15 is a schematic illustration of the positioning of a predicted bounding box including a head and a torso in one embodiment
  • Figure 16 is a schematic diagram showing the comparison of the method of the present application with other segmentation methods on the CUB-200-2011 data set;
  • 17 is a schematic diagram of different segmentation results corresponding to different algorithms
  • Figure 18 is a schematic diagram showing the classification accuracy of the bird head and torso semantic parts in the CUB-200-2011 data set;
  • Figure 19 is a schematic diagram showing the comparison of the classification accuracy of the present application with other leading edge methods on the CUB-200-2011 data set;
  • 20 is a schematic diagram showing the classification accuracy of the bird head and torso semantic parts of different methods in the CUB-200-2010 data set;
  • 21 is a schematic diagram of comparison of classification accuracy corresponding to the method of the present application and other methods.
  • Figure 22 is a schematic illustration of the masking on the Standford Cars-96 data set
  • 23 is a schematic diagram showing the classification accuracy of the depth system and other methods of the present application on the Standford Cars-96 data set;
  • Figure 24 is a block diagram showing the structure of an image classification device in an embodiment
  • Figure 25 is a block diagram showing the structure of an image sorting apparatus in another embodiment.
  • Figure 26 is a diagram showing the internal structure of a computer device in an embodiment.
  • the image classification method provided by the present application can be applied to an application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 via the network through the network.
  • the terminal may acquire an image to be classified input by the user, and send the image to be classified to the server 104 for classification or directly sorting at the terminal 102.
  • the terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablets, and portable wearable devices.
  • the server 104 can be implemented by a separate server or a server cluster composed of multiple servers.
  • an image classification method is provided, which is applied to the terminal or server in FIG. 1 as an example, and includes the following steps:
  • Step S210 Acquire an image to be classified, and input the image to be classified into a trained image classification model, where the trained image classification model includes a positioning segmentation subnetwork, a calibration subnetwork, and a classification subnetwork, and the calibration subnetwork is Formulated as the valve linkage function, the image classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is the calibrated image
  • the output of the valve linkage function is a function of the positioning and segmentation regions of the output of the segmentation subnetwork.
  • the image to be classified refers to an image that needs to specify a fine-grained category.
  • the image to be classified can be captured in real time, or it can be an image obtained from a stored file.
  • the image classification model is used to classify the input image into fine-grained categories and output corresponding classification results. Pre-processing can be performed on the classified image, such as unifying the resolution of the image to be classified.
  • the positioning and segmentation sub-network is used to obtain a positioning area and a segmentation area, wherein the positioning and segmentation sub-network may be composed of an inter-related positioning sub-network and a segmentation sub-network, or may be composed of mutually independent positioning sub-networks and segmentation sub-networks. Correlation is the uniformity of the training process of the two sub-networks, which is obtained by correlation training, such as the existence of shared parameters.
  • the positioning sub-network outputs the basic position of the target object, which can be displayed through the positioning frame.
  • the segmentation subnetwork produces a pixel-level segmentation of the target object and background by pre-forming the two-class regression.
  • the calibration sub-network is formulated as a valve linkage function.
  • the output of the valve linkage function is a function of the positioning area and the segmentation area of the output of the segmentation sub-network, forming a calibration module based on the positioning result and the segmentation result.
  • the image classification model is trained by adjusting the parameters of the positioning segmentation sub-network and the classification sub-network through the valve linkage function.
  • the valve linkage function makes the positioning segmentation sub-network, the calibration sub-network, and the classification sub-network as a whole in the training phase.
  • the output of the valve linkage function is a calibrated image that combines calibration with other sub-networks based on deep convolutional neural network components.
  • the objective function of the image classification model may be defined as being associated with a valve linkage function and a classification sub-network parameter
  • the valve linkage function is a function of positioning a segmentation region and a segmentation region of the segmentation sub-network output, and positioning the segmentation sub-network
  • the output positioning area and the segmentation area are related to the positioning and segmentation sub-network parameters, so that the parameters of the segmentation sub-network and the classification sub-network are adjusted by the valve linkage function during the training process.
  • the specific definition of the valve linkage function can be customized as needed.
  • a valve linkage function is added to the image classification model as a bridge to locate the sub-network and the classification module. During training, this function adaptively controls the propagation of updated signals from the classification module to the location segmentation subnetwork.
  • Step S220 The image to be classified is subjected to target object localization and segmentation through the positioning and segmentation sub-network to obtain a segmented image including the positioning area and the segmentation area.
  • the image to be classified passes through the positioning sub-network in the positioning and dividing sub-network to output a positioning area, which may be a bounding box including (x 1 , y 1 ), (x 2 , y 2 ), where x 1 and x 2 are boundaries.
  • the horizontal start coordinate and the horizontal end coordinate of the frame, and y 1 and y 2 are the vertical start coordinate and the vertical end coordinate of the bounding box.
  • the image including the positioning area is further generated by the segmentation sub-network in the positioning segmentation sub-network to generate a segmented image.
  • step S230 the segmented image passes through the calibration sub-network, and the calibration sub-network calibrates the target object to obtain a calibrated image.
  • the calibration sub-network obtains the positioning result L of the object part and the segmentation result O from the positioning and segmentation network, then performs template alignment, and provides the coordinate-aligned image to the classification sub-network.
  • Template alignment is a process of calibration.
  • the number of templates can be one or more, and the posture change can be grasped through multiple template selection.
  • Parameters need to be solved during calibration, including target template center point, target rotation angle, target zoom ratio, and target template.
  • the convergence condition is satisfied, thereby obtaining a target template center point, a target rotation angle, a target zoom ratio, and a target template.
  • the segmented image is centered on the target template center point by the target template, and the target rotation angle is rotated and the target zoom ratio image is scaled to obtain a calibrated image.
  • step S240 the calibrated image is subjected to fine-grain classification through the classification sub-network to obtain a category corresponding to the image to be classified.
  • the calibrated image passes through the classification sub-network, and outputs a corresponding category label, thereby obtaining a fine-grained category corresponding to the image to be classified.
  • the image to be classified is input into the trained image classification model by acquiring the image to be classified, and the trained image classification model includes a positioning segmentation sub-network, a calibration sub-network, and a classification sub-network, and the calibration sub-network is formulated.
  • the image classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is the calibrated image, in the opposite of the training.
  • the output of the valve linkage function is a function of the positioning area and the segmentation area of the output of the segmentation sub-network.
  • the image to be classified is subjected to the positioning and segmentation of the target object to obtain the segmented image including the positioning area and the segmentation area.
  • the segmented image passes through the calibration sub-network, and the calibration sub-network calibrates the target object to obtain a calibrated image; the calibrated image is fine-grained by the classification sub-network to obtain a category corresponding to the image to be classified, which can improve image classification The precision.
  • the location segmentation subnetwork comprises a location subnetwork and a segmentation subnetwork, the location network sharing the parameters of the convolutional neural network with the segmentation subnetwork.
  • the positioning sub-network and the segmentation sub-network share the parameters of the convolutional neural network, and the positioning and segmentation joint training is performed, and the parameters of the shared convolutional neural network can generate a more accurate model, which is independent of the positioning sub-network and the segmentation sub-network respectively.
  • the resulting model is highly accurate.
  • a large probability means that the pixel is located inside an object area, and the reverse mapping reduces the likelihood that the calibration operation will be applied to the background.
  • E ls is the objective function of the positioning segmentation sub-network
  • f l represents the positioning sub-network function
  • I represents the input image
  • L gt represents the standard positioning label box
  • c i represents the pixel point.
  • o i represents the value of pixel point c i
  • P represents the probability function
  • N represents the total number of pixel points of the input image.
  • the training steps of the image classification model include:
  • Step S310 acquiring a training image set, where each training image in the training image set includes a standard positioning label box, a standard segmentation label box and a standard category label.
  • the training image set includes a plurality of training images, each training image includes a standard positioning label box, a standard segmentation label box, and a standard category label, wherein the standard positioning label box is used to mark the real positioning result, and the standard segmentation label box Used to label true pixel-level segmentation results, standard category tags are used to label real classification results.
  • Step S320 obtaining templates corresponding to the respective categories from the training image set.
  • the plurality of training images in the training image set may be clustered into different categories, and different categories use corresponding different templates, and the template is used to calibrate the training images.
  • Corresponding templates can be selected for different categories from the respective training images according to the similarity between the respective training images corresponding to different categories.
  • the method of selecting a template can be customized as needed.
  • the number of templates corresponding to each category is not limited and may be one or more.
  • Step S330 inputting each training image in the training image set into the positioning segmentation sub-network, and obtaining a segmented training image including the current positioning area and the current segmentation area.
  • the image classification model may be initialized by random parameters, and each training image in the training image set is input into the segmentation sub-network, and the segmentation including the current positioning region and the current segmentation region corresponding to the current parameter is obtained. Training images.
  • Step S340 calibrating the segmented training image according to the template to obtain a calibrated training image.
  • the calibration process needs to adjust the template center point first, and then adjust the rotation angle and the zoom ratio according to the template to be adjusted according to the template center point, and when there are multiple templates, the target template needs to be selected.
  • the target template center point, target rotation angle, target zoom ratio, and target template can be determined by a custom calibration target function.
  • Step S350 the calibrated training image is input into the classification sub-network to obtain a corresponding current output category.
  • the classification subnetwork is the last module of the image classification model.
  • the calibrated training image is taken as an input and is expressed as I * ⁇ R h ⁇ w ⁇ 3 .
  • f c is the function name of the classification subnetwork, and the output is a category label y.
  • the standard category label is the desired label and the predicted category label y should be consistent with the standard category label.
  • the calibrated training image is input into the classification sub-network to obtain the current output prediction category corresponding to the current parameter.
  • Step S360 acquiring a total objective function corresponding to the image classification model, wherein the total objective function comprises a positioning segmentation sub-network objective function and a classification sub-network objective function, wherein the positioning segmentation sub-network objective function is a function of the valve linkage function, according to the current output
  • the category, standard positioning label box, standard split label box and standard category label calculate the value of the total objective function.
  • the total objective function is a function for locating the segmentation sub-network objective function and the classification sub-network objective function
  • the calibration sub-network is formulated into a valve linkage function, and in the forward process, the valve linkage function is used to obtain the calibrated image
  • the output of the valve linkage function is a function of the positioning area and the segmentation area for positioning the segmentation sub-network output
  • the valve linkage function is used to adjust the parameters of the positioning segmentation sub-network.
  • the positioning segmentation sub-network objective function and the classification sub-network are trained as a whole in the training phase.
  • the valve linkage function is a function of the calibration energy function and the calibrated image.
  • the calibration energy function is a function of the calibration objective function and the forward propagation energy.
  • the valve linkage function preserves the function of the calibration energy, and the variable position and objectivity for this part.
  • the mapping can be thought of as an input that enables the updated classification signal to be passed to the location segmentation subnetwork via chained rules.
  • Step S370 adjusting the positioning sub-network parameter and the classification sub-network parameter according to the valve linkage function until the value of the total objective function satisfies the convergence condition, and obtaining the trained image classification model.
  • the positioning segmentation sub-network parameter and the classification sub-network parameter are parameters that need to be determined, and the positioning segmentation sub-network and the classification sub-network are balanced by two factors of 1 during the training process, and the positioning is updated by minimizing the total objective function.
  • Split subnetworks and classification subnetworks are parameters that need to be determined, and the positioning segmentation sub-network and the classification sub-network are balanced by two factors of 1 during the training process, and the positioning is updated by minimizing the total objective function.
  • the valve linkage function when the image classification model is trained, can adaptively compromise the classification and calibration errors, and at the same time, can also update the parameters of the segmentation sub-network and the classification sub-network to determine more accurate model parameters.
  • step S320 includes:
  • step S321 the similarity between any two training images in the training image set is calculated to form a similarity matrix.
  • the similarity algorithm is used to calculate the similarity between any two training images, and the specific calculation method can be customized. For example, if the training image set includes N training images, the similarity between any two training images is calculated. , constitute a similarity matrix R N ⁇ N .
  • the pixel values of each image are regularized, the range of pixels is quantized into 256 values, and then respectively
  • the calculation for example, P i , P j , is two gray scale values belonging to R i and R j .
  • the regularization of the grayscale values and the calculation of the distribution values follow the construction of the normalized color histogram.
  • R i and R j have the same size, and every two pixels have the same position in R i and R j , which forms a 2D tuple of gray scale values.
  • P ij the joint distribution of the grayscale values of R i and R j .
  • S denotes a similarity function for measuring whether the poses of two images are similar
  • R i , R j represent two images of the same size
  • P i , P j respectively represent gray scale value distributions of R i and R j , similar frequencies
  • the histogram, P ij represents the joint distribution of the gray scale values of R i and R j
  • m, n represents the pixel coordinate values
  • M and N represent the length and width of the image, respectively.
  • Step S322 the similarity matrix is subjected to a spectral clustering algorithm, and each training image is divided into corresponding multiple clusters.
  • spectral clustering is a clustering algorithm, which is more adaptable to data distribution, and the clustering effect is also excellent, and the calculation amount of clustering is also much smaller.
  • the similarity matrix is divided into corresponding clusters by a spectral clustering algorithm.
  • Step S323 Acquire each cluster center, and determine, according to the similarity between each training image and the corresponding cluster center in each cluster, a target training image corresponding to each cluster to obtain a template corresponding to each category, and the template is used to calibrate the image. .
  • each cluster has a center.
  • This training image is the corresponding cluster image.
  • Template a cluster corresponds to a category, so as to get the template corresponding to each category.
  • a template corresponding to one category may be one or more.
  • Fig. 5 it is a schematic diagram of the training image of the bird's head and the torso of the bird, wherein the image selected as the template is shown in the first column of Figs. 5(a) and 5(b).
  • the template corresponding to each category is obtained by calculating the similarity between images and the adaptive calculation of the clustering algorithm, and the template is dynamically selected, thereby improving the accuracy of template selection.
  • step S340 includes:
  • Step S341 acquiring a calibration objective function, the calibration objective function including a similarity function, a distance function, and a foreground confidence function.
  • the calibration objective function is used to determine the target template center point, the target rotation angle, the target zoom ratio, and the target template
  • the similarity function is used to describe the similarity between the image to be calibrated and the template, the distance function and the template center point and positioning.
  • the distance between the center points of the positioning frames of the output of the segmentation subnetwork is related, and the foreground confidence function is used to describe the foreground confidence of the region covered by the template.
  • the distance function is defined by the following formula: Where D(c, L) represents the distance function, c represents the template center point, L is the positioning frame that locates the output of the segmentation subnetwork, and c r (L) represents the center point of the positioning frame that locates the output of the segmentation subnetwork.
  • is customizable by experience, set to 15 in one embodiment. Indicates the center of the bounding box L.
  • the measure of similarity is defined according to the distribution of pixel values, but lacks the shape information of the critical object, and the shape information of the object is described by the foreground confidence function.
  • a binary mask t m is given such that t m (c i ) ⁇ ⁇ 0, 1 ⁇ , which means that pixel point c i belongs to the background or foreground, and 0 or 1 represents the background and foreground, respectively.
  • c i use O f (c i ) and O b (c i ) as the scores of the foreground or background, respectively, by:
  • O f represents the confidence foreground pixel
  • O b represents the confidence background pixel
  • O f the higher points means the more likely the pixel is foreground
  • O b higher points means the more likely the pixel background .
  • F represents the foreground confidence of the area covered by the template
  • t m represents the binary mask of the template
  • N f represents the number of foreground pixels contained in the binary mask of the template
  • N b represents the binary mask of the template The number of background pixels. Part of the region that is likely to be promising in the foreground is positioned in the foreground region of the template, while the template that overlaps the background region and the foreground region is suppressed. After the foreground confidence is guided, the foreground region can be better calibrated.
  • the calibration objective function is defined as follows:
  • E a represents the calibration objective function
  • c, ⁇ , ⁇ , and t represent the parameters to be calibrated, respectively, the template center point, the rotation angle, the scaling ratio, and the current template.
  • ⁇ d and ⁇ s are constant and customizable, and in one embodiment they are set to 0.001 and 0.003, respectively.
  • Step S342 adjusting the template center point, the rotation angle, the zoom ratio, and the current template until the calibration target function satisfies the convergence condition, and obtain the corresponding target template center point, the target rotation angle, the target zoom ratio, and the target template.
  • the target template center point, the target rotation angle, the target zoom ratio, and the target template are obtained by maximizing the calibration target function.
  • Step S343 calibrating the segmented training image according to the target template center point, the target rotation angle, the target zoom ratio, and the target template to obtain a calibrated training image.
  • the segmented training image can be calibrated to obtain a calibrated training image.
  • FIG. 8 it is a schematic diagram of image comparison before and after calibration in one embodiment.
  • the left column shows an uncalibrated image of the bird's head
  • the right column shows the image of the bird's head calibrated through the template.
  • the left column shows the uncalibrated image of the bird's torso
  • the right column shows the image of the bird's torso calibrated through the template.
  • the target template center point, the target rotation angle, the target zoom ratio, and the target template are dynamically calculated, and the calculation algorithm comprehensively considers the similarity, the center distance, and the foreground confidence, so that the calibration result is more trustworthy.
  • the total objective function is defined by the following formula:
  • J is the total objective function
  • E c is the positioning segmentation sub-network objective function
  • E ls is the classification sub-network objective function
  • W c is the parameter that needs to be determined to locate the segmentation sub-network
  • W ls is the parameter to be determined by the classification sub-network
  • V Indicates the valve linkage function
  • L is the positioning area of the positioning segmentation sub-network output
  • O is the segmentation area of the positioning segmentation sub-network output
  • I is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process.
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • I is the input original image
  • y gt is the standard category label
  • L gt is the standard positioning label box
  • o gt is the standard segmentation label box.
  • valve linkage function is defined by the following formula:
  • V represents the valve linkage function
  • L is the positioning area of the output of the segmentation sub-network
  • O is the segmentation area of the output of the segmentation sub-network.
  • I Is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • c * is the template center point used in the calibration
  • ⁇ * is the angle of rotation used in the calibration
  • ⁇ * is the target zoom ratio used in the calibration
  • I is the image after calibration of the original image
  • E a is the calibration energy function
  • the calibration energy function is defined by the following formula:
  • E a (c, ⁇ , ⁇ , t; I, L, O) S(I(c, ⁇ , ⁇ ), t) + ⁇ d D(c, L) + ⁇ s F(O, t m )
  • c is the template center point
  • is the rotation angle
  • is the target scaling ratio
  • t is the template
  • S is the similarity function
  • ⁇ d and ⁇ s are custom constants
  • D is the distance function
  • F is the foreground confidence.
  • the degree function, t m is the binary mask of the template.
  • the image after the original image is subjected to posture calibration is I(c * , ⁇ * , ⁇ * ), and L and O are constants in the forward propagation phase of the training, that is, in the forward process, in the reverse propagation phase of the training.
  • L and O are variables.
  • ⁇ c * , ⁇ * , ⁇ * , t * ⁇ argmax c, ⁇ , ⁇ , t E a (c, ⁇ , ⁇ , t; I, L f , O f ) denotes c * , ⁇ * , ⁇ * , t * satisfies the maximum calibration energy function.
  • the valve linkage function compromises three key conditions: 1) the calibration energy function, 2) the forward propagation energy for L f and O f , and 3) the image after pose calibration.
  • the valve linkage function retains the calibration energy function, so that the updated classification signal can be transmitted to the positioning and segmentation subnetwork through the chain rule.
  • the output of the calibration subnetwork V(L f , O f ; L, f , O f ) becomes a function of L and O. Therefore, the total objective function of the image classification model is formulated as:
  • E ls and E c respectively represent training parameters for locating the divided sub-network and the classified sub-network, Represents the backpropagation phase within the location segmentation.
  • valve linkage function V connects the classification and positioning subnets in the back propagation phase, specifically with connection. Since the connection is available, the update of the location split subnet is sensitive to the classified backpropagation signal.
  • valve linkage function V can be written as:
  • E a (c * , ⁇ * , ⁇ * , t * ; I, L, O) is the calibration energy calculated in forward propagation. This forward propagation calibration energy is applied to adaptively update the positional segmentation.
  • the valve linkage function extracts information from the classification subnet and can adaptively update the positioning segmentation part.
  • connection part In the forward propagation phase, the calibration energy is treated as a constant in the BP phase. According to this energy, the connection part Can be expressed as:
  • a large calibration score e is equivalent to a better calibration in the forward propagation phase.
  • a back propagation phase Used to give update signals in the classification subnetwork Reset the weights.
  • the valve linkage function is equivalent to a trade-off between classification and calibration errors.
  • a large e means better calibration during the backpropagation phase, and information from the classification subnetwork is reduced to Conversely, if e is small, the calibration accuracy is reduced. Therefore, in order to locate the update of the segmentation subnetwork, an appropriate setting can be set. Introduce more classified information. can Understand a dynamic learning rate in the back propagation phase, adaptive matching performance.
  • connection part can be written as follows:
  • the network is not only supervised by the object region that reduces the global segmentation error, but also supervised by the template shape information of the corrected object boundary. As shown in Figure 10, it can be seen from the figure that including additional shape information does improve the accuracy of the segmentation results. Since the self-adjusting mechanism is used to classify and calibrate the valve linkage function, the positioning segmentation sub-network can also be enhanced in the reverse propagation phase.
  • FIG. 9 is a schematic diagram of a processing procedure of a depth system image classification system in an embodiment, the system consisting of three sub-networks: positioning segmentation, calibration, and classification.
  • the calibration sub-network Under the adjustment of the valve linkage function, in the forward propagation phase, the calibration sub-network outputs the position image of the posture calibration for the classification sub-network, and the classification and calibration errors can also be transmitted back to the positioning segment in the back propagation phase.
  • the internet is a schematic diagram of a processing procedure of a depth system image classification system in an embodiment, the system consisting of three sub-networks: positioning segmentation, calibration, and classification.
  • the bird's head and torso are treated as semantic parts.
  • All convolutional neural network models are based on the VGG-16 network.
  • the positioning and segmentation subnetwork all input images are initialized to a size of 224 x 224.
  • the original fully connected layer was removed. It outputs a structure that locates the bounding box and the pixel probability map for the foreground and background labels.
  • the positioning and segmentation sub-network is initialized first, wherein the input of the classification sub-network is a picture of 224 ⁇ 224.
  • the first fully connected layer is extended to a 4096-dimensional feature.
  • a support vector machine classifier is trained by the features extracted by the convolutional neural network.
  • the validation set is adjusted.
  • This validation set contains 1000 images that are randomly selected from the training set. A performance improvement was discovered by extending the lookup space. Therefore, based on all experimental performance, keep using the search space.
  • the results of the pose similarity function can be pre-computed and stored. Under the acceleration of the GPU, the entire pose position, template, zoom ratio and rotation angle are traversed to calculate the pose similarity, and each picture only takes 5 seconds. Therefore, the pose similarity can be quickly detected in the forward propagation, so that the training time of each picture is 15 ms and the test time is 8 ms.
  • the positioning and segmentation share the parameters of the convolution.
  • the non-shared convolutional neural network parameters and parameter sharing are respectively set in the convolutional neural network, and compared, and the comparison results of the partial positioning results are shown in FIG.
  • the percentage of the correctly positioned portion is calculated, which is calculated based on the top portion of the ranking, and the overlap with the true performance is considered to be correct.
  • the correctness rates of the positioning results on the head and the trunk are 93.2 and 94.3, respectively.
  • the correct rates are 95.0 and 97.0, respectively.
  • the non-shared convolutional neural network parameters and parameter sharing are set in the convolutional neural network, and the segmentation precision is compared.
  • the "bg” and “fg” abbreviations respectively represent the background and foreground, respectively.
  • the evaluation function score is tested to evaluate the segmentation performance. Calculate an average test evaluation function score to evaluate the overall segmentation accuracy.
  • the parameter sharing improves the segmentation precision on the foreground and background regions, as shown in FIG. 12, showing the input image and the segmentation result in various cases, wherein FIG. 12(a) shows the input image.
  • Fig. 12(b) shows the segmentation real result
  • Fig. 12(c) shows the segmentation result without parameter sharing
  • Fig. 12(d) shows the segmentation result without the valve linkage function
  • Fig. 12(e) shows the depth system frame based.
  • the results of the segmentation show that the visual differences between Fig. 12(c) and Fig. 12(e) are obvious, and the segmentation results including the valve linkage function are much more accurate.
  • this subnet is removed from the joint depth system module and then compared to the full depth system.
  • FIG 8 A comparison of the positioning accuracy is shown in Figure 8, and the performance is tested for the percentage of correctly positioned portions where the overlap is greater than 0.5, 0.6, 0.7. In all configurations, the positioning branch performs worse than the depth system model.
  • systems that remove the valve linkage function, where the split subnet also suffers from performance degradation show the segmentation accuracy of objects with and without valve linkage on the CUB-200-2011 data set.
  • Figure 12(d) shows the segmentation results. The reason for the performance degradation is that in the absence of the valve linkage function, the positioning segmentation subnetwork does not receive feedback from the calibration and classification operations, while the depth system with the valve linkage function updates the calibration and classification in the iteration, making the results more accurate. .
  • the results of the algorithm of the present application were 95.0 and 97.0 compared to the earlier best results 93.4 and 96.2.
  • Figure 15 shows some examples of the predicted bounding box containing the head and torso.
  • our deep system model improves performance in the overall positioning operation.
  • the position of the head that was changed due to the small area was significantly increased from 90.0 to 95.0.
  • the performance gap indicates the importance of the location segmentation subnetwork of the present application capturing the partial relationship of the object, which is beneficial for the regression of the bounding box.
  • the depth model of the present application includes segmentation to train a baseline full convolutional neural network for segmentation of objects.
  • the interactive object segmentation tool GrabCut and the collaborative segmentation method can be used.
  • the segmentation accuracy of these methods is given in Figure 16, which shows a comparison of the method of the present application with other segmentation methods on the CUB-200-2011 dataset.
  • the benchmark full-convergence neural network produces an average detection evaluation function score of only 78.8 compared to the score 84.5 of the depth image classification model of the present application.
  • This performance degradation stems from the fact that the full convolutional neural network of the benchmark has not been improved from parameter sharing.
  • GrabCut and collaborative segmentation methods exhibit lower precision because they rely on low-level image representations that lose semantic object information.
  • Figure 17 illustrates this.
  • Figure 17 shows a different breakdown of the results, where Figure 17 (a) shows the input image, Figure 17 (b) shows the segmentation of the real results, Figure 17 (c) shows the results of GrabCut, Figure 17 ( d) shows the result of the cooperative division, Fig. 17(e) shows the result of the division of the reference full convolutional neural network, and Fig. 17(f) shows the result of the division of the divisional branch of the depth system of the present application.
  • the calibration subnetwork is blocked in the deep system framework to block forward propagation and back propagation.
  • the location segmentation subnetwork is used to propose partial hypotheses for the classification, and the remaining location segmentation and classification modules are independently trained in the backpropagation phase.
  • the verification result of the second line in Fig. 18 indicates that lack of information propagation during the calibration process is not preferable.
  • valve linkage function is used in the calibration subnet to output the pose calibration portion for the classification of the forward propagation phase, but the valve linkage function is disabled during the backpropagation phase to prevent classification and calibration errors from propagating backwards to positioning. And segmentation.
  • the accuracy of the bird's head is only 78.2. Therefore, it is necessary to calibrate the sub-network in the forward propagation and back propagation phases.
  • valve linkage function is validated in the forward and reverse directions.
  • the framework is downgraded to our earlier positioning calibration classification model, which consists of positioning, calibration, and classification.
  • the positioning result is not as full as the fourth line in Fig. 14.
  • this model structure resulted in performance degradation in the classification of the head and torso.
  • Table 8 shows that the features of the head and torso are obtained with an accuracy of 79.5 and 63.3, and two feature vectors are joined to form a joint performance, yielding an accuracy of 83.7.
  • a deep convolutional neural network model is fine-tuned based on a full graph using a pre-trained model.
  • the sixth layer extracts features for an SVM classifier and achieves an accuracy of 76.3. After connecting the features of the head, torso, and full image, the accuracy was increased to 88.5.
  • the methods of [35], [62] also consider the head and torso, while combining the convolutional neural network features of the full image.
  • the accuracy improvement of the method of the present application is mainly due to the reliable positioning, segmentation and calibration operations in the depth system framework using the valve linkage function. 4) Apply Caltech-UCSDBird-200-2010 data set
  • the Caltech-UCSDBird-200-2010 dataset provides a total of 6033 images of 200 bird species.
  • the dataset does not provide partial annotations and only contains a small number of training and test sets. Therefore, it can verify the performance of the deep system framework trained on the Caltech-UCSDBird-200-2010 dataset on this dataset.
  • the classification accuracy of the different methods for the bird head and torso semantics in the CUB-200-2010 data set is shown in Figure 20.
  • the location split subnet and the calibration subnet are obtained by using the training set of the Caltech-UCSDBird-200-2010 data set. After the partial image of the posture calibration is obtained, the classification sub-network is updated on this data set.
  • the accuracy of the full map classification corresponding to the method of the present application is 63.7.
  • the classification accuracy of the bird's head is 67.3.
  • a performance improvement of 3.6 was obtained.
  • the lift amplitude becomes 6.5, and the best trunk recognition accuracy 49.1 is achieved by adding positioning, segmentation, and calibration operations.
  • the depth system image classification model of the present application can be applied to fine-grained identification of other object types.
  • the Standford Cars-96 data set was used in this section as an evaluation benchmark.
  • the car's dataset contains 16185 images from 196 categories and is also prepared for fine-grained recognition tasks, with a total of 8144 training images and 8041 test images.
  • the Standford Cars-96 dataset does not provide object tasks.
  • Figure 22 shows an example of a mask labeling, an example of marking a mask on the Standford Cars-96 data set.
  • a computer device is also provided, the internal structure of the computer device can be as shown in FIG. 26, the computer device includes an image classification device, and the image classification device includes various modules, and each module can be all or part of It is implemented by software, hardware or a combination thereof.
  • an image classification device including:
  • the input module 510 is configured to obtain an image to be classified, and input the image to be classified into the trained image classification model.
  • the trained image classification model includes a positioning segmentation subnetwork, a calibration subnetwork, and a classification subnetwork, and the calibration subnetwork is formulated as a valve.
  • the linkage function, the image classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is the calibrated image, and the back propagation in the training.
  • the output of the valve linkage function is a function of the positioning area and the division area of the output of the segmentation sub-network.
  • the segmentation module 520 is configured to perform positioning and segmentation of the image to be classified through the positioning and segmentation sub-network to obtain a segmented image including the positioning region and the segmentation region.
  • the calibration module 530 is configured to pass the segmented image through the calibration subnetwork, and the calibration subnetwork calibrates the target object to obtain a calibrated image.
  • the category determining module 540 is configured to perform fine-grained classification on the calibrated image through the classification sub-network to obtain a category corresponding to the image to be classified.
  • the location segmentation subnetwork comprises a location subnetwork and a segmentation subnetwork, the location network sharing the parameters of the convolutional neural network with the segmentation subnetwork.
  • the apparatus further includes:
  • the training module 550 is configured to acquire a training image set, where each training image in the training image set includes a standard positioning label box, a standard segmentation label box and a standard category label; and a template corresponding to each category is obtained from the training image set; Each training image is input into the segmentation sub-network, and the segmented training image including the current positioning region and the current segmentation region is obtained; the segmented training image is calibrated according to the template to obtain the calibrated training image; and the calibrated training image is input into the classification sub-network.
  • the total objective function includes positioning the segmentation sub-network objective function and the classification sub-network objective function, wherein the positioning segmentation sub-network objective function is a function of the valve linkage function, according to the current Output category, standard positioning label box, standard segmentation label box and standard category label calculate the value of the total objective function; adjust the positioning sub-network parameters and classification sub-network parameters according to the valve linkage function until the value of the total objective function Sufficient convergence condition; image classification model has been trained.
  • the training module 550 is further configured to calculate a similarity between any two training images in the training image set to form a similarity matrix; and the similarity matrix is subjected to a spectral clustering algorithm to divide each training image into Corresponding multiple clusters; obtaining each cluster center, and determining the target training images corresponding to each cluster to obtain templates corresponding to each category according to the similarity between each training image in each cluster and the corresponding cluster center, and the template is used to calibrate the image.
  • the training module 550 is further configured to acquire a calibration objective function including a similarity function, a distance function, and a foreground confidence function; adjusting a template center point, a rotation angle, a zoom ratio, and a current template until The calibration target function satisfies a convergence condition, and obtains a corresponding target template center point, a target rotation angle, a target zoom ratio, and a target template; and the target template center point, the target rotation angle, the target zoom ratio, and the target template are The training image is segmented for calibration to obtain a calibrated training image.
  • the total objective function is defined by the following formula:
  • J is the total objective function
  • E c is the positioning segmentation sub-network objective function
  • E ls is the classification sub-network objective function
  • W c is the parameter that needs to be determined to locate the segmentation sub-network
  • W ls is the parameter to be determined by the classification sub-network
  • V Indicates the valve linkage function
  • L is the positioning area of the positioning segmentation sub-network output
  • O is the segmentation area of the positioning segmentation sub-network output
  • I is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process.
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • I is the input original image
  • y gt is the standard category label
  • L gt is the standard positioning label box
  • o gt is the standard segmentation label box.
  • valve linkage function is defined by the following formula:
  • V represents the valve linkage function
  • L is the positioning area of the output of the segmentation sub-network
  • O is the segmentation area of the output of the segmentation sub-network.
  • I Is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • c * is the template center point used in the calibration
  • ⁇ * is the angle of rotation used in the calibration
  • ⁇ * is the target zoom ratio used in the calibration
  • I is the image after calibration of the original image
  • E a is the calibration energy function
  • the calibration energy function is defined by the following formula:
  • E a (c, ⁇ , ⁇ , t; I, L, O) S(I(c, ⁇ , ⁇ ), t) + ⁇ d D(c, L) + ⁇ s F(O, t m )
  • c is the template center point
  • is the rotation angle
  • is the target scaling ratio
  • t is the template
  • S is the similarity function
  • ⁇ d and ⁇ s are custom constants
  • D is the distance function
  • F is the foreground confidence.
  • the degree function, t m is the binary mask of the template.
  • Each of the above-described image classification devices may be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor in the computer device, or may be stored in a memory in the computer device in a software form, so that the processor invokes the operations corresponding to the above modules.
  • a computer device which may be a server, the internal structure of which may be as shown in FIG.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for operation of an operating system and computer programs in a non-volatile storage medium.
  • the database of the computer device is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer readable instructions are executed by the processor to implement the image classification method described in the above embodiments.
  • FIG. 26 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • a computer apparatus including a memory and a processor, the memory storing computer readable instructions, and when the processor executes the computer readable instructions, the following steps are performed: acquiring an image to be classified, and inputting the image to be classified into The trained image classification model, the trained image classification model includes the positioning segmentation sub-network, the calibration sub-network and the classification sub-network, the calibration sub-network is formulated as a valve linkage function, and the image classification model is to adjust the positioning segmentation sub-network through the valve linkage function and The parameters of the classification subnetwork are trained. In the forward propagation phase of the training, the output of the valve linkage function is the calibrated image.
  • the output of the valve linkage function is the positioning region for the output of the segmentation subnetwork.
  • the function of dividing the region; the image to be classified is positioned and segmented by the positioning segmentation sub-network to obtain the segmented image including the positioning region and the segmentation region; the segmented image passes through the calibration sub-network, and the calibration sub-network calibrates the target object to obtain Calibrate the image; already Fine-grained classification of the classified image sub-network, to obtain an image corresponding to the category to be classified.
  • the positioning and dividing sub-network comprises a positioning sub-network and a segmentation sub-network, and the positioning sub-network and the segmentation sub-network share parameters of the convolutional neural network.
  • the training of the image classification model comprises: acquiring a training image set, wherein each training image in the training image set comprises a standard positioning label box, a standard segmentation label box and a standard category label; and obtaining corresponding categories of each category from the training image set a template; inputting each training image in the training image set into the positioning sub-network to obtain a segmented training image including a current positioning area and a current segmentation area; and calibrating the segmented training image according to the template to obtain a calibrated training image;
  • the training image input classification sub-network obtains the corresponding current output category;
  • the total objective function corresponding to the image classification model is acquired, and the total objective function includes the positioning segmentation sub-network objective function and the classification sub-network objective function, wherein the positioning segmentation sub-network objective function is about the valve
  • the function of the linkage function calculates the value of the total objective function according to the current output category, the standard positioning label box, the standard segmentation label box and the standard category label; adjusts the positioning
  • obtaining a template corresponding to each category from the training image set includes: calculating a similarity between any two training images in the training image set to form a similarity matrix; and passing the similarity matrix through a spectral clustering algorithm, Each training image is divided into a plurality of corresponding clusters, and each cluster center is obtained. According to the similarity between each training image and the corresponding cluster center in each cluster, the target training image corresponding to each cluster is determined to obtain a template corresponding to each category, and the template is used for Calibrate the image.
  • calibrating the segmented training image according to a template to obtain a calibrated training image includes: obtaining a calibration objective function including a similarity function, a distance function, and a foreground confidence function; adjusting a template center point, and rotating Angle, zoom ratio and current template until the calibration target function satisfies the convergence condition, and obtain the corresponding target template center point, target rotation angle, target zoom ratio and target template; according to the target template center point, target rotation angle, target zoom ratio and target The template calibrates the segmented training image to obtain a calibrated training image.
  • the total objective function is defined by the following formula:
  • J is the total objective function
  • E c is the positioning segmentation sub-network objective function
  • E ls is the classification sub-network objective function
  • W c is the parameter that needs to be determined to locate the segmentation sub-network
  • W ls is the parameter to be determined by the classification sub-network
  • V Indicates the valve linkage function
  • L is the positioning area of the positioning segmentation sub-network output
  • O is the segmentation area of the positioning segmentation sub-network output
  • I is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process.
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • I is the input original image
  • y gt is the standard category label
  • L gt is the standard positioning label box
  • o gt is the standard segmentation label box.
  • valve linkage function is defined by the following formula:
  • V represents the valve linkage function
  • L is the positioning area of the output of the segmentation sub-network
  • O is the segmentation area of the output of the segmentation sub-network.
  • I Is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • c * is the template center point used in the calibration
  • ⁇ * is the angle of rotation used in the calibration
  • ⁇ * is the target zoom ratio used in the calibration
  • I is the image after calibration of the original image
  • E a is the calibration energy function
  • the calibration energy function is defined by the following formula:
  • E a (c, ⁇ , ⁇ , t; I, L, O) S(I(c, ⁇ , ⁇ ), t) + ⁇ d D(c, L) + ⁇ s F(O, t m )
  • c is the template center point
  • is the rotation angle
  • is the target scaling ratio
  • t is the template
  • S is the similarity function
  • ⁇ d and ⁇ s are custom constants
  • D is the distance function
  • F is the foreground confidence.
  • the degree function, t m is the binary mask of the template.
  • one or more non-volatile storage media storing computer readable instructions are provided, the computer readable instructions being executed by one or more processors, causing one or more processors to perform the following steps ::Get the image to be classified, and input the image to be classified into the trained image classification model.
  • the trained image classification model includes the positioning segmentation subnetwork, the calibration subnetwork and the classification subnetwork, and the calibration subnetwork is formulated as a valve linkage function, the image
  • the classification model is obtained by adjusting the parameters of the positioning segmentation subnetwork and the classification subnetwork through the valve linkage function.
  • the output of the valve linkage function is a calibrated image.
  • the valve linkage The output of the function is a function of the positioning area and the segmentation area of the output of the segmentation sub-network; the image to be classified is positioned and segmented by the positioning segmentation sub-network to obtain the segmented image including the positioning area and the segmentation area; the segmented image is calibrated Sub-network, calibration sub-network to target the target To obtain calibrated image; calibrated image classify the classified fine-grained sub-network, to obtain an image corresponding to the category to be classified.
  • the positioning and dividing sub-network comprises a positioning sub-network and a segmentation sub-network, and the positioning sub-network and the segmentation sub-network share parameters of the convolutional neural network.
  • the training of the image classification model comprises: acquiring a training image set, wherein each training image in the training image set comprises a standard positioning label box, a standard segmentation label box and a standard category label; and obtaining corresponding categories of each category from the training image set a template; inputting each training image in the training image set into the positioning sub-network to obtain a segmented training image including a current positioning area and a current segmentation area; and calibrating the segmented training image according to the template to obtain a calibrated training image;
  • the training image input classification sub-network obtains the corresponding current output category;
  • the total objective function corresponding to the image classification model is acquired, and the total objective function includes the positioning segmentation sub-network objective function and the classification sub-network objective function, wherein the positioning segmentation sub-network objective function is about the valve
  • the function of the linkage function calculates the value of the total objective function according to the current output category, the standard positioning label box, the standard segmentation label box and the standard category label; adjusts the positioning
  • obtaining a template corresponding to each category from the training image set includes: calculating a similarity between any two training images in the training image set to form a similarity matrix; and passing the similarity matrix through a spectral clustering algorithm, Each training image is divided into a plurality of corresponding clusters, and each cluster center is obtained. According to the similarity between each training image and the corresponding cluster center in each cluster, the target training image corresponding to each cluster is determined to obtain a template corresponding to each category, and the template is used for Calibrate the image.
  • calibrating the segmented training image according to a template to obtain a calibrated training image includes: obtaining a calibration objective function including a similarity function, a distance function, and a foreground confidence function; adjusting a template center point, and rotating Angle, zoom ratio and current template until the calibration target function satisfies the convergence condition, and obtain the corresponding target template center point, target rotation angle, target zoom ratio and target template; according to the target template center point, target rotation angle, target zoom ratio and target The template calibrates the segmented training image to obtain a calibrated training image.
  • the total objective function is defined by the following formula:
  • J is the total objective function
  • E c is the positioning segmentation sub-network objective function
  • E ls is the classification sub-network objective function
  • W c is the parameter that needs to be determined to locate the segmentation sub-network
  • W ls is the parameter to be determined by the classification sub-network
  • V Indicates the valve linkage function
  • L is the positioning area of the positioning segmentation sub-network output
  • O is the segmentation area of the positioning segmentation sub-network output
  • I is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process.
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • I is the input original image
  • y gt is the standard category label
  • L gt is the standard positioning label box
  • o gt is the standard segmentation label box.
  • valve linkage function is defined by the following formula:
  • V represents the valve linkage function
  • L is the positioning area of the output of the segmentation sub-network
  • O is the segmentation area of the output of the segmentation sub-network.
  • I Is the input original image
  • L f is the positioning area of the positioning segmentation sub-network output in the forward process
  • O f is the segmentation area of the positioning segmentation sub-network output in the forward process
  • c * is the template center point used in the calibration
  • ⁇ * is the angle of rotation used in the calibration
  • ⁇ * is the target zoom ratio used in the calibration
  • I is the image after calibration of the original image
  • E a is the calibration energy function
  • the calibration energy function is defined by the following formula:
  • E a (c, ⁇ , ⁇ , t; I, L, O) S(I(c, ⁇ , ⁇ ), t) + ⁇ d D(c, L) + ⁇ s F(O, t m )
  • c is the template center point
  • is the rotation angle
  • is the target scaling ratio
  • t is the template
  • S is the similarity function
  • ⁇ d and ⁇ s are custom constants
  • D is the distance function
  • F is the foreground confidence.
  • the degree function, t m is the binary mask of the template.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种图像分类方法包括:计算机设备获取待分类图像,将待分类图像输入已训练的图像分类模型,已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;已分割图像经过所述校准子网络,校准子网络对目标对象进行校准得到已校准图像;已校准图像经过所述分类子网络进行细粒度分类,得到待分类图像对应的类别。

Description

图像分类方法、计算机设备和存储介质
本申请要求于2018年05月15日提交中国专利局,申请号为201810462613.5,申请名称为“图像分类方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种图像分类方法、计算机设备和存储介质。
背景技术
细粒度的识别强调识别不同形状和姿势的物体类别之间的细微差别。细粒度物体识别的目的是识别子对象的对象类,它用来寻找动物、产品品牌以及建筑风格之间的细微差异。
传统的分类方法用定位和校准来减少姿势变化,因为所有的步骤都被独立的处理,因此每一个在定位中出现的误差都能够影响校准和分类,细粒度分类的准确度受到影响。
发明内容
根据本申请提供的各种实施例提供一种图像分类方法、计算机设备和存储介质。
一种图像分类方法,包括:
计算机设备获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的 正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;
所述计算机设备将所述待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;
所述计算机设备将所述已分割图像经过所述校准子网络,所述校准子网络对目标对象进行校准得到已校准图像;及
所述计算机设备将所述已校准图像经过所述分类子网络进行细粒度分类,得到所述待分类图像对应的类别。
在其中一个实施例中,所述定位分割子网络包括定位子网络和分割子网络,所述定位子网络与分割子网络共享卷积神经网络的参数。
在其中一个实施例中,图像分类模型的训练步骤包括:
所述计算机设备获取训练图像集合,所述训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签;
所述计算机设备从所述训练图像集合获取各个类别对应的模板;
所述计算机设备将所述训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;
所述计算机设备根据所述模板对所述已分割训练图像进行校准得到已校准训练图像;
所述计算机设备将所述已校准训练图像输入分类子网络得到对应的当前输出类别;
所述计算机设备获取图像分类模型对应的总目标函数,所述总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中所述定位分割子网络目标函数是关于所述阀门联动函数的函数,根据所述当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;
所述计算机设备根据所述阀门联动函数调整定位分割子网络参数和分类子网络参数,直到所述总目标函数的取值满足收敛条件;
所述计算机设备得到所述已训练的图像分类模型。
在其中一个实施例中,所述从所述训练图像集合获取各个类别对应的模板,包括:
所述计算机设备计算所述训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵;
所述计算机设备将所述相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群;
所述计算机设备获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到所述各个类别对应的模板,所述模板用于对图像进行校准。
在其中一个实施例中,所述根据所述模板对所述已分割训练图像进行校准得到已校准训练图像,包括:
所述计算机设备获取校准目标函数,所述校准目标函数包括相似度函数、距离函数和前景置信度函数;
所述计算机设备调整模板中心点、旋转角度、缩放比和当前模板,直到所述校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;
所述计算机设备根据所述目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
在其中一个实施例中,所述总目标函数通过以下公式定义:
J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割 区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
在其中一个实施例中,所述阀门联动函数通过以下公式定义:
Figure PCTCN2018090370-appb-000001
其中:V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:
获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;
所述待分类图像经过定位分割子网络进行目标对象定位和分割得到包含 定位区域和分割区域的已分割图像;
所述已分割图像经过所述校准子网络,所述校准子网络对目标对象进行校准得到已校准图像;及
所述已校准图像经过所述分类子网络进行细粒度分类,得到所述待分类图像对应的类别。
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;
所述待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;
所述已分割图像经过所述校准子网络,所述校准子网络对目标对象进行校准得到已校准图像;及
所述已校准图像经过所述分类子网络进行细粒度分类,得到所述待分类图像对应的类别。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本 申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中图像分类方法的应用环境图;
图2为一个实施例中图像分类方法的流程示意图;
图3为一个实施例中得到已训练的图像分类模型的流程示意图;
图4为一个实施例中确定类别对应的模板的流程示意图;
图5为一个实施例中鸟头和鸟躯干的训练图像示意图,其中被选中为模板的图像显示在图5(a)和图5(b)的第一列;
图6为一个实施例中根据模板得到已校准训练图像的流程示意图;
图7为一个实施例中校准部分的前景置信度图和二元掩膜的示意图;
图8为一个实施例中校准前和校准后的图像对比示意图;
图9为一个实施例中深度系统图像分类系统的处理过程示意图;
图10为一个实施例中分别设置不共享卷积神经网络参数和参数共享,部分定位结果比较示意图;
图11为一个实施例中分别设置不共享卷积神经网络参数和参数共享,分割精度比较示意图;
图12为一个实施例中输入图像和各种情况下的分割结果示意图;
图13为在CUB-200-2011数据集上有和没有使用阀门联动功能的物体分割精度对比示意图;
图14中为本申请方法与其他方法在头和躯干的定位准确率比较结果;
图15为一个实施例中包含头和躯干的预测边界框的定位示意图;
图16为在CUB-200-2011数据集上就物体分割本申请方法与其他分割方法的比较示意图;
图17为不同算法对应的不同的分割结果示意图;
图18为在CUB-200-2011数据集中鸟头和躯干语义部分的分类精度示意图;
图19为在CUB-200-2011数据集上本申请最后的分类精确度与其他的前沿的方法的比较结果示意图;
图20为不同方法在CUB-200-2010数据集中鸟头和躯干语义部分的分类精度示意图;
图21为本申请方法与其他方法对应的分类精确度比较示意图;
图22为在StandfordCars-96数据集上标注掩膜的示意图;
图23为本申请的深度系统和其他的方法在StandfordCars-96数据集上的分类精确度示意图;
图24为一个实施例中图像分类装置的结构框图;
图25为另一个实施例中图像分类装置的结构框图;及
图26为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的图像分类方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。终端可获取用户输入的待分类图像,将待分类图像发送至服务器104进行分类或直接在终端102进行分类。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种图像分类方法,以该方法应用于图1中的终端或服务器为例进行说明,包括以下步骤:
步骤S210,获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀 门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数。
其中,待分类图像是指需要指定细粒度类别的图像。待分类图像可以实时采集的图像,也可以是从存储的文件中获取的图像。图像分类模型用于对输入图像进行细粒度类别的分类,输出对应的分类结果。可对待分类图像进行前处理,如将待分类图像的分辨率进行统一。
定位分割子网络用于得到定位区域和分割区域,其中定位分割子网络可以由相互关联的定位子网络和分割子网络组成,也可以由相互独立的定位子网络和分割子网络组成。相互关联是指两个子网络的训练过程统一,是关联训练得到的,如存在共享参数等。定位子网络输出目标物体的基本位置,可通过定位框进行展示。分割子网络通过二类回归的预成型为目标物体和背景产生了像素级的分割。
校准子网络被公式化为阀门联动函数,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数,形成一个基于定位结果和分割结果的校准模块,通过阀门联动函数调整定位分割子网络和分类子网络的参数对图像分类模型进行训练,阀门联动函数使得定位分割子网络、校准子网络、分类子网络在训练阶段作为一个整体,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,使得校准和其他的基于深度卷积神经网络成分的子网络结合起来。
具体地,可以将图像分类模型的目标函数定义为与阀门联动函数、分类子网络参数相关联,而阀门联动函数又是关于定位分割子网络输出的定位区域和分割区域的函数,定位分割子网络输出的定位区域和分割区域是与定位分割子网络参数相关的,从而在训练过程中,通过阀门联动函数调整定位分割子网络和分类子网络的参数。在满足上述约束条件的基础上,阀门联动函数的具体定义可根据需要自定义。在图像分类模型添加了阀门联动函数作为 定位分割子网络和分类模块的桥梁。在训练的时候,这个函数适应性的控制从分类模块到定位分割子网络的更新信号传播。
步骤S220,待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像。
具体地,待分类图像经过定位分割子网络中的定位子网络输出定位区域,可以是包含(x 1,y 1)、(x 2,y 2)的边界框,其中x 1、x 2为边界框的横向起始坐标和横向终止坐标,y 1、y 2为边界框的纵向起始坐标和纵向终止坐标。将包含定位区域的图像进一步通过定位分割子网络中的分割子网络产生了像素级的物体区域,得到已分割图像。
步骤S230,已分割图像经过校准子网络,校准子网络对目标对象进行校准得到已校准图像。
具体地,校准子网络从定位和分割网络得到物体部位的定位结果L和分割结果O,然后执行模板对齐,并将坐标对齐的图像提供给分类子网络。模板对齐是校准的过程,模板的个数可以为一个或多个,可通过多模板选择来掌握姿势变化。
校准时需要求解参数,包括目标模板中心点、目标旋转角度、目标缩放比和目标模板。求解参数时,先获取相似度函数、距离函数和前景置信度函数,再将对应的已知量代入上述函数,通过调整模板中心点、旋转角度、缩放比和模板,使得上述函数组成的目标函数满足收敛条件,从而得到目标模板中心点、目标旋转角度、目标缩放比和目标模板。从而通过目标模板对已分割图像以目标模板中心点为中心,进行目标旋转角度的旋转和目标缩放比的图像缩放得到已校准图像。
步骤S240,已校准图像经过分类子网络进行细粒度分类,得到待分类图像对应的类别。
具体地,已校准图像经过分类子网络,输出对应的类别标签,从而得到对应的待分类图像对应的细粒度类别。
本实施例中,通过获取待分类图像,将待分类图像输入已训练的图像分 类模型,已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数,待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;已分割图像经过所述校准子网络,校准子网络对目标对象进行校准得到已校准图像;已校准图像经过所述分类子网络进行细粒度分类,得到待分类图像对应的类别,可提高图像分类的精准性。
在一个实施例中,定位分割子网络包括定位子网络和分割子网络,所述定位子网络与分割子网络共享卷积神经网络的参数。
具体地,定位子网络和分割子网络共享卷积神经网络的参数,将定位和分割联合训练,共享卷积神经网络的参数能生成更加精准的模型,比对定位子网络和分割子网络分别独立得到的模型的准确度高。
其中,定位子网络包括了一组参数W 1和一个为回归边界框(x 1,y 1)(x 2,y 2)输出的回归值L,给定一个输入图像I∈R h×w×3,边界框回归量L=(x 1,y 1,x 2,y 2),我们将定位子网表述为:L=f l(W l;I),其中f l表示定位子网络函数,W 1表示定位子网络参数,L=(x 1,y 1,x 2,y 2)表示定位框。
使用分割子网络生成一个反向映射,O(c i)=P(o i=1|c i,W s),其中0代表背景,1代表前景,O表示像素属于前景的概率值。大的概率意味着像素点位于一个物体区域的内部,反向映射减少了校准操作被应用在背景上的可能性。
将定位子网络和分割子网络共享的一组参数表示为W ls。在一个实施例中,我们制订了定位和分割的目标函数为:
Figure PCTCN2018090370-appb-000002
其中E ls为定位分割子网络的目标函数,f l表示定位子网络函数,I表示输 入图像,L gt表示标准定位标注框,c i表示像素点,
Figure PCTCN2018090370-appb-000003
表示像素点c i的真实取值,o i表示像素点c i的取值,P表示概率函数,N表示输入图像的像素点的总个数。通过定位子网络和分割子网络共享参数的定位分割子网络的目标函数,可以平衡定位和分割间的损失值。我们将定位子网络和分割子网络的输出部分建立在一组基础卷积层上面。这组基础卷积层生成的特征被定位子网络和分割子网络共享,用于生成定位坐标以及像素级别的分割结果。
在一个实施例中,如图3所示,图像分类模型的训练步骤包括:
步骤S310,获取训练图像集合,训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签。
具体地,训练图像集合中包括了多个训练图像,每个训练图像包括标准定位标注框,标准分割标注框和标准类别标签,其中标准定位标注框用于标注真实的定位结果,标准分割标注框用于标注真实的像素级分割结果,标准类别标签用于标注真实的分类结果。
步骤S320,从训练图像集合获取各个类别对应的模板。
具体地,训练图像集合中的多个训练图像可聚类为不同的类别,不同的类别使用对应不同的模板,模板用于对训练图像进行校准。可根据不同类别对应的各个训练图像间的相似性从各个训练图像中为不同的类别选取对应的模板。其中选取模板的方法可根据需要自定义。各个类别对应的模板的个数不限定,可以为一个或多个。
步骤S330,将训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像。
具体地,训练时,可通过随机的参数对图像分类模型进行初始化,将训练图像集合中的各个训练图像输入定位分割子网络,得到与当前参数对应的包含当前定位区域和当前分割区域的已分割训练图像。
步骤S340,根据模板对已分割训练图像进行校准得到已校准训练图像。
具体地,校准的过程需要先调整模板中心点,再根据调整模板中心点后 的模板对待校准图像调整旋转角度、缩放比,当存在多个模板时,还需要选取目标模板。可通过自定义的校准目标函数确定目标模板中心点、目标旋转角度、目标缩放比和目标模板。
步骤S350,将已校准训练图像输入分类子网络得到对应的当前输出类别。
具体地,分类子网络是图像分类模型的最后一个模块。已校准训练图像作为输入,表示为I *∈R h×w×3。分类卷积神经网络被表达为:y=f c(W c;I *),其中W c为分类子网络的参数,I *是经过姿态校准的部分,是已校准训练图像。f c是分类子网络的函数名,输出是一个类别标签y。在整个训练过程中,标准类别标签是期望标签,预测的种类标签y应该与标准类别标签一致。将已校准训练图像输入分类子网络得到当前参数对应的当前输出预测类别。
步骤S360,获取图像分类模型对应的总目标函数,总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中定位分割子网络目标函数是关于所述阀门联动函数的函数,根据当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值。
具体地,总目标函数是关于定位分割子网络目标函数和分类子网络目标函数的函数,且校准子网络被公式化成阀门联动函数,在前向过程,阀门联动函数用于得到校准后的图像,而在后向过程,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数,阀门联动函数被用于调整定位分割子网络的参数。使得定位分割子网络目标函数和分类子网络在训练阶段作为一个整体进进行训练。阀门联动函数是关于校准能量函数和已校准图像的函数,校准能量函数是关于校准目标函数和前向传播能量的函数,阀门联动函数保留了校准能量的函数,为此部分的变量位置和对象性映射可以被视为输入,它使得更新分类信号能够通过链式规则传递到定位分割子网络。
步骤S370,根据阀门联动函数调整定位分割子网络参数和分类子网络参数,直到总目标函数的取值满足收敛条件,得到已训练的图像分类模型。
具体地,定位分割子网络参数和分类子网络参数是需要被确定的参数, 定位分割子网络和分类子网络在训练过程中被1的两个因子所平衡,通过最小化总目标函数来更新定位分割子网络和分类子网络。
本实施例中,当训练图像分类模型的时候,阀门联动功能能够适应性的折衷分类和校准的误差,同时,也能更新定位分割子网络和分类子网络的参数,确定更准确的模型参数。
在一个实施例中,如图4所示,步骤S320包括:
步骤S321,计算训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵。
具体地,通过相似性算法来计算任意两个训练图像之间的相似性,具体的计算方法可自定义,如训练图像集合中包括N个训练图像,则计算任意两个训练图像之间相似性,组成相似性矩阵R N×N。在一个实施例中,在计算图像R i,R j的相似性时,为了减少光照变化带来的影响,正则化了每个图像的像素值,将像素的范围量化成256个数值,然后分别计算,比如,P i,P j,是两个属于R i和R j的灰阶值。灰阶值的正则化和分布值的计算遵循归一化的颜色直方图的构造。R i和R j有相同的尺寸,每两个像素点在R i和R j中有着相同的位置,这形成了一个灰阶值的2D元组。通过使用这个元组,我们计算了R i和R j的灰阶值的联合分布,表示为P ij。根据这个P i,P j,P ij,定义相似函数:
Figure PCTCN2018090370-appb-000004
其中S表示相似函数,用于衡量两张图像的姿态是否相似,R i,R j表示两个尺寸相同的图像,P i,P j分别表示R i和R j的灰阶值分布,类似频率直方图,P ij表示R i和R j的灰阶值的联合分布,m,n表示像素坐标值,M和N分别表示图像的长和宽。
步骤S322,将相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群。
具体地,谱聚类是一种聚类算法,对数据分布的适应性更强,聚类效果也很优秀,同时聚类的计算量也小很多。相似性矩阵经过谱聚类算法,将各 个训练图像分成对应的多个集群。
步骤S323,获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到所述各个类别对应的模板,所述模板用于对图像进行校准。
具体地,每个集群都有一个中心,我们通过相似函数来计算每个集群中的训练图像与集群中心的相似度,从而得出和集群中心最为相似的训练图像,这个训练图像就是这个集群对应的模板,一个集群对应一个类别,从而得到各个类别对应的模板。一个类别对应的模板可以为一个或多个。如图5所示,为鸟头和鸟躯干的训练图像示意图,其中被选中为模板的图像显示在图5(a)和图5(b)的第一列。
本实施例中,通过计算图像间的相似性和聚类算法自适应的计算得到各个类别对应的模板,动态选取模板,提高了模板选取的准确性。
在一个实施例中,如图6所示,步骤S340包括:
步骤S341,获取校准目标函数,所述校准目标函数包括相似度函数、距离函数和前景置信度函数。
具体地,校准目标函数用于确定目标模板中心点、目标旋转角度、目标缩放比和目标模板,相似度函数用于描述待校准图像与模板之间的相似性,距离函数与模板中心点与定位分割子网络的输出的定位框的中心点之间的距离相关,前景置信度函数用于描述模板所覆盖的区域的前景置信度。通过明晓物体的前景形状,当我们通过模板来校准部分区域的时候,背景的影响能被降低,所以需要测量被模板覆盖的校准部分的前景置信度。如图7所示,显示了校准部分的前景置信度图和二元掩膜。
在一个实施例中,距离函数通过以下公式定义:
Figure PCTCN2018090370-appb-000005
其中D(c,L)表示距离函数,c表示模板中心点,L是定位分割子网络的输出的定位框,c r(L)表示定位分割子网络的输出的定位框的中心点。其中σ按经验可自定义,在一个实施例中设置为15。
Figure PCTCN2018090370-appb-000006
表示边界框L的中心。
相似性的测量根据像素值的分布而定义,但是缺乏关键性的物体的形状信息,通过前景置信度函数描述物体的形状信息。对于模板t,给出了二元掩膜t m,使t m(c i)∈{0,1},这意味着像素点c i属于背景或是前景,0或1分别表示背景和前景。对于c i,分别用O f(c i)和O b(c i)来作为前景或背景的分值,通过下面来计算:
O f(c i)=-log(1-O(c i)),O b(c i)=-logO(c i)
其中O f表示像素的前景置信度,O b表示像素的背景置信度,O f越高意味着像素点在前景的可能性越大,O b越高意味着像素点在背景的可能性越大。假设t m总共有N个像素点,其中包括N f个前景点,N b个背景点,定义前景置信度如下:
Figure PCTCN2018090370-appb-000007
其中F表示模板所覆盖的区域的前景置信度,t m表示模板的二元掩膜,N f表示模板的二元掩膜所包含的前景像素数目,N b表示模板的二元掩膜所包含的背景像素数目。促使前景可能性高的部分地区定位在模板的前景区域,同时抑制背景区域与前景区域重叠的模板,经过前景置信度的引导,前景区域能被更好的校准。
在一个实施例中,校准目标函数定义如下:
E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m)
其中E a表示校准目标函数,c,θ,α,t分别表示需要校准的参数,分别为模板中心点、旋转角度、缩放比和当前模板。其中λ d和λ s是常量,可自定义,在一个实施例中,它们被分别设为0.001和0.003。
步骤S342,调整模板中心点、旋转角度、缩放比和当前模板,直到所述 校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板。
具体地,通过最大化校准目标函数来得到目标模板中心点、目标旋转角度、目标缩放比和目标模板。校准目标函数的输出越大代表越值得信赖的校准。
步骤S343,根据目标模板中心点、目标旋转角度、目标缩放比和目标模板对已分割训练图像进行校准,得到已校准训练图像。
具体地,得到目标模板中心点、目标旋转角度、目标缩放比和目标模板后,就可对已分割训练图像进行校准,得到已校准训练图像。如图8所示,为一个实施例中校准前和校准后的图像对比示意图。在图8(a)中,左边的列展示了鸟头部未经过校准的图像,右边的列展示了鸟头部通过模板校准后的图像。在图8(b)中,左边的列展示了鸟躯干未经过校准的图像,右边的列展示了鸟躯干通过模板校准后的图像。
本实施例中,动态计算得到目标模板中心点、目标旋转角度、目标缩放比和目标模板,计算算法综合考虑了相似度、中心距离和前景置信度,使得校准结果更值得信赖。
在一个实施例中,总目标函数通过以下公式定义:
J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
在一个实施例中,阀门联动函数通过以下公式定义:
Figure PCTCN2018090370-appb-000008
其中:V表示阀门联动函数,
L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
具体地,原始图像进行姿态校准后的图像为I(c *,θ *,α *),在训练的正向传播阶段,即前向过程中L和O是常量,在训练的反向传播阶段,L和O是变量。其中{c ***,t *}=argmax c,θ,α,tE a(c,θ,α,t;I,L f,O f)表示c ***,t *满足使校准能量函数最大。其中阀门联动函数折衷了三个关键的条件:1)校准能量函数,2)关于L f和O f的前向传播能量,3)姿态校准后的图像。
在前向传播阶段,校准子网络接收的输入为L f和O f,前向过程中L和O是常量,校准能量函数以及前向传播能量处于一个比率形式,在前向传播阶段此比率为1,使得阀门联动函数的输出为V(L f,O f;L,L f,O f)=I(c **,α *),即阀门联动函数的输出为已校准图像。
其中,阀门联动函数保留了校准能量函数,使得更新分类信号能够通过链式规则传递到定位分割子网络。在反向传播阶段,校准子网络V(L f,O f;L, f,O f)的输出成为了L和O的一个函数。因此,图像分类模型的总目标函数被制定 为:
J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
通过最小化这个客观函数来更新定位分割子网络和分类子网络,为了更新分类子网络,我们计算了J关于W c的梯度。为了更新定位分割子网络,关于W ls的梯度这样计算:
Figure PCTCN2018090370-appb-000009
其中E ls和E c分别表示定位分割子网络和分类子网的训练参数,
Figure PCTCN2018090370-appb-000010
表示在定位分割内的反向传播阶段。
其中
Figure PCTCN2018090370-appb-000011
可以展开为:
Figure PCTCN2018090370-appb-000012
其中
Figure PCTCN2018090370-appb-000013
在分类的反向传播阶段传递有效信息,梯度
Figure PCTCN2018090370-appb-000014
Figure PCTCN2018090370-appb-000015
被用来更新定位分割子网。根据链式法则,阀门联动函数V在反向传播阶段连接了分类和定位分割子网,具体通过
Figure PCTCN2018090370-appb-000016
Figure PCTCN2018090370-appb-000017
连接。由于连接可用,定位分割子网的更新对分类的反向传播信号敏感。
此外,分类子网络和定位分割子网络之间的信号交流能被阀门联动函数适应性的调整。在反向传播阶段,阀门联动函数V可以被写成:
Figure PCTCN2018090370-appb-000018
其中,e=E a(c ***,t *;I,L,O)是在前向传播中计算的校准能量。这个前向传播校准能量被应用,从而适应性的对定位分割部分进行更新。阀门联动函数从分类子网中提取信息,而且能适应性的对定位分割部分进行更新。
在前向传播阶段,校准能量被当作一个在BP阶段的常量。根据这个能量,连接部分
Figure PCTCN2018090370-appb-000019
可表示为:
Figure PCTCN2018090370-appb-000020
Figure PCTCN2018090370-appb-000021
被扩展为:
Figure PCTCN2018090370-appb-000022
其中c=(c x,c y),同时:
Figure PCTCN2018090370-appb-000023
其中,
Figure PCTCN2018090370-appb-000024
可被视为一个控制分类影响的阀,一个大的校准分值e相当于前向传播阶段中更好的校准。在反向传播阶段,
Figure PCTCN2018090370-appb-000025
被用来给分类子网络中的更新信号
Figure PCTCN2018090370-appb-000026
重设权重。阀门联动函数相当于在分类和校准误差之间进行折衷。
在这种情况下,一个大的e意味着在反向传播阶段较好的校准,来自分类子网络的信息被减少为
Figure PCTCN2018090370-appb-000027
相反的,如果e很小,则校准准确率降低。因此,为了定位分割子网络的更新可设置合适的
Figure PCTCN2018090370-appb-000028
引进更多的分类信息。可以将
Figure PCTCN2018090370-appb-000029
理解成一个在反向传播阶段的动态学习率,自适应的匹配性能。
其中,
Figure PCTCN2018090370-appb-000030
的连接部分可以写成如下形式:
Figure PCTCN2018090370-appb-000031
局部偏导数
Figure PCTCN2018090370-appb-000032
可以这样表示:
Figure PCTCN2018090370-appb-000033
除了适应性的因子
Figure PCTCN2018090370-appb-000034
分割的更新也被模板t m所引导,正如公式(1)指定的,在公式(1)的定义下,在t m(c i)=1下的模板允许
Figure PCTCN2018090370-appb-000035
来监督分割操作。另一方面,这个信号在t m(c i)=0的时候变成了
Figure PCTCN2018090370-appb-000036
这意味着通过模板的前景和背景区域可以灵活的转变这个控制信号。由于与部分区域匹配的模板掩码可用,网络不仅受到减小全局分割误差的对象区域的监督,而且受到纠正对象边界的模板形状信息的监督。如图10所示,从图中可看出包含额外形状信息确实提升了分割结果的准确性。由于用了这种自调整机制在阀门联动函数连接分类和校准,定位分割子网络在反向传播阶段也能够被加强。
如图9,为一个实施例中,深度系统图像分类系统的处理过程示意图,该系统由定位分割、校准和分类这三个子网络组成。在阀门联动函数的调整 下,在前向传播阶段,校准子网络为分类子网络输出了姿势校准的部位图像,同时,分类和校准的误差也能在反向传播阶段被传回到定位分割子网络。
进一步的,在3个数据集上(①Caltech-UCSD Bird-200-2011)②Caltech-UCSD Bird-200-2010③Standford Cars-196)评估了本算法。由于①号数据集更多的被使用于分析实验。因此,主要的评估实施在①号数据集上,然后使用其他两个数据集来和近来的一些技术做比较。具体实验过程如下:
实施过程中,鸟头和躯干被视作语义部分。我们分别给它们训练了两个深度系统得到图像分类模型。所有的卷积神经网络模型是基于VGG-16网络来调整的。在定位分割子网络中,所有的输入图像被初始化大小为224×224。删除了原始的全连接层。其中输出了一个结构,这个结构为定位边界框和为前景和背景标签的像素概率映射。训练模型时,先初始化定位分割子网络,其中分类子网络的输入为224×224的图片。第一个全连接层被扩展为4096维的特征。然后,通过卷积神经网络所提取出来的特征训练一个支持向量机分类器。
对于校准操作,在模板选择中,所有的在①号数据集中的为关于头和躯干的5994个部分标注都被使用。这5994个部分被裁剪为224×224。使用谱聚类算法将这些数据分成了30个集群。从每个集群中,选择贴近集群中心的集群区域以及其镜像版本作为两个模板。这个操作最终形成了60个模板。旋转角度θ是一个范围为[-60,60]的整数,变化间隔为10°。所有的输入图像和模板都被重设为224×224大小,图像中的待校准区域比任何一个模板都小,为了匹配待校准区域和模板的大小,需要按比例放大输入图像。为头设置了放大比例集合{1.5,2.7,4.0,7.7,15.0},为躯干设置了放大比例集合{1.2,1.4,2.0,2.5,3.5}。
依据模板的搜寻空间,旋转角度以及放缩比例在验证集的表现而发生调整,这个验证集包含了1000张从训练集中随意挑选的图像。通过扩展查找空间,发现了性能提升。因此,根据所有的实验表现,保持使用查找空间。姿势相似度函数的结果可以被预先计算而存放起来,在GPU的加速下,遍历整个姿势位置、模板、放缩比例以及旋转角度来完成计算姿势相似度,每张图片只需要5s的时间。因此,姿势相似度能够在前向传播中很快的查出来,使得每张图片的训练时间为15ms,测试时间为8ms。
在Caltech-UCSD Bird-200-2011数据集上评估了我们的方法。这个数据集包含11788张鸟的图像,分成了200个下属类别。每张图像包含标准定位分割标注框和标准类别标签。在整个训练和测试过程中,我们利用了数据集的边界框来简化分类。训练测试四角定位,定义两种语义模板,分为头和躯干。在鸟的头部和躯干的地方使用相应的矩形覆盖了标注部分。
1)定位分割子网络分析
为了获取物体和部分的联系,定位和分割共享了卷积的参数。为了调查参数共享的效率,在卷积神经网络中分别设置不共享卷积神经网络参数和参数共享,并进行比较,部分定位结果的比较结果如图10所示。在图10中,计算了正确定位部分百分比,这是根据排名靠前的部分定位来计算的,并且将与真实表现的重叠部分≥0.5的视为正确的。不共享卷积神经网络参数时,在头和躯干上定位结果正确率分别为93.2和94.3,通过参数共享的分割,在头和躯干上有更好的定位结果,正确率分别为95.0和97.0。
在图11中展示了在卷积神经网络中分别设置不共享卷积神经网络参数和参数共享,并进行比较分割精度,其中,“bg”和“fg”缩写词分别表示背景和前景,使用了检测评价函数分数来评估分割表现。计算一个平均的检测评价函数分数来评估总体的分割精度。通过比较结果看出,参数共享提升了 在前景和背景区域上的分割精度,如图12所示,展示了输入图像和各种情况下的分割结果,其中,图12(a)表示输入图像,图12(b)表示分割真实结果,图12(c)表示没有进行参数共享的分割结果,图12(d)表示不含阀门联动函数的分割结果,图12(e)表示基于深度系统框架的分割结果,可见图12(c)和图12(e)的视觉差异很明显,包含阀门联动函数的分割结果精准很多。
为了更进一步的理解阀门联动函数对定位分割的性能提升,将这个子网从联合的深度系统模块移除,然后再与完整的深度系统相比较。
在图8中展示了定位精度上的比较,就重叠部分大于0.5,0.6,0.7的正确定位部分百分比测试了表现。在全部的配置上,定位分支比深度系统模型的表现差一点。比较而言,去除阀门联动函数的系统,其中分割子网也会遭受性能退化,如图13所示,展示了在CUB-200-2011数据集上有和没有使用阀门联动功能的物体分割精度,图12(d)展示了分割结果。造成性能降低的原因是,在是缺少阀门联动函数时,定位分割子网络没有从校准和分类操作中得到回馈,而存在阀门联动函数的深度系统在迭代中更新了校准和分类,使得结果更准确。
为了评估部分定位的性能,在图14中展示了本申请方法与其他方法在头和躯干的定位准确率比较结果,我们使用VGG-6的结构。在使用相同的实验设置下,图14中显示了比较结果。
对于头和躯干部位,相比早前的最好的结果93.4和96.2,本申请算法结果是95.0和97.0。图15展示了一些例子,是包含头和躯干的预测边界框的。与先前的定位-校准-分割模型相比,我们的深度系统模型在整体部位的定位操作上提升了性能。特别是由于小区域而改变的头部的定位被显著的从90.0提高到了95.0。性能差距表明本申请的定位分割子网络捕获物体部分关系的重要性,这对于边界框的回归是有益的。
本申请的深度模型包括分割,为物体的分割来训练一个基准的全卷积神经网络。除了基于卷积神经网络的解决办法,交互式的物体分割工具GrabCut 和协同分割方法可以被使用。在图16中给出了这些方法的分割精度,表示在CUB-200-2011数据集上就物体分割本申请方法与其他分割方法的比较。
正如图16中展示的,与本申请的深度图像分类模型的分数84.5相比,基准全卷积神经网络产生的平均检测评价函数分数仅仅为78.8。这个性能退化源于基准的全卷积神经网络没有从参数共享中得到提升。对于不含卷积神经网络的方法等,GrabCut和协同分割方法,由于它们依赖于丢失语义对象信息的低级图像表示,因而表现了更低的精度。图17体现了这一点,图17展示了不同的分割结果示意图,其中图17(a)表示输入图像,图17(b)表示分割真实结果,图17(c)表示GrabCut的结果,图17(d)表示协同分割的结果,图17(e)表示基准全卷积神经网络的分割结果,图17(f)表示本申请深度系统的分割分支的分割结果。
2)子网络组合分析
上面的实验结果表明伴随三个子网络的深度系统框架在部分定位和物体分割中的表现很好。我们也在下面5个案例中评估了在细粒度分类的表现和删除一个或者两个子网络的实验。
第一、在删除了定位分割子网络的情况下验证图像上的分割精度,验证结果显示在图18的第一行,没有这个模块,全图的分类精度只有76.3,其中图18展示了CUB-200-2011数据集中鸟头和躯干语义部分的分类精度,分别用定位子网络和校准子网络来评估实验表现。
第二、在深度系统框架中阻断了校准子网络来阻断前向传播和反向传播。定位分割子网络被用来为分类提出部分假设,剩下的定位分割和分类模块在反向传播阶段中被独立的训练。图18中的第二行的验证结果表明在校准过程中缺少信息传播是不可取的。
第三、在校准子网中使用阀门联动函数来为前向传播阶段的分类来输出姿势校准部分,但是阀门联动函数在反向传播阶段被禁用,以防止分类和校 准误差从反向传播到定位和分割。在这种方法中,在鸟的头部只达到了78.2的精确度。因此,在前向传播和反向传播阶段的校准子网络是很有必要的。
第四、在前向和反向的过程中使阀门联动函数生效。但是,分割分支被移除后,框架就降级成了我们早先的定位校准分类模型,由定位、校准、分类组成。没有分割分支,只有定位结果,是不能够像图14中的第四行表现的那样充分的。不出意料,这个模型结构导致了在头部和躯干部位分类的性能退化。
第五、使用了完整的深度系统架构,通过图18显示,在头部识别中产生了最好的分数79.5,证实了包含阀门联动函数的深度系统能很好的进行细粒度识别,实际上在分类,定位和分割上也起到了促进作用。
通过用躯干部分来替换整个图像,发现了一个关于分类精确度的巨大的表现差距(76.3VS52.2)。在图14中关于躯干定位的高正确定位部分百分比显示了差的定位造成了细微的表现差距。通过对比包括更多的有区别的头部的图像,总结出鸟的躯干在鸟类的种类鉴别中重要,在校准子网络通过对鸟的躯干的分类提升分类精确度。在添加了校准和阀门联动函数时,分类结果获得了11.1的提升。说明提取了更好的躯干部位的特征,躯干部分的值得信赖的特征很重要,它能将头部和整张图像结合在一起,这有利于最终的分类效果。
3)全局比较
在图19中展示了在CUB-200-2011数据集上本申请最后的分类精确度与其他的前沿的方法的比较结果,所有的比较方法使用的卷积神经网络模型总结在表8的第一列,在训练和测试阶段就给定了整个鸟的标准边界框,所有实现都是基于这样一个设置的。在系统中,将每一个图像投放到训练网络中来提取头部和躯干部位的特征。
表8显示使用头部和躯干部位的特征得到了79.5和63.3的精度,连接了两个特征向量来组成一个联合表现,产生了83.7的精度。最后,基于一个使用预训练模型的全图,微调了深度卷积神经网络模型。第六层为一个SVM分类器而提取特征,获得了76.3的精度。在连接了头部,躯干,和全图的特征之后, 精度提升到了88.5。作为比较,[35],[62]的方法也考虑了头部和躯干,同时组合了全图的卷积神经网络特征。本申请方法精度提升主要是由于使用了阀门联动函数的深度系统框架中进行了值得信赖的定位、分割和校准操作。4)应用Caltech-UCSDBird-200-2010数据集
Caltech-UCSDBird-200-2010数据集提供了200个鸟的种类的共6033张图片。数据集没有提供部分标注,只包含了很少的训练集和测试集。因此,它能验证在Caltech-UCSDBird-200-2010数据集上训练的深度系统框架在这个数据集上的表现。
在图20中展示了不同方法在CUB-200-2010数据集中鸟头和躯干语义部分的分类精度。通过使用Caltech-UCSDBird-200-2010数据集的训练集而获得定位分割子网和校准子网。当得到了姿势校准的部分图像之后,在这个数据集上更新分类子网络。
本申请方法对应的全图分类精确度是63.7,通过定位-分割子网,鸟头部的分类精确度为67.3。在这个方法中,获得了3.6的表现提升。在结合了校准操作之后,这个提升幅度变成了6.5,最好的躯干识别精度49.1是通过添加定位、分割、校准操作而实现的。
在最后的实验中,就分类精确度与其他的方法做了比较,结果显示在图21中。在[62]的方法中,结果为66.1,在[35]的定位-校准-分类模型中,结果为66.5。本申请方法的鸟头部表现出了70.2的精确度,头和躯干的结合表现出了74.9的精确度。
与早前的实验相似,将全图考虑其中,在联合了全部的特征后,本申请的分类结果精准度提高为77.5。本申请分类方法的表现超过了以前的最佳结果,达到了显著的水平。如果使用部分标注来调整局部分割和对齐子网络,则可以在这个数据集中获得更好的性能。
5)应用4StandfordCars-96数据集
除鸟的种类分类之外,本申请的深度系统图像分类模型可以应用于其他对象类型的细粒度识别。在这个部分使用StandfordCars-96数据集来作为评估基 准。这个汽车的数据集包含从196个种类的16185个图像,也是为细粒度识别任务而准备的,共有8144个训练图像和8041个测试图像。不同于Caltech-UCSD Bird-200-2011数据集,StandfordCars-96数据集没有提供对象任务。为了有利于此数据集上的深度系统图像分类模型,我们额外提供了16185张图像的所有车的二元掩膜。图22展现了掩膜标注的示例,在StandfordCars-96数据集上标注掩膜的例子。
在图23中,比较了本申请的深度系统和其他的方法在StandfordCars-96数据集上的分类精确度。当对车的种类应用本申请的深度系统图像分类模型时,在没有任何细分部分的情况下对于车执行了定位、分割和校准的操作。相似的,比较的方法也将整车作为输入。通过使用VGG架构,早先的最好的结果是[29]中提出方法对应的92.6。通过使用相同的VGG结构来构造我们的深度系统图像分类模型,获得了比别的方法更好的表现。在StandfordCars-96数据集上的96.3的结果阐释了本申请的深度系统提供给车更精确的分类。
应该理解的是,虽然上述流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,还提供了一种计算机设备,该计算机设备的内部结构可如图26所示,该计算机设备包括图像分类装置,图像分类装置中包括各个模块,每个模块可全部或部分通过软件、硬件或其组合来实现。
在一个实施例中,如图24所示,提供了一种图像分类装置,包括:
输入模块510,用于获取待分类图像,将待分类图像输入已训练的图像 分类模型,已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数。
分割模块520,用于待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像。
校准模块530,用于已分割图像经过校准子网络,校准子网络对目标对象进行校准得到已校准图像。
类别确定模块540,用于已校准图像经过分类子网络进行细粒度分类,得到待分类图像对应的类别。
在一个实施例中,定位分割子网络包括定位子网络和分割子网络,所述定位子网络与分割子网络共享卷积神经网络的参数。
在一个实施例中,如图25所示,装置还包括:
训练模块550,用于获取训练图像集合,训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签;从训练图像集合获取各个类别对应的模板;将训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;根据模板对已分割训练图像进行校准得到已校准训练图像;将已校准训练图像输入分类子网络得到对应的当前输出类别;获取图像分类模型对应的总目标函数,总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中定位分割子网络目标函数是关于阀门联动函数的函数,根据当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;根据阀门联动函数调整定位分割子网络参数和分类子网络参数,直到总目标函数的取值满足收敛条件;得到已训练的图像分类模型。
在一个实施例中,训练模块550还用于计算所述训练图像集合中任意两 个训练图像之间的相似性,组成相似性矩阵;将相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群;获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到各个类别对应的模板,模板用于对图像进行校准。
在一个实施例中,训练模块550还用于获取校准目标函数,所述校准目标函数包括相似度函数、距离函数和前景置信度函数;调整模板中心点、旋转角度、缩放比和当前模板,直到所述校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;根据所述目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
在一个实施例中,总目标函数通过以下公式定义:
J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
在一个实施例中,阀门联动函数通过以下公式定义:
Figure PCTCN2018090370-appb-000037
其中:V表示阀门联动函数,
L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络 在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
上述图像分类装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图26所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现上述实施例所述的图像分类方法。
本领域技术人员可以理解,图26中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机可读指令,处理器执行计算机可读指令时实现以下步骤:获取待分类图像,将待分类图像输入已训练的图像分类模型,已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;已分割图像经过校准子网络,校准子网络对目标对象进行校准得到已校准图像;已校准图像经过分类子网络进行细粒度分类,得到待分类图像对应的类别。
在一个实施例中,定位分割子网络包括定位子网络和分割子网络,定位子网络与分割子网络共享卷积神经网络的参数。
在一个实施例中,图像分类模型的训练包括:获取训练图像集合,训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签;从训练图像集合获取各个类别对应的模板;将训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;根据模板对已分割训练图像进行校准得到已校准训练图像;将已校准训练图像输入分类子网络得到对应的当前输出类别;获取图像分类模型对应的总目标函数,总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中定位分割子网络目标函数是关于阀门联动函数的函数,根据当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;根据阀门联动函数调整定位分割子网络参数和分类子网络参数,直到总目标函数的取值满足收敛条件;得到已训练的图像分类模型。
在一个实施例中,从训练图像集合获取各个类别对应的模板,包括:计 算训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵;将相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群,获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到各个类别对应的模板,模板用于对图像进行校准。
在一个实施例中,根据模板对已分割训练图像进行校准得到已校准训练图像,包括:获取校准目标函数,校准目标函数包括相似度函数、距离函数和前景置信度函数;调整模板中心点、旋转角度、缩放比和当前模板,直到校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;根据目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
在一个实施例中,总目标函数通过以下公式定义:
J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
在一个实施例中,阀门联动函数通过以下公式定义:
Figure PCTCN2018090370-appb-000038
其中:V表示阀门联动函数,
L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图 像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤::获取待分类图像,将待分类图像输入已训练的图像分类模型,已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;已分割图像经过校准子网络,校准子网络对目标对象进行校准得到已校准图像;已校准图像经过分类子网络进行细粒度分类,得到待分类图像对应的类别。
在一个实施例中,定位分割子网络包括定位子网络和分割子网络,定位子网络与分割子网络共享卷积神经网络的参数。
在一个实施例中,图像分类模型的训练包括:获取训练图像集合,训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类 别标签;从训练图像集合获取各个类别对应的模板;将训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;根据模板对已分割训练图像进行校准得到已校准训练图像;将已校准训练图像输入分类子网络得到对应的当前输出类别;获取图像分类模型对应的总目标函数,总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中定位分割子网络目标函数是关于阀门联动函数的函数,根据当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;根据阀门联动函数调整定位分割子网络参数和分类子网络参数,直到总目标函数的取值满足收敛条件;得到已训练的图像分类模型。
在一个实施例中,从训练图像集合获取各个类别对应的模板,包括:计算训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵;将相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群,获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到各个类别对应的模板,模板用于对图像进行校准。
在一个实施例中,根据模板对已分割训练图像进行校准得到已校准训练图像,包括:获取校准目标函数,校准目标函数包括相似度函数、距离函数和前景置信度函数;调整模板中心点、旋转角度、缩放比和当前模板,直到校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;根据目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
在一个实施例中,总目标函数通过以下公式定义:
J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确 定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
在一个实施例中,阀门联动函数通过以下公式定义:
Figure PCTCN2018090370-appb-000039
其中:V表示阀门联动函数,
L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM (PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (21)

  1. 一种图像分类方法,包括:
    计算机设备获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;
    所述计算机设备将所述待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;
    所述计算机设备将所述已分割图像经过所述校准子网络,所述校准子网络对目标对象进行校准得到已校准图像;及
    所述计算机设备将所述已校准图像经过所述分类子网络进行细粒度分类,得到所述待分类图像对应的类别。
  2. 根据权利要求1所述的方法,其特征在于,所述定位分割子网络包括定位子网络和分割子网络,所述定位子网络与分割子网络共享卷积神经网络的参数。
  3. 根据权利要求1所述的方法,其特征在于,所述图像分类模型的训练步骤包括:
    所述计算机设备获取训练图像集合,所述训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签;
    所述计算机设备从所述训练图像集合获取各个类别对应的模板;
    所述计算机设备将所述训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;
    所述计算机设备根据所述模板对所述已分割训练图像进行校准得到已校准训练图像;
    所述计算机设备将所述已校准训练图像输入分类子网络得到对应的当前输出类别;
    所述计算机设备获取图像分类模型对应的总目标函数,所述总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中所述定位分割子网络目标函数是关于所述阀门联动函数的函数,根据所述当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;
    所述计算机设备根据所述阀门联动函数调整定位分割子网络参数和分类子网络参数,直到所述总目标函数的取值满足收敛条件;
    所述计算机设备得到所述已训练的图像分类模型。
  4. 根据权利要求3所述的方法,其特征在于,所述从所述训练图像集合获取各个类别对应的模板,包括:
    所述计算机设备计算所述训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵;
    所述计算机设备将所述相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群;
    所述计算机设备获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到所述各个类别对应的模板,所述模板用于对图像进行校准。
  5. 根据权利要求3所述的方法,其特征在于,所述根据所述模板对所述已分割训练图像进行校准得到已校准训练图像,包括:
    所述计算机设备获取校准目标函数,所述校准目标函数包括相似度函数、距离函数和前景置信度函数;
    所述计算机设备调整模板中心点、旋转角度、缩放比和当前模板,直到所述校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;
    所述计算机设备根据所述目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
  6. 根据权利要求3所述的方法,其特征在于,所述总目标函数通过以下公式定义:
    J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
    其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
  7. 根据权利要求1所述的方法,其特征在于,所述阀门联动函数通过以下公式定义:
    Figure PCTCN2018090370-appb-100001
    其中:V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
    E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
  8. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:
    获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;
    所述待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;
    所述已分割图像经过所述校准子网络,所述校准子网络对目标对象进行校准得到已校准图像;及
    所述已校准图像经过所述分类子网络进行细粒度分类,得到所述待分类图像对应的类别。
  9. 根据权利要求8所述的计算机设备,其特征在于,所述定位分割子网络包括定位子网络和分割子网络,所述定位子网络与分割子网络共享卷积神经网络的参数。
  10. 根据权利要求8所述的计算机设备,其特征在于,所述图像分类模型的训练,包括:
    获取训练图像集合,所述训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签;
    从所述训练图像集合获取各个类别对应的模板;
    将所述训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;
    根据所述模板对所述已分割训练图像进行校准得到已校准训练图像;
    将所述已校准训练图像输入分类子网络得到对应的当前输出类别;
    获取图像分类模型对应的总目标函数,所述总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中所述定位分割子网络目标函数是关于所述阀门联动函数的函数,根据所述当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;
    根据所述阀门联动函数调整定位分割子网络参数和分类子网络参数,直到所述总目标函数的取值满足收敛条件;
    得到所述已训练的图像分类模型。
  11. 根据权利要求10所述的计算机设备,其特征在于,所述从所述训练图像集合获取各个类别对应的模板,包括:
    计算所述训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵;
    将所述相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群;
    获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到所述各个类别对应的模板,所述模板用于对图像进行校准。
  12. 根据权利要求10所述的计算机设备,其特征在于,所述根据所述模板对所述已分割训练图像进行校准得到已校准训练图像,包括:
    获取校准目标函数,所述校准目标函数包括相似度函数、距离函数和前景置信度函数;
    调整模板中心点、旋转角度、缩放比和当前模板,直到所述校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;
    根据所述目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
  13. 根据权利要求10所述的计算机设备,其特征在于,所述总目标函数 通过以下公式定义:
    J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
    其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
  14. 根据权利要求8所述的计算机设备,其特征在于,所述阀门联动函数通过以下公式定义:
    Figure PCTCN2018090370-appb-100002
    其中:V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
    E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
  15. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    获取待分类图像,将所述待分类图像输入已训练的图像分类模型,所述已训练的图像分类模型包括定位分割子网络、校准子网络和分类子网络,所述校准子网络被公式化为阀门联动函数,图像分类模型是通过阀门联动函数调整定位分割子网络和分类子网络的参数训练得到的,在训练的正向传播阶段,阀门联动函数的输出为已校准图像,在训练的反向传播阶段,阀门联动函数的输出为关于定位分割子网络输出的定位区域和分割区域的函数;
    所述待分类图像经过定位分割子网络进行目标对象定位和分割得到包含定位区域和分割区域的已分割图像;
    所述已分割图像经过所述校准子网络,所述校准子网络对目标对象进行校准得到已校准图像;及
    所述已校准图像经过所述分类子网络进行细粒度分类,得到所述待分类图像对应的类别。
  16. 根据权利要求15所述的存储介质,其特征在于,所述定位分割子网络包括定位子网络和分割子网络,所述定位子网络与分割子网络共享卷积神经网络的参数。
  17. 根据权利要求15所述的存储介质,其特征在于,所述图像分类模型的训练,包括:
    获取训练图像集合,所述训练图像集合中的各个训练图像包括标准定位标注框,标准分割标注框和标准类别标签;
    从所述训练图像集合获取各个类别对应的模板;
    将所述训练图像集合中的各个训练图像输入定位分割子网络,得到包含当前定位区域和当前分割区域的已分割训练图像;
    根据所述模板对所述已分割训练图像进行校准得到已校准训练图像;
    将所述已校准训练图像输入分类子网络得到对应的当前输出类别;
    获取图像分类模型对应的总目标函数,所述总目标函数包括定位分割子网络目标函数和分类子网络目标函数,其中所述定位分割子网络目标函数是关于所述阀门联动函数的函数,根据所述当前输出类别、标准定位标注框,标准分割标注框和标准类别标签计算得到总目标函数的取值;
    根据所述阀门联动函数调整定位分割子网络参数和分类子网络参数,直到所述总目标函数的取值满足收敛条件;
    得到所述已训练的图像分类模型。
  18. 根据权利要求17所述的存储介质,其特征在于,所述从所述训练图像集合获取各个类别对应的模板,包括:
    计算所述训练图像集合中任意两个训练图像之间的相似性,组成相似性矩阵;
    将所述相似性矩阵经过谱聚类算法,将各个训练图像分成对应的多个集群;
    获取各个集群中心,根据各个集群中各个训练图像与对应的集群中心的相似度,确定各个集群对应的目标训练图像得到所述各个类别对应的模板,所述模板用于对图像进行校准。
  19. 根据权利要求17所述的存储介质,其特征在于,所述根据所述模板对所述已分割训练图像进行校准得到已校准训练图像,包括:
    获取校准目标函数,所述校准目标函数包括相似度函数、距离函数和前景置信度函数;
    调整模板中心点、旋转角度、缩放比和当前模板,直到所述校准目标函数满足收敛条件,得到对应的目标模板中心点、目标旋转角度、目标缩放比和目标模板;
    根据所述目标模板中心点、目标旋转角度、目标缩放比和目标模板对所述已分割训练图像进行校准,得到已校准训练图像。
  20. 根据权利要求17所述的存储介质,其特征在于,所述总目标函数通 过以下公式定义:
    J(W c,W ls;I,L gt,y gt,o gt)=E c(W c;V(L,O;I,L f,O f),y gt)+E ls(W ls;I,L gt,o gt)
    其中J为总目标函数,E c表示定位分割子网络目标函数,E ls表示分类子网络目标函数,W c表示定位分割子网络需要确定的参数,W ls表示分类子网络需要确定的参数,V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,I是输入的原始图像,y gt是标准类别标签,L gt是标准定位标注框,o gt是标准分割标注框。
  21. 根据权利要求15所述的存储介质,其特征在于,所述阀门联动函数通过以下公式定义:
    Figure PCTCN2018090370-appb-100003
    其中:V表示阀门联动函数,L是定位分割子网络输出的定位区域,O是定位分割子网络输出的分割区域,在前向过程中,L=L f,O=O f,在反向过程中L和O是变量,I是输入的原始图像,L f是定位分割子网络在前向过程输出的定位区域,O f是定位分割子网络在前向过程输出的分割区域,c *是校准时采用的模板中心点,θ *是校准时采用的旋转角度,α *是校准时采用的目标缩放比,I表示对所述原始图像校准后的图像,E a为校准能量函数,所述校准能量函数通过以下公式定义:
    E a(c,θ,α,t;I,L,O)=S(I(c,θ,α),t)+λ dD(c,L)+λ sF(O,t m),其中c表示模板中心点,θ表示旋转角度,α表示目标缩放比,t表示模板,S为相似度函数,其中λ d和λ s是自定义的常量,D为距离函数,F为前景置信度函数,t m为模板的二元掩膜。
PCT/CN2018/090370 2018-05-15 2018-06-08 图像分类方法、计算机设备和存储介质 WO2019218410A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/606,166 US11238311B2 (en) 2018-05-15 2018-06-08 Method for image classification, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810462613.5A CN108764306B (zh) 2018-05-15 2018-05-15 图像分类方法、装置、计算机设备和存储介质
CN201810462613.5 2018-05-15

Publications (1)

Publication Number Publication Date
WO2019218410A1 true WO2019218410A1 (zh) 2019-11-21

Family

ID=64007736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090370 WO2019218410A1 (zh) 2018-05-15 2018-06-08 图像分类方法、计算机设备和存储介质

Country Status (3)

Country Link
US (1) US11238311B2 (zh)
CN (1) CN108764306B (zh)
WO (1) WO2019218410A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535951A (zh) * 2021-06-21 2021-10-22 深圳大学 用于进行信息分类的方法、装置、终端设备及存储介质
CN117932490A (zh) * 2024-01-17 2024-04-26 江苏软擎信息科技有限公司 配方分类方法、系统、电子设备及存储介质
CN117975169A (zh) * 2024-03-27 2024-05-03 先临三维科技股份有限公司 对象分类方法、计算机程序产品、设备及存储介质

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784140A (zh) * 2018-11-19 2019-05-21 深圳市华尊科技股份有限公司 驾驶员属性识别方法及相关产品
CN109583489B (zh) * 2018-11-22 2021-01-15 中国科学院自动化研究所 缺陷分类识别方法、装置、计算机设备和存储介质
CN109670532B (zh) * 2018-11-23 2022-12-09 腾讯医疗健康(深圳)有限公司 生物体器官组织图像的异常识别方法、装置及系统
CN111582294B (zh) * 2019-03-05 2024-02-27 慧泉智能科技(苏州)有限公司 一种构建用于表面缺陷检测的卷积神经网络模型的方法及其利用
KR20210020702A (ko) * 2019-08-16 2021-02-24 엘지전자 주식회사 인공지능 서버
CN112651410A (zh) * 2019-09-25 2021-04-13 图灵深视(南京)科技有限公司 用于鉴别的模型的训练、鉴别方法、系统、设备及介质
US11636639B2 (en) * 2019-10-11 2023-04-25 Robert G. Adamson, III Mobile application for object recognition, style transfer and image synthesis, and related systems, methods, and apparatuses
CN111008655A (zh) * 2019-11-28 2020-04-14 上海识装信息科技有限公司 辅助鉴定实物商品品牌真伪的方法、装置和电子设备
CN111160441B (zh) * 2019-12-24 2024-03-26 上海联影智能医疗科技有限公司 分类方法、计算机设备和存储介质
CN111178364A (zh) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 一种图像识别方法和装置
CN111310794B (zh) * 2020-01-19 2021-04-20 北京字节跳动网络技术有限公司 目标对象的分类方法、装置和电子设备
CN111291827B (zh) * 2020-02-28 2023-12-22 北京市商汤科技开发有限公司 图像聚类方法、装置、设备及存储介质
CN113628154A (zh) * 2020-04-23 2021-11-09 上海联影智能医疗科技有限公司 图像分析方法、装置、计算机设备和存储介质
CN111738310B (zh) * 2020-06-04 2023-12-01 科大讯飞股份有限公司 物料分类方法、装置、电子设备和存储介质
CN111666905B (zh) * 2020-06-10 2022-12-02 重庆紫光华山智安科技有限公司 模型训练方法、行人属性识别方法和相关装置
CN113761249A (zh) * 2020-08-03 2021-12-07 北京沃东天骏信息技术有限公司 一种确定图片类型的方法和装置
CN112150605B (zh) * 2020-08-17 2024-02-02 北京化工大学 用于mri局部sar估计的膝关节模型构建方法
CN112036404B (zh) * 2020-08-31 2024-01-02 上海大学 一种海上船只目标检测方法及系统
CN112149729A (zh) * 2020-09-22 2020-12-29 福州大学 一种基于通道剪裁和定位分类子网络的细粒度图像分类方法及系统
US11461989B2 (en) * 2020-12-04 2022-10-04 Himax Technologies Limited Monitor method and monitor system thereof wherein mask is used to cover image for detecting object
CN113506332B (zh) * 2021-09-09 2021-12-17 北京的卢深视科技有限公司 目标对象识别的方法、电子设备及存储介质
CN113837062A (zh) * 2021-09-22 2021-12-24 内蒙古工业大学 一种分类方法、装置、存储介质及电子设备
CN115311647B (zh) * 2022-10-09 2023-01-24 松立控股集团股份有限公司 一种融合车标分类特征的车标检测识别方法
CN116403071B (zh) * 2023-03-23 2024-03-26 河海大学 基于特征重构的少样本混凝土缺陷检测方法及装置
CN116503429B (zh) * 2023-06-28 2023-09-08 深圳市华海天贸科技有限公司 用于生物材料3d打印的模型图像分割方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016409A (zh) * 2017-03-20 2017-08-04 华中科技大学 一种基于图像显著区域的图像分类方法和系统
US20170364536A1 (en) * 2014-08-12 2017-12-21 Paypal, Inc. Image processing and matching
CN107592839A (zh) * 2015-01-19 2018-01-16 电子湾有限公司 细粒度分类
CN108009518A (zh) * 2017-12-19 2018-05-08 大连理工大学 一种基于快速二分卷积神经网络的层次化交通标识识别方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165354B1 (en) * 2008-03-18 2012-04-24 Google Inc. Face recognition with discriminative face alignment
CN103218827B (zh) * 2013-03-21 2016-03-02 上海交通大学 基于形状传递联合分割和图匹配校正的轮廓跟踪方法
CN107622272A (zh) * 2016-07-13 2018-01-23 华为技术有限公司 一种图像分类方法及装置
CN106529565B (zh) * 2016-09-23 2019-09-13 北京市商汤科技开发有限公司 目标识别模型训练和目标识别方法及装置、计算设备
CN106530305B (zh) * 2016-09-23 2019-09-13 北京市商汤科技开发有限公司 语义分割模型训练和图像分割方法及装置、计算设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364536A1 (en) * 2014-08-12 2017-12-21 Paypal, Inc. Image processing and matching
CN107592839A (zh) * 2015-01-19 2018-01-16 电子湾有限公司 细粒度分类
CN107016409A (zh) * 2017-03-20 2017-08-04 华中科技大学 一种基于图像显著区域的图像分类方法和系统
CN108009518A (zh) * 2017-12-19 2018-05-08 大连理工大学 一种基于快速二分卷积神经网络的层次化交通标识识别方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535951A (zh) * 2021-06-21 2021-10-22 深圳大学 用于进行信息分类的方法、装置、终端设备及存储介质
CN117932490A (zh) * 2024-01-17 2024-04-26 江苏软擎信息科技有限公司 配方分类方法、系统、电子设备及存储介质
CN117975169A (zh) * 2024-03-27 2024-05-03 先临三维科技股份有限公司 对象分类方法、计算机程序产品、设备及存储介质

Also Published As

Publication number Publication date
CN108764306B (zh) 2022-04-22
CN108764306A (zh) 2018-11-06
US20210365732A1 (en) 2021-11-25
US11238311B2 (en) 2022-02-01

Similar Documents

Publication Publication Date Title
WO2019218410A1 (zh) 图像分类方法、计算机设备和存储介质
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
US11556797B2 (en) Systems and methods for polygon object annotation and a method of training an object annotation system
CN111860670B (zh) 域自适应模型训练、图像检测方法、装置、设备及介质
CN110599451B (zh) 医学图像病灶检测定位方法、装置、设备及存储介质
CN111723860B (zh) 一种目标检测方法及装置
CN109241903B (zh) 样本数据清洗方法、装置、计算机设备及存储介质
CN109492643B (zh) 基于ocr的证件识别方法、装置、计算机设备及存储介质
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
US9928405B2 (en) System and method for detecting and tracking facial features in images
WO2020119458A1 (zh) 脸部关键点检测方法、装置、计算机设备和存储介质
CN109493417B (zh) 三维物体重建方法、装置、设备和存储介质
TWI701608B (zh) 用於圖片匹配定位的神經網路系統、方法及裝置
Rohlfing et al. Expectation maximization strategies for multi-atlas multi-label segmentation
CN112241952B (zh) 大脑中线识别方法、装置、计算机设备及存储介质
CN111931581A (zh) 一种基于卷积神经网络农业害虫识别方法、终端及可读存储介质
WO2022134354A1 (zh) 车损检测模型训练、车损检测方法、装置、设备及介质
CN111860582B (zh) 图像分类模型构建方法、装置、计算机设备和存储介质
WO2023035538A1 (zh) 车辆损伤的检测方法、装置、设备及存储介质
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
CN110929730A (zh) 图像处理方法、装置、计算机设备和存储介质
CN111046755A (zh) 字符识别方法、装置、计算机设备和计算机可读存储介质
CN111898408B (zh) 一种快速人脸识别方法及装置
CN115862119B (zh) 基于注意力机制的人脸年龄估计方法及装置
US11756319B2 (en) Shift invariant loss for deep learning based image segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18918739

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 09.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18918739

Country of ref document: EP

Kind code of ref document: A1