WO2018120740A1 - Picture classification method, device and robot - Google Patents

Picture classification method, device and robot Download PDF

Info

Publication number
WO2018120740A1
WO2018120740A1 PCT/CN2017/092044 CN2017092044W WO2018120740A1 WO 2018120740 A1 WO2018120740 A1 WO 2018120740A1 CN 2017092044 W CN2017092044 W CN 2017092044W WO 2018120740 A1 WO2018120740 A1 WO 2018120740A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
convolution
convolution kernel
error value
neural network
Prior art date
Application number
PCT/CN2017/092044
Other languages
French (fr)
Chinese (zh)
Inventor
刘若鹏
徐磊
欧阳一村
Original Assignee
深圳光启合众科技有限公司
深圳光启创新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳光启合众科技有限公司, 深圳光启创新技术有限公司 filed Critical 深圳光启合众科技有限公司
Publication of WO2018120740A1 publication Critical patent/WO2018120740A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to the field of image processing, and in particular to a picture classification method and apparatus, and a robot.
  • the fully connected layer is a very important component.
  • the deep belief network all layers of the network such as the automatic encoder are all connected layers; the convolutional neural network will add several layers of fully connected layers to obtain better Classification accuracy.
  • the main problems of the full connection layer are: too many parameters lead to high performance requirements of the network to the terminal.
  • the embodiments of the present invention provide a picture classification method and apparatus, and a robot, to solve at least the technical problem that the parameters of the full connection layer in the prior art are too large, resulting in high performance requirements of the network to the terminal.
  • a picture classification method including: inputting a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution a layer and a pooling layer; performing a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, wherein the first vector is a one-dimensional vector, and the first matrix is the volume An output of a last pooling layer of the neural network; performing a convolution operation on the first vector according to a second convolution kernel set in advance to obtain a second vector, wherein the second vector is a one-dimensional vector; The second vector classifies the target picture.
  • performing a convolution operation on the first matrix according to the first convolution kernel set in advance, and obtaining the first vector includes: performing convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix; All elements of the second matrix are rearranged in a predetermined order to obtain the first vector.
  • the method before performing a convolution operation on the first matrix according to the preset first convolution kernel, the method further includes: acquiring a training sample, wherein the training sample includes a plurality of pictures pre-divided into categories, The class is used to characterize the kind of the thing indicated by the training sample; the convolutional neural network is trained according to the training sample to obtain the first convolution kernel.
  • training the convolutional neural network according to the training sample, and obtaining the first convolution kernel comprises: inputting the training sample into a convolution layer of the convolutional neural network; according to an initial state The convolution kernel performs a convolution operation on the first matrix to obtain a first vector; performing a convolution operation on the first vector according to the second convolution kernel to obtain a second vector; according to the target in the second vector
  • the value of the element determines a classification result of the training sample, wherein a value of the target element in the second vector indicates a probability size of a category of the second vector that is the same as a category corresponding to the target element, wherein the target An element is any one of the second vectors; comparing the classification result with a category of each of the pictures to obtain a classification error value; determining whether the classification error value is greater than a preset error value; The classification error value is greater than the preset error value, and the weight value of the convolution kernel of the initial state is adjusted until the classification error value is less than or equal to the
  • the first convolution kernel satisfies the following formula: Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the first convolution kernel size.
  • a picture classification apparatus comprising: an input unit, configured to input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolution The neural network includes at least one convolution layer and one pooling layer; the first operation unit is configured to perform a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector a first-dimensional vector, the first matrix is an output of a last pooling layer of the convolutional neural network, and a second computing unit is configured to convolve the first vector according to a second convolution kernel set in advance The second vector is obtained by the operation, wherein the second vector is a one-dimensional vector; and the classification unit is configured to classify the target image according to the second vector.
  • the first operation unit includes: a first operation subunit, configured to perform a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix; and arrange a subunit for All elements of the second matrix are rearranged in a predetermined order to obtain the first vector.
  • the device further includes: an acquiring unit, configured to acquire a training sample before the first operation unit performs a convolution operation on the first matrix according to the preset first convolution kernel, wherein the training sample includes Pre-dividing a plurality of pictures of a category, the category is used to represent a category of the thing indicated by the training sample; and a training unit is configured to train the convolutional neural network according to the training sample to obtain the A convolution kernel.
  • an acquiring unit configured to acquire a training sample before the first operation unit performs a convolution operation on the first matrix according to the preset first convolution kernel, wherein the training sample includes Pre-dividing a plurality of pictures of a category, the category is used to represent a category of the thing indicated by the training sample
  • a training unit is configured to train the convolutional neural network according to the training sample to obtain the A convolution kernel.
  • the training unit includes: an input subunit for inputting the training sample to the volume a second operation subunit, configured to perform a convolution operation on the first matrix according to a convolution kernel of an initial state to obtain a first vector; and a third operation subunit, configured to The second convolution kernel performs a convolution operation on the first vector to obtain a second vector.
  • the first determining subunit is configured to determine a classification result of the training sample according to a value of the target element in the second vector, where The value of the target element in the second vector indicates a probability size of the category of the second vector that is the same as the category corresponding to the target element, wherein the target element is any one of the second vectors; a subunit, configured to compare the classification result with each category of the picture to obtain a classification error value; and determine a subunit for determining whether the classification error value is greater than a preset error value; adjusting the subunit, And if the classification error value is greater than the preset error value, adjusting a weight value of the convolution kernel of the initial state until the classification error value is less than or equal to the preset error value; Determining a subunit, if the classification error value is less than or equal to the preset error value, the training ends, and the current convolution kernel is used as the first convolution kernel.
  • the first convolution kernel satisfies the following formula: Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the first convolution kernel size.
  • a robot comprising: the above picture classification device.
  • the target picture is a picture to be classified
  • the target picture is input into the convolution layer of the convolutional neural network
  • the last pooled layer of the convolutional neural network outputs the first matrix, according to the first convolution check
  • a matrix is convoluted to obtain a first vector
  • a convolution operation is performed on the first vector according to the second convolution kernel to obtain a second vector.
  • the value of each element of the second vector can indicate the probability size of the target image as a certain category. Therefore, the target image can be classified by the second vector, the number of parameters of the full connection layer is reduced, the requirement for the performance of the terminal is reduced, and the network can be deployed on a mobile phone or other embedded systems, thereby achieving a reduction in the pair.
  • the technical effect of the requirements of the terminal performance of the network is solved, thereby solving the technical problem that the parameters of the full connection layer in the prior art are too large, resulting in high performance requirements of the network to the terminal.
  • FIG. 1 is a flowchart of a picture classification method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a picture classification device according to an embodiment of the present invention.
  • an embodiment of a picture classification method is provided, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions, and, although The logical order is shown in the flowcharts, but in some cases the steps shown or described may be performed in a different order than the ones described herein.
  • FIG. 1 is a flowchart of a picture classification method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
  • Step S102 input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution layer and one pooling layer.
  • Step S104 performing a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector is a one-dimensional vector, and the first matrix is the last pooled layer of the convolutional neural network. Output.
  • Step S106 performing a convolution operation on the first vector according to the second convolution kernel set in advance to obtain a second vector, wherein the second vector is a one-dimensional vector.
  • Step S108 classifying the target picture according to the second vector.
  • Image classification is to input a picture, output the category corresponding to the picture (dog, cat, boat, bird), or the most likely to output which category the picture belongs to.
  • each number in the array is 0-255, representing the pixel value at that point. Let it return the possible classification probability corresponding to this array (for example, dog 0.01, cat 0.04, boat 0.94, bird 0.02).
  • the first layer in a convolutional neural network is always a convolutional layer.
  • the input to the convolutional layer is an array full of pixel values, if it is an array of 28*28*3 (3 is the RGB value).
  • the convolutional layer is a beam of light that shines on a picture. This beam is called a filter (convolution core), and the place where the beam shines is called the receptive zone. Assume that the range illuminated by this beam is a 5*5 square area. Now let the beam sweep from left to right and from top to bottom across every area of the picture. When all the moves are complete, I get an array of 24*24*3. Call this array a feature image.
  • This filter is an array of numeric types (the numbers inside are some weight values).
  • the depth of the filter is the same as the depth of the input. So the dimension of the filter is 5*5*3. Using the 5*5*3 filter, you can get the output array 24*24*3. If you use more filters, you can get more feature images.
  • the weight value in the filter is multiplied by the corresponding pixel value in the real picture, and finally all the results are summed to obtain an added value. Then repeat this process, scan the entire input image (the next step moves the filter one unit to the right, then move one step to the right, and so on), and each step gets a value.
  • the pooling layer is usually used after the convolutional layer.
  • the role of the pooling layer is to simplify the information output in the convolutional layer, reduce the data dimension, reduce the computational overhead, and control the overfitting.
  • the output of the last pooling layer is connected to a plurality of fully connected layers, and the parameters of the plurality of fully connected layers are very large.
  • the fully connected layer maps one vector space to another vector space. If the rank of the mapping matrix W is greater than the input vector dimension, the information will not be lost.
  • the convolution operation is to complete the above mapping operation, and in the general manner used by the convolutional neural network, the rank of the mapping matrix W generated by the convolution operation is generally min ⁇ m,n ⁇ ,m. For the input vector dimension, n is the output vector dimension; therefore, from a vector space perspective, this means that the information loss of the convolution operation is only related to the output vector dimension.
  • the number of parameters required for the fully connected layer is m*n
  • the number of parameters required for the convolutional layer is n oc *n conv , where n oc is the number of output channels, and n conv is The convolution kernel size, in fact, these parameters also have to satisfy the following equation, stride is the distance each time the convolution moves.
  • the number of parameters required for convolution is m*n oc -n*stride, which is less than m*n of the number of fully connected parameters.
  • n oc is generally only a fraction of a to a few tenths of n.
  • Han et al. used parametric pruning to reduce the number of fully connected layer parameters.
  • the fully connected layer parameters can be compressed to 20%, which means that in the fully connected parameter matrix, there are only 20% non-zero elements; This experiment also demonstrates the rationality of using one-dimensional convolution.
  • the output of the last pooling layer is connected to the first convolution kernel, and the convolution operation is used to obtain the second vector.
  • the value of each element of the second vector can indicate the probability size of the target image as a certain category. , representing the possible classification probability of the picture.
  • the second vector is [0.01, 0.04, 0.94, 0.02], and a higher value indicates that these feature images are more similar to that one.
  • 0.94 represents a 94% probability that the image is a ship, indicating that the predicted picture and the filter produce a high level of excitation, acquiring a lot of high-level features, such as sails, oars and so on.
  • 0.02 means that the probability of the image being a bird is 2%, indicating that the predicted picture and the filter produce a very low stimulus, and many high-level features such as wings and scorpions are not acquired.
  • the target picture is a picture to be classified
  • the target picture is input into the convolution layer of the convolutional neural network
  • the last pooled layer of the convolutional neural network outputs the first matrix, according to the first convolution check
  • a matrix is convoluted to obtain a first vector
  • a convolution operation is performed on the first vector according to the second convolution kernel to obtain a second vector.
  • the value of each element of the second vector can indicate the probability size of the target image as a certain category.
  • the target image can be classified by the second vector, the number of parameters of the full connection layer is reduced, the requirement for the performance of the terminal is reduced, and the network can be deployed on a mobile phone or some other embedded system, thereby solving the existing In the technology, too many parameters of the full connection layer lead to technical problems that require high performance of the network to the terminal, and the technical effect of reducing the requirements on the performance of the terminal of the deployed network is achieved.
  • performing a convolution operation on the first matrix according to the preset first convolution kernel, and obtaining the first vector includes: performing a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix; and using the second matrix; All elements are rearranged in a preset order to get the first vector.
  • Compressed sensing is a very popular field because of its good data recovery capability. From the perspective of coding, it is a new lossy compression coding method.
  • the idea of compressed sensing is to use a common coding method and a special decoding method. Codec process. In theory, compress a vector x obeying the distribution p(x), compressed perceptual coding One possibility is to randomly sample this vector, use the L1 norm as the cost equation for decoding, and restore the original vector. It can be proved that if the distribution meets certain conditions, the error between the recovered vector and the original vector can be small.
  • the method further includes: acquiring the training samples, wherein the training samples include a plurality of pictures pre-divided into categories, and the categories are used for characterizing the training. The type of things indicated by the sample; the convolutional neural network is trained according to the training samples to obtain the first convolution kernel.
  • the first convolution kernel is trained.
  • the specific training process may be as follows: input the training sample into the convolution layer of the convolutional neural network; convolute the first matrix according to the convolution kernel of the initial state to obtain the first vector; and perform the first vector according to the second convolution kernel Convolution operation, obtaining a second vector; determining a classification result of the training sample according to the value of the target element in the second vector, wherein the value of the target element in the second vector indicates a probability that the category of the second vector is the same as the category corresponding to the target element The size, wherein the target element is any one of the second vectors; comparing the classification result with the category of each picture to obtain a classification error value; determining whether the classification error value is greater than a preset error value; if the classification error value is greater than the pre- Set the error value, adjust the weight value of the convolution kernel in the initial state until the classification error value is less than or equal to the preset error value; if the classification error value is less than or equal to the preset error
  • the loss function helps the network update the weight values to find the desired feature image.
  • MSE mean squared erro
  • the loss value is obtained by substituting the true classification value of the picture and the classification value trained by the network into the mean square error error formula. This loss value may be high when the network is just starting to train because the weight values are randomly initialized. The ultimate goal is to get the predicted value and the real value. In order to achieve this goal, it is necessary to minimize the loss value. The smaller the loss value, the closer the prediction result is. In this process, it is necessary to constantly adjust the weight values to find out which weight values can reduce the loss of the network. You can use the gradient descent algorithm to find these weight values.
  • performing a convolution operation on the first vector according to the second convolution kernel, and obtaining the second vector includes: acquiring a preset second convolution kernel, where the second convolution kernel is a one-dimensional vector; The second convolution kernel performs a convolution operation on the first vector to obtain a second vector.
  • the output reshape of the previous layer is a one-dimensional vector.
  • the fully connected layer maps one vector space to another vector space. If the rank of the mapping matrix W is greater than the input vector dimension, the information will not be lost.
  • the convolution operation is to complete the above mapping operation, and in the general manner used by the convolutional neural network, the rank of the mapping matrix W generated by the convolution operation is generally min ⁇ m,n ⁇ ,m. For the input vector dimension, n is the output vector dimension; therefore, from a vector space perspective, this means that the information loss of the convolution operation is only related to the output vector dimension.
  • the number of parameters required for the fully connected layer is m*n
  • the number of parameters required for the convolutional layer is n oc *n conv , where n oc is the number of output channels, and n conv is The convolution kernel size, in fact, these parameters also have to satisfy the following equation, stride is the distance each time the convolution moves.
  • the number of parameters required for convolution is m*n oc -n*stride, which is less than m*n of the number of fully connected parameters.
  • n oc is generally only a fraction of a to a few tenths of n.
  • Han et al. used parametric pruning to reduce the number of fully connected layer parameters.
  • the fully connected layer parameters can be compressed to 20%, which means that in the fully connected parameter matrix, there are only 20% non-zero elements; This experiment also demonstrates the rationality of using one-dimensional convolution. Most of the parameters are 0, indicating the sparsity of the parameters, and the parameter sharing of the convolution greatly reduces the amount of parameters.
  • Compressed sensing is a very popular field because of its good data recovery capability. From the perspective of coding, it is a new lossy compression coding method. The idea of compressed sensing is to use a common coding method and a special decoding method. Codec process. In theory, to compress a vector x obeying the distribution p(x), one possible way of compressing perceptual coding is to randomly sample this vector, using the L1 norm as the cost equation for decoding, and restoring the original vector, which can be proved if the distribution Depending on certain conditions, the error between the recovered vector and the original vector can be small.
  • the commonly used classifier is softmax, and there is also a fully connected layer before the classifier; this fully connected layer is different from other fully connected layers in that its output dimension is determined, and the number of categories must be The same (the output dimensions of the other fully connected layers are generally given by experience freely, without the required requirements).
  • this process there will be a problem that the output dimension and the number of categories do not match after convolution, so here is one full Connection layer.
  • the mnist is a handwritten font dataset containing 0-9 handwritten digital data sets and labels.
  • the accuracy rate is the probability that the network correctly recognizes the handwritten digits.
  • a picture classification device may perform the picture classification method described above, and the picture classification method may be implemented by the picture classification device.
  • the apparatus includes an input unit 10, a first arithmetic unit 20, a second arithmetic unit 30, and a sorting unit 40.
  • the input unit 10 is configured to input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution layer and one pooling layer.
  • the first operation unit 20 is configured to perform a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector is a one-dimensional vector, and the first matrix is the last of the convolutional neural network.
  • the output of a pooled layer is configured to perform a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector is a one-dimensional vector, and the first matrix is the last of the convolutional neural network.
  • the second operation unit 30 is configured to perform a convolution operation on the first vector according to the second convolution kernel set in advance to obtain a second vector, where the second vector is a one-dimensional vector.
  • the classification unit 40 is configured to classify the target picture according to the second vector.
  • the first operation unit 20 includes: a first operation subunit, and an arrangement subunit.
  • the first operation subunit is configured to perform a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix.
  • the device further includes: an acquiring unit and a training unit.
  • An obtaining unit configured to acquire a training sample before the first operation unit 20 performs a convolution operation on the first matrix according to the first convolution kernel, wherein the training sample includes a plurality of pictures pre-divided into categories, and the category is used to represent the training sample The type of thing indicated.
  • a training unit is configured to train the convolutional neural network according to the training sample to obtain a first convolution kernel.
  • the training unit includes: an input subunit, a second operation subunit, a third operation subunit, a first determination subunit, a comparison subunit, a judgment subunit, an adjustment subunit, and a second determination subunit.
  • An input subunit for inputting training samples into the convolutional layer of the convolutional neural network.
  • a second operation subunit configured to perform a convolution operation on the first matrix according to the convolution kernel of the initial state to obtain a first vector.
  • a third operation subunit configured to perform a convolution operation on the first vector according to the second convolution kernel to obtain a second vector.
  • a first determining subunit configured to determine, according to a value of the target element in the second vector, a classification result of the training sample, where the value of the target element in the second vector indicates a probability that the category of the second vector is the same as the category corresponding to the target element , wherein the target element is any one of the second vectors.
  • the comparison subunit is used to compare the classification result with the category of each picture to obtain a classification error value.
  • the determining subunit is configured to determine whether the classification error value is greater than a preset error value.
  • the subunit is adjusted to adjust the weight value of the convolution kernel in the initial state until the classification error value is less than or equal to the preset error value if the classification error value is greater than the preset error value.
  • the second determining subunit is configured to end the training if the classification error value is less than or equal to the preset error value, and use the current convolution kernel as the first convolution kernel.
  • the first convolution kernel satisfies the following formula: Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the size of the first convolution kernel.
  • the disclosed technical contents may be implemented in other manners.
  • the device embodiments described above are only schematic.
  • the division of the unit may be a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to implement the solution of the embodiment. the goal of.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed by the present invention are a picture classification method, device and a robot. The method comprises: inputting a target picture to be classified into a convolution layer of a convolution neural network, the convolution neural network at least comprising a convolution layer and a pooling layer; performing a convolution operation on a first matrix according to a preset first convolution kernel so as to obtain a first vector, the first vector being a one-dimensional vector, and the first matrix being the output of the last pooling layer of the convolution neural network; performing a convolution operation on the first vector according to a preset second convolution kernel so as to obtain a second vector, the second vector being a one-dimensional vector; and classifying the target picture according to the second vector. The present invention solves the technical problem in existing technology wherein a network has high requirements for the performance of a terminal due to too many parameters of a full connection layer.

Description

图片分类方法和装置、机器人Picture classification method and device, robot 技术领域Technical field
本发明涉及图像处理领域,具体而言,涉及一种图片分类方法和装置、机器人。The present invention relates to the field of image processing, and in particular to a picture classification method and apparatus, and a robot.
背景技术Background technique
当前研究者投入了非常多的精力研究神经网络,包括深度信念网,深度玻尔兹曼机,自动编码机,去噪编码机,卷积神经网络等。在这些网络中,全连接层是非常主要的构件,例如,深度信念网,自动编码机这样的网络所有层都是全连接层;卷积神经网络会增加几层全连接层以获得较好的分类准确度。全连接层的主要问题有:参数太多导致网络对终端的性能要求较高。Current researchers have invested a lot of energy in researching neural networks, including deep belief networks, deep Boltzmann machines, automatic encoders, denoising encoders, and convolutional neural networks. In these networks, the fully connected layer is a very important component. For example, the deep belief network, all layers of the network such as the automatic encoder are all connected layers; the convolutional neural network will add several layers of fully connected layers to obtain better Classification accuracy. The main problems of the full connection layer are: too many parameters lead to high performance requirements of the network to the terminal.
针对上述的问题,目前尚未提出有效的解决方案。In response to the above problems, no effective solution has been proposed yet.
发明内容Summary of the invention
本发明实施例提供了一种图片分类方法和装置、机器人,以至少解决现有技术中全连接层的参数太多导致网络对终端的性能要求较高的技术问题。The embodiments of the present invention provide a picture classification method and apparatus, and a robot, to solve at least the technical problem that the parameters of the full connection layer in the prior art are too large, resulting in high performance requirements of the network to the terminal.
根据本发明实施例的一个方面,提供了一种图片分类方法,包括:将待分类的目标图片输入到卷积神经网络的卷积层中,其中,所述卷积神经网络至少包括一个卷积层和一个池化层;根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量,其中,所述第一向量为一维向量,所述第一矩阵是所述卷积神经网络的最后一个池化层的输出;根据预先设置的第二卷积核对所述第一向量做卷积运算,得到第二向量,其中,所述第二向量为一维向量;根据所述第二向量对所述目标图片进行分类。According to an aspect of an embodiment of the present invention, a picture classification method is provided, including: inputting a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution a layer and a pooling layer; performing a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, wherein the first vector is a one-dimensional vector, and the first matrix is the volume An output of a last pooling layer of the neural network; performing a convolution operation on the first vector according to a second convolution kernel set in advance to obtain a second vector, wherein the second vector is a one-dimensional vector; The second vector classifies the target picture.
进一步地,根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量包括:根据所述第一卷积核对所述第一矩阵做卷积运算,得到第二矩阵;将所述第二矩阵的所有元素按照预设顺序重新排列,得到所述第一向量。Further, performing a convolution operation on the first matrix according to the first convolution kernel set in advance, and obtaining the first vector includes: performing convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix; All elements of the second matrix are rearranged in a predetermined order to obtain the first vector.
进一步地,在根据预先设置的第一卷积核对第一矩阵做卷积运算之前,所述方法还包括:获取训练样本,其中,所述训练样本包括预先划分好类别的多张图片,所述类别用于表征所述训练样本所指示的事物的种类;根据所述训练样本对所述卷积神经网络进行训练,得到所述第一卷积核。 Further, before performing a convolution operation on the first matrix according to the preset first convolution kernel, the method further includes: acquiring a training sample, wherein the training sample includes a plurality of pictures pre-divided into categories, The class is used to characterize the kind of the thing indicated by the training sample; the convolutional neural network is trained according to the training sample to obtain the first convolution kernel.
进一步地,根据所述训练样本对所述卷积神经网络进行训练,得到所述第一卷积核包括:将所述训练样本输入到所述卷积神经网络的卷积层中;根据初始状态的卷积核对所述第一矩阵做卷积运算,得到第一向量;根据所述第二卷积核对所述第一向量做卷积运算,得到第二向量;根据所述第二向量中目标元素的值确定所述训练样本的分类结果,其中,所述第二向量中目标元素的值指示所述第二向量的类别与所述目标元素对应的类别相同的概率大小,其中,所述目标元素是所述第二向量中的任意一个元素;将所述分类结果与每张所述图片的类别进行比较,得到分类误差值;判断所述分类误差值是否大于预设误差值;如果所述分类误差值大于所述预设误差值,对所述初始状态的卷积核的权重值进行调整,直至分类误差值小于等于所述预设误差值;如果所述分类误差值小于等于所述预设误差值,则训练结束,并将当前的卷积核作为所述第一卷积核。Further, training the convolutional neural network according to the training sample, and obtaining the first convolution kernel comprises: inputting the training sample into a convolution layer of the convolutional neural network; according to an initial state The convolution kernel performs a convolution operation on the first matrix to obtain a first vector; performing a convolution operation on the first vector according to the second convolution kernel to obtain a second vector; according to the target in the second vector The value of the element determines a classification result of the training sample, wherein a value of the target element in the second vector indicates a probability size of a category of the second vector that is the same as a category corresponding to the target element, wherein the target An element is any one of the second vectors; comparing the classification result with a category of each of the pictures to obtain a classification error value; determining whether the classification error value is greater than a preset error value; The classification error value is greater than the preset error value, and the weight value of the convolution kernel of the initial state is adjusted until the classification error value is less than or equal to the preset error value; If the classification error value is less than or equal to the preset error value, the training ends and the current convolution kernel is used as the first convolution kernel.
进一步地,所述第一卷积核满足如下公式:
Figure PCTCN2017092044-appb-000001
其中,m为卷积神经网络中输入向量维度,n为输出向量维度,stride为所述第一卷积核的步长,noc为输出通道个数,nconv为所述第一卷积核大小。
Further, the first convolution kernel satisfies the following formula:
Figure PCTCN2017092044-appb-000001
Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the first convolution kernel size.
根据本发明实施例的另一方面,还提供了一种图片分类装置,包括:输入单元,用于将待分类的目标图片输入到卷积神经网络的卷积层中,其中,所述卷积神经网络至少包括一个卷积层和一个池化层;第一运算单元,用于根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量,其中,所述第一向量为一维向量,所述第一矩阵是所述卷积神经网络的最后一个池化层的输出;第二运算单元,用于根据预先设置的第二卷积核对所述第一向量做卷积运算,得到第二向量,其中,所述第二向量为一维向量;分类单元,用于根据所述第二向量对所述目标图片进行分类。According to another aspect of the embodiments of the present invention, there is further provided a picture classification apparatus, comprising: an input unit, configured to input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolution The neural network includes at least one convolution layer and one pooling layer; the first operation unit is configured to perform a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector a first-dimensional vector, the first matrix is an output of a last pooling layer of the convolutional neural network, and a second computing unit is configured to convolve the first vector according to a second convolution kernel set in advance The second vector is obtained by the operation, wherein the second vector is a one-dimensional vector; and the classification unit is configured to classify the target image according to the second vector.
进一步地,所述第一运算单元包括:第一运算子单元,用于根据所述第一卷积核对所述第一矩阵做卷积运算,得到第二矩阵;排列子单元,用于将所述第二矩阵的所有元素按照预设顺序重新排列,得到所述第一向量。Further, the first operation unit includes: a first operation subunit, configured to perform a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix; and arrange a subunit for All elements of the second matrix are rearranged in a predetermined order to obtain the first vector.
进一步地,所述装置还包括:获取单元,用于在所述第一运算单元根据预先设置的第一卷积核对第一矩阵做卷积运算之前,获取训练样本,其中,所述训练样本包括预先划分好类别的多张图片,所述类别用于表征所述训练样本所指示的事物的种类;训练单元,用于根据所述训练样本对所述卷积神经网络进行训练,得到所述第一卷积核。Further, the device further includes: an acquiring unit, configured to acquire a training sample before the first operation unit performs a convolution operation on the first matrix according to the preset first convolution kernel, wherein the training sample includes Pre-dividing a plurality of pictures of a category, the category is used to represent a category of the thing indicated by the training sample; and a training unit is configured to train the convolutional neural network according to the training sample to obtain the A convolution kernel.
进一步地,所述训练单元包括:输入子单元,用于将所述训练样本输入到所述卷 积神经网络的卷积层中;第二运算子单元,用于根据初始状态的卷积核对所述第一矩阵做卷积运算,得到第一向量;第三运算子单元,用于根据所述第二卷积核对所述第一向量做卷积运算,得到第二向量;第一确定子单元,用于根据所述第二向量中目标元素的值确定所述训练样本的分类结果,其中,所述第二向量中目标元素的值指示所述第二向量的类别与所述目标元素对应的类别相同的概率大小,其中,所述目标元素是所述第二向量中的任意一个元素;比较子单元,用于将所述分类结果与每张所述图片的类别进行比较,得到分类误差值;判断子单元,用于判断所述分类误差值是否大于预设误差值;调整子单元,用于如果所述分类误差值大于所述预设误差值,对所述初始状态的卷积核的权重值进行调整,直至分类误差值小于等于所述预设误差值;第二确定子单元,用于如果所述分类误差值小于等于所述预设误差值,则训练结束,并将当前的卷积核作为所述第一卷积核。Further, the training unit includes: an input subunit for inputting the training sample to the volume a second operation subunit, configured to perform a convolution operation on the first matrix according to a convolution kernel of an initial state to obtain a first vector; and a third operation subunit, configured to The second convolution kernel performs a convolution operation on the first vector to obtain a second vector. The first determining subunit is configured to determine a classification result of the training sample according to a value of the target element in the second vector, where The value of the target element in the second vector indicates a probability size of the category of the second vector that is the same as the category corresponding to the target element, wherein the target element is any one of the second vectors; a subunit, configured to compare the classification result with each category of the picture to obtain a classification error value; and determine a subunit for determining whether the classification error value is greater than a preset error value; adjusting the subunit, And if the classification error value is greater than the preset error value, adjusting a weight value of the convolution kernel of the initial state until the classification error value is less than or equal to the preset error value; Determining a subunit, if the classification error value is less than or equal to the preset error value, the training ends, and the current convolution kernel is used as the first convolution kernel.
进一步地,所述第一卷积核满足如下公式:
Figure PCTCN2017092044-appb-000002
其中,m为卷积神经网络中输入向量维度,n为输出向量维度,stride为所述第一卷积核的步长,noc为输出通道个数,nconv为所述第一卷积核大小。
Further, the first convolution kernel satisfies the following formula:
Figure PCTCN2017092044-appb-000002
Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the first convolution kernel size.
根据本发明实施例的另一方面,还提供了一种机器人,包括:上述图片分类装置。According to another aspect of an embodiment of the present invention, there is also provided a robot comprising: the above picture classification device.
在本发明实施例中,目标图片为待分类的图片,将目标图片输入卷积神经网络的卷积层,卷积神经网络的最后一个池化层输出第一矩阵,根据第一卷积核对第一矩阵做卷积运算,得到第一向量,根据第二卷积核对第一向量做卷积运算,得到第二向量,第二向量的各个元素的值能够表示目标图片为某个类别的概率大小,因此,能够通过第二向量对目标图片进行分类,减少了全连接层参数的数量,降低了对终端性能的要求,使得网络能够部署在手机或者其他的一些嵌入式系统上,达到了降低对部署网络的终端性能的要求的技术效果,进而解决了现有技术中全连接层的参数太多导致网络对终端的性能要求较高的技术问题。In the embodiment of the present invention, the target picture is a picture to be classified, the target picture is input into the convolution layer of the convolutional neural network, and the last pooled layer of the convolutional neural network outputs the first matrix, according to the first convolution check A matrix is convoluted to obtain a first vector, and a convolution operation is performed on the first vector according to the second convolution kernel to obtain a second vector. The value of each element of the second vector can indicate the probability size of the target image as a certain category. Therefore, the target image can be classified by the second vector, the number of parameters of the full connection layer is reduced, the requirement for the performance of the terminal is reduced, and the network can be deployed on a mobile phone or other embedded systems, thereby achieving a reduction in the pair. The technical effect of the requirements of the terminal performance of the network is solved, thereby solving the technical problem that the parameters of the full connection layer in the prior art are too large, resulting in high performance requirements of the network to the terminal.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据本发明实施例的一种图片分类方法的流程图;1 is a flowchart of a picture classification method according to an embodiment of the present invention;
图2是根据本发明实施例的一种图片分类装置的示意图。 2 is a schematic diagram of a picture classification device according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
实施例1Example 1
根据本发明实施例,提供了一种图片分类方法的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。In accordance with an embodiment of the present invention, an embodiment of a picture classification method is provided, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions, and, although The logical order is shown in the flowcharts, but in some cases the steps shown or described may be performed in a different order than the ones described herein.
图1是根据本发明实施例的一种图片分类方法的流程图,如图1所示,该方法包括如下步骤:FIG. 1 is a flowchart of a picture classification method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
步骤S102,将待分类的目标图片输入卷积神经网络的卷积层,其中,卷积神经网络至少包括一个卷积层和一个池化层。Step S102, input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution layer and one pooling layer.
步骤S104,根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量,其中,第一向量为一维向量,第一矩阵是卷积神经网络的最后一个池化层的输出。Step S104, performing a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector is a one-dimensional vector, and the first matrix is the last pooled layer of the convolutional neural network. Output.
步骤S106,根据预先设置的第二卷积核对第一向量做卷积运算,得到第二向量,其中,第二向量为一维向量。Step S106, performing a convolution operation on the first vector according to the second convolution kernel set in advance to obtain a second vector, wherein the second vector is a one-dimensional vector.
步骤S108,根据第二向量对目标图片进行分类。Step S108, classifying the target picture according to the second vector.
图片分类就是输入一张图片,输出该图片对应的类别(狗,猫,船,鸟),或者说输出该图片属于哪种分类的可能性最大。Image classification is to input a picture, output the category corresponding to the picture (dog, cat, boat, bird), or the most likely to output which category the picture belongs to.
给计算机一张图片让它对图片进行分类,就是向计算机输入一个充满像素值的数 组,数组里的每一个数字范围都是0-255,代表该点上的像素值。让它返回这个数组对应的可能分类概率(例如,狗0.01,猫0.04,船0.94,鸟0.02)。Give the computer a picture to sort the picture by entering a number filled with pixel values into the computer. Group, each number in the array is 0-255, representing the pixel value at that point. Let it return the possible classification probability corresponding to this array (for example, dog 0.01, cat 0.04, boat 0.94, bird 0.02).
人类分辨一张船的图片可能是通过图片里船的边缘,线条等等特征。类似的,计算机分辨一张船的图片也是通过这些底层特征来进行判断,比如图片里的图像边缘和图像轮廓,然后通过卷积神经网络建立更抽象的概念。Humans can distinguish the picture of a ship by the characteristics of the ship's edge, lines and so on. Similarly, a computer that distinguishes a ship's image is also judged by these underlying features, such as image edges and image outlines in the image, and then a more abstract concept is created by convolutional neural networks.
卷积神经网络里第一层总是卷积层。正如前面所说,输入到卷积层里的是一个充满像素值的数组,假如是一个28*28*3的数组(3是RGB值)。可以把卷积层想象成一道光束,照在一张图片上面。这道光束叫做过滤器(卷积核),被光束照耀的地方叫做感受区。假设这道光束照亮的范围是一个5*5的方形区域。现在让这道光束从左到右,从上到下扫过图片的每一个区域。当所有移动完毕时,得到了一个24*24*3的数组。叫这个数组为特征图像。The first layer in a convolutional neural network is always a convolutional layer. As mentioned earlier, the input to the convolutional layer is an array full of pixel values, if it is an array of 28*28*3 (3 is the RGB value). Think of a convolutional layer as a beam of light that shines on a picture. This beam is called a filter (convolution core), and the place where the beam shines is called the receptive zone. Assume that the range illuminated by this beam is a 5*5 square area. Now let the beam sweep from left to right and from top to bottom across every area of the picture. When all the moves are complete, I get an array of 24*24*3. Call this array a feature image.
这个过滤器是一个数字类型的数组(里面的数字就是一些权重值)。过滤器的深度和输入的深度一样。所以过滤器的维度就是5*5*3。使用5*5*3过滤器,就可以得到输出数组24*24*3。如果使用更多的过滤器,就可以得到更多的特征图像。This filter is an array of numeric types (the numbers inside are some weight values). The depth of the filter is the same as the depth of the input. So the dimension of the filter is 5*5*3. Using the 5*5*3 filter, you can get the output array 24*24*3. If you use more filters, you can get more feature images.
当过滤器扫或者卷积整个图片时,把过滤器里面的权重值和真实图片里面对应的像素值相乘,最后把所有结果加和,得到一个加和值。然后重复这个过程,扫描整个输入图片,(下一步把过滤器向右移动一个单元,然后在向右移动一步,如此),每一步都可以得到一个值。When the filter sweeps or convolves the entire image, the weight value in the filter is multiplied by the corresponding pixel value in the real picture, and finally all the results are summed to obtain an added value. Then repeat this process, scan the entire input image (the next step moves the filter one unit to the right, then move one step to the right, and so on), and each step gets a value.
卷积的过程中,如果图片里有一个形状与该过滤器表示的形状相似,就会与过滤器产生激励效果,那么所得到的乘积结果加和值将就会是一个很大的数。可以增加其他的过滤器来检测图片的边缘和颜色等等。过滤器越多,特征图就越多,从输入数据里获取到的信息就越丰富。In the process of convolution, if there is a shape in the picture similar to the shape represented by the filter, it will generate an incentive effect with the filter, and the sum of the resulting product results will be a large number. Additional filters can be added to detect the edges and colors of the image, and so on. The more filters, the more feature maps, and the more information you get from the input data.
池化层通常用在卷积层之后,池化层的作用就是简化卷积层里输出的信息,减少数据维度,降低计算开销,控制过拟合。The pooling layer is usually used after the convolutional layer. The role of the pooling layer is to simplify the information output in the convolutional layer, reduce the data dimension, reduce the computational overhead, and control the overfitting.
现有技术中在最后一个池化层的输出接多个全连接层,多个全连接层的参数非常多。In the prior art, the output of the last pooling layer is connected to a plurality of fully connected layers, and the parameters of the plurality of fully connected layers are very large.
从线性代数的角度讲,全连接层将一个向量空间映射到另一个向量空间,如果映射矩阵W的秩大于输入向量维数,信息不会有丢失。卷积操作从线性代数的角度讲,就是完成上面这个映射操作,而且就卷积神经网络使用的一般方式而言,卷积操作生成的映射矩阵W的秩一般为min{m,n},m为输入向量维度,n为输出向量维度;因此从向量空间角度讲,这意味着卷积操作的信息丢失仅和输出向量维度相关。对于二者 的参数个数而言,全连接层需要的参数个数为m*n,卷积层需要的参数个数为noc*nconv,这里noc为输出通道个数,nconv为卷积核大小,事实上,这些参数还要满足如下的等式,stride为卷积每次移动的距离。From the perspective of linear algebra, the fully connected layer maps one vector space to another vector space. If the rank of the mapping matrix W is greater than the input vector dimension, the information will not be lost. From the perspective of linear algebra, the convolution operation is to complete the above mapping operation, and in the general manner used by the convolutional neural network, the rank of the mapping matrix W generated by the convolution operation is generally min{m,n},m. For the input vector dimension, n is the output vector dimension; therefore, from a vector space perspective, this means that the information loss of the convolution operation is only related to the output vector dimension. For the number of parameters of the two, the number of parameters required for the fully connected layer is m*n, and the number of parameters required for the convolutional layer is n oc *n conv , where n oc is the number of output channels, and n conv is The convolution kernel size, in fact, these parameters also have to satisfy the following equation, stride is the distance each time the convolution moves.
Figure PCTCN2017092044-appb-000003
Figure PCTCN2017092044-appb-000003
也就是说,卷积需要的参数个数是m*noc-n*stride,少于全连接参数个数的m*n。这里的noc一般只有n的几分之一到几十分之一。Han等使用参数剪枝方法降低全连接层参数数目,在他们的实验中,全连接层参数可以被压缩到20%,这意味着,在全连接参数矩阵中,仅仅有20%非零元素;这个实验也证明了使用一维卷积的合理性。That is to say, the number of parameters required for convolution is m*n oc -n*stride, which is less than m*n of the number of fully connected parameters. Here n oc is generally only a fraction of a to a few tenths of n. Han et al. used parametric pruning to reduce the number of fully connected layer parameters. In their experiments, the fully connected layer parameters can be compressed to 20%, which means that in the fully connected parameter matrix, there are only 20% non-zero elements; This experiment also demonstrates the rationality of using one-dimensional convolution.
在本发明实施例中,在最后一个池化层的输出接第一卷积核,利用卷积运算得到第二向量,第二向量的各个元素的值能够表示目标图片为某个类别的概率大小,代表图片可能的分类概率。例如第二向量为[0.01,0.04,0.94,0.02],数值越高表示这些特征图像和那一类更相近。这里0.94代表图像是船的可能性为94%,表示预测图片与过滤器产生了很高的激励,获取到了很多高层次的特征,例如船帆,船桨等等特征。0.02表示图像是鸟的可能性为2%,表示预测图片与过滤器产生了很低的激励,没有获取到了很多高层次的特征,例如翅膀,喙等特征。In the embodiment of the present invention, the output of the last pooling layer is connected to the first convolution kernel, and the convolution operation is used to obtain the second vector. The value of each element of the second vector can indicate the probability size of the target image as a certain category. , representing the possible classification probability of the picture. For example, the second vector is [0.01, 0.04, 0.94, 0.02], and a higher value indicates that these feature images are more similar to that one. Here 0.94 represents a 94% probability that the image is a ship, indicating that the predicted picture and the filter produce a high level of excitation, acquiring a lot of high-level features, such as sails, oars and so on. 0.02 means that the probability of the image being a bird is 2%, indicating that the predicted picture and the filter produce a very low stimulus, and many high-level features such as wings and scorpions are not acquired.
在本发明实施例中,目标图片为待分类的图片,将目标图片输入卷积神经网络的卷积层,卷积神经网络的最后一个池化层输出第一矩阵,根据第一卷积核对第一矩阵做卷积运算,得到第一向量,根据第二卷积核对第一向量做卷积运算,得到第二向量,第二向量的各个元素的值能够表示目标图片为某个类别的概率大小,因此,能够通过第二向量对目标图片进行分类,减少了全连接层参数的数量,降低了对终端性能的要求,使得网络能够部署在手机或者其他的一些嵌入式系统上,解决了现有技术中全连接层的参数太多导致网络对终端的性能要求较高的技术问题,达到了降低对部署网络的终端性能的要求的技术效果。In the embodiment of the present invention, the target picture is a picture to be classified, the target picture is input into the convolution layer of the convolutional neural network, and the last pooled layer of the convolutional neural network outputs the first matrix, according to the first convolution check A matrix is convoluted to obtain a first vector, and a convolution operation is performed on the first vector according to the second convolution kernel to obtain a second vector. The value of each element of the second vector can indicate the probability size of the target image as a certain category. Therefore, the target image can be classified by the second vector, the number of parameters of the full connection layer is reduced, the requirement for the performance of the terminal is reduced, and the network can be deployed on a mobile phone or some other embedded system, thereby solving the existing In the technology, too many parameters of the full connection layer lead to technical problems that require high performance of the network to the terminal, and the technical effect of reducing the requirements on the performance of the terminal of the deployed network is achieved.
可选地,根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量包括:根据第一卷积核对第一矩阵做卷积运算,得到第二矩阵;将第二矩阵的所有元素按照预设顺序重新排列,得到第一向量。Optionally, performing a convolution operation on the first matrix according to the preset first convolution kernel, and obtaining the first vector includes: performing a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix; and using the second matrix; All elements are rearranged in a preset order to get the first vector.
因为压缩感知良好的数据恢复能力,压缩感知是当前非常热门的领域,从编码看,它是一种新的有损压缩编码方式;压缩感知的想法是使用通用的编码方式,特殊的解码方式完成编解码过程。在理论上,压缩一个服从分布p(x)的向量x,压缩感知编码 的一种可能做法是对这个向量随机采样,解码时使用L1范数作为代价方程,恢复原向量,可以证明,如果分布符合某种条件,恢复的向量和原始向量的误差可以很小。Compressed sensing is a very popular field because of its good data recovery capability. From the perspective of coding, it is a new lossy compression coding method. The idea of compressed sensing is to use a common coding method and a special decoding method. Codec process. In theory, compress a vector x obeying the distribution p(x), compressed perceptual coding One possibility is to randomly sample this vector, use the L1 norm as the cost equation for decoding, and restore the original vector. It can be proved that if the distribution meets certain conditions, the error between the recovered vector and the original vector can be small.
使用重排的另一个原因是,因为卷积核的局部性质,仅仅经过卷积产生的向量只有局部信息,没有全局信息,重排后,对新的向量卷积,根据上面的压缩感知性质,卷积后的向量包含原始向量所有的信息,这样的卷积结果会有更好的全局特性。Another reason for using rearrangement is that, because of the local nature of the convolution kernel, only the vector produced by convolution has only local information, no global information, and after rearrangement, convolution of the new vector, according to the above-mentioned compression-aware nature, The convolved vector contains all the information of the original vector, and such convolution results have better global characteristics.
可选地,在根据预先设置的第一卷积核对第一矩阵做卷积运算之前,方法还包括:获取训练样本,其中,训练样本包括预先划分好类别的多张图片,类别用于表征训练样本所指示的事物的种类;根据训练样本对卷积神经网络进行训练,得到第一卷积核。Optionally, before the convolution operation is performed on the first matrix according to the preset first convolution kernel, the method further includes: acquiring the training samples, wherein the training samples include a plurality of pictures pre-divided into categories, and the categories are used for characterizing the training. The type of things indicated by the sample; the convolutional neural network is trained according to the training samples to obtain the first convolution kernel.
第一卷积核是经过训练得到的。具体的训练过程可以如下:将训练样本输入卷积神经网络的卷积层;根据初始状态的卷积核对第一矩阵做卷积运算,得到第一向量;根据第二卷积核对第一向量做卷积运算,得到第二向量;根据第二向量中目标元素的值确定训练样本的分类结果,其中,第二向量中目标元素的值指示第二向量的类别与目标元素对应的类别相同的概率大小,其中,目标元素是第二向量中的任意一个元素;将分类结果与每张图片的类别进行比较,得到分类误差值;判断分类误差值是否大于预设误差值;如果分类误差值大于预设误差值,对初始状态的卷积核的权重值进行调整,直至分类误差值小于等于预设误差值;如果分类误差值小于等于预设误差值,则训练结束,并将当前的卷积核作为第一卷积核。The first convolution kernel is trained. The specific training process may be as follows: input the training sample into the convolution layer of the convolutional neural network; convolute the first matrix according to the convolution kernel of the initial state to obtain the first vector; and perform the first vector according to the second convolution kernel Convolution operation, obtaining a second vector; determining a classification result of the training sample according to the value of the target element in the second vector, wherein the value of the target element in the second vector indicates a probability that the category of the second vector is the same as the category corresponding to the target element The size, wherein the target element is any one of the second vectors; comparing the classification result with the category of each picture to obtain a classification error value; determining whether the classification error value is greater than a preset error value; if the classification error value is greater than the pre- Set the error value, adjust the weight value of the convolution kernel in the initial state until the classification error value is less than or equal to the preset error value; if the classification error value is less than or equal to the preset error value, the training ends and the current convolution kernel As the first convolution kernel.
损失函数可以帮助网络更新权重值,从而找到想要的特征图像。损失函数的定义方式有很多种,一种常用方式的是MSE(mean squared erro)均方误差。将图片的真实分类值和图片通过网络训练出来的分类值代入均方误差误差公式,就得到了损失值。这个损失值在网络刚开始训练的时候可能会很高,这是因为权重值都是随机初始化出来的。最终目的就是想要得到预测值和真实值一样。为了达到这个目的,需要尽量减少损失值,损失值越小就说明预测结果越接近。在这一个过程中,需要不断的调整权重值,来寻找出哪些权重值能使网络的损失减小。可以使用到梯度下降算法来寻找这些权重值。The loss function helps the network update the weight values to find the desired feature image. There are many ways to define the loss function. One common method is MSE (mean squared erro) mean square error. The loss value is obtained by substituting the true classification value of the picture and the classification value trained by the network into the mean square error error formula. This loss value may be high when the network is just starting to train because the weight values are randomly initialized. The ultimate goal is to get the predicted value and the real value. In order to achieve this goal, it is necessary to minimize the loss value. The smaller the loss value, the closer the prediction result is. In this process, it is necessary to constantly adjust the weight values to find out which weight values can reduce the loss of the network. You can use the gradient descent algorithm to find these weight values.
可选地,根据第二卷积核对第一向量做卷积运算,得到第二向量包括:获取预先设置的第二卷积核,其中,第二卷积核为一维向量;利用预先设置的第二卷积核对第一向量做卷积运算,得到第二向量。Optionally, performing a convolution operation on the first vector according to the second convolution kernel, and obtaining the second vector includes: acquiring a preset second convolution kernel, where the second convolution kernel is a one-dimensional vector; The second convolution kernel performs a convolution operation on the first vector to obtain a second vector.
实施例2Example 2
为了减少全连接层的参数个数,使用如下的方案:In order to reduce the number of parameters of the fully connected layer, the following scheme is used:
1.上一层的输出reshape为一维向量。 1. The output reshape of the previous layer is a one-dimensional vector.
2.以固定的顺序重新排列这个向量;按照卷积核的滑动经过的2维向量依次排列成一维向量。2. Rearrange the vectors in a fixed order; the 2D vectors passing through the sliding of the convolution kernel are sequentially arranged into a one-dimensional vector.
3.对这个向量做一维卷积。3. Do one-dimensional convolution on this vector.
从线性代数的角度讲,全连接层将一个向量空间映射到另一个向量空间,如果映射矩阵W的秩大于输入向量维数,信息不会有丢失。卷积操作从线性代数的角度讲,就是完成上面这个映射操作,而且就卷积神经网络使用的一般方式而言,卷积操作生成的映射矩阵W的秩一般为min{m,n},m为输入向量维度,n为输出向量维度;因此从向量空间角度讲,这意味着卷积操作的信息丢失仅和输出向量维度相关。对于二者的参数个数而言,全连接层需要的参数个数为m*n,卷积层需要的参数个数为noc*nconv,这里noc为输出通道个数,nconv为卷积核大小,事实上,这些参数还要满足如下的等式,stride为卷积每次移动的距离。From the perspective of linear algebra, the fully connected layer maps one vector space to another vector space. If the rank of the mapping matrix W is greater than the input vector dimension, the information will not be lost. From the perspective of linear algebra, the convolution operation is to complete the above mapping operation, and in the general manner used by the convolutional neural network, the rank of the mapping matrix W generated by the convolution operation is generally min{m,n},m. For the input vector dimension, n is the output vector dimension; therefore, from a vector space perspective, this means that the information loss of the convolution operation is only related to the output vector dimension. For the number of parameters of the two, the number of parameters required for the fully connected layer is m*n, and the number of parameters required for the convolutional layer is n oc *n conv , where n oc is the number of output channels, and n conv is The convolution kernel size, in fact, these parameters also have to satisfy the following equation, stride is the distance each time the convolution moves.
Figure PCTCN2017092044-appb-000004
Figure PCTCN2017092044-appb-000004
也就是说,卷积需要的参数个数是m*noc-n*stride,少于全连接参数个数的m*n。这里的noc一般只有n的几分之一到几十分之一。Han等使用参数剪枝方法降低全连接层参数数目,在他们的实验中,全连接层参数可以被压缩到20%,这意味着,在全连接参数矩阵中,仅仅有20%非零元素;这个实验也证明了使用一维卷积的合理性。大部分参数为0,说明参数的稀疏性,而卷积的参数共享正是大大减小了参数量。That is to say, the number of parameters required for convolution is m*n oc -n*stride, which is less than m*n of the number of fully connected parameters. Here n oc is generally only a fraction of a to a few tenths of n. Han et al. used parametric pruning to reduce the number of fully connected layer parameters. In their experiments, the fully connected layer parameters can be compressed to 20%, which means that in the fully connected parameter matrix, there are only 20% non-zero elements; This experiment also demonstrates the rationality of using one-dimensional convolution. Most of the parameters are 0, indicating the sparsity of the parameters, and the parameter sharing of the convolution greatly reduces the amount of parameters.
因为压缩感知良好的数据恢复能力,压缩感知是当前非常热门的领域,从编码看,它是一种新的有损压缩编码方式;压缩感知的想法是使用通用的编码方式,特殊的解码方式完成编解码过程。在理论上,压缩一个服从分布p(x)的向量x,压缩感知编码的一种可能做法是对这个向量随机采样,解码时使用L1范数作为代价方程,恢复原向量,可以证明,如果分布符合某种条件,恢复的向量和原始向量的误差可以很小。Compressed sensing is a very popular field because of its good data recovery capability. From the perspective of coding, it is a new lossy compression coding method. The idea of compressed sensing is to use a common coding method and a special decoding method. Codec process. In theory, to compress a vector x obeying the distribution p(x), one possible way of compressing perceptual coding is to randomly sample this vector, using the L1 norm as the cost equation for decoding, and restoring the original vector, which can be proved if the distribution Depending on certain conditions, the error between the recovered vector and the original vector can be small.
使用重排的另一个原因是,因为卷积核的局部性质,仅仅经过卷积产生的向量只有局部信息,没有全局信息,重排后,对新的向量卷积,根据上面的压缩感知性质,卷积后的向量包含原始向量所有的信息,这样的卷积结果会有更好的全局特性。Another reason for using rearrangement is that, because of the local nature of the convolution kernel, only the vector produced by convolution has only local information, no global information, and after rearrangement, convolution of the new vector, according to the above-mentioned compression-aware nature, The convolved vector contains all the information of the original vector, and such convolution results have better global characteristics.
在当前的网络中,常用的分类器是softmax,在分类器之前也有一个全连接层;这个全连接层和其它全连接层的不同之处在于,它的输出维度是确定的,必须和类别数相同(其它的全连接层的输出维度一般由经验自由给定,没有必须的要求)。直接使用上述的过程,会有卷积后输出维度和类别数不匹配的问题,因此,在这里接一个全 连接层。In the current network, the commonly used classifier is softmax, and there is also a fully connected layer before the classifier; this fully connected layer is different from other fully connected layers in that its output dimension is determined, and the number of categories must be The same (the output dimensions of the other fully connected layers are generally given by experience freely, without the required requirements). Directly using the above process, there will be a problem that the output dimension and the number of categories do not match after convolution, so here is one full Connection layer.
最终的结果是,整个网络只有softmax之前一个全连接层,这会大大减少网络的参数。通过在Mnist上的实验说明了这一点。The end result is that the entire network has only a fully connected layer before softmax, which greatly reduces the parameters of the network. This is illustrated by experiments on Mnist.
mnist为手写字体数据集,包含0-9的手写数字数据集及标签,准确率即为网络对手写体数字正确识别的概率。The mnist is a handwritten font dataset containing 0-9 handwritten digital data sets and labels. The accuracy rate is the probability that the network correctly recognizes the handwritten digits.
在Mnist上的实验结果:Experimental results on Mnist:
方法method 参数个数Number of parameters
SoftmaxSoftmax 78407840
Conv+fc+softmaxConv+fc+softmax 32735043273504
Conv+fc+softmaxConv+fc+softmax 85568556
实施例3Example 3
根据本发明实施例,还提供了一种图片分类装置。该图片分类装置可以执行上述图片分类方法,上述图片分类方法也可以通过该图片分类装置实施。According to an embodiment of the invention, a picture classification device is also provided. The picture classification device may perform the picture classification method described above, and the picture classification method may be implemented by the picture classification device.
图2是根据本发明实施例的一种图片分类装置的示意图。如图2所示,该装置包括:输入单元10、第一运算单元20、第二运算单元30、分类单元40。2 is a schematic diagram of a picture classification device according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes an input unit 10, a first arithmetic unit 20, a second arithmetic unit 30, and a sorting unit 40.
输入单元10,用于将待分类的目标图片输入卷积神经网络的卷积层,其中,卷积神经网络至少包括一个卷积层和一个池化层。The input unit 10 is configured to input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution layer and one pooling layer.
第一运算单元20,用于根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量,其中,第一向量为一维向量,第一矩阵是卷积神经网络的最后一个池化层的输出。The first operation unit 20 is configured to perform a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector is a one-dimensional vector, and the first matrix is the last of the convolutional neural network. The output of a pooled layer.
第二运算单元30,用于根据预先设置的第二卷积核对第一向量做卷积运算,得到第二向量,其中,第二向量为一维向量。The second operation unit 30 is configured to perform a convolution operation on the first vector according to the second convolution kernel set in advance to obtain a second vector, where the second vector is a one-dimensional vector.
分类单元40,用于根据第二向量对目标图片进行分类。The classification unit 40 is configured to classify the target picture according to the second vector.
可选地,第一运算单元20包括:第一运算子单元、排列子单元。第一运算子单元,用于根据第一卷积核对第一矩阵做卷积运算,得到第二矩阵。排列子单元,用于将第二矩阵的所有元素按照预设顺序重新排列,得到第一向量。 Optionally, the first operation unit 20 includes: a first operation subunit, and an arrangement subunit. The first operation subunit is configured to perform a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix. Arranging subunits for rearranging all elements of the second matrix in a predetermined order to obtain a first vector.
可选地,装置还包括:获取单元、训练单元。获取单元,用于在第一运算单元20根据第一卷积核对第一矩阵做卷积运算之前,获取训练样本,其中,训练样本包括预先划分好类别的多张图片,类别用于表征训练样本所指示的事物的种类。训练单元,用于根据训练样本对卷积神经网络进行训练,得到第一卷积核。Optionally, the device further includes: an acquiring unit and a training unit. An obtaining unit, configured to acquire a training sample before the first operation unit 20 performs a convolution operation on the first matrix according to the first convolution kernel, wherein the training sample includes a plurality of pictures pre-divided into categories, and the category is used to represent the training sample The type of thing indicated. A training unit is configured to train the convolutional neural network according to the training sample to obtain a first convolution kernel.
可选地,训练单元包括:输入子单元、第二运算子单元、第三运算子单元、第一确定子单元、比较子单元、判断子单元、调整子单元、第二确定子单元。输入子单元,用于将训练样本输入到卷积神经网络的卷积层中。第二运算子单元,用于根据初始状态的卷积核对第一矩阵做卷积运算,得到第一向量。第三运算子单元,用于根据第二卷积核对第一向量做卷积运算,得到第二向量。第一确定子单元,用于根据第二向量中目标元素的值确定训练样本的分类结果,其中,第二向量中目标元素的值指示第二向量的类别与目标元素对应的类别相同的概率大小,其中,目标元素是第二向量中的任意一个元素。比较子单元,用于将分类结果与每张图片的类别进行比较,得到分类误差值。判断子单元,用于判断分类误差值是否大于预设误差值。调整子单元,用于如果分类误差值大于预设误差值,对初始状态的卷积核的权重值进行调整,直至分类误差值小于等于预设误差值。第二确定子单元,用于如果分类误差值小于等于预设误差值,则训练结束,并将当前的卷积核作为第一卷积核。Optionally, the training unit includes: an input subunit, a second operation subunit, a third operation subunit, a first determination subunit, a comparison subunit, a judgment subunit, an adjustment subunit, and a second determination subunit. An input subunit for inputting training samples into the convolutional layer of the convolutional neural network. And a second operation subunit, configured to perform a convolution operation on the first matrix according to the convolution kernel of the initial state to obtain a first vector. And a third operation subunit, configured to perform a convolution operation on the first vector according to the second convolution kernel to obtain a second vector. a first determining subunit, configured to determine, according to a value of the target element in the second vector, a classification result of the training sample, where the value of the target element in the second vector indicates a probability that the category of the second vector is the same as the category corresponding to the target element , wherein the target element is any one of the second vectors. The comparison subunit is used to compare the classification result with the category of each picture to obtain a classification error value. The determining subunit is configured to determine whether the classification error value is greater than a preset error value. The subunit is adjusted to adjust the weight value of the convolution kernel in the initial state until the classification error value is less than or equal to the preset error value if the classification error value is greater than the preset error value. The second determining subunit is configured to end the training if the classification error value is less than or equal to the preset error value, and use the current convolution kernel as the first convolution kernel.
可选地,第一卷积核满足如下公式:
Figure PCTCN2017092044-appb-000005
其中,m为卷积神经网络中输入向量维度,n为输出向量维度,stride为第一卷积核的步长,noc为输出通道个数,nconv为第一卷积核大小。
Optionally, the first convolution kernel satisfies the following formula:
Figure PCTCN2017092044-appb-000005
Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the size of the first convolution kernel.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本发明所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed technical contents may be implemented in other manners. The device embodiments described above are only schematic. For example, the division of the unit may be a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案 的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to implement the solution of the embodiment. the goal of.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。 The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims (11)

  1. 一种图片分类方法,其特征在于,包括:A picture classification method, comprising:
    将待分类的目标图片输入到卷积神经网络的卷积层中,其中,所述卷积神经网络至少包括一个卷积层和一个池化层;Inputting a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution layer and one pooling layer;
    根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量,其中,所述第一向量为一维向量,所述第一矩阵是所述卷积神经网络的最后一个池化层的输出;Performing a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, wherein the first vector is a one-dimensional vector, and the first matrix is the last pool of the convolutional neural network Output of the layer;
    根据预先设置的第二卷积核对所述第一向量做卷积运算,得到第二向量,其中,所述第二向量为一维向量;Performing a convolution operation on the first vector according to a preset second convolution kernel to obtain a second vector, where the second vector is a one-dimensional vector;
    根据所述第二向量对所述目标图片进行分类。The target picture is classified according to the second vector.
  2. 根据权利要求1所述的方法,其特征在于,根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量包括:The method according to claim 1, wherein the first matrix is convoluted according to the first convolution kernel set in advance, and the first vector is obtained:
    根据所述第一卷积核对所述第一矩阵做卷积运算,得到第二矩阵;Performing a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix;
    将所述第二矩阵的所有元素按照预设顺序重新排列,得到所述第一向量。All elements of the second matrix are rearranged in a predetermined order to obtain the first vector.
  3. 根据权利要求1所述的方法,其特征在于,在根据预先设置的第一卷积核对第一矩阵做卷积运算之前,所述方法还包括:The method according to claim 1, wherein before the convolution operation on the first matrix according to the first convolution kernel set in advance, the method further comprises:
    获取训练样本,其中,所述训练样本包括预先划分好类别的多张图片,所述类别用于表征所述训练样本所指示的事物的种类;Obtaining a training sample, wherein the training sample includes a plurality of pictures of a pre-divided category, the category being used to characterize a kind of the thing indicated by the training sample;
    根据所述训练样本对所述卷积神经网络进行训练,得到所述第一卷积核。The convolutional neural network is trained according to the training sample to obtain the first convolution kernel.
  4. 根据权利要求3所述的方法,其特征在于,根据所述训练样本对所述卷积神经网络进行训练,得到所述第一卷积核包括:The method according to claim 3, wherein the convolutional neural network is trained according to the training sample, and obtaining the first convolution kernel comprises:
    将所述训练样本输入到所述卷积神经网络的卷积层中;Importing the training sample into a convolutional layer of the convolutional neural network;
    根据初始状态的卷积核对所述第一矩阵做卷积运算,得到第一向量;Performing a convolution operation on the first matrix according to a convolution kernel of an initial state to obtain a first vector;
    根据所述第二卷积核对所述第一向量做卷积运算,得到第二向量;Performing a convolution operation on the first vector according to the second convolution kernel to obtain a second vector;
    根据所述第二向量中目标元素的值确定所述训练样本的分类结果,其中,所述第二向量中目标元素的值指示所述第二向量的类别与所述目标元素对应的类别相同的概率大小,其中,所述目标元素是所述第二向量中的任意一个元素; Determining a classification result of the training sample according to a value of a target element in the second vector, wherein a value of the target element in the second vector indicates that a category of the second vector is the same as a category corresponding to the target element a probability size, wherein the target element is any one of the second vectors;
    将所述分类结果与每张所述图片的类别进行比较,得到分类误差值;Comparing the classification result with each category of the picture to obtain a classification error value;
    判断所述分类误差值是否大于预设误差值;Determining whether the classification error value is greater than a preset error value;
    如果所述分类误差值大于所述预设误差值,对所述初始状态的卷积核的权重值进行调整,直至分类误差值小于等于所述预设误差值;If the classification error value is greater than the preset error value, adjusting a weight value of the convolution kernel of the initial state until the classification error value is less than or equal to the preset error value;
    如果所述分类误差值小于等于所述预设误差值,则训练结束,并将当前的卷积核作为所述第一卷积核。If the classification error value is less than or equal to the preset error value, the training ends and the current convolution kernel is used as the first convolution kernel.
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述第一卷积核满足如下公式:The method according to any one of claims 1 to 4, wherein the first convolution kernel satisfies the following formula:
    Figure PCTCN2017092044-appb-100001
    Figure PCTCN2017092044-appb-100001
    其中,m为卷积神经网络中输入向量维度,n为输出向量维度,stride为所述第一卷积核的步长,noc为输出通道个数,nconv为所述第一卷积核大小。Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the first convolution kernel size.
  6. 一种图片分类装置,其特征在于,包括:A picture classification device, comprising:
    输入单元,用于将待分类的目标图片输入到卷积神经网络的卷积层中,其中,所述卷积神经网络至少包括一个卷积层和一个池化层;An input unit, configured to input a target picture to be classified into a convolution layer of a convolutional neural network, wherein the convolutional neural network includes at least one convolution layer and one pooling layer;
    第一运算单元,用于根据预先设置的第一卷积核对第一矩阵做卷积运算,得到第一向量,其中,所述第一向量为一维向量,所述第一矩阵是所述卷积神经网络的最后一个池化层的输出;a first operation unit, configured to perform a convolution operation on the first matrix according to the preset first convolution kernel to obtain a first vector, where the first vector is a one-dimensional vector, and the first matrix is the volume The output of the last pooled layer of the neural network;
    第二运算单元,用于根据预先设置的第二卷积核对所述第一向量做卷积运算,得到第二向量,其中,所述第二向量为一维向量;a second operation unit, configured to perform a convolution operation on the first vector according to a preset second convolution kernel to obtain a second vector, where the second vector is a one-dimensional vector;
    分类单元,用于根据所述第二向量对所述目标图片进行分类。a classifying unit, configured to classify the target picture according to the second vector.
  7. 根据权利要求6所述的装置,其特征在于,所述第一运算单元包括:The apparatus according to claim 6, wherein the first arithmetic unit comprises:
    第一运算子单元,用于根据所述第一卷积核对所述第一矩阵做卷积运算,得到第二矩阵;a first operation subunit, configured to perform a convolution operation on the first matrix according to the first convolution kernel to obtain a second matrix;
    排列子单元,用于将所述第二矩阵的所有元素按照预设顺序重新排列,得到所述第一向量。Arranging subunits for rearranging all elements of the second matrix in a predetermined order to obtain the first vector.
  8. 根据权利要求6所述的装置,其特征在于,所述装置还包括: The device according to claim 6, wherein the device further comprises:
    获取单元,用于在所述第一运算单元根据预先设置的第一卷积核对第一矩阵做卷积运算之前,获取训练样本,其中,所述训练样本包括预先划分好类别的多张图片,所述类别用于表征所述训练样本所指示的事物的种类;An acquiring unit, configured to acquire a training sample before the first operation unit performs a convolution operation on the first matrix according to a preset first convolution kernel, where the training sample includes a plurality of pictures pre-divided into categories The category is used to characterize the type of thing indicated by the training sample;
    训练单元,用于根据所述训练样本对所述卷积神经网络进行训练,得到所述第一卷积核。And a training unit, configured to train the convolutional neural network according to the training sample to obtain the first convolution kernel.
  9. 根据权利要求8所述的装置,其特征在于,所述训练单元包括:The apparatus of claim 8 wherein said training unit comprises:
    输入子单元,用于将所述训练样本输入到所述卷积神经网络的卷积层中;An input subunit, configured to input the training sample into a convolution layer of the convolutional neural network;
    第二运算子单元,用于根据初始状态的卷积核对所述第一矩阵做卷积运算,得到第一向量;a second operation subunit, configured to perform a convolution operation on the first matrix according to a convolution kernel of an initial state to obtain a first vector;
    第三运算子单元,用于根据所述第二卷积核对所述第一向量做卷积运算,得到第二向量;a third operation subunit, configured to perform a convolution operation on the first vector according to the second convolution kernel to obtain a second vector;
    第一确定子单元,用于根据所述第二向量中目标元素的值确定所述训练样本的分类结果,其中,所述第二向量中目标元素的值指示所述第二向量的类别与所述目标元素对应的类别相同的概率大小,其中,所述目标元素是所述第二向量中的任意一个元素;a first determining subunit, configured to determine, according to a value of the target element in the second vector, a classification result of the training sample, where a value of the target element in the second vector indicates a category and a location of the second vector a probability size of the same category corresponding to the target element, wherein the target element is any one of the second vectors;
    比较子单元,用于将所述分类结果与每张所述图片的类别进行比较,得到分类误差值;a comparison subunit, configured to compare the classification result with a category of each of the pictures to obtain a classification error value;
    判断子单元,用于判断所述分类误差值是否大于预设误差值;a determining subunit, configured to determine whether the classification error value is greater than a preset error value;
    调整子单元,用于如果所述分类误差值大于所述预设误差值,对所述初始状态的卷积核的权重值进行调整,直至分类误差值小于等于所述预设误差值;Adjusting the subunit, if the classification error value is greater than the preset error value, adjusting a weight value of the convolution kernel of the initial state until the classification error value is less than or equal to the preset error value;
    第二确定子单元,用于如果所述分类误差值小于等于所述预设误差值,则训练结束,并将当前的卷积核作为所述第一卷积核。And a second determining subunit, configured to end the training if the classification error value is less than or equal to the preset error value, and use the current convolution kernel as the first convolution kernel.
  10. 根据权利要求6至9任一项所述的装置,其特征在于,所述第一卷积核满足如下公式:The apparatus according to any one of claims 6 to 9, wherein said first convolution kernel satisfies the following formula:
    Figure PCTCN2017092044-appb-100002
    Figure PCTCN2017092044-appb-100002
    其中,m为卷积神经网络中输入向量维度,n为输出向量维度,stride为所述第一卷积核的步长,noc为输出通道个数,nconv为所述第一卷积核大小。 Where m is the input vector dimension in the convolutional neural network, n is the output vector dimension, stride is the step size of the first convolution kernel, n oc is the number of output channels, and n conv is the first convolution kernel size.
  11. 一种机器人,其特征在于,包括权利要求6至10任一项所述的图片分类装置。 A robot comprising the picture classification device according to any one of claims 6 to 10.
PCT/CN2017/092044 2016-12-29 2017-07-06 Picture classification method, device and robot WO2018120740A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611266567.9 2016-12-29
CN201611266567.9A CN108256544B (en) 2016-12-29 2016-12-29 Picture classification method and device, robot

Publications (1)

Publication Number Publication Date
WO2018120740A1 true WO2018120740A1 (en) 2018-07-05

Family

ID=62707761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/092044 WO2018120740A1 (en) 2016-12-29 2017-07-06 Picture classification method, device and robot

Country Status (2)

Country Link
CN (1) CN108256544B (en)
WO (1) WO2018120740A1 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614856A (en) * 2018-10-31 2019-04-12 西安理工大学 Fungi image classification method based on convolutional neural networks
CN109671020A (en) * 2018-12-17 2019-04-23 北京旷视科技有限公司 Image processing method, device, electronic equipment and computer storage medium
CN109800817A (en) * 2019-01-25 2019-05-24 西安电子科技大学 Image classification method based on fusion Semantic Neural Network
CN109828251A (en) * 2019-03-07 2019-05-31 中国人民解放军海军航空大学 Radar target identification method based on feature pyramid light weight convolutional neural networks
CN109858261A (en) * 2019-01-18 2019-06-07 芜湖智久机器人有限公司 A kind of data storage medium, encryption method
CN110222718A (en) * 2019-05-09 2019-09-10 华为技术有限公司 The method and device of image procossing
CN110263965A (en) * 2019-05-06 2019-09-20 平安科技(深圳)有限公司 Method for early warning, device, computer equipment and storage medium based on video
CN110298394A (en) * 2019-06-18 2019-10-01 中国平安财产保险股份有限公司 A kind of image-recognizing method and relevant apparatus
CN110298346A (en) * 2019-05-23 2019-10-01 平安科技(深圳)有限公司 Image-recognizing method, device and computer equipment based on divisible convolutional network
CN110874627A (en) * 2018-09-04 2020-03-10 华为技术有限公司 Data processing method, data processing apparatus, and computer readable medium
CN110874556A (en) * 2018-09-04 2020-03-10 上海集光安防科技股份有限公司 License plate detecting system in natural scene based on deep learning
CN111046933A (en) * 2019-12-03 2020-04-21 东软集团股份有限公司 Image classification method and device, storage medium and electronic equipment
CN111079639A (en) * 2019-12-13 2020-04-28 中国平安财产保险股份有限公司 Method, device and equipment for constructing garbage image classification model and storage medium
CN111160517A (en) * 2018-11-07 2020-05-15 杭州海康威视数字技术股份有限公司 Convolutional layer quantization method and device of deep neural network
CN111192334A (en) * 2020-01-02 2020-05-22 苏州大学 Trainable compressed sensing module and image segmentation method
CN111191583A (en) * 2019-12-30 2020-05-22 郑州科技学院 Space target identification system and method based on convolutional neural network
CN111242228A (en) * 2020-01-16 2020-06-05 武汉轻工大学 Hyperspectral image classification method, device, equipment and storage medium
CN111339871A (en) * 2020-02-18 2020-06-26 中国电子科技集团公司第二十八研究所 Target group distribution pattern studying and judging method and device based on convolutional neural network
CN111382791A (en) * 2020-03-07 2020-07-07 北京迈格威科技有限公司 Deep learning task processing method, image recognition task processing method and device
CN111402217A (en) * 2020-03-10 2020-07-10 广州视源电子科技股份有限公司 Image grading method, device, equipment and storage medium
CN111428033A (en) * 2020-03-20 2020-07-17 北京邮电大学 Automatic threat information extraction method based on double-layer convolutional neural network
CN111523561A (en) * 2020-03-19 2020-08-11 深圳市彬讯科技有限公司 Image style recognition method and device, computer equipment and storage medium
CN111681292A (en) * 2020-05-18 2020-09-18 陕西科技大学 Task fMRI brain decoding and visualization method based on convolutional neural network
CN111783813A (en) * 2019-11-20 2020-10-16 北京沃东天骏信息技术有限公司 Image evaluation method, image model training device, image model training equipment and medium
CN111881729A (en) * 2020-06-16 2020-11-03 深圳数联天下智能科技有限公司 Live body flow direction discrimination method, device and equipment based on thermal imaging and storage medium
CN111967315A (en) * 2020-07-10 2020-11-20 华南理工大学 Human body comprehensive information acquisition method based on face recognition and infrared detection
CN112052758A (en) * 2020-08-25 2020-12-08 西安电子科技大学 Hyperspectral image classification method based on attention mechanism and recurrent neural network
CN112106034A (en) * 2018-07-13 2020-12-18 华为技术有限公司 Convolution method and device for neural network
CN112102239A (en) * 2020-08-10 2020-12-18 北京工业大学 Image processing method and system for full-layer brain CT image
CN112529973A (en) * 2020-10-13 2021-03-19 重庆英卡电子有限公司 Animal identification algorithm for snap-shot picture of field self-powered animal
CN113239899A (en) * 2021-06-17 2021-08-10 阿波罗智联(北京)科技有限公司 Method for processing image and generating convolution kernel, road side equipment and cloud control platform
CN113657421A (en) * 2021-06-17 2021-11-16 中国科学院自动化研究所 Convolutional neural network compression method and device and image classification method and device
CN113807363A (en) * 2021-09-08 2021-12-17 西安电子科技大学 Image classification method based on lightweight residual error network
CN114297940A (en) * 2021-12-31 2022-04-08 合肥工业大学 Method and device for determining unsteady reservoir parameters
CN116050474A (en) * 2022-12-29 2023-05-02 上海天数智芯半导体有限公司 Convolution calculation method, SOC chip, electronic equipment and storage medium
CN116718894A (en) * 2023-06-19 2023-09-08 上饶市广强电子科技有限公司 Circuit stability test method and system for corn lamp

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254946B (en) * 2018-08-31 2021-09-17 郑州云海信息技术有限公司 Image feature extraction method, device and equipment and readable storage medium
CN113302657B (en) * 2018-11-16 2024-04-26 华为技术有限公司 Neural network compression method and device
CN110378372A (en) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 Diagram data recognition methods, device, computer equipment and storage medium
CN110309837B (en) * 2019-07-05 2021-07-06 北京迈格威科技有限公司 Data processing method and image processing method based on convolutional neural network characteristic diagram
CN112580772B (en) * 2019-09-30 2024-04-26 华为技术有限公司 Compression method and device for convolutional neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679206A (en) * 2013-12-24 2014-03-26 Tcl集团股份有限公司 Image classification method and device
US20140180989A1 (en) * 2012-12-24 2014-06-26 Google Inc. System and method for parallelizing convolutional neural networks
CN104268521A (en) * 2014-09-23 2015-01-07 朱毅 Image recognition method based on convolutional neural network in non-finite category
CN105868785A (en) * 2016-03-30 2016-08-17 乐视控股(北京)有限公司 Image identification method based on convolutional neural network and image identification system thereof
CN106250911A (en) * 2016-07-20 2016-12-21 南京邮电大学 A kind of picture classification method based on convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140180989A1 (en) * 2012-12-24 2014-06-26 Google Inc. System and method for parallelizing convolutional neural networks
CN103679206A (en) * 2013-12-24 2014-03-26 Tcl集团股份有限公司 Image classification method and device
CN104268521A (en) * 2014-09-23 2015-01-07 朱毅 Image recognition method based on convolutional neural network in non-finite category
CN105868785A (en) * 2016-03-30 2016-08-17 乐视控股(北京)有限公司 Image identification method based on convolutional neural network and image identification system thereof
CN106250911A (en) * 2016-07-20 2016-12-21 南京邮电大学 A kind of picture classification method based on convolutional neural networks

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112106034A (en) * 2018-07-13 2020-12-18 华为技术有限公司 Convolution method and device for neural network
CN112106034B (en) * 2018-07-13 2024-05-24 华为技术有限公司 Convolution method and device for neural network
CN110874627A (en) * 2018-09-04 2020-03-10 华为技术有限公司 Data processing method, data processing apparatus, and computer readable medium
CN110874556B (en) * 2018-09-04 2024-02-09 上海集光安防科技股份有限公司 License plate detection system in natural scene based on deep learning
CN110874556A (en) * 2018-09-04 2020-03-10 上海集光安防科技股份有限公司 License plate detecting system in natural scene based on deep learning
CN109614856A (en) * 2018-10-31 2019-04-12 西安理工大学 Fungi image classification method based on convolutional neural networks
CN111160517B (en) * 2018-11-07 2024-02-06 杭州海康威视数字技术股份有限公司 Convolutional layer quantization method and device for deep neural network
CN111160517A (en) * 2018-11-07 2020-05-15 杭州海康威视数字技术股份有限公司 Convolutional layer quantization method and device of deep neural network
CN109671020A (en) * 2018-12-17 2019-04-23 北京旷视科技有限公司 Image processing method, device, electronic equipment and computer storage medium
CN109671020B (en) * 2018-12-17 2023-10-24 北京旷视科技有限公司 Image processing method, device, electronic equipment and computer storage medium
CN109858261A (en) * 2019-01-18 2019-06-07 芜湖智久机器人有限公司 A kind of data storage medium, encryption method
CN109800817B (en) * 2019-01-25 2023-03-24 西安电子科技大学 Image classification method based on fusion semantic neural network
CN109800817A (en) * 2019-01-25 2019-05-24 西安电子科技大学 Image classification method based on fusion Semantic Neural Network
CN109828251A (en) * 2019-03-07 2019-05-31 中国人民解放军海军航空大学 Radar target identification method based on feature pyramid light weight convolutional neural networks
CN110263965A (en) * 2019-05-06 2019-09-20 平安科技(深圳)有限公司 Method for early warning, device, computer equipment and storage medium based on video
CN110222718A (en) * 2019-05-09 2019-09-10 华为技术有限公司 The method and device of image procossing
CN110222718B (en) * 2019-05-09 2023-11-03 华为技术有限公司 Image processing method and device
CN110298346A (en) * 2019-05-23 2019-10-01 平安科技(深圳)有限公司 Image-recognizing method, device and computer equipment based on divisible convolutional network
CN110298394A (en) * 2019-06-18 2019-10-01 中国平安财产保险股份有限公司 A kind of image-recognizing method and relevant apparatus
CN110298394B (en) * 2019-06-18 2024-04-05 中国平安财产保险股份有限公司 Image recognition method and related device
CN111783813B (en) * 2019-11-20 2024-04-09 北京沃东天骏信息技术有限公司 Image evaluation method, training image model method, device, equipment and medium
CN111783813A (en) * 2019-11-20 2020-10-16 北京沃东天骏信息技术有限公司 Image evaluation method, image model training device, image model training equipment and medium
CN111046933B (en) * 2019-12-03 2024-03-05 东软集团股份有限公司 Image classification method, device, storage medium and electronic equipment
CN111046933A (en) * 2019-12-03 2020-04-21 东软集团股份有限公司 Image classification method and device, storage medium and electronic equipment
CN111079639A (en) * 2019-12-13 2020-04-28 中国平安财产保险股份有限公司 Method, device and equipment for constructing garbage image classification model and storage medium
CN111079639B (en) * 2019-12-13 2023-09-19 中国平安财产保险股份有限公司 Method, device, equipment and storage medium for constructing garbage image classification model
CN111191583A (en) * 2019-12-30 2020-05-22 郑州科技学院 Space target identification system and method based on convolutional neural network
CN111191583B (en) * 2019-12-30 2023-08-25 郑州科技学院 Space target recognition system and method based on convolutional neural network
CN111192334A (en) * 2020-01-02 2020-05-22 苏州大学 Trainable compressed sensing module and image segmentation method
CN111242228A (en) * 2020-01-16 2020-06-05 武汉轻工大学 Hyperspectral image classification method, device, equipment and storage medium
CN111242228B (en) * 2020-01-16 2024-02-27 武汉轻工大学 Hyperspectral image classification method, hyperspectral image classification device, hyperspectral image classification equipment and storage medium
CN111339871A (en) * 2020-02-18 2020-06-26 中国电子科技集团公司第二十八研究所 Target group distribution pattern studying and judging method and device based on convolutional neural network
CN111339871B (en) * 2020-02-18 2022-09-16 中国电子科技集团公司第二十八研究所 Target group distribution pattern studying and judging method and device based on convolutional neural network
CN111382791A (en) * 2020-03-07 2020-07-07 北京迈格威科技有限公司 Deep learning task processing method, image recognition task processing method and device
CN111382791B (en) * 2020-03-07 2023-12-26 北京迈格威科技有限公司 Deep learning task processing method, image recognition task processing method and device
CN111402217B (en) * 2020-03-10 2023-10-31 广州视源电子科技股份有限公司 Image grading method, device, equipment and storage medium
CN111402217A (en) * 2020-03-10 2020-07-10 广州视源电子科技股份有限公司 Image grading method, device, equipment and storage medium
CN111523561A (en) * 2020-03-19 2020-08-11 深圳市彬讯科技有限公司 Image style recognition method and device, computer equipment and storage medium
CN111428033B (en) * 2020-03-20 2023-04-07 北京邮电大学 Automatic threat information extraction method based on double-layer convolutional neural network
CN111428033A (en) * 2020-03-20 2020-07-17 北京邮电大学 Automatic threat information extraction method based on double-layer convolutional neural network
CN111681292A (en) * 2020-05-18 2020-09-18 陕西科技大学 Task fMRI brain decoding and visualization method based on convolutional neural network
CN111681292B (en) * 2020-05-18 2023-04-07 陕西科技大学 Task fMRI brain decoding and visualization method based on convolutional neural network
CN111881729A (en) * 2020-06-16 2020-11-03 深圳数联天下智能科技有限公司 Live body flow direction discrimination method, device and equipment based on thermal imaging and storage medium
CN111881729B (en) * 2020-06-16 2024-02-06 深圳数联天下智能科技有限公司 Living body flow direction screening method, device, equipment and storage medium based on thermal imaging
CN111967315A (en) * 2020-07-10 2020-11-20 华南理工大学 Human body comprehensive information acquisition method based on face recognition and infrared detection
CN111967315B (en) * 2020-07-10 2023-08-22 华南理工大学 Human body comprehensive information acquisition method based on face recognition and infrared detection
CN112102239A (en) * 2020-08-10 2020-12-18 北京工业大学 Image processing method and system for full-layer brain CT image
CN112102239B (en) * 2020-08-10 2024-05-21 北京工业大学 Image processing method and system for full-layer brain CT image
CN112052758A (en) * 2020-08-25 2020-12-08 西安电子科技大学 Hyperspectral image classification method based on attention mechanism and recurrent neural network
CN112052758B (en) * 2020-08-25 2023-05-23 西安电子科技大学 Hyperspectral image classification method based on attention mechanism and cyclic neural network
CN112529973A (en) * 2020-10-13 2021-03-19 重庆英卡电子有限公司 Animal identification algorithm for snap-shot picture of field self-powered animal
CN112529973B (en) * 2020-10-13 2023-06-02 重庆英卡电子有限公司 Method for identifying field self-powered animal snap-shot pictures
CN113239899B (en) * 2021-06-17 2024-05-28 阿波罗智联(北京)科技有限公司 Method for processing image and generating convolution kernel, road side equipment and cloud control platform
CN113657421B (en) * 2021-06-17 2024-05-28 中国科学院自动化研究所 Convolutional neural network compression method and device, and image classification method and device
CN113239899A (en) * 2021-06-17 2021-08-10 阿波罗智联(北京)科技有限公司 Method for processing image and generating convolution kernel, road side equipment and cloud control platform
CN113657421A (en) * 2021-06-17 2021-11-16 中国科学院自动化研究所 Convolutional neural network compression method and device and image classification method and device
CN113807363A (en) * 2021-09-08 2021-12-17 西安电子科技大学 Image classification method based on lightweight residual error network
CN113807363B (en) * 2021-09-08 2024-04-19 西安电子科技大学 Image classification method based on lightweight residual error network
CN114297940A (en) * 2021-12-31 2022-04-08 合肥工业大学 Method and device for determining unsteady reservoir parameters
CN114297940B (en) * 2021-12-31 2024-05-07 合肥工业大学 Method and device for determining unsteady state reservoir parameters
CN116050474A (en) * 2022-12-29 2023-05-02 上海天数智芯半导体有限公司 Convolution calculation method, SOC chip, electronic equipment and storage medium
CN116718894B (en) * 2023-06-19 2024-03-29 上饶市广强电子科技有限公司 Circuit stability test method and system for corn lamp
CN116718894A (en) * 2023-06-19 2023-09-08 上饶市广强电子科技有限公司 Circuit stability test method and system for corn lamp

Also Published As

Publication number Publication date
CN108256544A (en) 2018-07-06
CN108256544B (en) 2019-07-23

Similar Documents

Publication Publication Date Title
WO2018120740A1 (en) Picture classification method, device and robot
US11516473B2 (en) Bandwidth compression for neural network systems
WO2020177651A1 (en) Image segmentation method and image processing device
CN110163197B (en) Target detection method, target detection device, computer-readable storage medium and computer equipment
Sameen et al. Classification of very high resolution aerial photos using spectral‐spatial convolutional neural networks
CN113469073B (en) SAR image ship detection method and system based on lightweight deep learning
EP3333768A1 (en) Method and apparatus for detecting target
CN110826596A (en) Semantic segmentation method based on multi-scale deformable convolution
US9710697B2 (en) Method and system for exacting face features from data of face images
CN112308152B (en) Hyperspectral image ground object classification method based on spectrum segmentation and homogeneous region detection
CN109615614B (en) Method for extracting blood vessels in fundus image based on multi-feature fusion and electronic equipment
CN113052216B (en) Oil spill hyperspectral image detection method based on two-way graph U-NET convolutional network
EP4283876A1 (en) Data coding method and related device
CN110287938B (en) Event identification method, system, device and medium based on key fragment detection
CN115565045A (en) Hyperspectral and multispectral image fusion method based on multi-scale space-spectral transformation
CN112801063A (en) Neural network system and image crowd counting method based on neural network system
WO2022228142A1 (en) Object density determination method and apparatus, computer device and storage medium
CN111242228A (en) Hyperspectral image classification method, device, equipment and storage medium
CN108985346B (en) Existing exploration image retrieval method fusing low-level image features and CNN features
CN114612709A (en) Multi-scale target detection method guided by image pyramid characteristics
Jiang et al. Semantic segmentation network combined with edge detection for building extraction in remote sensing images
Demir et al. Phase correlation based redundancy removal in feature weighting band selection for hyperspectral images
CN109583584B (en) Method and system for enabling CNN with full connection layer to accept indefinite shape input
CN114782821B (en) Coastal wetland vegetation remote sensing identification method combined with multiple migration learning strategies
WO2023061465A1 (en) Methods, systems, and media for computer vision using 2d convolution of 4d video data tensors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17886242

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 24.10.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17886242

Country of ref document: EP

Kind code of ref document: A1