WO2021102762A1 - 一种感知网络及图像处理方法 - Google Patents

一种感知网络及图像处理方法 Download PDF

Info

Publication number
WO2021102762A1
WO2021102762A1 PCT/CN2019/121373 CN2019121373W WO2021102762A1 WO 2021102762 A1 WO2021102762 A1 WO 2021102762A1 CN 2019121373 W CN2019121373 W CN 2019121373W WO 2021102762 A1 WO2021102762 A1 WO 2021102762A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
convolutional
fully connected
subnet
image
Prior art date
Application number
PCT/CN2019/121373
Other languages
English (en)
French (fr)
Inventor
谭文伟
邓鹏�
许占
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2019/121373 priority Critical patent/WO2021102762A1/zh
Priority to CN201980101029.XA priority patent/CN114467121A/zh
Publication of WO2021102762A1 publication Critical patent/WO2021102762A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to a perceptual network and an image processing method.
  • CNN Convolutional Neural Network
  • FC layer fully connected layer
  • Figure 1 it is a schematic diagram of the multi-task neural network architecture.
  • Sharing a fully connected layer to train the data of different types of tasks cannot guarantee the optimal output results of each task, and the accuracy of the output results is low.
  • the present application provides a perceptual network and an image processing method, which solves the problem of low accuracy of multi-task output results when a fully connected layer is shared.
  • this application provides a perceptual network that includes a backbone network, a convolutional network, and a fully connected network.
  • the convolutional network includes N convolutional subnets
  • the fully connected network includes N fully connected subnets.
  • the backbone network is connected to N convolutional subnets
  • N convolutional subnets are connected to N fully connected subnets
  • the i-th convolutional subnet is connected to the i-th fully connected subnet.
  • N is an integer greater than or equal to 2
  • the backbone network is used to perform convolution processing on the first image to obtain M initialization images, and output the M initialization images to each of the N convolution subnets, and M initialization images
  • the feature of each initialization image is different, and M is an integer greater than or equal to 1.
  • the i-th convolutional subnet is used to perform convolution pooling processing on the M initialization images to obtain the image of the i-th feature, and The image of the i-th feature is output to the i-th fully connected subnet; the i-th fully connected subnet is used to classify the image of the i-th feature to obtain the classification result of the i-th feature.
  • the perceptual network provided by this application includes multiple convolutional subnets and fully connected subnets connected to the convolutional subnets. Since each convolutional subnet and the fully connected subnet connected to the convolutional subnet are processed together A feature type image, therefore, the data of different types of tasks can be trained separately, ensuring the optimal output result of each task, and effectively improving the accuracy of the output result.
  • the backbone network includes at least one convolutional layer, and the backbone network is used to perform convolution processing on the first image according to the first convolutional layer to the Kth convolutional layer.
  • K is an integer greater than or equal to 1.
  • the value of K is 3 or 5.
  • the convolution of the i-th sub-layer comprising a K i L i layer and a convolution layer cell layer (pooling layer), K i, and L i are integers greater than or equal to 1; a second M i convolution subnet for image initialization processing and the K i L i convolutions secondary cell treatment to obtain an image of the i-th feature category. Since different convolutional subnets contain different numbers of convolutional layers and pooling layers, different convolutional subnets are used to perform different times of convolutional pooling processing on the same image to obtain images with different types of features, thus, Maintaining a high-resolution convolution feature map effectively improves the accuracy of image processing.
  • the i-th subnetwork comprises a fully-connected full connection layer R i, R i is an integer greater than or equal to 2. Since different fully connected subnets contain different numbers of fully connected layers, different feature images are classified through different fully connected subnets to obtain the classification results of corresponding features, thereby ensuring higher resolution convolution features The classification processing of the graph effectively improves the accuracy of the classification processing.
  • the first convolutional subnet includes the first convolutional layer, the second convolutional layer, and the first pooling layer
  • the second convolutional subnet includes the third convolutional layer and the second pooling layer.
  • the third convolutional subnet includes the fourth convolutional layer, the fifth convolutional layer and the third pooling layer
  • the first fully connected subnet includes the first fully connected layer, the second fully connected layer and the third fully connected layer
  • the second fully connected subnet includes the fourth fully connected layer and the fifth fully connected layer
  • the third fully connected subnet includes the sixth fully connected layer, the seventh fully connected layer and the eighth fully connected layer; among them, the backbone network and
  • the first convolutional layer is connected, the first convolutional layer is connected to the second convolutional layer, the second convolutional layer is connected to the first pooling layer, the first pooling layer is connected to the first fully connected layer, and the first fully connected The layer is connected to the second fully connected layer, and the second fully connected layer is connected to the third fully connected layer; the backbone network
  • the value T of the convolution kernel of the first convolution subnet is 128, 256 or 512; the value of the convolution kernel of the second convolution subnet is 0.5*T; the convolution of the third convolution subnet
  • the value of the core is 2*T; the dimension value of the first fully connected layer is 1024, 2048 or 4096; the dimension value of the second fully connected layer is 1024, 2048 or 4096; the dimension value of the third fully connected layer is 1024, 2048 or 4096; the dimension value of the fourth fully connected layer is 1024, 2048 or 4096; the dimension value of the fifth fully connected layer is 1024, 2048 or 4096; the dimension value of the sixth fully connected layer is 1024, 2048 or 4096; The dimension value of the seventh fully connected layer is 1024, 2048, or 4096; the dimension value of the eighth fully connected layer is 1024, 2048, or 4096.
  • the present application provides an image processing method, which can be applied to terminal devices, or the method can be applied to a communication device that can support the terminal device to implement the method, for example, the communication device includes a chip system.
  • the terminal device is provided with a perception network, the perception network includes a backbone network, a convolutional network and a fully connected network.
  • the convolutional network includes N convolutional subnets.
  • the fully connected network includes N fully connected subnets.
  • the backbone network and N Convolutional subnets are connected, N convolutional subnets are connected to N fully connected subnets, the i-th convolutional subnet is connected to the i-th fully connected subnet, N is an integer greater than or equal to 2, i is an integer, i ⁇ [1,N].
  • the method includes: firstly, performing convolution processing on the first image through the backbone network to obtain M initialization images, and outputting the M initialization images to each of the N convolution subnets, and M The characteristics of each initialization image in the initialization image are different, and M is an integer greater than or equal to 1.
  • M is an integer greater than or equal to 1.
  • the i-th convolutional subnet performs convolution pooling processing on the M initialization images to obtain the image of the i-th feature, and outputs the image of the i-th feature to the i-th fully connected subnet; the i-th fully connected subnet pair The image of the i-th feature is classified to obtain the classification result of the i-th feature.
  • the perceptual network provided by this application includes multiple convolutional subnets and fully connected subnets connected to the convolutional subnets. Since each convolutional subnet and the fully connected subnet connected to the convolutional subnet are processed together A feature type image, therefore, the data of different types of tasks can be trained separately, ensuring the optimal output result of each task, and effectively improving the accuracy of the output result.
  • the backbone network includes at least one convolutional layer, and the backbone network performs convolution processing on the first image to obtain M initialization images, including: according to the first convolutional layer to the Kth convolutional layer
  • the build-up layer performs convolution processing on the first image to obtain M initialization images
  • K is an integer greater than or equal to 1.
  • the value of K is 3 or 5.
  • the i-th convolutional subnet includes a K i- layer convolutional layer and a Li- layer pooling layer, and both K i and Li are integers greater than or equal to 1;
  • the i-th convolution subnet network initialization for M convolving the image processing to obtain a cell type characteristic of the i-th image comprising: i-th convolution of M sub image initialization processing and the convolutions K i L i to obtain a secondary treatment tank the image feature class i;
  • i-th subnetwork comprises a fully-connected full connection layer R i, R i is an integer greater than or equal to 2.
  • Convolution processing is performed on the initialization image through multiple convolution layers, so that a higher resolution convolution feature map is maintained, which is beneficial to improve the accuracy of processing the image.
  • the first convolutional subnet includes the first convolutional layer, the second convolutional layer, and the first pooling layer
  • the second convolutional subnet includes the third convolutional layer and the second pooling layer.
  • the third convolutional subnet includes the fourth convolutional layer, the fifth convolutional layer and the third pooling layer
  • the first fully connected subnet includes the first fully connected layer, the second fully connected layer and the third fully connected layer
  • the second fully connected subnet includes the fourth fully connected layer and the fifth fully connected layer
  • the third fully connected subnet includes the sixth fully connected layer, the seventh fully connected layer and the eighth fully connected layer; among them, the backbone network and
  • the first convolutional layer is connected, the first convolutional layer is connected to the second convolutional layer, the second convolutional layer is connected to the first pooling layer, the first pooling layer is connected to the first fully connected layer, and the first fully connected The layer is connected to the second fully connected layer, and the second fully connected layer is connected to the third fully connected layer; the backbone network
  • the value T of the convolution kernel of the first convolution subnet is 128, 256 or 512; the value of the convolution kernel of the second convolution subnet is 0.5*T; the convolution of the third convolution subnet
  • the value of the core is 2*T; the dimension value of the first fully connected layer is 1024, 2048 or 4096, the dimension value of the second fully connected layer is 1024, 2048 or 4096; the dimension value of the fourth fully connected layer is 1024, 2048 or 4096, the dimension value of the fifth fully connected layer is 1024, 2048 or 4096; the dimension value of the sixth fully connected layer is 1024, 2048 or 4096, and the dimension value of the seventh fully connected layer is 1024, 2048 or 4096.
  • the embodiments of the present application also provide a communication device, and the beneficial effects can be referred to the description of the second aspect and will not be repeated here.
  • the communication device has the function of realizing the behavior in the method example of the second aspect described above.
  • the function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the communication device includes: a transceiver unit and a processing unit. The processing unit is used to obtain a first image.
  • the processing unit is configured to perform convolution processing on the first image through the backbone network to obtain M initialization images, and output the M initialization images to each of the N convolution subnets, and M The characteristics of each initialization image in the initialization image are different, and M is an integer greater than or equal to 1.
  • the processing unit is also used to perform convolution pooling processing on the M initialization images through the i-th convolution subnet to obtain the i-th convolutional subnet. And output the image of the i-th feature to the i-th fully connected subnet; the processing unit is also used to classify the image of the i-th feature through the i-th fully connected subnet to obtain the The classification result of the i-type features.
  • a communication device may be the terminal device in the foregoing method embodiment, or a chip set in the terminal device.
  • the communication device includes a communication interface, a processor, and optionally, a memory.
  • the memory is used to store a computer program or instruction, and the processor is coupled with the memory and a communication interface.
  • the processor executes the computer program or instruction
  • the communication device executes the method executed by the terminal device in the foregoing method embodiment.
  • a computer program product includes: computer program code, which when the computer program code is running, causes the methods executed by the terminal device in the above aspects to be executed.
  • the present application provides a chip system, the chip system includes a processor, and is configured to implement the functions of the terminal device in the methods of the foregoing aspects.
  • the chip system further includes a memory for storing program instructions and/or data.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the present application provides a computer-readable storage medium that stores a computer program, and when the computer program is executed, the method executed by the terminal device in each of the above aspects is implemented.
  • the names of the terminal device and the communication device do not constitute a limitation on the device itself. In actual implementation, these devices may appear under other names. As long as the function of each device is similar to that of this application, it falls within the scope of the claims of this application and its equivalent technologies.
  • FIG. 1 is a schematic diagram of the architecture of a multi-task neural network provided by an embodiment
  • FIG. 2 is a schematic diagram of the architecture of a sensing network provided by an embodiment
  • FIG. 3 is a schematic diagram of the architecture of a sensing network provided by an embodiment
  • Figure 4 is a schematic diagram of a pitch angle, a yaw angle, and a roll angle provided by an embodiment
  • FIG. 5 is a schematic diagram of a result of image processing provided by an embodiment
  • FIG. 6 is a flowchart of an image processing method provided by an embodiment
  • FIG. 7 is a schematic diagram of the composition of an image processing device provided by an embodiment
  • FIG. 8 is a schematic diagram of the composition of an image processing device provided by an embodiment.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • Convolutional neural networks include one-dimensional convolutional neural networks, two-dimensional convolutional neural networks, and three-dimensional convolutional neural networks.
  • One-dimensional convolutional neural networks are often used in sequence data processing.
  • Two-dimensional convolutional neural networks are often used in image text recognition.
  • Three-dimensional convolutional neural networks are mainly used in medical image and video data recognition.
  • Convolutional neural network includes data input layer (Input layer), convolution layer, excitation layer (ReLU layer), pooling layer and fully connected layer.
  • the data input layer mainly preprocesses the original image data (such as pixel value). Preprocessing can include de-averaging, normalization, and principal component analysis (PCA)/whitening.
  • PCA principal component analysis
  • De-averaging is to center each dimension of the input data to 0, and its purpose is to pull the center of the sample back to the origin of the coordinate system.
  • Normalization is to normalize the amplitude to the same range, that is, to reduce the interference caused by the difference in the value range of the data of each dimension. For example, there are two dimensions of features A and B, A range is 0 to 10, and B range is 0 to 10000, normalization is to change the data of A and B to the range of 0 to 1.
  • Principal component analysis is using the idea of dimensionality reduction to convert multiple indicators into a few comprehensive indicators. Whitening is to normalize the amplitude on each characteristic axis of the data.
  • the convolutional layer is the most important layer of the convolutional neural network, and it is also the source of the name "convolutional neural network".
  • the purpose of the convolution operation is to extract different features of the original image.
  • the convolutional neural network may include multiple convolutional layers.
  • the first convolutional layer may only extract some low-level features such as edges, lines, and corners. More layers of convolutional layers can iteratively extract more complex features from low-level features.
  • the convolutional layer may include at least one convolution kernel.
  • the features of the original image are obtained through the convolution kernel.
  • the convolution kernel can be used to slide on the original image, and the parameters in the convolution kernel and the pixel values of the original image can be convolved to obtain the characteristics of the original image.
  • the parameters of the convolution kernel can be obtained by optimizing the backpropagation algorithm.
  • the sliding range of the convolution kernel can be set according to requirements.
  • the sliding amplitude of the convolution kernel is 1 or 2.
  • the number of convolution kernels can also be called depth.
  • the number of convolution kernels determines the number of images output by the convolution layer.
  • the parameters of the convolution kernel are different, and the image output by the convolution layer contains different image features.
  • the excitation layer is a nonlinear mapping of the output result of the convolutional layer.
  • the excitation function used by the convolutional neural network is generally a modified linear unit (ReLU). ReLU is characterized by fast convergence and simple gradient calculation, but it is relatively fragile.
  • the pooling layer is located in the middle of successive convolutional layers.
  • the pooling layer is used to compress the amount of data and parameters to reduce overfitting. In short, if the input is an image, then the main function of the pooling layer is to compress the image.
  • the pooling layer has the following advantages:
  • Feature invariance The scale invariance of features is often mentioned in image processing, and the pooling operation is an operation on the size of the image (resize). After an image of a dog is reduced by a factor of two, it can be considered as a photo of a dog. This shows that the most important features of the dog are still retained in this image. The information removed when the image is compressed is only insignificant information. , And the information left is a feature of scale invariance, which is the feature that best expresses the image.
  • the methods of the pooling layer include max pooling and average pooling.
  • Maximum pooling is a commonly used pooling method.
  • Fully connected means that each neuron in the next layer is all connected to the neuron in the previous layer. All neurons between the two layers have the right to reconnect.
  • the fully connected layer and the output layer classify the images output by the pooling layer to obtain the classification results.
  • an embodiment of the present application provides a perceptual network that includes multiple convolutional subnets and is connected to the convolutional subnet Because each convolutional subnet and the fully connected subnet connected to the convolutional subnet jointly process images of one characteristic type, the data of different types of tasks can be trained separately to ensure The output result of each task is optimal, which effectively improves the accuracy of the output result.
  • FIG. 2 is a schematic diagram of the architecture of a sensing network provided by an embodiment.
  • the perceptual network 200 includes a backbone network 210, a convolutional network 220 and a fully connected network 230.
  • the convolutional network 220 includes N convolutional subnets (the first convolutional subnet 221 to the nth convolutional subnet 22n as shown in FIG. 2), and N is an integer greater than or equal to 2.
  • the fully connected network 230 includes N fully connected subnets (the first fully connected subnet 231 to the nth fully connected subnet 23n as shown in FIG. 2).
  • the backbone network 210 is respectively connected to N convolutional subnets.
  • N convolutional subnets are connected to N fully connected subnets, that is, the first convolutional subnet 221 is connected to the first fully connected subnet 231, the second convolutional subnet 222 is connected to the second fully connected subnet 232, and the i-th
  • the convolutional subnet 22i is connected to the i-th fully connected subnet 23i
  • the n-th convolutional subnet 22n is connected to the nth fully connected subnet 23n.
  • the backbone network 210 is used to perform convolution processing on the acquired first image to obtain M initialization images, and output the M initialization images to each of the N convolution subnets, M is an integer greater than or equal to 1.
  • the backbone network may include a neural network with image classification function.
  • VGG16 or ResNet50 may include at least one convolutional layer, select the first convolutional layer to the Kth convolutional layer in the backbone network, and use the first convolutional layer to the Kth convolutional layer to convolve the first image Product processing to obtain M initialization images, K is an integer greater than or equal to 1.
  • the parameter indicated by the weight of the 2-layer convolutional layer and the parameter indicated by the weight of the third-layer convolutional layer perform convolution processing on the first image to obtain M initialization images.
  • the parameters indicated by the weight include parameters related to the convolution kernel, such as the size of the convolution kernel, the number of moving steps of the convolution kernel, and the number of convolution kernels.
  • the characteristics of each initialization image in the M initialization images are different.
  • the value of M can be determined by the number of convolution kernels.
  • the characteristics of the M initialization images can be determined by the value of the convolution kernel.
  • the initialization image may be an image of the contour feature of the first image, an image of the relief feature of the first image, or an image of the sharp feature of the first image, or the like.
  • the high-resolution feature image can be retained to the greatest extent, which is beneficial to improve the accuracy of processing the image.
  • Each of the N convolution subnets performs convolution pooling processing on M initialization images to obtain images with different types of features.
  • each of the N convolutional subnets has a different structure.
  • Convolution pooling is performed on M initialization images through convolution subnets of different structures to obtain images with different types of features.
  • the i-th subnetwork comprises a convolution layer K i L i and the convolution layer layer pooled, K i, and L i are integers greater than or equal to 1.
  • the i-th convolutional subnet is used to perform Ki- th convolution processing and Li- th pooling processing on the M initialization images to obtain an image of the i-th type of feature.
  • the number of convolution layers included in different convolution subnets may be the same or different.
  • the number of pooling layers included in different convolutional subnets may be the same or different. Since different convolutional subnets contain different numbers of convolutional layers and pooling layers, different convolutional subnets are used to perform different times of convolutional pooling processing on the same image to obtain images with different types of features, thus, Maintaining a high-resolution convolution feature map effectively improves the accuracy of image processing.
  • the image obtained by the convolutional subnet convolution pooling process includes but is not limited to the following features, for example, detailed features, contour features, and directional features.
  • the convolutional subnet may perform convolution pooling processing on at least one area in the initialization image, and a feature image after the convolution pooling processing of at least one area may be obtained. Performing convolution pooling processing on at least one area in the initialization image can be considered as a different task.
  • the image output by the convolution subnet can be called a feature image, a convolution feature map, a convolution pooling feature map, or a type feature map.
  • Each of the N convolutional subnets performs convolution pooling on M initialization images to obtain images with different types of features (such as convolution feature maps), and then outputs images with different types of features To the fully connected subnet connected to the convolutional subnet.
  • Each of the N fully-connected sub-networks classifies the received feature image to obtain a classification result of the corresponding feature.
  • each of the N fully connected subnets has a different structure.
  • the received feature images are classified through the fully connected subnets of different structures to obtain the classification results of the corresponding features.
  • the i-th subnetwork comprises a fully-connected full connection layer R i, R i is an integer greater than or equal to 2.
  • the image of the i-th type of feature is output to the i-th fully connected subnet 23i.
  • the i-th fully connected subnet 23i is a fully connected subnet connected to the i-th convolutional subnet 22i.
  • the i-th fully connected subnet 23i is used to classify the image of the i-th feature to obtain the classification result of the i-th feature.
  • the number of fully connected layers included in different fully connected subnets may be the same or different. Since different fully connected subnets contain different numbers of fully connected layers, different feature images are classified through different fully connected subnets to obtain the classification results of corresponding features, thereby ensuring higher resolution convolution features The classification processing of the graph effectively improves the accuracy of the classification processing.
  • Fully connected subnets of different structures can be connected in parallel.
  • Fully connected subnets of different structures can process the received feature images in parallel.
  • the feature image received by the fully connected subnet includes but is not limited to the following features, for example, detailed features, contour features, and directional features.
  • the perceptual network provided by the embodiment of the present application includes multiple convolutional subnets and fully connected subnets connected to the convolutional subnet. Because each convolutional subnet and the fully connected subnet connected to the convolutional subnet The images of one feature type are processed together. Therefore, the data of different types of tasks can be trained separately to ensure the optimal output result of each task and effectively improve the accuracy of the output result.
  • the sensing network 300 includes a backbone network 310, a convolutional network 320, and a fully connected network 330.
  • the convolutional network 320 includes 3 convolutional subnets.
  • the fully connected network 330 includes 3 fully connected subnets.
  • the backbone network 310 is connected to three convolutional subnets.
  • Three convolutional subnets are connected to three fully connected subnets, that is, the first convolutional subnet 321 is connected to the first fully connected subnet 331, the second convolutional subnet 322 is connected to the second fully connected subnet 332, and the third The convolution subnet 323 is connected to the third fully connected subnet 333.
  • the second convolutional subnet 322 includes a convolutional layer and a pooling layer, that is, the second convolutional subnet 322 includes a third convolution Layer and the second pooling layer.
  • the third convolutional subnet 323 includes a 2-layer convolutional layer and a 1-layer pooling layer, that is, the third convolutional subnet 323 includes a fourth convolutional layer. Layer, fifth convolutional layer, and third pooling layer.
  • the whole first subnetwork 331 includes a first connector fully connected layer, the second layer and the third fully connected layers fully connected.
  • the second subnetwork 332 comprises a fully-connected fourth and fifth full connectivity layer fully connected.
  • the third subnetwork 333 includes a sixth fully-connected layers fully connected, the seventh and eighth layers fully connected layers fully connected.
  • the backbone network 310 is connected to the first convolutional layer, the first convolutional layer is connected to the second convolutional layer, the second convolutional layer is connected to the first pooling layer, and the first pooling layer is connected to the first fully connected layer.
  • the first fully connected layer is connected with the second fully connected layer, and the second fully connected layer is connected with the third fully connected layer.
  • the backbone network 310 is connected to the third convolutional layer, the third convolutional layer is connected to the second pooling layer, the second pooling layer is connected to the fourth fully connected layer, and the fourth fully connected layer is connected to the fifth fully connected layer.
  • the backbone network 310 is connected to the fourth convolutional layer, the fourth convolutional layer is connected to the third pooling layer, the third pooling layer is connected to the fifth convolutional layer, and the fifth convolutional layer is connected to the sixth fully connected layer.
  • the sixth fully connected layer is respectively connected to the seventh fully connected layer and the eighth fully connected layer.
  • the value T of the convolution kernel of the first convolution subnet may be 128, 256, or 512.
  • the value of the convolution kernel of the second convolution subnet may be 0.5*T.
  • the value of the convolution kernel of the third convolution subnet can be 2*T.
  • the dimension value of the first fully connected layer is 1024, 2048 or 4096; the dimension value of the second fully connected layer is 1024, 2048 or 4096; the dimension value of the third fully connected layer is 1024, 2048 or 4096; The dimension value of the fourth fully connected layer is 1024, 2048 or 4096; the dimension value of the fifth fully connected layer is 1024, 2048 or 4096; the dimension value of the sixth fully connected layer is 1024, 2048 or 4096; the dimension value of the seventh fully connected layer The dimension value is 1024, 2048, or 4096; the dimension value of the eighth fully connected layer is 1024, 2048, or 4096.
  • the backbone network 310 is used to perform convolution processing on the acquired first image to obtain M initialization images, and output the M initialization images to each of the 3 convolution subnets, where M is greater than or An integer equal to 1.
  • the first convolution subnet 321 is used to perform convolution pooling processing on M initialization images to obtain images with detailed features.
  • the first fully connected subnet 331 is used to classify the image of the detailed feature to obtain the classification result of the detailed feature.
  • the detailed feature image may be the detailed feature image of the initialization image, or the detailed feature image of a region in the initialization image.
  • the initialization image may be a face image
  • the left eye, right eye, nose, eyebrows, and mouth may be a region in the initialization image.
  • the first convolution subnet 321 may perform convolution pooling processing on the left eye part in the initialization image to obtain an image of the details of the left eye part.
  • the first fully connected subnet 331 may perform classification processing on the image of the detailed feature of the left eye to obtain the classification result of the detailed feature of the left eye.
  • the first convolution subnet 321 may perform convolution pooling processing on the right-eye part in the initialization image to obtain an image with detailed features of the right-eye part.
  • the first fully connected subnet 331 may perform classification processing on the image of the detailed feature of the right eye to obtain the classification result of the detailed feature of the right eye.
  • the first convolution subnet 321 may perform convolution pooling processing on the nose part in the initialization image to obtain an image with detailed features of the nose part.
  • the first fully connected subnet 331 may perform classification processing on the image of the detailed features of the nose to obtain the classification result of the detailed features of the nose.
  • the first convolution subnet 321 may perform convolution pooling processing on the eyebrow parts in the initialization image to obtain an image of the detailed features of the eyebrow parts.
  • the first fully connected subnet 331 may perform classification processing on the image of the detailed features of the eyebrows to obtain a classification result of the detailed features of the eyebrows.
  • the first convolution subnet 321 may perform convolution pooling processing on the mouth part in the initialization image to obtain an image with detailed features of the mouth part.
  • the first fully connected subnet 331 may perform classification processing on the image of the detailed feature of the mouth to obtain the classification result of the detailed feature of the mouth.
  • the second convolution subnet 322 is used to perform convolution pooling processing on the M initialization images to obtain contour feature images.
  • the second fully connected subnet 332 is used to classify the contour feature image to obtain the contour feature classification result.
  • the initialization image when the perception network 300 performs face recognition, may be a face image.
  • the second convolution subnet 322 may perform convolution pooling processing on the facial contour area to obtain an image with facial contour features.
  • the second fully connected subnet 332 may perform classification processing on the image of the facial contour feature to obtain the classification result of the facial contour feature.
  • the third convolution subnet 323 is used to perform convolution pooling processing on M initialization images to obtain images with directional characteristics.
  • the third fully connected subnet 333 is used to classify images with directional features to obtain a classification result of directional features.
  • the initialization image when the perception network 300 performs face recognition, may be a face image.
  • the third convolution subnet 323 may perform convolution pooling processing on the image of the face direction feature to obtain the image of the face direction feature.
  • the third fully connected subnet 333 may perform classification processing on the image of the facial orientation feature to obtain the classification result of the facial orientation feature.
  • the classification result of the directional feature may be jointly determined according to the classification result of the seventh fully connected layer and the classification result of the eighth fully connected layer.
  • the directional feature may be represented by a pitch angle (pitch), a yaw angle (yaw), and a roll angle (roll) in right-hand Cartesian coordinates in a three-dimensional space.
  • pitch angle is rotated around the X axis.
  • yaw angle rotates around the Y axis.
  • roll angle rotates around the Z axis.
  • convolutional pooling processing and classification are performed on the detailed features of different regions of the face, the contour features of the face, and the direction features of the face. Processing can obtain high-precision key points and postures of the face, which effectively improves the accuracy of image processing. The smaller the mean error, the higher the accuracy and the better the effect.
  • FIG. 6 is a flowchart of an image processing method provided by an embodiment of the application.
  • the method is applied to a perceptual network.
  • the perceptual network includes a backbone network, a convolutional network, and a fully connected network.
  • the convolutional network includes N convolutional subnets
  • the fully connected network includes N fully connected subnets
  • the backbone network and N convolutions Subnet connection N convolutional subnets are connected to N fully connected subnets
  • the i-th convolutional subnet is connected to the i-th fully connected subnet
  • N is an integer greater than or equal to 2
  • i is an integer, i ⁇ [1 ,N].
  • the method may include:
  • the backbone network performs convolution processing on the first image to obtain M initialization images.
  • the backbone network includes at least one convolutional layer.
  • the backbone network may perform convolution processing on the first image according to the first convolutional layer to the Kth convolutional layer to obtain M initialization images, where K is an integer greater than or equal to 1. For example, the value of K is 3 or 5. Therefore, by selecting the first N convolutional layers to perform convolution processing on the first image, the high-resolution feature image can be retained to the greatest extent, which is beneficial to improve the accuracy of processing the image.
  • the backbone network outputs M initialization images to each of the N convolution subnets.
  • Each of the M initialization images has different characteristics, and M is an integer greater than or equal to 1.
  • the i-th convolutional subnet performs convolution pooling processing on the M initialization images to obtain an image of the i-th type of feature.
  • the i-th subnetwork comprises a convolution layer K i L i and the convolution layer layer layer pooled, K i, and L i are integers greater than or equal to 1.
  • the i-th convolutional subnet performs Ki- th convolution processing and Li- th pooling processing on the M initialization images to obtain the image of the i-th type of feature. Since different convolutional subnets contain different numbers of convolutional layers and pooling layers, different convolutional subnets are used to perform different times of convolutional pooling processing on the same image to obtain images with different types of features, thus, Maintaining a high-resolution convolution feature map effectively improves the accuracy of image processing.
  • the i-th convolutional subnet outputs the image of the i-th type of feature to the i-th fully connected subnet.
  • the i-th fully connected subnet performs classification processing on the image of the i-th type of feature to obtain a classification result of the i-th type of feature.
  • the i-th subnetwork comprises a fully-connected full connection layer R i, R i is an integer greater than or equal to 2. Since different fully connected subnets contain different numbers of fully connected layers, different feature images are classified through different fully connected subnets to obtain the classification results of corresponding features, thereby ensuring higher resolution convolution features The classification processing of the graph effectively improves the accuracy of the classification processing.
  • the image processing apparatus includes corresponding hardware structures and/or software modules for performing various functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.
  • Figures 7 and 8 are schematic structural diagrams of possible image processing apparatuses provided by embodiments of the application. These image processing devices can be used to implement the functions of the backbone network, convolutional network, and fully connected network in the foregoing method embodiments, and therefore, can also achieve the beneficial effects of the foregoing method embodiments.
  • the image processing apparatus may be a terminal device, or a module (such as a chip) applied to a terminal device.
  • the terminal device can be a device that performs image processing, face recognition, image classification, and so on.
  • wearable devices augmented reality (AR) devices, or virtual reality (VR) devices.
  • AR augmented reality
  • VR virtual reality
  • Wearable devices can also be called wearable smart devices. It is a general term for the application of wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, watches, clothing and shoes.
  • a wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-sized, complete or partial functions that can be achieved without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to cooperate with other devices such as smart phones. Use, such as all kinds of smart bracelets and smart jewelry for physical sign monitoring.
  • the image processing apparatus 700 includes a processing unit 710 and a transceiving unit 720.
  • the image processing device 700 is used to implement the functions of the backbone network, the convolutional network, and the fully connected network in the method embodiment shown in FIG. 6 above.
  • the transceiver unit 720 is used to receive the first image, and output M initialization images to each of the N convolutional subnets.
  • the processing unit 710 is configured to perform convolution processing on the first image to obtain M initialization images, that is, S601 is executed.
  • the transceiver unit 720 is used to receive M initialization images, and output the image of the i-th type of feature to the i-th fully connected sub Net, that is, execute S604;
  • the processing unit 710 is configured to perform convolution pooling processing on the M initialization images to obtain an image of the i-th type of feature, that is, execute S603.
  • the transceiver unit 720 is used to receive the image of the i-th type of feature; the processing unit 710 is used to perform the image processing on the image of the i-th type of feature The classification is processed to obtain the classification result of the i-th feature.
  • processing unit 710 and the transceiving unit 720 can be obtained directly with reference to the relevant description in the method embodiment shown in FIG. 6, and will not be repeated here.
  • the image processing device 800 includes a processor 810 and an interface circuit 820.
  • the processor 810 and the interface circuit 820 are coupled to each other.
  • the interface circuit 820 may be a transceiver or an input/output interface.
  • the image processing apparatus 800 may further include a memory 830 for storing instructions executed by the processor 810 or storing input data required by the processor 810 to run the instructions or storing data generated after the processor 810 runs the instructions.
  • the processor 810 is used to perform the function of the above-mentioned processing unit 710, and the interface circuit 820 is used to perform the function of the above-mentioned transceiving unit 720.
  • the terminal device chip When the foregoing image processing apparatus is a chip applied to a terminal device, the terminal device chip implements the function of the terminal device in the foregoing method embodiment.
  • the terminal device chip receives information from other modules in the terminal device (such as a network card, port, or camera), and the information is image information.
  • the processor in the embodiment of the present application may be a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), or other general-purpose processors and digital signal processors.
  • Central Processing Unit CPU
  • graphics processing unit Graphics Processing Unit, GPU
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application can be implemented by hardware, and can also be implemented by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read-Only Memory, ROM), and programmable read-only memory (Programmable ROM). , PROM), Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or well-known in the art Any other form of storage medium.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist as discrete components in the network device or the terminal device.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instruction may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program or instruction may be downloaded from a website, computer, The server or data center transmits to another website site, computer, server or data center through wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that integrates one or more available media.
  • the usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); and it may also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).
  • “at least one” refers to one or more, and “multiple” refers to two or more.
  • “And/or” describes the association relationship of the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are an “or” relationship; in the formula of this application, the character “/” indicates that the associated objects before and after are a kind of "division” Relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种感知网络及图像处理方法,涉及图像处理,解决了共用一个全连接层时导致多任务的输出结果的精度较低的问题。所述感知网络包括主干网络、卷积网络和全连接网络,卷积网络包括N个卷积子网,全连接网络包括N个全连接子网,主干网络与N个卷积子网连接,N个卷积子网与N个全连接子网连接,第i卷积子网连接第i全连接子网,N为大于或等于2的整数,i为整数,i∈[1,N]。首先,通过主干网络先对第一图像进行卷积处理得到M个初始化图像,再通过第i卷积子网对M个初始化图像进行卷积池化处理得到第i类特征的图像,第i全连接子网再对第i类特征的图像进行分类处理得到第i类特征的分类结果。

Description

一种感知网络及图像处理方法 技术领域
本申请实施例涉及图像处理领域,尤其涉及一种感知网络及图像处理方法。
背景技术
在机器学习中,卷积神经网络(Convolutional Neural Network,CNN)是一种深度前馈人工神经网络,已广泛地应用于图像处理和视觉任务中,比如人脸检测、人脸识别、图像分类、自动驾驶和医疗影像诊断等。
目前,可以根据任务类型在深度神经网络的全连接层(fully connected layer,FC layer)设置多种类型的输出层,输出不同类型的任务的结果。如图1所示,为多任务神经网络的架构示意图。但是,不同类型的任务的优化目标特性差异大,共用一个全连接层对不同类型的任务的数据进行训练,无法保证每个任务的输出结果最优,输出结果的精度较低。
发明内容
本申请提供一种感知网络及图像处理方法,解决了共用一个全连接层时导致多任务的输出结果的精度较低的问题。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供了一种感知网络,该感知网络包括主干网络、卷积网络和全连接网络,卷积网络包括N个卷积子网,全连接网络包括N个全连接子网,主干网络与N个卷积子网连接,N个卷积子网与N个全连接子网连接,第i卷积子网连接第i全连接子网,N为大于或等于2的整数,i为整数,i∈[1,N]。其中,所述主干网络用于对第一图像进行卷积处理以得到M个初始化图像,并将M个初始化图像输出至N个卷积子网中的每个卷积子网,M个初始化图像中每个初始化图像的特征不同,M为大于或等于1的整数;所述第i卷积子网用于对M个初始化图像进行卷积池化处理以得到第i类特征的图像,并将第i类特征的图像输出至第i全连接子网;所述第i全连接子网用于对第i类特征的图像进行分类处理以得到第i类特征的分类结果。
本申请提供的感知网络包括了多个卷积子网,以及与卷积子网连接的全连接子网,由于每个卷积子网和与该卷积子网连接的全连接子网共同处理一种特征类型的图像,因此,对于不同类型的任务的数据可以分别进行训练,保证了每个任务的输出结果最优,有效地提高了输出结果的精度。
在一种可能的实现方式中,主干网络包括至少1层卷积层(convolutional layer),主干网络用于根据第1层卷积层至第K层卷积层对第一图像进行卷积处理以得到M个初始化图像,K为大于或等于1的整数。示例的,K的取值为3或5。通过选取前N层卷积层对第一图像进行卷积处理,能够最大程度地保留高分辨率的特征图像,有利于提高处理图像的精度。
在另一种可能的实现方式中,第i卷积子网包括K i层卷积层和L i层池化层(pooling  layer),K i和L i均为大于或等于1的整数;第i卷积子网用于对M个初始化图像进行K i次卷积处理和L i次池化处理以得到第i类特征的图像。由于不同的卷积子网包含了不同个数的卷积层和池化层,通过不同卷积子网对相同的图像进行不同次数的卷积池化处理以得到不同类型特征的图像,从而,保持较高分辨率的卷积特征图,有效地提高处理图像的精度。
在另一种可能的实现方式中,第i全连接子网包括R i个全连接层,R i为大于或等于2的整数。由于不同的全连接子网包含了不同个数的全连接层,通过不同的全连接子网对不同特征图像进行分类处理以得到相应特征的分类结果,从而,确保较高分辨率的卷积特征图的分类处理,有效地提高了分类处理的精度。
示例的,N=3,第一卷积子网包括第一卷积层、第二卷积层和第一池化层,第二卷积子网包括第三卷积层和第二池化层,第三卷积子网包括第四卷积层、第五卷积层和第三池化层,第一全连接子网包括第一全连接层、第二全连接层和第三全连接层,第二全连接子网包括第四全连接层和第五全连接层,第三全连接子网包括第六全连接层、第七全连接层和第八全连接层;其中,主干网络与第一卷积层连接,第一卷积层与第二卷积层连接,第二卷积层与第一池化层连接,第一池化层与第一全连接层连接,第一全连接层与第二全连接层连接,第二全连接层与第三全连接层连接;主干网络与第三卷积层连接,第三卷积层与第二池化层连接,第二池化层与第四全连接层连接,第四全连接层与第五全连接层连接;主干网络与第四卷积层连接,第四卷积层与第三池化层连接,第三池化层与第五卷积层连接,第五卷积层与第六全连接层连接,第六全连接层分别与第七全连接层和第八全连接层连接。其中,第一卷积子网的卷积核的取值T为128、256或512;第二卷积子网的卷积核的取值为0.5*T;第三卷积子网的卷积核的取值为2*T;第一全连接层的维度值为1024、2048或4096;第二全连接层的维度值为1024、2048或4096;第三全连接层的维度值为1024、2048或4096;第四全连接层的维度值为1024、2048或4096;第五全连接层的维度值为1024、2048或4096;第六全连接层的维度值为1024、2048或4096;第七全连接层的维度值为1024、2048或4096;第八全连接层的维度值为1024、2048或4096。
第二方面,本申请提供了一种图像处理方法,所述方法可应用于终端设备,或者该方法可应用于可以支持终端设备实现该方法的通信装置,例如该通信装置包括芯片系统。该终端设备上设置有感知网络,该感知网络包括主干网络、卷积网络和全连接网络,卷积网络包括N个卷积子网,全连接网络包括N个全连接子网,主干网络与N个卷积子网连接,N个卷积子网与N个全连接子网连接,第i卷积子网连接第i全连接子网,N为大于或等于2的整数,i为整数,i∈[1,N]。所述方法包括:首先,通过主干网络对第一图像进行卷积处理以得到M个初始化图像,并将M个初始化图像输出至N个卷积子网中的每个卷积子网,M个初始化图像中每个初始化图像的特征不同,M为大于或等于1的整数。每个卷积子网处理M个初始化图像可以参考如下对第i卷积子网的阐述。第i卷积子网对M个初始化图像进行卷积池化处理以得到第i类特征的图像,并将第i类特征的图像输出至第i全连接子网;第i全连接子网对第i类特征的图像进行分类处理以得到第i类特征的分类结果。
本申请提供的感知网络包括了多个卷积子网,以及与卷积子网连接的全连接子网, 由于每个卷积子网和与该卷积子网连接的全连接子网共同处理一种特征类型的图像,因此,对于不同类型的任务的数据可以分别进行训练,保证了每个任务的输出结果最优,有效地提高了输出结果的精度。
在一种可能的实现方式中,主干网络包括至少1层卷积层,主干网络对第一图像进行卷积处理以得到M个初始化图像,包括:根据第1层卷积层至第K层卷积层对第一图像进行卷积处理以得到M个初始化图像,K为大于或等于1的整数。示例的,K的取值为3或5。通过选取前N层卷积层对第一图像进行卷积处理,能够最大程度地保留高分辨率的特征图像,有利于提高处理图像的精度。
在另一种可能的实现方式中,第i卷积子网包括K i层卷积层和L i层池化层,K i和L i均为大于或等于1的整数;第i卷积子网对M个初始化图像进行卷积池化处理以得到第i类特征的图像,包括:第i卷积子网对M个初始化图像进行K i次卷积处理和L i次池化处理以得到第i类特征的图像;第i全连接子网包括R i个全连接层,R i为大于或等于2的整数。通过多个卷积层对初始化图像进行卷积处理,从而,保持较高分辨率的卷积特征图,有利于提高处理图像的精度。
示例的,N=3,第一卷积子网包括第一卷积层、第二卷积层和第一池化层,第二卷积子网包括第三卷积层和第二池化层,第三卷积子网包括第四卷积层、第五卷积层和第三池化层,第一全连接子网包括第一全连接层、第二全连接层和第三全连接层,第二全连接子网包括第四全连接层和第五全连接层,第三全连接子网包括第六全连接层、第七全连接层和第八全连接层;其中,主干网络与第一卷积层连接,第一卷积层与第二卷积层连接,第二卷积层与第一池化层连接,第一池化层与第一全连接层连接,第一全连接层与第二全连接层连接,第二全连接层与第三全连接层连接;主干网络与第三卷积层连接,第三卷积层与第二池化层连接,第二池化层与第四全连接层连接,第四全连接层与第五全连接层连接;主干网络与第四卷积层连接,第四卷积层与第三池化层连接,第三池化层与第五卷积层连接,第五卷积层与第六全连接层连接,第六全连接层分别与第七全连接层和第八全连接层连接。
其中,第一卷积子网的卷积核的取值T为128、256或512;第二卷积子网的卷积核的取值为0.5*T;第三卷积子网的卷积核的取值为2*T;第一全连接层的维度值为1024、2048或4096,第二全连接层的维度值为1024、2048或4096;第四全连接层的维度值为1024、2048或4096,第五全连接层的维度值为1024、2048或4096;第六全连接层的维度值为1024、2048或4096,第七全连接层的维度值为1024、2048或4096。
第三方面,本申请实施例还提供了一种通信装置,有益效果可以参见第二方面的描述此处不再赘述。所述通信装置具有实现上述第二方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述通信装置包括:收发单元和处理单元。所述处理单元,用于获取第一图像。所述处理单元,用于通过主干网络对第一图像进行卷积处理以得到M个初始化图像,并将M个初始化图像输出至N个卷积子网中的每个卷积子网,M个初始化图像中每个初始化图像的特征不同,M为大于或等于1的整数,所述处理单元,还用于通过第i卷积子网对M个初始化图像进行卷积池化处理以得到第i类特征的图像,并将第i类特征的图像输出至第i全连 接子网;所述处理单元,还用于通过第i全连接子网对第i类特征的图像进行分类处理以得到第i类特征的分类结果。
第四方面,提供了一种通信装置,该通信装置可以为上述方法实施例中的终端设备,或者为设置在终端设备中的芯片。该通信装置包括通信接口以及处理器,可选的,还包括存储器。其中,该存储器用于存储计算机程序或指令,处理器与存储器、通信接口耦合,当处理器执行所述计算机程序或指令时,使通信装置执行上述方法实施例中由终端设备所执行的方法。
第五方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码并运行时,使得上述各方面中由终端设备执行的方法被执行。
第六方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于实现上述各方面的方法中终端设备的功能。在一种可能的设计中,所述芯片系统还包括存储器,用于保存程序指令和/或数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
第七方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,当该计算机程序被运行时,实现上述各方面中由终端设备执行的方法。
本申请中,终端设备和通信装置的名字对设备本身不构成限定,在实际实现中,这些设备可以以其他名称出现。只要各个设备的功能和本申请类似,属于本申请权利要求及其等同技术的范围之内。
附图说明
图1为一实施例提供的多任务神经网络的架构示意图;
图2为一实施例提供的感知网络的架构示意图;
图3为一实施例提供的感知网络的架构示意图;
图4为一实施例提供的俯仰角、偏航角和翻滚角的示意图;
图5为一实施例提供的一种图像处理的结果示意图;
图6为一实施例提供的一种图像处理方法的流程图;
图7为一实施例提供的一种图像处理装置的组成示意图;
图8为一实施例提供的一种图像处理装置的组成示意图。
具体实施方式
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍:
卷积神经网络包括一维卷积神经网络、二维卷积神经网络以及三维卷积神经网络。一维卷积神经网络常应用于序列类的数据处理。二维卷积神经网络常应用于图像类文本的识别。三维卷积神经网络主要应用于医学图像以及视频类数据识别。卷积神经网 络包括数据输入层(Input layer)、卷积层、激励层(ReLU layer)、池化层和全连接层。
1、数据输入层
数据输入层主要是对原始图像的数据(如:像素值)进行预处理。预处理可以包括去均值、归一化和主成分分析(principal component analysis,PCA)/白化。
去均值是将输入数据的各个维度均中心化为0,其目的就是将样本的中心拉回到坐标系原点上。
归一化是将幅度归一化到同样的范围,即减少各维度数据取值范围的差异而带来的干扰。比如,有两个维度的特征A和B,A范围是0到10,而B范围是0到10000,归一化是将A和B的数据都变为0到1的范围。
主成分分析是在利用降维的思想,将多指标转化为少数几个综合指标。白化是对数据各个特征轴上的幅度归一化。
2、卷积层
卷积层是卷积神经网络最重要的一个层次,也是“卷积神经网络”的名字来源。卷积运算的目的是提取原始图像的不同特征。
在一些实施例中,卷积神经网络可以包括多个卷积层。第一层卷积层可能只能提取一些低级的特征如边缘、线条和角等层级。更多层的卷积层能从低级特征中迭代提取更复杂的特征。
卷积层可以包括至少一个卷积核。通过卷积核获取原始图像的特征。例如,可以利用卷积核在原始图像上滑动,将卷积核内的参数与原始图像的像素值进行卷积运算获得原始图像的特征。
卷积核的参数可以是通过反向传播算法最佳化得到的。
卷积核的尺寸越大,得到的图像的细节越少,卷积层输出的图像的尺寸也更小。反之,卷积核的尺寸越小,得到的图像的细节越多,卷积层输出的图像的尺寸也更大。
卷积核滑动的幅度可以根据需求设置。例如,卷积核滑动的幅度为1或2。卷积核滑动的幅度越大,卷积层输出的图像包含的图像特征越少。
卷积核的个数也可以称为深度。卷积核的个数决定了卷积层输出的图像的个数。卷积核的参数不同,卷积层输出的图像包含的图像特征不同。
3、激励层
激励层是将卷积层输出结果做非线性映射。在一些实施例中,卷积神经网络采用的激励函数一般为修正线性单元(The Rectified Linear Unit,ReLU)。ReLU的特点是收敛快,求梯度简单,但较脆弱。
4.池化层
池化层位于连续的卷积层中间。池化层用于压缩数据和参数的量,减小过拟合。简而言之,如果输入是图像的话,那么池化层的最主要作用就是压缩图像。池化层具有以下优点:
(1)特征不变性:在图像处理中经常提到的特征的尺度不变性,池化操作就是对图像的大小(resize)的操作。一张狗的图像被缩小了一倍后,还可以认为这是一张狗的照片,这说明这张图像中仍保留着狗最重要的特征,图像压缩时去掉的信息只是一 些无关紧要的信息,而留下的信息则是具有尺度不变性的特征,是最能表达图像的特征。
(2)特征降维:一幅图像含有的信息是很大的,特征也很多,但是有些信息对于做图像任务时没有太多用途或者有重复,可以将这类冗余信息去除,将最重要的特征抽取出来,这也是池化操作的一大作用。
(3)在一定程度上防止过拟合,更方便优化。
池化层的方法包括最大池化(Max pooling)和平均池化(average pooling)。最大池化为常用的池化方法。
5、全连接层
全连接是指下一层的每个神经元与上一层的神经元全部连接。两层之间所有神经元都有权重连接。全连接层和输出层对池化层输出的图像进行分类,得到分类结果。
关于卷积神经网络的具体的计算过程可以参考现有技术的阐述,不予赘述。
为了解决共用一个全连接层时导致多任务的输出结果的精度较低问题,本申请实施例提供了一种感知网络,该感知网络包括了多个卷积子网,以及与卷积子网连接的全连接子网,由于每个卷积子网和与该卷积子网连接的全连接子网共同处理一种特征类型的图像,因此,对于不同类型的任务的数据可以分别进行训练,保证了每个任务的输出结果最优,有效地提高了输出结果的精度。
接下来,结合附图对本申请实施例的实施方式进行详细描述。
图2为一实施例提供的感知网络的架构示意图。如图2所示,该感知网络200包括主干网络210、卷积网络220和全连接网络230。卷积网络220包括N个卷积子网(如图2中所示的第一卷积子网221至第n卷积子网22n),N为大于或等于2的整数。全连接网络230包括N个全连接子网(如图2中所示的第一全连接子网231至第n全连接子网23n)。主干网络210分别与N个卷积子网连接。N个卷积子网与N个全连接子网连接,即第一卷积子网221连接第一全连接子网231,第二卷积子网222连接第二全连接子网232,第i卷积子网22i连接第i全连接子网23i,第n卷积子网22n连接第n全连接子网23n。
其中,所述主干网络210用于对获取到的第一图像进行卷积处理以得到M个初始化图像,并将M个初始化图像输出至N个卷积子网中的每个卷积子网,M为大于或等于1的整数。
在一些实施例中,主干网络可以包含具备图像分类功能的神经网络。例如,VGG16或ResNet50。主干网络可以包括至少1层卷积层,选取主干网络中的第1层卷积层至第K层卷积层,利用第1层卷积层至第K层卷积层对第一图像进行卷积处理以得到M个初始化图像,K为大于或等于1的整数。
示例的,当K=3时,选取主干网络中的第1层卷积层、第2层卷积层和第3层卷积层,利用第1层卷积层的权值指示的参数、第2层卷积层的权值指示的参数和第3层卷积层的权值指示的参数对第一图像进行卷积处理以得到M个初始化图像。当K=5时,选取主干网络中的第1层卷积层、第2层卷积层、第3层卷积层、第4层卷积层和第5层卷积层,利用第1层卷积层的权值指示的参数、第2层卷积层的权值指示的参数、第3层卷积层的权值指示的参数、第4层卷积层的权值指示的参数和第5层卷 积层的权值指示的参数对第一图像进行卷积处理以得到M个初始化图像。
需要说明的是,权值指示的参数包括与卷积核相关的参数,如卷积核的大小、卷积核的移动步数和卷积核的个数。M个初始化图像中每个初始化图像的特征不同。M的取值可以由卷积核的个数确定。M个初始化图像的特征可以由卷积核的取值确定。例如,初始化图像可以第一图像的轮廓特征的图像、第一图像的浮雕特征的图像或第一图像的锐化特征的图像等。
从而,通过选取前N层卷积层对第一图像进行卷积处理,能够最大程度地保留高分辨率的特征图像,有利于提高处理图像的精度。
N个卷积子网中的每个卷积子网均对M个初始化图像进行卷积池化处理得到不同类型特征的图像。示例的,所述第i卷积子网22i用于对M个初始化图像进行卷积池化处理以得到第i类特征的图像,i为整数,i∈[1,N]。可理解的,当i取1至N中不同的值时,表示N个卷积子网中的不同卷积子网。例如,当i=1时,表示第一卷积子网;当i=2时,表示第二卷积子网;当i=3时,表示第三卷积子网;当i=n时,表示第n卷积子网。
在一些实施例中,N个卷积子网中的每个卷积子网具有不同的结构。通过不同结构的卷积子网分别对M个初始化图像进行卷积池化处理以得到不同类型特征的图像。
示例的,第i卷积子网包括K i层卷积层和L i层池化层,K i和L i均为大于或等于1的整数。第i卷积子网用于对M个初始化图像进行K i次卷积处理和L i次池化处理以得到第i类特征的图像。
可选的,当i取1至N中不同的值时,不同卷积子网包括的卷积层的个数可以相同也可以不同。同理,当i取1至N中不同的值时,不同卷积子网包括的池化层的个数可以相同也可以不同。由于不同的卷积子网包含了不同个数的卷积层和池化层,通过不同卷积子网对相同的图像进行不同次数的卷积池化处理以得到不同类型特征的图像,从而,保持较高分辨率的卷积特征图,有效地提高处理图像的精度。
可选的,经过卷积子网卷积池化处理得到的图像的包括但不限于以下特征,例如,细节特征、轮廓特征和方向特征。
可理解的,不同结构的卷积子网分别对M个初始化图像进行卷积池化处理可以认为是不同的任务。
在另一些实施例中,卷积子网可以对初始化图像中的至少一个区域进行卷积池化处理,可以得到至少一个区域的卷积池化处理后的特征图像。对初始化图像中的至少一个区域进行卷积池化处理可以认为是不同的任务。
在本文中,卷积子网输出的图像可以称为特征图像、卷积特征图、卷积池化特征图或类型特征图。
在N个卷积子网中的每个卷积子网均对M个初始化图像进行卷积池化处理得到不同类型特征的图像(如:卷积特征图)后,将不同类型特征的图像输出至与卷积子网相连接的全连接子网。在N个全连接子网中的每个全连接子网对接收到的特征图像进行分类处理以得到相应特征的分类结果。
示例的,所述第i全连接子网23i用于对第i类特征的图像进行分类处理以得到第i类特征的分类结果,i为整数,i∈[1,N]。可理解的,当i取1至N中不同的值时,表 示N个全连接子网中的不同全连接子网。例如,当i=1时,表示第一全连接子网;当i=2时,表示第二全连接子网;当i=3时,表示第三卷积子网;当i=n时,表示第n全连接子网。
在一些实施例中,N个全连接子网中的每个全连接子网具有不同的结构。通过不同结构的全连接子网分别对接收到的特征图像进行分类处理以得到相应特征的分类结果。
示例的,第i全连接子网包括R i个全连接层,R i为大于或等于2的整数。在第i卷积子网对M个初始化图像进行卷积池化处理以得到第i类特征的图像后,将第i类特征的图像输出至第i全连接子网23i。第i全连接子网23i为与第i卷积子网22i相连接的全连接子网。第i全连接子网23i用于对第i类特征的图像进行分类处理以得到第i类特征的分类结果。
可选的,当i取1至N中不同的值时,不同全连接子网包括的全连接层的个数可以相同也可以不同。由于不同的全连接子网包含了不同个数的全连接层,通过不同的全连接子网对不同特征图像进行分类处理以得到相应特征的分类结果,从而,确保较高分辨率的卷积特征图的分类处理,有效地提高了分类处理的精度。
可选的,不同结构的全连接子网可以并行连接。不同结构的全连接子网可以并行处理接收到的特征图像。
可选的,全连接子网接收到的特征图像的包括但不限于以下特征,例如,细节特征、轮廓特征和方向特征。
可理解的,不同结构的全连接子网分别对接收到的特征图像进行分类处理可以认为是不同的任务。
需要说明的是,本申请实施例提供的图像处理过程中的卷积处理、池化处理和分类处理的具体计算过程可以参考现有技术的阐述,不予赘述。本申请实施例提供的感知网络包括了多个卷积子网,以及与卷积子网连接的全连接子网,由于每个卷积子网和与该卷积子网连接的全连接子网共同处理一种特征类型的图像,因此,对于不同类型的任务的数据可以分别进行训练,保证了每个任务的输出结果最优,有效地提高了输出结果的精度。
下面通过具体示例对感知网络进行举例说明。如图3所示,为一实施例提供的感知网络的架构示意图。假设N=3,感知网络300包括主干网络310、卷积网络320和全连接网络330。卷积网络320包括3个卷积子网。全连接网络330包括3个全连接子网。
主干网络310与3个卷积子网连接。3个卷积子网与3个全连接子网连接,即第一卷积子网321连接第一全连接子网331,第二卷积子网322连接第二全连接子网332,第三卷积子网323连接第三全连接子网333。
在一些实施例中,当i=1时,假设K i=2,L i=1,第一卷积子网321包括2层卷积层和1层池化层,即第一卷积子网321包括第一卷积层、第二卷积层和第一池化层。
当i=2时,假设K i=1,L i=1,第二卷积子网322包括1层卷积层和1层池化层,即第二卷积子网322包括第三卷积层和第二池化层。
当i=3时,假设K i=2,L i=1,第三卷积子网323包括2层卷积层和1层池化层, 即第三卷积子网323包括第四卷积层、第五卷积层和第三池化层。
当i=1时,假设R i=3,第一全连接子网331包括第一全连接层、第二全连接层和第三全连接层。
当i=2时,假设R i=2,第二全连接子网332包括第四全连接层和第五全连接层。
当i=3时,假设R i=3,第三全连接子网333包括第六全连接层、第七全连接层和第八全连接层。
其中,主干网络310与第一卷积层连接,第一卷积层与第二卷积层连接,第二卷积层与第一池化层连接,第一池化层与第一全连接层连接,第一全连接层与第二全连接层连接,第二全连接层与第三全连接层连接。
主干网络310与第三卷积层连接,第三卷积层与第二池化层连接,第二池化层与第四全连接层连接,第四全连接层与第五全连接层连接。
主干网络310与第四卷积层连接,第四卷积层与第三池化层连接,第三池化层与第五卷积层连接,第五卷积层与第六全连接层连接,第六全连接层分别与第七全连接层和第八全连接层连接。
可选的,第一卷积子网的卷积核的取值T可以为128、256或512。第二卷积子网的卷积核的取值可以为0.5*T。第三卷积子网的卷积核的取值可以为2*T。
可选的,第一全连接层的维度值为1024、2048或4096;第二全连接层的维度值为1024、2048或4096;第三全连接层的维度值为1024、2048或4096;第四全连接层的维度值为1024、2048或4096;第五全连接层的维度值为1024、2048或4096;第六全连接层的维度值为1024、2048或4096;第七全连接层的维度值为1024、2048或4096;第八全连接层的维度值为1024、2048或4096。
主干网络310用于对获取到的第一图像进行卷积处理以得到M个初始化图像,并将M个初始化图像输出至3个卷积子网中的每个卷积子网,M为大于或等于1的整数。
第一卷积子网321用于对M个初始化图像进行卷积池化处理以得到细节特征的图像。
第一全连接子网331用于对细节特征的图像进行分类处理以得到细节特征的分类结果。
在一些实施例中,细节特征的图像可以是初始化图像的细节特征图像,也可以是初始化图像中的一个区域的细节特征图像。例如,感知网络300进行人脸识别时,初始化图像可以是一个人脸图像,左眼部位、右眼部位、鼻子部位、眉毛部位和嘴部位可以是初始化图像中的一个区域。
第一卷积子网321可以对初始化图像中的左眼部位进行卷积池化处理以得到左眼部位的细节特征的图像。第一全连接子网331可以对左眼部位的细节特征的图像进行分类处理以得到左眼部位的细节特征的分类结果。
第一卷积子网321可以对初始化图像中的右眼部位进行卷积池化处理以得到右眼部位的细节特征的图像。第一全连接子网331可以对右眼部位的细节特征的图像进行分类处理以得到右眼部位的细节特征的分类结果。
第一卷积子网321可以对初始化图像中的鼻子部位进行卷积池化处理以得到鼻子部位的细节特征的图像。第一全连接子网331可以对鼻子部位的细节特征的图像进行 分类处理以得到鼻子部位的细节特征的分类结果。
第一卷积子网321可以对初始化图像中的眉毛部位进行卷积池化处理以得到眉毛部位的细节特征的图像。第一全连接子网331可以对眉毛部位的细节特征的图像进行分类处理以得到眉毛部位的细节特征的分类结果。
第一卷积子网321可以对初始化图像中的嘴部位进行卷积池化处理以得到嘴部位的细节特征的图像。第一全连接子网331可以对嘴部位的细节特征的图像进行分类处理以得到嘴部位的细节特征的分类结果。
第二卷积子网322用于对M个初始化图像进行卷积池化处理以得到轮廓特征的图像。
第二全连接子网332用于对轮廓特征的图像进行分类处理以得到轮廓特征的分类结果。
在一些实施例中,感知网络300进行人脸识别时,初始化图像可以是一个人脸图像。第二卷积子网322可以对脸部轮廓区域进行卷积池化处理以得到脸部轮廓特征的图像。第二全连接子网332可以对脸部轮廓特征的图像进行分类处理以得到脸部轮廓特征的分类结果。
第三卷积子网323用于对M个初始化图像进行卷积池化处理以得到方向特征的图像。
第三全连接子网333用于对方向特征的图像进行分类处理以得到方向特征的分类结果。
在一些实施例中,感知网络300进行人脸识别时,初始化图像可以是一个人脸图像。第三卷积子网323可以对脸部方向特征的图像进行卷积池化处理以得到脸部方向特征的图像。第三全连接子网333可以对脸部方向特征的图像进行分类处理以得到脸部方向特征的分类结果。可选的,方向特征的分类结果可以是根据第七全连接层的分类结果和第八全连接层的分类结果共同确定的。
在一些实施例中,方向特征可以采用三维空间的右手笛卡尔坐标中的俯仰角(pitch)、偏航角(yaw)和翻滚角(roll)来表示。示例的,如图4所示,俯仰角围绕X轴旋转的。偏航角围绕Y轴旋转。翻滚角围绕Z轴旋转。俯仰角、偏航角和翻滚角的具体解释可以参考现有技术,不予赘述。
从而,通过感知网络300包括的主干网络310、卷积网络320和全连接网络330对脸部不同的区域的细节特征、脸部的轮廓特征和脸部的方向特征进行卷积池化处理和分类处理,可以获得高精度的脸部关键点和姿态,有效地提高了处理图像的精度。平均误差(mean error)越小表示精度越高,效果越好。
如表1所示,多任务神经网络和本申请实施例所述的感知网络处理图像后的平均误差。
表1
  多任务神经网络 感知网络
标准化平均误差 8.6 7.2
如图5中的(a)所示,为多任务神经网络对图像处理后的结果,由图可以看出,识别出的脸部关键点较少,因此,脸部图像识别的准确率较低,平均误差较大。如图 5中的(b)所示,为感知网络对图像处理后的结果,由图可以看出,识别出的脸部关键点较多,因此,脸部图像识别的准确率较高,平均误差较小。
图6为本申请实施例提供的一种图像处理方法的流程图。该方法应用于感知网络,感知网络包括主干网络、卷积网络和全连接网络,卷积网络包括N个卷积子网,全连接网络包括N个全连接子网,主干网络与N个卷积子网连接,N个卷积子网与N个全连接子网连接,第i卷积子网连接第i全连接子网,N为大于或等于2的整数,i为整数,i∈[1,N]。关于感知网络的详细解释可以参考上述感知网络300的阐述,不予赘述。如图6所示,该方法可以包括:
S601、主干网络对第一图像进行卷积处理以得到M个初始化图像。
在一些实施例中,主干网络包括至少1层卷积层。主干网络可以根据第1层卷积层至第K层卷积层对第一图像进行卷积处理以得到M个初始化图像,K为大于或等于1的整数。示例的,K的取值为3或5。从而,通过选取前N层卷积层对第一图像进行卷积处理,能够最大程度地保留高分辨率的特征图像,有利于提高处理图像的精度。
S602、主干网络将M个初始化图像输出至N个卷积子网中的每个卷积子网。
M个初始化图像中每个初始化图像的特征不同,M为大于或等于1的整数。
S603、第i卷积子网对M个初始化图像进行卷积池化处理以得到第i类特征的图像。
在一些实施例中,第i卷积子网包括K i层卷积层和L i层池化层,K i和L i均为大于或等于1的整数。第i卷积子网对M个初始化图像进行K i次卷积处理和L i次池化处理以得到第i类特征的图像。由于不同的卷积子网包含了不同个数的卷积层和池化层,通过不同卷积子网对相同的图像进行不同次数的卷积池化处理以得到不同类型特征的图像,从而,保持较高分辨率的卷积特征图,有效地提高处理图像的精度。
S604、第i卷积子网将第i类特征的图像输出至第i全连接子网。
S605、第i全连接子网对第i类特征的图像进行分类处理以得到第i类特征的分类结果。
在一些实施例中,第i全连接子网包括R i个全连接层,R i为大于或等于2的整数。由于不同的全连接子网包含了不同个数的全连接层,通过不同的全连接子网对不同特征图像进行分类处理以得到相应特征的分类结果,从而,确保较高分辨率的卷积特征图的分类处理,有效地提高了分类处理的精度。
关于图像处理过程中的其他解释可以参考上述关于感知网络300处理图像的阐述,不予赘述。
可以理解的是,为了实现上述实施例中功能,图像处理装置包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
图7和图8为本申请的实施例提供的可能的图像处理装置的结构示意图。这些图像处理装置可以用于实现上述方法实施例中主干网络、卷积网络和全连接网络的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该图像处理 装置可以是终端设备,还可以是应用于终端设备的模块(如芯片)。终端设备可以是进行图像处理、人脸识别、图像分类等设备。例如,可穿戴设备、增强现实(augmented reality,AR)设备或虚拟现实(virtual reality,VR)设备。
可穿戴设备也可以称为穿戴式智能设备,是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。
如图7所示,图像处理装置700包括处理单元710和收发单元720。图像处理装置700用于实现上述图6中所示的方法实施例中主干网络、卷积网络和全连接网络的功能。
当图像处理装置700用于实现图6所示的方法实施例中主干网络的功能时:收发单元720用于接收第一图像,并将M个初始化图像输出至N个卷积子网中的每个卷积子网,即执行S602;处理单元710用于对第一图像进行卷积处理以得到M个初始化图像,即执行S601。
当图像处理装置700用于实现图6所示的方法实施例中卷积网络的功能时:收发单元720用于接收M个初始化图像,并将第i类特征的图像输出至第i全连接子网,即执行S604;处理单元710用于对M个初始化图像进行卷积池化处理以得到第i类特征的图像,即执行S603。
当图像处理装置700用于实现图6所示的方法实施例中全连接网络的功能时:收发单元720用于接收第i类特征的图像;处理单元710用于对第i类特征的图像进行分类处理以得到第i类特征的分类结果。
有关上述处理单元710和收发单元720更详细的描述可以直接参考图6所示的方法实施例中相关描述直接得到,这里不加赘述。
如图8所示,图像处理装置800包括处理器810和接口电路820。处理器810和接口电路820之间相互耦合。可以理解的是,接口电路820可以为收发器或输入输出接口。可选的,图像处理装置800还可以包括存储器830,用于存储处理器810执行的指令或存储处理器810运行指令所需要的输入数据或存储处理器810运行指令后产生的数据。
当图像处理装置800用于实现图6所示的方法时,处理器810用于执行上述处理单元710的功能,接口电路820用于执行上述收发单元720的功能。
当上述图像处理装置为应用于终端设备的芯片时,该终端设备芯片实现上述方法实施例中终端设备的功能。该终端设备芯片从终端设备中的其它模块(如网卡、端口或摄像头)接收信息,该信息是图像信息。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(Central Processing Unit,CPU),图形处理器(Graphics Processing Unit,GPU)、还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (15)

  1. 一种感知网络,其特征在于,所述感知网络包括主干网络、卷积网络和全连接网络,所述卷积网络包括N个卷积子网,所述全连接网络包括N个全连接子网,所述主干网络与所述N个卷积子网连接,所述N个卷积子网与所述N个全连接子网连接,第i卷积子网连接第i全连接子网,N为大于或等于2的整数,i为整数,i∈[1,N];
    所述主干网络用于对第一图像进行卷积处理以得到M个初始化图像,并将所述M个初始化图像输出至所述N个卷积子网中的每个卷积子网,所述M个初始化图像中每个初始化图像的特征不同,M为大于或等于1的整数;
    所述第i卷积子网用于对所述M个初始化图像进行卷积池化处理以得到第i类特征的图像,并将所述第i类特征的图像输出至所述第i全连接子网;
    所述第i全连接子网用于对所述第i类特征的图像进行分类处理以得到所述第i类特征的分类结果。
  2. 根据权利要求1所述的感知网络,其特征在于,所述主干网络包括至少1层卷积层,所述主干网络用于:
    根据第1层卷积层至第K层卷积层对所述第一图像进行卷积处理以得到所述M个初始化图像,K为大于或等于1的整数。
  3. 根据权利要求2所述的感知网络,其特征在于,K的取值为3或5。
  4. 根据权利要求1-3中任一项所述的感知网络,其特征在于,所述第i卷积子网包括K i层卷积层和L i层池化层,所述K i和L i均为大于或等于1的整数;
    所述第i卷积子网用于对所述M个初始化图像进行K i次卷积处理和L i次池化处理以得到所述第i类特征的图像;
    所述第i全连接子网包括R i个全连接层,所述R i为大于或等于2的整数。
  5. 根据权利要求4所述的感知网络,其特征在于,N=3,第一卷积子网包括第一卷积层、第二卷积层和第一池化层,第二卷积子网包括第三卷积层和第二池化层,第三卷积子网包括第四卷积层、第五卷积层和第三池化层,第一全连接子网包括第一全连接层、第二全连接层和第三全连接层,第二全连接子网包括第四全连接层和第五全连接层,第三全连接子网包括第六全连接层、第七全连接层和第八全连接层;其中,
    所述主干网络与所述第一卷积层连接,所述第一卷积层与所述第二卷积层连接,所述第二卷积层与所述第一池化层连接,所述第一池化层与所述第一全连接层连接,所述第一全连接层与所述第二全连接层连接,所述第二全连接层与所述第三全连接层连接;
    所述主干网络与所述第三卷积层连接,所述第三卷积层与所述第二池化层连接,所述第二池化层与所述第四全连接层连接,所述第四全连接层与所述第五全连接层连接;
    所述主干网络与所述第四卷积层连接,所述第四卷积层与所述第三池化层连接,所述第三池化层与所述第五卷积层连接,所述第五卷积层与所述第六全连接层连接,所述第六全连接层分别与所述第七全连接层和所述第八全连接层连接。
  6. 根据权利要求5所述的感知网络,其特征在于,所述第一卷积子网的卷积核的取值T为128、256或512;所述第二卷积子网的卷积核的取值为0.5*T;所述第三卷 积子网的卷积核的取值为2*T;所述第一全连接层的维度值为1024、2048或4096;所述第二全连接层的维度值为1024、2048或4096;所述第三全连接层的维度值为1024、2048或4096;所述第四全连接层的维度值为1024、2048或4096;所述第五全连接层的维度值为1024、2048或4096;所述第六全连接层的维度值为1024、2048或4096;所述第七全连接层的维度值为1024、2048或4096;所述第八全连接层的维度值为1024、2048或4096。
  7. 一种图像处理方法,其特征在于,应用于感知网络,所述感知网络包括主干网络、卷积网络和全连接网络,所述卷积网络包括N个卷积子网,所述全连接网络包括N个全连接子网,所述主干网络与所述N个卷积子网连接,所述N个卷积子网与所述N个全连接子网连接,第i卷积子网连接第i全连接子网,N为大于或等于2的整数,i为整数,i∈[1,N];
    所述主干网络对第一图像进行卷积处理以得到M个初始化图像,并将所述M个初始化图像输出至所述N个卷积子网中的每个卷积子网,所述M个初始化图像中每个初始化图像的特征不同,M为大于或等于1的整数;
    所述第i卷积子网对所述M个初始化图像进行卷积池化处理以得到第i类特征的图像,并将所述第i类特征的图像输出至所述第i全连接子网;
    所述第i全连接子网对所述第i类特征的图像进行分类处理以得到所述第i类特征的分类结果。
  8. 根据权利要求7所述的方法,其特征在于,所述主干网络包括至少1层卷积层,所述主干网络对第一图像进行卷积处理以得到M个初始化图像,包括:
    根据第1层卷积层至第K层卷积层对所述第一图像进行卷积处理以得到所述M个初始化图像,K为大于或等于1的整数。
  9. 根据权利要求8所述的方法,其特征在于,K的取值为3或5。
  10. 根据权利要求7-9中任一项所述的方法,其特征在于,所述第i卷积子网包括K i层卷积层和L i层池化层,所述K i和L i均为大于或等于1的整数;
    所述第i卷积子网对所述M个初始化图像进行卷积池化处理以得到第i类特征的图像,包括:
    所述第i卷积子网对所述M个初始化图像进行K i次卷积处理和L i次池化处理以得到所述第i类特征的图像;
    所述第i全连接子网包括R i个全连接层,所述R i为大于或等于2的整数。
  11. 根据权利要求10所述的方法,其特征在于,N=3,第一卷积子网包括第一卷积层、第二卷积层和第一池化层,第二卷积子网包括第三卷积层和第二池化层,第三卷积子网包括第四卷积层、第五卷积层和第三池化层,第一全连接子网包括第一全连接层、第二全连接层和第三全连接层,第二全连接子网包括第四全连接层和第五全连接层,第三全连接子网包括第六全连接层、第七全连接层和第八全连接层;其中,
    所述主干网络与所述第一卷积层连接,所述第一卷积层与所述第二卷积层连接,所述第二卷积层与所述第一池化层连接,所述第一池化层与所述第一全连接层连接,所述第一全连接层与所述第二全连接层连接,所述第二全连接层与所述第三全连接层连接;
    所述主干网络与所述第三卷积层连接,所述第三卷积层与所述第二池化层连接,所述第二池化层与所述第四全连接层连接,所述第四全连接层与所述第五全连接层连接;
    所述主干网络与所述第四卷积层连接,所述第四卷积层与所述第三池化层连接,所述第三池化层与所述第五卷积层连接,所述第五卷积层与所述第六全连接层连接,所述第六全连接层分别与所述第七全连接层和所述第八全连接层连接。
  12. 根据权利要求11所述的方法,其特征在于,所述第一卷积子网的卷积核的取值T为128、256或512;所述第二卷积子网的卷积核的取值为0.5*T;所述第三卷积子网的卷积核的取值为2*T;所述第一全连接层的维度值为1024、2048或4096;所述第二全连接层的维度值为1024、2048或4096;所述第三全连接层的维度值为1024、2048或4096;所述第四全连接层的维度值为1024、2048或4096;所述第五全连接层的维度值为1024、2048或4096;所述第六全连接层的维度值为1024、2048或4096;所述第七全连接层的维度值为1024、2048或4096;所述第八全连接层的维度值为1024、2048或4096。
  13. 一种图像处理装置,包括用于执行如权利要求7至12中的任一项所述的图像处理方法的模块。
  14. 一种图像处理装置,其特征在于,包括:至少一个处理器、存储器、总线和传感器,其中,所述存储器用于存储计算机程序,使得所述计算机程序被所述至少一个处理器执行时实现如权利要求7至12中任一项所述的方法。
  15. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被通信装置执行时,实现如权利要求7至12中任一项所述的方法。
PCT/CN2019/121373 2019-11-27 2019-11-27 一种感知网络及图像处理方法 WO2021102762A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/121373 WO2021102762A1 (zh) 2019-11-27 2019-11-27 一种感知网络及图像处理方法
CN201980101029.XA CN114467121A (zh) 2019-11-27 2019-11-27 一种感知网络及图像处理方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/121373 WO2021102762A1 (zh) 2019-11-27 2019-11-27 一种感知网络及图像处理方法

Publications (1)

Publication Number Publication Date
WO2021102762A1 true WO2021102762A1 (zh) 2021-06-03

Family

ID=76128721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121373 WO2021102762A1 (zh) 2019-11-27 2019-11-27 一种感知网络及图像处理方法

Country Status (2)

Country Link
CN (1) CN114467121A (zh)
WO (1) WO2021102762A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762342A (zh) * 2021-08-04 2021-12-07 北京旷视科技有限公司 数据处理方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105938560A (zh) * 2016-03-23 2016-09-14 吉林大学 一种基于卷积神经网络的车型精细分类系统
CN106934404A (zh) * 2017-03-10 2017-07-07 深圳市瀚晖威视科技有限公司 一种基于cnn卷积神经网络的图像火焰识别系统
CN107403197A (zh) * 2017-07-31 2017-11-28 武汉大学 一种基于深度学习的裂缝识别方法
CN108875674A (zh) * 2018-06-29 2018-11-23 东南大学 一种基于多列融合卷积神经网络的驾驶员行为识别方法
CN109359666A (zh) * 2018-09-07 2019-02-19 佳都新太科技股份有限公司 一种基于多特征融合神经网络的车型识别方法及处理终端
US20190065817A1 (en) * 2017-08-29 2019-02-28 Konica Minolta Laboratory U.S.A., Inc. Method and system for detection and classification of cells using convolutional neural networks
CN110209844A (zh) * 2019-05-17 2019-09-06 腾讯音乐娱乐科技(深圳)有限公司 多媒体数据匹配方法、装置和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105938560A (zh) * 2016-03-23 2016-09-14 吉林大学 一种基于卷积神经网络的车型精细分类系统
CN106934404A (zh) * 2017-03-10 2017-07-07 深圳市瀚晖威视科技有限公司 一种基于cnn卷积神经网络的图像火焰识别系统
CN107403197A (zh) * 2017-07-31 2017-11-28 武汉大学 一种基于深度学习的裂缝识别方法
US20190065817A1 (en) * 2017-08-29 2019-02-28 Konica Minolta Laboratory U.S.A., Inc. Method and system for detection and classification of cells using convolutional neural networks
CN108875674A (zh) * 2018-06-29 2018-11-23 东南大学 一种基于多列融合卷积神经网络的驾驶员行为识别方法
CN109359666A (zh) * 2018-09-07 2019-02-19 佳都新太科技股份有限公司 一种基于多特征融合神经网络的车型识别方法及处理终端
CN110209844A (zh) * 2019-05-17 2019-09-06 腾讯音乐娱乐科技(深圳)有限公司 多媒体数据匹配方法、装置和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI,CHUANPENG ET AL.: "Research on Image Denoising Based on Deep Convolutional Neural Network", COMPUTER ENGINEERING, vol. 43, no. 3, 15 March 2017 (2017-03-15), pages 253 - 260, XP055816603 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762342A (zh) * 2021-08-04 2021-12-07 北京旷视科技有限公司 数据处理方法、装置、电子设备及存储介质
CN113762342B (zh) * 2021-08-04 2024-03-29 北京旷视科技有限公司 数据处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114467121A (zh) 2022-05-10

Similar Documents

Publication Publication Date Title
WO2020238293A1 (zh) 图像分类方法、神经网络的训练方法及装置
US10452979B2 (en) Convolution neural network training apparatus and method thereof
CN109685819B (zh) 一种基于特征增强的三维医学图像分割方法
US11328172B2 (en) Method for fine-grained sketch-based scene image retrieval
US20210398252A1 (en) Image denoising method and apparatus
CN109902548B (zh) 一种对象属性识别方法、装置、计算设备及系统
US20210264144A1 (en) Human pose analysis system and method
CN108596833A (zh) 超分辨率图像重构方法、装置、设备及可读存储介质
EP4006776A1 (en) Image classification method and apparatus
WO2020098257A1 (zh) 一种图像分类方法、装置及计算机可读存储介质
US20220148291A1 (en) Image classification method and apparatus, and image classification model training method and apparatus
CN110222718B (zh) 图像处理的方法及装置
US20220262093A1 (en) Object detection method and system, and non-transitory computer-readable medium
CN111209873A (zh) 一种基于深度学习的高精度人脸关键点定位方法及系统
CN112651380A (zh) 人脸识别方法、人脸识别装置、终端设备及存储介质
Jia et al. Skin lesion classification using class activation map
WO2021102762A1 (zh) 一种感知网络及图像处理方法
US20230401838A1 (en) Image processing method and related apparatus
Sardeshmukh et al. Crop image classification using convolutional neural network
WO2023207531A1 (zh) 一种图像处理方法及相关设备
JP7225731B2 (ja) 多変数データシーケンスの画像化
US11977979B2 (en) Adaptive bounding for three-dimensional morphable models
WO2022227024A1 (zh) 神经网络模型的运算方法、训练方法及装置
TWI728791B (zh) 圖像語義分割方法及裝置、儲存介質
US20230410447A1 (en) View dependent three-dimensional morphable models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19954678

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19954678

Country of ref document: EP

Kind code of ref document: A1