WO2022213843A1 - 一种图像处理方法、训练方法及装置 - Google Patents

一种图像处理方法、训练方法及装置 Download PDF

Info

Publication number
WO2022213843A1
WO2022213843A1 PCT/CN2022/083614 CN2022083614W WO2022213843A1 WO 2022213843 A1 WO2022213843 A1 WO 2022213843A1 CN 2022083614 W CN2022083614 W CN 2022083614W WO 2022213843 A1 WO2022213843 A1 WO 2022213843A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
feature
layer
neural network
convolutional neural
Prior art date
Application number
PCT/CN2022/083614
Other languages
English (en)
French (fr)
Inventor
赵寅
哈米杜林维亚切斯拉夫
杨海涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from RU2021109673A external-priority patent/RU2773420C1/ru
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202280008179.8A priority Critical patent/CN116635895A/zh
Priority to EP22783910.7A priority patent/EP4303818A1/en
Publication of WO2022213843A1 publication Critical patent/WO2022213843A1/zh
Priority to US18/481,096 priority patent/US20240029406A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present application relates to the field of image processing, and in particular, to an image processing method, training method and device.
  • Images are the visual basis for human perception of the world. Humans can use images to obtain information, express information and transmit information. In order to quickly obtain image information, neural networks can be used to process images to achieve functions such as image classification, face recognition, and target detection. Usually, the device on the device side sends image data to the device on the cloud side where the neural network is deployed, and the device on the cloud side performs image processing. However, the amount of image data is large, resulting in high latency in device-cloud interaction.
  • the current technical solution provides a device-cloud collaboration solution based on feature map transmission.
  • the device on the device side extracts the original feature map of the image to be processed, and uses the principal components analysis (PCA) method to extract multiple features of the original feature map.
  • PCA principal components analysis
  • the terminal-side device sends the linear combination of the multiple principal components to the cloud-side device, and the cloud-side device obtains a reconstructed feature map according to the multiple principal components, and obtains an image processing result according to the reconstructed feature map.
  • PCA principal components analysis
  • the amount of data of the multiple principal components is still large, so that it takes a long time for the cloud-side device to receive the multiple principal components.
  • the present application provides an image processing method, training method and device, which solve the problem of a large amount of transmitted data during image processing.
  • the present application provides an image processing method, the method can be applied to a sending node, or the method can be applied to a communication device that can support a terminal device to implement the method, for example, the communication device includes a chip system, and the method includes :
  • the sending node obtains the image to be processed, inputs the image to be processed into the convolutional neural network, and the feature extraction layer included in the convolutional neural network performs feature extraction on the image to be processed to obtain the first feature map, and the feature compression included in the convolutional neural network
  • the layer compresses the first feature map to obtain a second feature map, the number of channels of the second feature map is smaller than the number of channels of the first feature map.
  • the sending node sends the second feature map to the receiving node. Since the feature extraction layer is used to extract the features of the image to be processed to obtain the first feature map, the data volume of the first feature map is smaller than that of the to-be-processed image; in addition, the first feature map is compressed by the feature compression layer to obtain the second feature map, The number of channels of the second feature map is made smaller than the number of channels of the first feature map. Under the condition that the resolution of the first feature map is not increased, the data volume of the second feature map is smaller than the data volume of the first feature map, which is further reduced The data volume of the feature map sent by the sending node to the receiving node is reduced, and the transmission delay on the terminal side and the cloud side is reduced.
  • the resolution of the first feature map is W ⁇ H
  • the resolution of the second feature map is W ⁇ H ⁇ , where W ⁇ H ⁇ W ⁇ H.
  • the data volume of the feature map is determined by the product of the resolution and the number of channels. For example, when the resolution of the second feature map is smaller than that of the first feature map, the number of channels of the second feature map is smaller than that of the first feature map. The number of channels in the graph, so the data volume of the second feature map is smaller than the data volume of the first feature map, which reduces the data volume of the feature map sent by the sending node to the receiving node, and reduces the transmission delay between the device side and the cloud side.
  • the feature compression layer includes at least one convolutional layer.
  • the convolutional layer can be used to downsample the first feature map to reduce the resolution of the first feature map.
  • the convolutional layer can also be used to reduce the number of channels of the first feature map to obtain the second feature map.
  • the method further includes: the sending node receives an image processing result of the image to be processed, where the image processing result is determined by the receiving node according to the second feature map.
  • the sending node performs feature extraction and compression on the image to be processed to obtain a second feature map, and the receiving node uses the second feature map to determine the image processing result of the image to be processed, which realizes the end-cloud interaction of the image processing method, and overcomes the computing power of the sending node. Defects of insufficient storage.
  • the method further includes: the sending node displays the image processing result.
  • the sending node displays the image processing result.
  • the present application provides an image processing method, the method can be applied to a receiving node, or the method can be applied to a communication device that can support a terminal device to implement the method, for example, the communication device includes a chip system, and the method includes :
  • the receiving node receives the second feature map, and uses the feature reconstruction layer included in the convolutional neural network to reconstruct the second feature map to obtain the third feature map, and the receiving node uses the feature output layer and image included in the convolutional neural network.
  • the processing layer processes the third feature map to obtain an image processing result, and the receiving node also sends the image processing result, where the image processing result indicates the information of the image to be processed.
  • the second feature map is obtained by the sending node using the convolutional neural network to extract the features of the image to be processed to obtain the first feature map, and compress the first feature map; the number of channels of the second feature map is smaller than that of the third feature map The number of channels in the first feature map is the same as the number of channels in the third feature map.
  • the receiving node only needs to determine the image processing result of the image to be processed according to the second feature map sent by the sending node, and the number of channels of the second feature map is smaller than the third feature map required for image processing. This reduces the amount of data received by the receiving node from the sending node without increasing the resolution of the second feature map, thereby reducing the transmission delay between the terminal side and the cloud side.
  • the feature reconstruction layer includes at least one deconvolution layer.
  • the deconvolution layer can be used to upsample the second feature map to increase the resolution of the second feature map.
  • the deconvolution layer can also be used to increase the number of channels of the second feature map to obtain the third feature map.
  • the resolution of the second feature map is W ⁇ H ⁇
  • the resolution of the third feature map is W ⁇ H, where W ⁇ H ⁇ W ⁇ H.
  • the data volume of the feature map is determined by the product of the resolution and the number of channels. For example, when the resolution of the second feature map is smaller than that of the first feature map, the number of channels of the second feature map is smaller than that of the first feature map. The number of channels in the map, so the data volume of the first feature map is smaller than the data volume of the second feature map.
  • the receiving node receiving the second feature map includes: the receiving node receiving the second feature map sent by the sending node.
  • Sending the image processing result by the receiving node includes: the receiving node sending the image processing result to the sending node.
  • the present application also provides a method for training a convolutional neural network, and the training method can be applied to a communication device that can support a terminal device to implement the method, for example, the communication device includes a chip system, and the method includes: acquiring at least A training set of a training image, and according to the training set, the first feature extraction layer and the first feature output layer of the first convolutional neural network, the first bottleneck structure layer of the first convolutional neural network is trained to obtain the second convolutional neural network.
  • a neural network the second convolutional neural network includes a first feature extraction layer, a second bottleneck structure layer and a first feature output layer; wherein, the first feature extraction layer is used to perform feature extraction on the image to be processed to obtain a first feature map, The feature compression layer in the second bottleneck structure layer is used to compress the first feature map to obtain a second feature map, and the number of channels of the second feature map is smaller than the number of channels of the first feature map.
  • the convolutional neural network extracts the feature map of the image to be processed, the resolution of the image to be processed will be reduced or unchanged.
  • the network can perform feature extraction and compression on the image to be processed, reducing the number of channels of the feature map of the image to be processed, thereby reducing the amount of feature map data sent by the sending node to the receiving node.
  • the first convolutional neural network and the second convolutional neural network have the same first feature extraction layer and first feature output layer, only the bottleneck structure layer needs to be trained during the training process of the convolutional neural network, reducing The computational resources required to train a convolutional neural network.
  • the first bottleneck structure layer of the first convolutional neural network is trained to obtain the second volume
  • the convolutional neural network includes: inputting the training set into the third convolutional neural network to obtain the first set, and inputting the training set into the first convolutional neural network to obtain the second set, and further, according to the fourth feature map and A loss function is calculated for the fifth feature map in the second set, and the parameters of the first bottleneck structure layer are updated according to the loss function to obtain a second bottleneck structure layer, and a second convolutional neural network is obtained.
  • the third convolutional neural network includes a second feature extraction layer and a second feature output layer, the parameters of the first feature extraction layer are the same as the parameters of the second feature extraction layer, and the parameters of the first feature output layer are the same as the second feature output layer.
  • the parameters of the layers are the same.
  • the fourth feature map included in the first set is obtained after the second feature extraction layer and the second feature output layer perform feature extraction on the training image; the fifth feature map included in the second set is the first bottleneck structure. layer and the first feature output layer are obtained by reconstructing and processing the second feature map.
  • the training method provided by this application compares the distance between the corresponding multi-layer feature maps (the fourth feature map and the fifth feature map) in the first convolutional neural network and the third convolutional neural network Calculate the loss function to obtain the second convolutional neural network, which is beneficial to reduce the distance between the fourth feature map and the fifth feature map as small as possible, thereby reducing the error between the first feature map and the third feature map, Improve the accuracy of image processing.
  • inputting the training set into the first convolutional neural network to obtain the second set includes: using the first feature extraction layer to perform feature extraction on the training image to obtain the first feature map; A feature compression layer included in the bottleneck structure layer compresses the first feature map to obtain a sixth feature map; also uses the feature reconstruction layer included in the first bottleneck structure layer to reconstruct the sixth feature map to obtain a third feature map, and then uses The second feature output layer processes the third feature map to obtain the fifth feature map included in the second set.
  • the number of channels of the third feature map is the same as the number of channels of the first feature map; the number of channels of the sixth feature map is smaller than the number of channels of the first feature map.
  • calculating the loss function according to the fourth feature map in the first set and the fifth feature map in the second set includes: acquiring the difference between the fourth feature map and the fifth feature map The first distance, and the second distance between the first feature map and the third feature map is obtained, and a loss function is calculated according to the first distance and the second distance.
  • the second distance between the first feature map and the third feature map is added to calculate the loss function, which is beneficial to the fourth feature
  • the distance between the feature map and the fifth feature map is as small as possible, and the distance between the first feature map and the third feature map is as small as possible, which reduces the processing error of the feature compression layer and feature reconstruction layer, and improves the The accuracy of image processing.
  • the resolution of the first feature map is W ⁇ H
  • the resolution of the second feature map is W ⁇ H ⁇ , where W ⁇ H ⁇ W ⁇ H.
  • the present application further provides an image processing apparatus, and the beneficial effects can be found in the description of any aspect of the first aspect, which will not be repeated here.
  • the image processing apparatus has the function of implementing the behavior in the method example of any one of the above-mentioned first aspects.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the image processing apparatus is applied to a sending node, and the image processing apparatus includes: a transceiver unit for acquiring an image to be processed; a feature extraction unit for using a feature extraction layer included in a convolutional neural network to obtain an image to be processed Process the image to perform feature extraction to obtain the first feature map; the feature compression unit is used to use the feature compression layer included in the convolutional neural network to compress the first feature map to obtain a second feature map, and the number of channels of the second feature map is smaller than the first feature map. The number of channels of the feature map; the transceiver unit is also used to send the second feature map to the receiving node.
  • the feature compression layer includes at least one convolutional layer.
  • the resolution of the first feature map is W ⁇ H
  • the resolution of the second feature map is W ⁇ H ⁇ , where W ⁇ H ⁇ W ⁇ H.
  • the transceiver unit is further configured to receive an image processing result of the image to be processed, and the image processing result is determined by the receiving node according to the second feature map.
  • the image processing apparatus further includes: a display unit, configured to display the image to be processed and/or the image processing result.
  • the present application further provides another image processing apparatus, and the beneficial effects can be found in the description of any aspect of the second aspect, which will not be repeated here.
  • the image processing apparatus has the function of implementing the behavior in the method example of any one of the above-mentioned second aspects.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the image processing apparatus is applied to a receiving node, and the image processing apparatus includes: a transceiver unit, configured to receive a second feature map, where the sending node uses a convolutional neural network to process the image to be processed.
  • the first feature map is obtained by feature extraction and obtained by compressing the first feature map; the feature reconstruction unit is used to reconstruct the second feature map by using the feature reconstruction layer included in the convolutional neural network to obtain the third feature map.
  • feature map the number of channels of the second feature map is less than the number of channels of the third feature map, the number of channels of the first feature map and the number of channels of the third feature map are the same; the image processing unit is used to utilize the features included in the convolutional neural network
  • the output layer and the image processing layer process the third feature map to obtain the image processing result, and the image processing result indicates the information of the image to be processed; the transceiver unit is used for sending the image processing result.
  • the feature reconstruction layer includes at least one deconvolution layer.
  • the resolution of the second feature map is W ⁇ H ⁇
  • the resolution of the third feature map is W ⁇ H, where W ⁇ H ⁇ W ⁇ H.
  • the transceiver unit is specifically configured to receive the second feature map sent by the sending node; the transceiver unit is specifically configured to send the image processing result to the sending node.
  • the present application further provides a training device for a convolutional neural network.
  • a training device for a convolutional neural network For beneficial effects, reference may be made to the description of any aspect of the third aspect, which will not be repeated here.
  • the image processing apparatus has the function of implementing the behavior in the method example of any one of the above-mentioned third aspects.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the training device includes: an acquisition unit for acquiring a training set, where the training set includes at least one training image; a processing unit for extracting the first feature according to the training set and the first convolutional neural network layer and the first feature output layer, train the first bottleneck structure layer of the first convolutional neural network to obtain the second convolutional neural network, and the second convolutional neural network includes the first feature extraction layer, the second bottleneck structure layer and the first Feature output layer; wherein, the first feature extraction layer is used to extract the features of the image to be processed to obtain the first feature map, the feature compression layer in the second bottleneck structure layer is used to compress the first feature map to obtain the second feature map, the second feature The number of channels of the feature map is smaller than the number of channels of the first feature map.
  • the processing unit includes: a first training unit, configured to input the training set into a third convolutional neural network to obtain a first set, and the third convolutional neural network includes a second feature extraction layer and a third convolutional neural network.
  • the parameters of the first feature extraction layer are the same as the parameters of the second feature extraction layer, the parameters of the first feature output layer are the same as the parameters of the second feature output layer, the first set includes the fourth feature map, the fourth feature map
  • the feature map is obtained after the second feature extraction layer and the second feature output layer perform feature extraction on the training image;
  • the second training unit is used to input the training set into the first convolutional neural network to obtain the second set, the second set Including a fifth feature map, the fifth feature map is obtained by the feature reconstruction and processing of the second feature map by the first bottleneck structure layer and the first feature output layer;
  • the loss calculation unit is used to calculate according to the fourth feature in the first set.
  • the feature map and the fifth feature map in the second set calculate the loss function;
  • the third training unit is used to update the parameters of the first bottleneck structure layer according to the loss function, obtain the second bottleneck structure layer, and obtain the second convolutional neural network.
  • inputting the training set into the first convolutional neural network to obtain the second set includes: using the first feature extraction layer to perform feature extraction on the training image to obtain the first feature map; using the first feature extraction layer
  • the feature compression layer included in the bottleneck structure layer compresses the first feature map to obtain a sixth feature map, and the number of channels in the sixth feature map is less than the number of channels in the first feature map; using the feature reconstruction layer included in the first bottleneck structure layer, the The sixth feature map is constructed to obtain the third feature map.
  • the number of channels in the third feature map is the same as the number of channels in the first feature map; the second feature output layer is used to process the third feature map to obtain the fifth feature map included in the second set.
  • the loss calculation unit is specifically used to obtain the first distance between the fourth feature map and the fifth feature map; the loss calculation unit is specifically used to obtain the first feature map and the third feature The second distance between the graphs; the loss calculation unit is specifically configured to calculate the loss function according to the first distance and the second distance.
  • the resolution of the first feature map is W ⁇ H
  • the resolution of the second feature map is W ⁇ H ⁇ , where W ⁇ H ⁇ W ⁇ H.
  • the present application further provides a communication device, comprising a processor and an interface circuit, the interface circuit is configured to receive signals from other communication devices other than the communication device and transmit to the processor or send the signals from the processor to the communication device.
  • the processor is used to implement the first aspect and any one of the possible implementations of the first aspect, or any one of the second aspect and the second aspect through logic circuits or executing code instructions. manner, or operation steps of the method of any one of the third aspect and any possible implementation manner of the third aspect.
  • the present application provides a computer-readable storage medium, in which a computer program or instruction is stored, and when the computer program or instruction is executed by a communication device, any one of the first aspect and the first aspect may be implemented. or any one of the possible implementations of the second aspect and the second aspect, or the operation steps of the method of the third aspect and any one of the possible implementations of the third aspect.
  • the present application provides a computer program product that, when the computer program product runs on a computer, enables a computing device to implement any one of the possible implementations of the first aspect and the first aspect, or the second aspect and the second aspect Any one of the possible implementations, or the third aspect and the operation steps of the method of any one of the possible implementations of the third aspect.
  • the present application provides a chip, including a memory and a processor, the memory is used for storing computer instructions, and the processor is used for calling and running the computer instructions from the memory, so as to execute the above-mentioned first aspect and any possibility of the first aspect.
  • the method in the implementation manner of the or any one of the possible implementation manners of the second aspect and the second aspect, or the operation steps of the method in the third aspect and any one possible implementation manner of the third aspect.
  • the present application may further combine to provide more implementation manners.
  • FIG. 1 is a system schematic diagram of a device-cloud collaboration solution provided by the present application.
  • FIG. 2 is a schematic flowchart of an image processing method provided by the present application.
  • FIG. 3 is a schematic structural diagram of a convolutional neural network provided by the application.
  • FIG. 4 is a schematic structural diagram of a convolutional neural network provided by the prior art
  • FIG. 5 is a schematic display diagram of an image processing provided by the present application.
  • FIG. 6 is a schematic flowchart of a training method provided by the application.
  • FIG. 7 is a schematic diagram one of training of a convolutional neural network provided by the application.
  • 8A is a second schematic diagram of training of a convolutional neural network provided by the application.
  • 8B is a schematic diagram three of training of a convolutional neural network provided by the application.
  • FIG. 10 is a schematic structural diagram of a training device and an image processing device provided by the application.
  • FIG. 11 is a schematic structural diagram of a communication device provided by the present application.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
  • FIG. 1 is a system schematic diagram of a device-cloud collaboration solution provided by the present application.
  • the system includes a device-side device 110 , an edge device 120 , and a cloud-side device 130 .
  • the end-side device 110 may be connected with the edge device 120 in a wireless or wired manner.
  • the terminal-side device 110 may be connected with the cloud-side device 130 in a wireless or wired manner.
  • the edge device 120 may be connected to the cloud-side device 130 in a wireless or wired manner.
  • the terminal-side device 110, the edge device 120, and the cloud-side device 130 may all communicate through a network, and the network may be an internet network.
  • the end-side device 110 may be a terminal device, a user equipment (UE), a mobile station (mobile station, MS), a mobile terminal (mobile terminal, MT), and the like.
  • the end-side device 110 may be a mobile phone (the terminal 111 shown in FIG. 1 ), a tablet computer (the terminal 112 shown in FIG. 1 ), a computer with a wireless transceiving function (as shown in FIG. 1 ) terminal 113), virtual reality (Virtual Reality, VR) terminal equipment, augmented reality (Augmented Reality, AR) terminal equipment, wireless terminals in industrial control (industrial control), wireless terminals in self-driving (such as Terminal 114 shown in FIG.
  • VR Virtual Reality
  • AR Augmented Reality
  • the terminal 114 may be a device for image processing in an automatic driving system
  • the terminal 115 may be a camera device for road monitoring
  • the terminal 116 may be a collection device (such as a camera) for face recognition.
  • the embodiments of the present application do not limit the specific technology and specific device form adopted by the end-side device.
  • the terminal-side device 110 may use an artificial intelligence (artificial intelligence, AI) neural network to perform image processing on the image to be processed.
  • AI artificial intelligence
  • the image to be processed may be collected by the terminal-side device 110 , and the image to be processed may also be realized by an image acquisition device communicatively connected to the terminal-side device 110 , and the image acquisition device may be a video camera, a camera, or the like.
  • the image to be processed may be an image collected by a camera, and the image to be processed may also be a frame of image in a video collected by a camera.
  • the terminal-side device 110 may transmit the image to the edge device 120 or the cloud-side device 130, and the edge device 120 or the cloud-side device 130 runs an AI neural network to process the image to obtain an image processing result.
  • the terminal 116 on the road monitoring collects road images when the terminal 114 (such as a car or a truck) passes through the intersection, and sends the road image to the edge device 120.
  • the edge device 120 runs an AI neural network to determine whether the license plate of the terminal 114 is a Local license plate, if the license plate of the terminal 114 is a foreign license plate, the edge device 120 sends the information of the license plate and the image of the terminal 114 to the terminal device of traffic management.
  • the terminal-side device 110 may transmit the image to the edge device 120, the edge device 120 preprocesses the image, and sends the result obtained from the preprocessing to the cloud-side device 130, and the cloud-side device 130 Get the processing result of the image.
  • the AI neural network is divided into two parts: the first part of the network is used to extract the original feature map of the image, and the second part of the network is used to obtain the image processing result of the image according to the original feature map.
  • the edge device 120 runs the first part of the network and sends the original feature map of the image to the cloud-side device 130, and the cloud-side device 130 runs the second part of the network to process the original feature map to obtain an image processing result.
  • the cloud-side device 130 may be a server for processing image data, such as the server 131 shown in FIG. 1 .
  • the cloud-side device 130 may also be a plurality of virtual machines provided by the server 131 using a virtualization technology, and the virtual machines perform image processing.
  • FIG. 1 is only a schematic diagram, and the system may also include other devices, which are not shown in FIG. 1 .
  • the embodiments of the present application do not limit the number of terminal-side devices, edge devices, and cloud-side devices included in the system.
  • the present application provides an image processing method.
  • the method includes: sending a node to obtain an image to be processed, inputting the to-be-processed image into a convolutional neural network, and a feature extraction layer included in the convolutional neural network performs feature extraction on the to-be-processed image. , to obtain a first feature map, and the feature compression layer included in the convolutional neural network compresses the first feature map to obtain a second feature map, where the number of channels of the second feature map is smaller than the number of channels of the first feature map.
  • the sending node sends the second feature map to the receiving node. Since the feature extraction layer is used to extract the features of the image to be processed to obtain the first feature map, the data volume of the first feature map is smaller than that of the to-be-processed image; in addition, the first feature map is compressed by the feature compression layer to obtain the second feature map, The number of channels of the second feature map is made smaller than the number of channels of the first feature map. Under the condition that the resolution of the first feature map is not increased, the data volume of the second feature map is smaller than the data volume of the first feature map, which is further reduced The data volume of the feature map sent by the sending node to the receiving node is reduced, and the transmission delay on the terminal side and the cloud side is reduced.
  • FIG. 2 is a schematic flowchart of an image processing method provided by the present application, and the image processing method includes the following steps.
  • the sending node acquires the image to be processed.
  • the to-be-processed image may include at least one of a binary image, a grayscale image, an indexed image, or a true-color image.
  • the image to be processed may be acquired by the sending node.
  • the sending node if the sending node is any one of the terminal 111 to the terminal 116 , the sending node can use its own image collecting unit (such as a camera) to collect images.
  • its own image collecting unit such as a camera
  • the to-be-processed image may also be collected by an image collection device communicatively connected to the sending node.
  • the image acquisition device may be a terminal 115 or a terminal 116
  • the sending node may be a server connected to the terminal 115 or the terminal 116 , or the like.
  • the sending node uses the feature extraction layer included in the convolutional neural network to perform feature extraction on the image to be processed to obtain a first feature map.
  • CNN Convolutional neural network
  • a convolutional neural network consists of a feature extractor consisting of convolutional and pooling layers. The feature extractor can be viewed as a filter, and the convolution process can be viewed as convolution with an input image or a convolutional feature map using a trainable filter.
  • the convolutional layer refers to a neuron layer in a convolutional neural network that performs convolution processing on an input signal.
  • the convolution layer can include many convolution operators.
  • the convolution operator is also called the kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can essentially is a weight matrix, which is usually pre-defined, during the convolution operation on the image, the weight matrix is usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image... ...it depends on the value of stride) to process, so as to complete the work of extracting specific features from the image. Different weight matrices can be used to extract different features in the image.
  • one weight matrix is used to extract image edge information
  • another weight matrix is used to extract specific colors of the image
  • another weight matrix is used to extract unwanted noise in the image. Blur, etc.
  • the multiple weight matrices have the same size (row ⁇ column), and the size of the feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted feature maps with the same size are combined to form a convolution operation. output.
  • the weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
  • the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network deepens.
  • the features extracted by the later convolutional layers are more and more complex, such as features such as high-level semantics, and the features with higher semantics are more suitable for the problem to be solved.
  • the convolutional neural network includes at least one convolutional layer, and the convolutional layer includes at least one convolutional unit.
  • the convolutional layer can be used to extract various feature maps of the image to be processed, and the feature maps are convolutional layers.
  • the three dimensions of the feature map are: height (height, H), width (width, W) and The number of channels (channel, C), the product of W and H can be called the resolution of the feature map (W*H).
  • the feature map can represent various information of the image to be processed, such as edge information, lines, textures, etc. in the image.
  • the resolution of the image to be processed is 96 ⁇ 96, which is divided into 144 8 ⁇ 8 image samples.
  • the feature extraction layer convolves each 8 ⁇ 8 image sample, and convolves all the results obtained by the convolution. Perform aggregation to obtain the first feature map of the image to be processed.
  • the convolutional neural network may also include an activation layer, for example, a rectified linear units layer (ReLU), or a parametric rectified linear unit (PReLU), etc.
  • an activation layer for example, a rectified linear units layer (ReLU), or a parametric rectified linear unit (PReLU), etc.
  • the convolutional neural network may also include other functional modules such as a pooling layer, a batch normalization layer (BN layer), a fully connected layer, and the like.
  • BN layer batch normalization layer
  • the convolutional neural network may also include other functional modules such as a pooling layer, a batch normalization layer (BN layer), a fully connected layer, and the like.
  • BN layer batch normalization layer
  • the relevant principles of each functional module of the CNN please refer to the relevant elaboration in the prior art, which will not be repeated.
  • the sending node uses the feature compression layer included in the convolutional neural network to compress the first feature map to obtain the second feature map.
  • the dimension of the first feature map is W 1 ⁇ H 1 ⁇ C 1
  • the dimension of the second feature map is W 2 ⁇ H 2 ⁇ C 2 .
  • the above feature compression layer includes at least one convolution layer, the convolution layer is used to reduce the number of channels of the first feature map, and the number of channels of the second feature map is smaller than the number of channels of the first feature map.
  • the number of output channels is 1/K of the number of input channels, and K can be 2, 3, 4, 6, 8, 12, or 16, etc.
  • the convolutional layer may also be used to downsample the first feature map.
  • the convolution kernel of the above-mentioned convolution layer may be determined according to the actual computing power and image processing requirements of the sending node.
  • the convolution kernel of the convolutional layer can be 3 ⁇ 3, 5 ⁇ 5, or 7 ⁇ 7, etc.
  • the data dimensionality reduction of the neural network feature map often uses a pooling layer.
  • the pooling operation of the pooling layer mainly reduces the parameters of the feature map through a pooling kernel, such as maximum pooling, average Value pooling and min pooling.
  • the pooling layer will lead to an increase in the channel data of the feature map, such as the VGG (Visual Geometry Group Network) model in CNN.
  • the VGG network consists of a convolutional layer module followed by a full connection
  • the layer module is composed of several vgg_biocks in series with the VGG network, and its hyperparameters are defined by the variable conv_rach.
  • This variable specifies the number of output channels of each VGG block in the VGG network.
  • the height and width of the original feature map will be halved, and the channels of the original feature map will be halved. This will lead to the reduction of the total data volume of the feature map, but after the feature map is transmitted to the cloud-side device, the number of channels increases and the resolution decreases, resulting in the image information corresponding to each channel number in the feature map. If it is reduced, the reconstructed feature map reconstructed by the cloud-side device will lose more image information, resulting in a large difference between the image processing result and the information actually indicated by the image to be processed.
  • the sending node uses a convolutional neural network including a feature compression layer to compress the first feature map to obtain the second feature map, and the difference between the two is mainly reflected in the second feature.
  • the number of channels of the map is smaller than the number of channels of the first feature map.
  • the second feature map corresponds to each channel
  • the image information remains unchanged, and the image information lost in the reconstructed feature map obtained by the receiving node after reconstructing the second feature map is reduced.
  • the image information corresponding to each channel in the second feature map is increased, and the reconstructed feature map obtained by the receiving node after reconstructing the second feature map is lost.
  • the image information is reduced, and the difference between the image processing result obtained by the receiving node according to the second feature map and the information actually indicated by the image to be processed is reduced.
  • the sending node sends the second feature map to the receiving node.
  • the sending node may first encode the second feature map to obtain a code stream, and send the code stream to the receiving node.
  • the encoding method may adopt a lossless encoding method, such as the LZMA (lempel-ziv-markov chain-algorithm) algorithm.
  • the coding method can also adopt lossy coding methods, such as joint photographic experts group (JPEG) coding, advanced video coding (AVC), high efficiency video coding (high efficiency video coding, HEVC) and other image coding methods, etc.
  • JPEG joint photographic experts group
  • AVC advanced video coding
  • HEVC high efficiency video coding
  • other image coding methods etc.
  • the encoding method may also adopt an entropy encoding method based on a convolutional neural network and arithmetic encoding, such as an entropy encoding method oriented to a variation auto encoder (VAE) feature map.
  • VAE variation auto encoder
  • the sending node may integerize the second feature map into 8 bits, form data in YUV400 format for each channel, and input the data into the HEVC or VVC encoder.
  • the sending node may further compress the second feature map using a lossless encoding algorithm after integerizing the second feature map into N-bit data.
  • the sending node may also use an encoder designed for feature map data to perform compression processing.
  • the sending node may also not encode the second feature map, for example, the sending node sends the second feature map output by the feature compression layer to the receiving node.
  • the interaction between the sending node and the receiving node may be through the network shown in FIG. 1 for data transmission.
  • the sending node and the receiving node may transmit through a transmission control protocol (transmission control protocol, TCP), an Internet protocol (Internet Protocol, IP), a TCP/IP protocol, and the like.
  • the receiving node reconstructs the second feature map by using the feature reconstruction layer included in the convolutional neural network to obtain a third feature map.
  • the second feature map is obtained by the sending node using the feature extraction layer and the feature compression layer included in the convolutional neural network to perform feature extraction and compression on the image to be processed, but in some possible examples, the second feature map is The feature map may also be forwarded by a network device communicating with the sending node after another processing device performs feature extraction and compression on the image to be processed.
  • the processing device may be a mobile phone, and the network device may be a router.
  • the dimension of the third feature map is denoted as W 3 ⁇ H 3 ⁇ C 3 .
  • the number of output channels can be K times the number of input channels, and K can be 2, 3, 4, 6, 8, 12, or 16, etc.
  • the convolution kernel of the above-mentioned deconvolution layer may be determined according to the actual computing power and image processing requirements of the receiving node.
  • the convolution kernel of the deconvolution layer can be 3 ⁇ 3, 5 ⁇ 5, or 7 ⁇ 7, etc.
  • the receiving node can also decode the code stream.
  • the decoding method of the receiving node and the encoding method of the sending node to the second feature map are as follows: matched.
  • FIG. 3 is a schematic structural diagram of a convolutional neural network provided by the present application.
  • the convolutional neural network 300 includes a first partial network 310 and a second partial network 320, wherein the sending node can use the first partial network 310 obtains a second feature map of the image to be processed, and the receiving node may process the second feature map by using the second partial network 320 to obtain an image processing result.
  • the first partial network 310 includes a feature extraction layer 311 and a feature compression layer 312
  • the second partial network 320 includes a feature reconstruction layer 321 and a feature output layer 322 .
  • the present application also provides the following possible implementation manners.
  • the feature compression layer 312 includes two convolutional layers, the first convolutional layer is used to downsample the first feature map, and the second convolutional layer is used to reduce the first feature The number of channels of the map to obtain the second feature map.
  • the number of input channels of the first convolutional layer is denoted as C in
  • the stride is 2
  • the input feature map of the second convolutional layer is the first convolutional layer.
  • Output feature map, step size stride 1, and the number of output channels C out2 ⁇ C in .
  • the feature reconstruction layer 321 includes a convolution layer and a deconvolution layer.
  • the convolution layer is used to increase the number of channels of the second feature map, and then the deconvolution layer realizes the upper and lower layers of the second feature map. Sampling to obtain a third feature map.
  • the feature compression layer 312 includes two convolutional layers, the first convolutional layer is used to reduce the number of channels of the first feature map, and the second convolutional layer is used to realize the Downsampling of the graph to obtain a second feature map.
  • the feature reconstruction layer 321 includes one convolution layer and one deconvolution layer.
  • the deconvolution layer realizes upsampling of the second feature map, and then the convolution layer increases the channel of the second feature map.
  • the feature compression layer 312 and the feature reconstruction layer 321 may be asymmetric structures.
  • the feature compression layer may also include more convolutional layers
  • the feature reconstruction layer may also include more convolutional layers and deconvolutional layers.
  • the output of the above-mentioned convolutional layer or deconvolutional layer can also be processed by an activation layer such as ReLU, a BN layer, etc., and then input to the next convolutional layer or deconvolutional layer to improve feature compression.
  • an activation layer such as ReLU, a BN layer, etc.
  • the non-linearity of the output feature map of the layer and the feature reconstruction layer makes the image processing more accurate, which is not limited in this application.
  • FIG. 4 is a schematic structural diagram of a convolutional neural network provided by the prior art.
  • the convolutional neural network 400 includes a feature extraction layer 410 and a feature output layer 420, and the feature compression layer shown in FIG. 3 312 can implement the same function as the feature extraction layer 410 , and the feature reconstruction layer 321 shown in FIG. 3 can implement the same function as the feature output layer 420 .
  • the feature extraction layer 410 may include a network layer conv1 and a network layer conv2, and both the network layer conv1 and the network layer conv2 may be convolutional layers.
  • the parameters of the image to be processed are: W ⁇ H ⁇ 3
  • the parameters of the output feature map of the network layer conv1 are: (W/2) ⁇ (H/2) ⁇ 64
  • the output feature map of the network layer conv2 are: (W/4) ⁇ (H/4) ⁇ 256.
  • the feature output layer 420 may include a network layer conv3, a network layer conv4, and a network layer conv5.
  • the network layers conv3 to conv5 may be convolutional layers.
  • the parameters of the output feature map of the network layer conv3 are: (W/8) ⁇ (H/8) ⁇ 512
  • the output feature map of the network layer conv4 has the following parameters: The parameters are: (W/16) ⁇ (H/16) ⁇ 1024, and the parameters of the output feature map of the network layer conv5 are: (W/32) ⁇ (H/32) ⁇ 2048.
  • the backbone network of the convolutional neural network 400 includes network layers conv1 to conv5, and the backbone network is used to extract multiple feature maps of the image to be processed.
  • the convolutional neural network further includes a neck network layer 424 (neck network) and a head network layer 425 (head network).
  • the neck network layer 424 can be used to further integrate the feature map output by the head network to obtain a new feature map.
  • the neck network layer 424 may be a feature pyramid network (FPN).
  • the head network layer 425 is used to process the feature map output by the neck network layer 424 to obtain an image processing result.
  • the head network contains fully connected layers, softmax modules, etc.
  • the present application introduces a feature compression layer and a feature reconstruction layer into the backbone network of the convolutional neural network, so that on the basis of ensuring image processing, the first feature map can be compressed, reducing the size of the first feature map.
  • the number of channels thereby reducing the amount of data transmitted by the sending node and the receiving node.
  • the receiving node processes the second feature map by using the feature output layer and the image processing layer included in the convolutional neural network to obtain an image processing result.
  • the image processing layer may include the neck network layer 424 and the head network layer 425 shown in FIG. 4 .
  • the image processing result indicates information of the image to be processed.
  • the image processing result may be a result of object detection on the image to be processed, and the information may be a certain area in the image to be processed.
  • the sending node may be the terminal 116 (such as a surveillance camera at an intersection), and the receiving node may be the server 131 .
  • the terminal 116 collects the image to be processed when the terminal 114 (such as a car or a truck, etc.) passes through the intersection, and sends the feature map obtained after feature extraction and compression of the to-be-processed image to the server 131. If the server 131 determines the license plate of the terminal 114 If it is a foreign license plate, the server 131 sends the foreign license plate information to the central control device of traffic management in the form of a data message.
  • the image processing result may be the result of performing face recognition on the image to be processed.
  • the sending node may be the terminal 115 (such as a surveillance camera of the administrative building), and the receiving node may be the server 131 .
  • the terminal 115 collects a face image when user 1 and user 2 enter the administrative building, and sends the feature map obtained after feature extraction and compression of the face image to the server 131, and the server 131 determines whether user 1 and user 2 are For a legal user registered in the administrative building, for example, the server 131 matches the facial features in the image to be processed with the facial comparison database.
  • the server 131 sends the verification pass information to the terminal 115.
  • the terminal 115 opens the entrance and exit gates according to the verification pass information, and the user 1 can enter the administrative building through the entrance and exit gates.
  • the image processing result may be a result of classifying objects on the image to be processed, and the information of the image to be processed may be object classification information in the image.
  • the sending node may be any one of the terminal 111 to the terminal 113
  • the receiving node may be the server 131 .
  • the terminal 111 collects various images in the house, including sofas, TVs, tables, etc., and sends the feature maps obtained after feature extraction and compression of this set of images to the server 131, and the server 131 according to the set of feature maps The classification of each object in the image and the shopping link corresponding to each type of image are determined, and the information is sent to the terminal 111 .
  • the image processing result may be the result of geolocating the image to be processed.
  • the sending node may be a terminal 114 (such as a driving recorder installed in a car or a truck), and the receiving node may be a server 131 .
  • the terminal 114 performs feature extraction and compression on the to-be-processed image (such as a road image, the road image includes the relative position information of houses, trees, administrative buildings and reference objects shown in FIG. 1 ) taken when approaching an intersection to obtain a feature map , and send the feature map to the server 131 .
  • the server 131 performs feature reconstruction and image processing on the feature map to obtain the geographic location corresponding to the image to be processed, and sends the geographic location to the terminal 114 .
  • the image processing method further includes the following steps S270 and S280.
  • the receiving node sends the image processing result.
  • the receiving node may send the image processing result to the sending node shown in FIG. 2 .
  • the receiving node can send the image processing results to other nodes.
  • the other node may be the central control device of the traffic management system.
  • the sending node only needs to send the second feature map to the receiving node, and the number of channels of the second feature map is smaller than the number of channels of the first feature map, so that the resolution of the first feature map is not
  • the amount of data sent by the sending node to the receiving node is reduced, which reduces the transmission delay on the terminal side and the cloud side.
  • the sending node displays the image processing result.
  • the sending node may have a display area, for example, the display area may include a display panel.
  • the display panel can be a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), an Active-Matrix Organic Light-Emitting Diode or an Active-Matrix Organic Light-Emitting Diode (Active-Matrix Organic Light).
  • Emitting Diode, AMOLED flexible light-emitting diode
  • FLED Flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (Quantum Dot Light Emitting Diodes, QLED) and so on.
  • the sending node may include one or more display screens 194 .
  • the sending node is a mobile phone and the image processing scene is object classification.
  • the sending node only displays the image to be processed, as shown in (a) in Figure 5, the mobile phone displays the pending image. Process an image, which includes multiple graphics.
  • the sending node only displays the image processing result, as shown in (b) in Figure 5, the mobile phone displays the types of objects in the image to be processed, such as water cups, pens, etc.
  • the sending node displays the image to be processed and the image processing result, as shown in (c) in Figure 5, when the mobile phone displays the image to be processed, the type of each object is marked in the to-be-processed image Process the corresponding position of the image.
  • the sending node may also perform other processing. For example, the name of the object in the image captured by the sending node is notified to the user; or, if the receiving node analyzes the second feature map and gives warning information, the sending node sends a warning voice prompt to remind the user in the environment corresponding to the sending node to pay attention to safety, etc. , which is not limited in this application.
  • the sending node can implement the functions of the terminal-side device and/or edge device as shown in FIG. 1
  • the receiving node can implement the functions of the cloud-side device as shown in FIG. 1 as an example.
  • the sending node can implement the function of the end-side device as shown in FIG. 1
  • the receiving node can also implement the function of the edge device as shown in FIG. 1 .
  • the above-mentioned convolutional neural network can be obtained by adding a bottleneck structure layer to an existing image processing network (the convolutional neural network 400 shown in FIG. 4 ), and training the bottleneck structure layer according to the image processing network, Among them, the bottleneck structure is a multi-layer network structure, the number of input channels and the number of output channels of the bottleneck structure layer are the same, and the number of channels of the intermediate feature map of the bottleneck structure layer is smaller than the number of input channels.
  • the input data of the bottleneck structure layer (such as the first feature map of the above image to be processed) first passes through one or more neural network layers to obtain intermediate data (such as the second feature map above), and then the intermediate data passes through 1 layer Or multi-layer neural network layers to obtain output data (such as the above-mentioned third feature map); wherein, the data volume of the intermediate data (ie the product of width, height and number of channels) is lower than the input data volume and the output data volume.
  • the image processing network includes a feature extraction layer, a feature output layer and an image processing layer
  • the bottleneck structure layer includes a feature compression layer and a feature reconstruction layer
  • the convolutional neural network can be It is divided into two parts: the first part of the network includes a feature extraction layer and a feature compression layer, and the second part of the network includes a feature reconstruction layer, a feature output layer and an image processing layer.
  • the first part of the network of the convolutional neural network is realized, and the above-mentioned function of the receiving node may be realized by the second part of the network of the convolutional neural network deployed on the receiving node.
  • the present application also provides a training method for a convolutional neural network, as shown in FIG. 6 , which is a training method provided by the present application.
  • FIG. 6 A schematic flowchart of the method, the training method can be executed by a sending node or a receiving node, or can be executed by other electronic devices, which is not limited in this application, and the training method includes the following steps.
  • the training set includes at least one training image.
  • the training set may include 50,000 to 100,000 training images, and the training images may be any of the following types: binary images, grayscale images, indexed images, and true-color images.
  • the first feature extraction layer and the first feature output layer of the first convolutional neural network train the first bottleneck structure layer of the first convolutional neural network to obtain a second convolutional neural network.
  • FIG. 7 is a schematic diagram 1 of training of a convolutional neural network provided by this application.
  • the first convolutional neural network 710 includes a first feature extraction layer T701 , a first bottleneck structure layer 711 and a first feature
  • the output layer P701 the second convolutional neural network 720 includes a first feature extraction layer T701, a second bottleneck structure layer 721 and a first feature output layer P701.
  • the first feature extraction layer T701 can be used to implement the function of the feature extraction layer 311 shown in FIG. 3
  • the first feature output layer P701 can be used to implement the function of the feature output layer 322 shown in FIG. 3
  • the second bottleneck structure layer 721 It can be used to realize the functions of the feature compression layer 312 and the feature reconstruction layer 321 shown in FIG. 3 .
  • the above-mentioned first bottleneck structure layer 711 and the second bottleneck structure layer 721 are both bottleneck structures, and the bottleneck structure is a multi-layer network structure.
  • the input data of the second bottleneck structure layer 721 (such as the above-mentioned first feature map of the image to be processed) first passes through one or more neural network layers to obtain intermediate data (such as the above-mentioned second feature map), and then the intermediate data
  • the output data (such as the above-mentioned third feature map) is obtained through one or more neural network layers; wherein, the data volume of the intermediate data (ie the product of the width, height and the number of channels) is lower than the input data volume and the output data volume.
  • the second feature map FM2' is smaller than the number of channels of the first feature map FM1.
  • the second feature map FM2' is obtained by performing feature extraction and compression on the training image by the first feature extraction layer T701 and the second bottleneck structure layer 721.
  • the first feature extraction layer T701 performs feature extraction on the training image to obtain the first feature map FM1
  • the feature compression layer 7211 in the second bottleneck structure layer 721 compresses the first feature map FM1 to obtain a second feature map FM2'.
  • the first feature map FM1 is obtained by performing feature extraction on the training image by the first feature extraction layer T701.
  • the number of channels of the second feature map FM2 is greater than or equal to the number of channels of the second feature map FM2 ′.
  • the second feature map FM2 is obtained by performing feature extraction and compression on the training image by the first feature extraction layer T701 and the first bottleneck structure layer 711.
  • the first feature extraction layer T701 performs feature extraction on the training image to obtain the first feature map FM1
  • the feature compression layer 7111 in the first bottleneck structure layer 711 compresses the first feature map FM1 to obtain the second feature map FM2.
  • the resolution of the image to be processed will be reduced or unchanged, and the first convolutional neural network is trained by using the training method provided in this application, and the obtained
  • the second convolutional neural network can perform feature extraction and compression on the image to be processed, which reduces the number of channels of the feature map of the image to be processed, thereby reducing the amount of data of the feature map sent by the sending node to the receiving node.
  • first convolutional neural network and the second convolutional neural network have the same first feature extraction layer and first feature output layer, only the bottleneck structure layer needs to be trained during the training process of the convolutional neural network, reducing The computational resources required to train a convolutional neural network.
  • the third convolutional neural network 730 includes a second feature extraction layer T702 and a second feature output layer P702.
  • the parameters of the first feature extraction layer T701 are the same as those of the first feature extraction layer T701.
  • the parameters of the two feature extraction layers T702 are the same, and the parameters of the first feature output layer P701 are the same as the parameters of the second feature output layer P702.
  • the third convolutional neural network 730 may be used to implement the functions of the convolutional neural network 400 shown in FIG. 4 .
  • the above parameters may include the maximum number of iterations (max_batches) or the batch size (batch_size), etc.
  • FIG. 8B is a third schematic diagram of training of a convolutional neural network provided by this application.
  • the technical content of the first convolutional neural network 710 shown in FIG. 8B please refer to the first convolutional neural network shown in FIG. 7 above. 710, which will not be repeated here.
  • a conventional approach is to use the training set (including the training image and the annotation information corresponding to the training image, such as object detection frame and feature category, etc.) used in training the third convolutional neural network 730, to The first convolutional neural network 710 is trained with the model parameters of the first bottleneck structure layer 711 .
  • the training set used when training the third convolutional neural network 730 includes the label information of the training images, which causes the training of the first convolutional neural network 710 to consume enormous computing resources and the training speed is slow.
  • FIG. 9 is a schematic flowchart of another training method provided by the present application.
  • S620 it may include the training methods corresponding to the operation steps of S621 to S624 .
  • the first set includes a fourth feature map
  • the fourth feature map is obtained after the second feature extraction layer and the second feature output layer perform feature extraction and image processing on the training image.
  • the second feature extraction layer T702 shown in FIG. 8A and FIG. 8B includes network layer conv1 and network layer conv2
  • the second feature output layer P702 includes network layer conv3, network layer conv4 and network layer conv5 as an example for description.
  • the first set may include a fourth feature map FM4_1 obtained by processing the first feature map FM1 by the network layer conv3, a fourth feature map FM4_2 obtained by processing the fourth feature map FM4_1 by the network layer conv4, and a fourth feature map FM4_2 obtained by processing the fourth feature map FM4_2 by the network layer conv5 Any one or more of the fourth feature maps FM4_3.
  • the third convolutional neural network does not include the bottleneck structure layer for illustration.
  • the third convolutional neural network may also include a bottleneck structure layer.
  • the first The convolutional neural network has one more first bottleneck structure layer than the third convolutional neural network.
  • the first convolutional neural network not only includes all the structures of the third convolutional neural network, but also includes the above-mentioned first bottleneck structure layer.
  • the third convolutional neural network is used. The network trains the first bottleneck structure layer of the first convolutional neural network to obtain a second convolutional neural network.
  • S622 Input the training set into the first convolutional neural network to obtain a second set.
  • the second set includes a fifth feature map
  • the fifth feature map is obtained by performing feature reconstruction and image processing on the second feature map by the first bottleneck structure layer and the first feature output layer.
  • the first feature extraction layer T701 shown in FIG. 8B includes the network layer conv1 and the network layer conv2
  • the first feature output layer P701 includes the network layer conv3, the network layer conv4 and the network layer conv5 as an example for description.
  • the second set of It can include the fifth feature map FM5_1 obtained by the network layer conv3 processing the third feature map FM3, the fifth feature map FM5_2 obtained by the network layer conv4 processing the fifth feature map FM5_1, and the fifth feature map FM5_2 obtained by the network layer conv5 processing the fifth feature map FM5_2. Any one or more of the feature maps FM5_3.
  • the loss function may be calculated according to the distance between the fourth feature map and the fifth feature map.
  • the loss function may be calculated using the mean absolute error L1norm (1-norm) of the fourth and fifth feature maps.
  • L1norm (1-norm) of the fourth and fifth feature maps.
  • the loss function may also be calculated using the mean square error L2norm (2-norm) of the fourth feature map and the fifth feature map.
  • L2norm (2-norm) the mean square error of the fourth feature map and the fifth feature map.
  • Loss the loss function
  • L2norm the mean square error L2norm (2-norm) of the fourth feature map and the fifth feature map.
  • Loss w1 ⁇ L2(FM4_1, FM5_1)+ w2 ⁇ L2(FM4_2, FM5_2)+w3 ⁇ L2(FM4_3, FM5_3).
  • L2(A, B) represents the L2norm (2-norm) that calculates the difference between the two three-dimensional data of A and B.
  • a regularization term of the second feature map FM2 may also be used to calculate the loss function, and the regularization term may include any one of the following three items.
  • the average amplitude of the second feature map FM2, the weighting coefficient of this item is a negative real number.
  • the L1norm of the gradient of the feature element of the second feature map FM2 and its adjacent feature elements in the same channel, the weighting coefficient of this item is a negative real number.
  • the coded bit estimation value of the second feature map FM2, the weighting coefficient of this item is a positive real number.
  • the second feature map of the image to be processed is obtained by compressing the first feature map.
  • the regularization term of the second feature map FM2 is added. According to the regularization term of the second feature map To train the first bottleneck structure layer, it is beneficial to reduce the error caused by compressing the first feature map.
  • the second bottleneck structure layer is obtained, and then the second convolutional neural network is obtained.
  • BP backward propagation
  • the condition for obtaining the second convolutional neural network may be that the number of backpropagation times reaches a threshold, or the value of the loss function is less than or equal to the threshold, or it may be obtained by two adjacent calculations.
  • the difference between the loss function values is less than or equal to the threshold, which is not limited in this application.
  • the training method provided by this application compares the distance between the corresponding multi-layer feature maps (the fourth feature map and the fifth feature map) in the first convolutional neural network and the third convolutional neural network Calculate the loss function to obtain the second convolutional neural network, which is beneficial to reduce the distance between the fourth feature map and the fifth feature map as small as possible, thereby reducing the error between the first feature map and the third feature map, Improve the accuracy of image processing.
  • the above S622 may include the following steps S622a to S622d.
  • the first feature extraction layer T701 performs feature extraction on the training image to obtain a first feature map FM1.
  • the feature compression layer 7111 compresses the first feature map FM1 to obtain a second feature map FM2 (such as the sixth feature map above), and the number of channels of the second feature map FM2 is smaller than that of the first feature map FM1 number of channels.
  • the feature reconstruction layer 7111 reconstructs the second feature map FM2 to obtain a third feature map FM3.
  • the first feature map FM1 and the third feature map FM3 have the same number of channels, and the first feature map FM1
  • the resolution of the third feature map FM3 may also be the same.
  • the first feature output layer 7112 processes the third feature map FM3 to obtain a fifth feature map (for example, any one of the fifth feature map FM5_1, the fifth feature map FM5_2, and the fifth feature map FM5_3 or multiple).
  • a fifth feature map for example, any one of the fifth feature map FM5_1, the fifth feature map FM5_2, and the fifth feature map FM5_3 or multiple.
  • the above-mentioned S623 may include the following steps S623a-S623c.
  • the first distance may be L1norm or L2norm between the fourth feature map and the fifth feature map.
  • the second distance may be L1norm or L2norm between the first feature map and the third feature map.
  • the second distance may be L2norm of the first feature map FM1 and the third feature map FM3.
  • S623c Calculate a loss function according to the first distance and the second distance.
  • the loss function is based on the mean square error L2norm (2-norm) of the fourth feature map FM4 and the fifth feature map FM5, and the first feature map FM1 and the third feature map FM3.
  • L2(A, B) represents the L2norm (2-norm) that calculates the difference between the two three-dimensional data of A and B.
  • the second distance between the first feature map and the third feature map is added to calculate the loss function, which is beneficial to the fourth feature
  • the distance between the feature map and the fifth feature map is as small as possible, and the distance between the first feature map and the third feature map is as small as possible, which reduces the processing error of the feature compression layer and feature reconstruction layer, and improves the The accuracy of image processing.
  • the PCA method is often used to reduce the data dimension of the feature map.
  • PCA is a multivariate statistical analysis method that selects a small number of important variables by linearly transforming multiple variables.
  • the 128-channel original feature map corresponding to a group of images is obtained through the end-side device using the PCA method to obtain the principal component of each image.
  • the group of images includes 3 images, the main The number of components is 47, 48, and 49 in sequence.
  • the number of principal components of each image is different.
  • the number of channels of the reconstructed feature map for each image may become 126, 127, and 128, resulting in a difference in the number of channels of the original and reconstructed feature maps.
  • the first feature map before compression and the reconstructed feature map are considered.
  • the distance between the third feature map of the second convolutional neural network obtained by training makes the process of compressing and reconstructing the first feature map in the second convolutional neural network, but the difference between the first feature map and the third feature map is significantly smaller than that of the PCA method.
  • the PCA method can achieve a data reduction of about 3 times when the mean Average Precision (mAP) metric decreases by 2%.
  • mAP mean Average Precision
  • the 128-channel original feature map corresponding to a set of images is When the average number of principal components generated after PCA was 47.9, the mAP index decreased by about 2%.
  • the second convolutional neural network obtained by using the training method of the present application can perform 64 times data reduction on the first feature map of 128 channels corresponding to a set of images (for example, the width, height, number of channels of the second feature map are respectively reduced to 1/4 of the width, height and number of channels of the first feature map), the code stream compressed by the second feature map after 64 times of data reduction is compared with the code stream compressed by the unreduced first feature map, The amount of data is reduced by 90%, and the loss of mAP is less than 1%.
  • the training method provided in this application it is only necessary to input a large number of training images, and use the training images to stimulate the feature maps generated in the first convolutional neural network and the third convolutional neural network as a guide, without relying on artificial vision tasks. Labeling the data reduces the data dependence of the training images; and using the feature map as a training guide makes the training method provided in this application more versatile.
  • the host includes corresponding hardware structures and/or software modules for executing each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.
  • FIG. 10 is a schematic structural diagram of a training device and an image processing device provided by the present application.
  • the structures and functions of the training device 1010 , the first image processing device 1020 and the second image processing device 1030 are described below with reference to FIG. 10 . It should be understood that this embodiment only describes the training device 1010 , the first image processing device 1020 and the second image processing device 1030
  • the structure and functional modules of the image processing apparatus 1030 are exemplarily divided, and the present application does not make any limitation to the specific division.
  • the training apparatus 1010 includes an acquisition unit 1011 and a processing unit 1012 , and the training apparatus 1010 is configured to implement the training method corresponding to each operation step in the method embodiment shown in FIG. 6 or FIG. 9 above.
  • the acquisition unit 1011 is used to perform the above-mentioned S610
  • the processing unit 1012 is used to perform the above-mentioned S620.
  • the processing unit 1012 includes a first training unit 1012a, a second training unit 1012b, a loss calculation unit 1012c, and a third training unit 1012d .
  • the first training unit 1012a is used to execute S621
  • the second training unit 1012b is used to execute S622 and its possible sub-steps S622a-S622d
  • the loss calculation unit 1012c is used to execute S623 and its possible sub-steps S623a-S623c
  • the third training Unit 1012d is used to perform S624.
  • the first image processing apparatus 1020 includes a first transceiver unit 1021 , a feature extraction unit 1022 , a feature compression unit 1023 and a display unit 1024 .
  • the first image processing apparatus 1020 is configured to implement the image processing method corresponding to each operation step of the sending node in the method embodiment shown in FIG. 2 above.
  • the first image processing apparatus 1020 When the first image processing apparatus 1020 is used to implement the function of the sending node in the method embodiment shown in FIG. 2, the first transceiver unit 1021 is used to perform the above-mentioned S210 and S240, the feature extraction unit 1022 is used to perform the above-mentioned S220, The feature compression unit 1023 is configured to perform the above-mentioned S230. Optionally, the display unit 1024 is configured to perform the above-mentioned S280.
  • the second image processing apparatus 1030 includes a second transceiver unit 1031 , a feature reconstruction unit 1032 and an image processing unit 1033 .
  • the first image processing device 1020 is configured to implement the image processing method corresponding to each operation step of the receiving node in the method embodiment shown in FIG. 2 above.
  • the second image processing apparatus 1030 When the second image processing apparatus 1030 is used to implement the function of the receiving node in the method embodiment shown in FIG. 2 , the second transceiver unit 1031 is used to perform the above S270, the feature reconstruction unit 1032 is used to perform the above S250, and the image The processing unit 1033 is configured to execute the above-mentioned S260.
  • the first image processing apparatus 1020 and the second image processing apparatus 1030 can be obtained directly by referring to the relevant descriptions in the method embodiments shown in FIG. 2 , FIG. 6 or FIG. 9 , which are not added here. Repeat.
  • FIG. 11 is a schematic structural diagram of a communication apparatus provided by the present application.
  • the communication apparatus 1100 includes a processor 1110 and a communication interface 1120 .
  • the processor 1110 and the communication interface 1120 are coupled to each other.
  • the communication interface 1120 can be a transceiver or an input-output interface.
  • the communication apparatus 1100 may further include a memory 1130 for storing instructions executed by the processor 1110 or input data required by the processor 1110 to execute the instructions or data generated after the processor 1110 executes the instructions.
  • the functions of the training apparatus 1010 , the first image processing apparatus 1020 , and the second image processing apparatus 1030 can be implemented, which will not be repeated here.
  • the specific connection medium between the communication interface 1120, the processor 1110, and the memory 1130 is not limited in this embodiment of the present application.
  • the communication interface 1120, the processor 1110, and the memory 1130 are connected through a bus 1140 in FIG. 11.
  • the bus is represented by a thick line in FIG. 11, and the connection between other components is only for schematic illustration. , is not limited.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the memory 1130 may be used to store software programs and modules, such as program instructions/modules corresponding to the image processing method and the training method provided in the embodiments of the present application.
  • the processor 1110 executes each software program and module stored in the memory 1130 by executing the functional applications and data processing.
  • the communication interface 1120 can be used for signaling or data communication with other devices. In this application, the communication device 1100 may have a plurality of communication interfaces 1120 .
  • the above-mentioned memory may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), can Erasable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Electrical Erasable Programmable Read-Only Memory (EEPROM), etc.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrical Erasable Programmable Read-Only Memory
  • the above-mentioned processor may be an integrated circuit chip with signal processing capability.
  • the processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, removable hard disk, CD-ROM or any other form of storage medium well known in the art .
  • An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage medium may reside in an ASIC.
  • the ASIC can be located in the communication device or in the terminal device.
  • the processor and the storage medium may also exist in the communication device or the terminal device as discrete components.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer programs or instructions.
  • the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, special purpose computer, computer network, communication device, user equipment, or other programmable device.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website site, computer, A server or data center transmits by wire or wireless to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, data center, or the like that integrates one or more available media.
  • the usable medium can be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it can also be an optical medium, such as a digital video disc (DVD); it can also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).
  • a magnetic medium such as a floppy disk, a hard disk, and a magnetic tape
  • an optical medium such as a digital video disc (DVD)
  • DVD digital video disc
  • it can also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).
  • “at least one” means one or more, and “plurality” means two or more.
  • “And/or”, which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are a kind of "or” relationship; in the formula of this application, the character "/” indicates that the related objects are a kind of "division" Relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请提供一种图像处理方法、训练方法及装置,涉及图像处理领域。该图像处理方法包括:发送节点获取到待处理图像,卷积神经网络的特征提取层对待处理图像进行特征提取,得到第一特征图,以及卷积神经网络包括的特征压缩层压缩第一特征图得到第二特征图,该第二特征图的通道数小于第一特征图的通道数;发送节点向接收节点发送第二特征图。利用特征提取层提取待处理图像的第一特征图,利用特征压缩层压缩第一特征图得到第二特征图,使得第二特征图的通道数小于第一特征图的通道数,在第一特征图的分辨率未增大的情况下,第二特征图的数据量小于第一特征图的数据量,减少发送节点向接收节点发送的特征图的数据量,降低了端侧和云侧的传输时延。

Description

一种图像处理方法、训练方法及装置
本申请要求于2021年04月08日提交俄罗斯专利局、申请号为2021109673、申请名称为“一种图像处理方法、训练方法及装置”的俄罗斯专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,尤其涉及一种图像处理方法、训练方法及装置。
背景技术
图像是人类感知世界的视觉基础,人类可以利用图像来获取信息、表达信息和传递信息。为了快速获取图像信息,可以利用神经网络对图像进行处理,实现如图像分类、人脸识别、目标检测等功能。通常,端侧设备将图像数据发送到部署有神经网络的云侧设备,并由云侧设备进行图像处理,然而,图像数据的数据量很大,导致端云交互的时延较高。
目前的技术方案提供了一种基于特征图传输的端云协同方案,端侧设备提取待处理图像的原始特征图,并利用主成分分析(principal components analysis,PCA)方法提取原始特征图的多个主分量,端侧设备将该多个主分量的线性组合发送至云侧设备,由云侧设备依据该多个主分量得到重建特征图,并依据重建特征图获得图像处理结果。然而,该多个主分量的数据量依然很大,导致云侧设备接收该多个主分量需要较长时间。
因此,如何在图像处理过程中,减少端云传输的数据量是目前亟需解决的问题。
发明内容
本申请提供一种图像处理方法、训练方法及装置,解决了图像处理过程中传输数据量大的问题。
为达到上述目的,本申请采用如下技术方案。
第一方面,本申请提供了一种图像处理方法,该方法可应用于发送节点,或者该方法可应用于可以支持终端设备实现该方法的通信装置,例如该通信装置包括芯片系统,该方法包括:发送节点获取到待处理图像,将待处理图像输入卷积神经网络,卷积神经网络包括的特征提取层对待处理图像进行特征提取,得到第一特征图,以及卷积神经网络包括的特征压缩层压缩第一特征图得到第二特征图,该第二特征图的通道数小于第一特征图的通道数。进而,发送节点向接收节点发送第二特征图。由于利用特征提取层对待处理图像进行特征提取得到第一特征图,使得第一特征图的数据量小于待处理图像的数据量;此外,利用特征压缩层压缩第一特征图得到第二特征图,使得第二特征图的通道数小于第一特征图的通道数,在第一特征图的分辨率未增大的情况下,第二特征图的数据量小于第一特征图的数据量,进一步减少了发送节点向接收节点发送的特征图的数据量,降低了端侧和云侧的传输时延。
在一种可能的示例中,第一特征图的分辨率为W×H,第二特征图的分辨率为W`× H`,W`×H`<W×H。特征图的数据量是由分辨率与通道数的乘积确定的,例如,在第二特征图的分辨率小于第一特征图的分辨率的情况下,第二特征图的通道数小于第一特征图的通道数,因此第二特征图的数据量小于第一特征图的数据量,这减少了发送节点向接收节点发送的特征图的数据量,降低了端侧和云侧的传输时延。
在一种可能的实现方式中,特征压缩层包括至少一层卷积层。例如,该卷积层可以用于对第一特征图进行下采样,减小第一特征图的分辨率。又如,该卷积层还可以用于减少第一特征图的通道数,得到第二特征图。
在另一种可能的实现方式中,该方法还包括:发送节点接收待处理图像的图像处理结果,该图像处理结果为接收节点依据第二特征图确定的。发送节点对待处理图像进行特征提取和压缩得到第二特征图,接收节点利用第二特征图确定待处理图像的图像处理结果,实现了图像处理方法的端云交互,克服了发送节点的算力或存储不足的缺陷。
在另一种可能的实现方式中,该方法还包括:发送节点显示图像处理结果。通过对待处理图像和图像处理结果进行显示,有利于用户获得待处理图像中的各种信息,如物理分类、人脸识别、目标检测等,减少人力获取图像信息的过程,提高获取视觉信息的效率。
第二方面,本申请提供了一种图像处理方法,该方法可应用于接收节点,或者该方法可应用于可以支持终端设备实现该方法的通信装置,例如该通信装置包括芯片系统,该方法包括:接收节点接收第二特征图,并利用卷积神经网络包括的特征重构层,对第二特征图进行重构得到第三特征图,接收节点利用卷积神经网络包括的特征输出层和图像处理层,对第三特征图进行处理得到图像处理结果,接收节点还发送该图像处理结果,该图像处理结果指示待处理图像的信息。其中,第二特征图是发送节点利用卷积神经网络,对待处理图像进行特征提取得到第一特征图,并将第一特征图进行压缩得到的;第二特征图的通道数小于第三特征图的通道数,且第一特征图的通道数和第三特征图的通道数相同。在本申请提供的图像处理方法中,接收节点仅需依据发送节点发送的第二特征图确定待处理图像的图像处理结果,该第二特征图的通道数小于图像处理所需的第三特征图的通道数,使得在第二特征图的分辨率未增大的情况下,接收节点接收来自发送节点的数据量减少,降低了端侧和云侧的传输时延。
在一种可选的实现方式中,特征重构层包括至少一层反卷积层。例如,该反卷积层可以用于对第二特征图进行上采样,提升第二特征图的分辨率。又如,该反卷积层还可以用于增加第二特征图的通道数,得到第三特征图。
在另一种可选的实现方式中,第二特征图的分辨率为W`×H`,第三特征图的分辨率为W×H,W`×H`<W×H。特征图的数据量是由分辨率与通道数的乘积确定的,例如,在第二特征图的分辨率小于第一特征图的分辨率的情况下,第二特征图的通道数小于第一特征图的通道数,因此第一特征图的数据量小于第二特征图的数据量。
在另一种可选的实现方式中,上述接收节点接收第二特征图,包括:接收节点接收发送节点发送的第二特征图。上述接收节点发送图像处理结果,包括:接收节点向发送节点发送图像处理结果。
第三方面,本申请还提供一种卷积神经网络的训练方法,该训练方法可应用于可以支持终端设备实现该方法的通信装置,例如该通信装置包括芯片系统,该方法包括:获取包括至少一幅训练图像的训练集,并依据训练集、第一卷积神经网络的第一特征提取层和第 一特征输出层,训练第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络,该第二卷积神经网络包括第一特征提取层、第二瓶颈结构层和第一特征输出层;其中,第一特征提取层用于对待处理图像进行特征提取得到第一特征图,第二瓶颈结构层中的特征压缩层用于压缩第一特征图得到第二特征图,第二特征图的通道数小于第一特征图的通道数。卷积神经网络提取待处理图像的特征图时,待处理图像的分辨率会减小或不变,采用本申请提供的训练方法对第一卷积神经网络进行训练,得到的第二卷积神经网络可以对待处理图像进行特征提取和压缩,减少了待处理图像的特征图的通道数,进而减少了发送节点向接收节点发送的特征图的数据量。另外,由于第一卷积神经网络和第二卷积神经网络具有相同的第一特征提取层和第一特征输出层,在卷积神经网络的训练过程中仅需对瓶颈结构层进行训练,减少了训练卷积神经网络所需的计算资源。
在一种可选的实现方式中,依据训练集、第一卷积神经网络的第一特征提取层和第一特征输出层,训练第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络,包括:将训练集输入第三卷积神经网络得到第一集合,以及将训练集输入第一卷积神经网络得到第二集合,进而,依据第一集合中的第四特征图和第二集合中的第五特征图计算损失函数,以及依据该损失函数更新第一瓶颈结构层的参数,获取第二瓶颈结构层,得到第二卷积神经网络。其中,第三卷积神经网络包括第二特征提取层和第二特征输出层,第一特征提取层的参数与第二特征提取层的参数相同,第一特征输出层的参数与第二特征输出层的参数相同,第一集合包括的第四特征图为第二特征提取层和第二特征输出层对训练图像进行特征提取后得到的;第二集合包括的第五特征图为第一瓶颈结构层和第一特征输出层对第二特征图进行特征重构和处理得到的。本申请提供的训练方法,特别是对于同一训练图像,对第一卷积神经网络和第三卷积神经网络中对应的多层特征图(第四特征图和第五特征图)之间的距离计算损失函数,得到第二卷积神经网络,有利于第四特征图和第五特征图之间的距离尽可能的变小,进而减小第一特征图和第三特征图之间的误差,提高图像处理的准确率。
在另一种可选的实现方式中,将训练集输入第一卷积神经网络,得到第二集合,包括:利用第一特征提取层,对训练图像进行特征提取得到第一特征图;利用第一瓶颈结构层包括的特征压缩层,压缩第一特征图得到第六特征图;还利用第一瓶颈结构层包括的特征重构层,重构第六特征图得到第三特征图,进而,利用第二特征输出层,处理第三特征图得到第二集合包括的第五特征图。其中,第三特征图的通道数和第一特征图的通道数相同;第六特征图的通道数小于第一特征图的通道数。
在另一种可选的实现方式中,依据第一集合中的第四特征图和第二集合中的第五特征图计算损失函数,包括:获取第四特征图与第五特征图之间的第一距离,以及获取第一特征图和第三特征图之间的第二距离,依据第一距离和第二距离计算损失函数。在利用第四特征图和第五特征图之间的第一距离计算损失函数的基础上,增加第一特征图和第三特征图之间的第二距离来计算损失函数,有利于第四特征图和第五特征图之间的距离尽可能的小,以及第一特征图和第三特征图之间的距离尽可能的小,减少了特征压缩层和特征重构层的处理误差,提高了图像处理的准确率。
在另一种可选的实现方式中,第一特征图的分辨率为W×H,第二特征图的分辨率为W`×H`,W`×H`<W×H。
第四方面,本申请还提供一种图像处理装置,有益效果可以参见第一方面中任一方面的描述,此处不再赘述。所述图像处理装置具有实现上述第一方面中任一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,该图像处理装置应用于发送节点,该图像处理装置包括:收发单元,用于获取待处理图像;特征提取单元,用于利用卷积神经网络包括的特征提取层,对待处理图像进行特征提取,得到第一特征图;特征压缩单元,用于利用卷积神经网络包括的特征压缩层,压缩第一特征图得到第二特征图,第二特征图的通道数小于第一特征图的通道数;收发单元,还用于向接收节点发送第二特征图。
在一种可能的实现方式中,特征压缩层包括至少一层卷积层。
在另一种可能的实现方式中,第一特征图的分辨率为W×H,第二特征图的分辨率为W`×H`,W`×H`<W×H。
在另一种可能的实现方式中,收发单元还用于接收待处理图像的图像处理结果,图像处理结果为接收节点依据第二特征图确定的。
在另一种可能的实现方式中,该图像处理装置还包括:显示单元,用于显示待处理图像和/或图像处理结果。
第五方面,本申请还提供另一种图像处理装置,有益效果可以参见第二方面中任一方面的描述,此处不再赘述。所述图像处理装置具有实现上述第二方面中任一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,该图像处理装置应用于接收节点,该图像处理装置包括:收发单元,用于接收第二特征图,第二特征图是发送节点利用卷积神经网络,对待处理图像进行特征提取得到第一特征图,并将第一特征图进行压缩得到的;特征重构单元,用于利用卷积神经网络包括的特征重构层,对第二特征图进行重构,得到第三特征图,第二特征图的通道数小于第三特征图的通道数,第一特征图的通道数和第三特征图的通道数相同;图像处理单元,用于利用卷积神经网络包括的特征输出层和图像处理层,对第三特征图进行处理得到图像处理结果,图像处理结果指示待处理图像的信息;收发单元,用于发送图像处理结果。
在一种可能的实现方式中,特征重构层包括至少一层反卷积层。
在另一种可能的实现方式中,第二特征图的分辨率为W`×H`,第三特征图的分辨率为W×H,W`×H`<W×H。
在另一种可能的实现方式中,收发单元,具体用于接收发送节点发送的第二特征图;收发单元,具体用于向发送节点发送图像处理结果。
第六方面,本申请还提供一种卷积神经网络的训练装置,有益效果可以参见第三方面中任一方面的描述,此处不再赘述。所述图像处理装置具有实现上述第三方面中任一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,该训练装置包括:获取单元,用于获取训练集,训练集包括至少一幅训练图像;处理单元,用于依据训练集、第一卷积神经网络的第一特征提取层和第一特征输出层,训练第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络,第二卷积神经网络包括第一特征提取 层、第二瓶颈结构层和第一特征输出层;其中,第一特征提取层用于对待处理图像进行特征提取得到第一特征图,第二瓶颈结构层中的特征压缩层用于压缩第一特征图得到第二特征图,第二特征图的通道数小于第一特征图的通道数。
在一种可能的实现方式中,处理单元包括:第一训练单元,用于将训练集输入第三卷积神经网络,得到第一集合,第三卷积神经网络包括第二特征提取层和第二特征输出层,第一特征提取层的参数与第二特征提取层的参数相同,第一特征输出层的参数与第二特征输出层的参数相同,第一集合包括第四特征图,第四特征图为第二特征提取层和第二特征输出层对训练图像进行特征提取后得到的;第二训练单元,用于将训练集输入第一卷积神经网络,得到第二集合,第二集合包括第五特征图,第五特征图为第一瓶颈结构层和第一特征输出层对第二特征图进行特征重构和处理得到的;损失计算单元,用于依据第一集合中的第四特征图和第二集合中的第五特征图计算损失函数;第三训练单元,用于依据损失函数更新第一瓶颈结构层的参数,获取第二瓶颈结构层,得到第二卷积神经网络。
在另一种可能的实现方式中,将训练集输入第一卷积神经网络,得到第二集合,包括:利用第一特征提取层,对训练图像进行特征提取得到第一特征图;利用第一瓶颈结构层包括的特征压缩层,压缩第一特征图得到第六特征图,第六特征图的通道数小于第一特征图的通道数;利用第一瓶颈结构层包括的特征重构层,重构第六特征图得到第三特征图,第三特征图的通道数和第一特征图的通道数相同;利用第二特征输出层,处理第三特征图得到第二集合包括的第五特征图。
在另一种可能的实现方式中,损失计算单元,具体用于获取第四特征图与第五特征图之间的第一距离;损失计算单元,具体用于获取第一特征图和第三特征图之间的第二距离;损失计算单元,具体用于依据第一距离和第二距离计算损失函数。
在另一种可能的实现方式中,第一特征图的分辨率为W×H,第二特征图的分辨率为W`×H`,W`×H`<W×H。
第七方面,本申请还提供一种通信装置,包括处理器和接口电路,接口电路用于接收来自通信装置之外的其它通信装置的信号并传输至处理器或将来自处理器的信号发送给通信装置之外的其它通信装置,处理器通过逻辑电路或执行代码指令用于实现第一方面和第一方面中任一种可能实现方式,或第二方面和第二方面中任一种可能实现方式,或第三方面和第三方面中任一种可能实现方式的方法的操作步骤。
第八方面,本申请提供一种计算机可读存储介质,存储介质中存储有计算机程序或指令,当计算机程序或指令被通信装置执行时,实现第一方面和第一方面中任一种可能实现方式,或第二方面和第二方面中任一种可能实现方式,或第三方面和第三方面中任一种可能实现方式的方法的操作步骤。
第九方面,本申请提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算设备实现第一方面和第一方面中任一种可能实现方式,或第二方面和第二方面中任一种可能实现方式,或第三方面和第三方面中任一种可能实现方式的方法的操作步骤。
第十方面,本申请提供一种芯片,包括存储器和处理器,存储器用于存储计算机指令,处理器用于从存储器中调用并运行该计算机指令,以执行上述第一方面及其第一方面任意可能的实现方式中的方法,或第二方面和第二方面中任一种可能实现方式,或第三方面和第三方面中任一种可能实现方式的方法的操作步骤。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种端云协同方案的系统示意图;
图2为本申请提供的一种图像处理方法的流程示意图;
图3为本申请提供的一种卷积神经网络的结构示意图;
图4为现有技术提供的一种卷积神经网络的结构示意图;
图5为本申请提供的一种图像处理的显示示意图;
图6为本申请提供的一种训练方法的流程示意图;
图7为本申请提供的一种卷积神经网络的训练示意图一;
图8A为本申请提供的一种卷积神经网络的训练示意图二;
图8B为本申请提供的一种卷积神经网络的训练示意图三;
图9为本申请提供的另一种训练方法的流程示意图;
图10为本申请提供的一种训练装置和图像处理装置的结构示意图;
图11为本申请提供的一种通信装置的结构示意图。
具体实施方式
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍。
图1为本申请提供的一种端云协同方案的系统示意图,该系统包括端侧设备110、边缘设备120和云侧设备130。端侧设备110可以通过无线或有线方式与边缘设备120连接。端侧设备110可以通过无线或有线方式与云侧设备130连接。边缘设备120可以通过无线或有线方式与云侧设备130连接。示例的,端侧设备110、边缘设备120和云侧设备130之间均可以通过网络进行通信,该网络可以是互联网络。
端侧设备110可以是终端设备、用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等。端侧设备110可以是手机(如图1中所示出的终端111)、平板电脑(如图1中所示出的终端112)、带无线收发功能的电脑(如图1中所示出的终端113)、虚拟现实(Virtual Reality,VR)终端设备、增强现实(Augmented Reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端(如图1中所示出的终端114)、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端(如图1中所示出的终端115和终端116)、智慧家庭(smart home)中的无线终端等等。例如,终端114可以是自动驾驶系统中用于图像处理的装置,终端115可以是用于道路监控的摄像装置,终端116可以是用于人脸识别的采集装置(如相机)。本申请的实施例对端侧设 备所采用的具体技术和具体设备形态不做限定。
如果端侧设备110的算力或存储能力较强,则端侧设备110可以利用人工智能(artificial intelligence,AI)神经网络对待处理图像进行图像处理。待处理图像可以是由端侧设备110采集的,待处理图像也可以是由与端侧设备110通信连接的图像采集设备实现的,图像采集设备可以是摄像机、相机等。示例的,该待处理图像可以是相机采集的一幅图像,该待处理图像也可以是摄像机采集的视频中的一帧图像。
如果端侧设备110的算力或存储不足,无法运行复杂的AI神经网络进行图像处理。
在一种可能的实现方式中,端侧设备110可以将图像传输至边缘设备120或云侧设备130,边缘设备120或云侧设备130运行AI神经网络,对图像进行处理得到图像处理结果。例如,道路监控上的终端116在终端114(如汽车或货车等)通过路口时采集道路图像,并将道路图像发送至边缘设备120,边缘设备120运行AI神经网络,判断终端114的车牌是否为本地车牌,若终端114的车牌为外地车牌,则边缘设备120将该车牌的信息和终端114的图像发送给交通管理的终端设备。
在另一种可能的实现方式中,端侧设备110可以将图像传输至边缘设备120,边缘设备120将图像进行预处理,并将预处理得到的结果发送到云侧设备130,云侧设备130获取图像的处理结果。示例的,将AI神经网络划分为2部分网络:第一部分网络用于提取图像的原始特征图(feature map),第二部分网络用于根据该原始特征图得到图像的图像处理结果。例如,边缘设备120运行该第一部分网络并将图像的原始特征图发送到云侧设备130,云侧设备130运行该第二部分网络处理该原始特征图,得到图像处理结果。
云侧设备130可以是用于处理图像数据的服务器,如图1所示出的服务器131。另外,云侧设备130还可以是服务器131利用虚拟化技术提供的多个虚拟机,由虚拟机进行图像处理。
图1只是示意图,该系统中还可以包括其它设备,在图1中未画出。本申请的实施例对该系统中包括的端侧设备、边缘设备和云侧设备数量不做限定。
但是,传输至边缘设备或云侧设备的上述的图像和原始特征图的数据量很大,导致端侧向云侧发送数据的时延较高。为了解决该问题,本申请提供一种图像处理方法,该方法包括发送节点获取到待处理图像,将待处理图像输入卷积神经网络,卷积神经网络包括的特征提取层对待处理图像进行特征提取,得到第一特征图,以及卷积神经网络包括的特征压缩层压缩第一特征图得到第二特征图,该第二特征图的通道数小于第一特征图的通道数。进而,发送节点向接收节点发送第二特征图。由于利用特征提取层对待处理图像进行特征提取得到第一特征图,使得第一特征图的数据量小于待处理图像的数据量;此外,利用特征压缩层压缩第一特征图得到第二特征图,使得第二特征图的通道数小于第一特征图的通道数,在第一特征图的分辨率未增大的情况下,第二特征图的数据量小于第一特征图的数据量,进一步减少了发送节点向接收节点发送的特征图的数据量,降低了端侧和云侧的传输时延。
下面将结合附图对本申请实施例的实施方式进行详细描述。
这里以发送节点可以实现如图1所示出的端侧设备和/或边缘设备的功能、接收节点可以实现如图1所示出的云侧设备的功能为例进行说明,请参见图2,图2为本申请提供的一种图像处理方法的流程示意图,该图像处理方法包括以下步骤。
S210、发送节点获取待处理图像。
该待处理图像可以包括二值图像、灰度图像、索引图像或真彩图像中至少一种。
在一种可能的情形中,该待处理图像可以是由发送节点采集的。如图1所示,若发送节点为终端111~终端116中的任意一个,发送节点可以利用其自带的图像采集单元(如相机)来采集图像。
在另一种可能的情形中,该待处理图像还可以是与发送节点通信连接的图像采集装置采集的。如图1所示,该图像采集装置可以是终端115或终端116,发送节点可以是与终端115或终端116连接的服务器等。
S220、发送节点利用卷积神经网络包括的特征提取层,对待处理图像进行特征提取,得到第一特征图。
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。卷积神经网络包含了一个由卷积层和池化层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。
示例的,卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。卷积层可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络进行正确的预测。当卷积神经网络有多个卷积层的时候,初始的卷积层往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络深度的加深,越往后的卷积层提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
该卷积神经网络至少包括一层卷积层(convolutional layer),该卷积层包括至少一个卷积单元,该卷积层可以用于提取待处理图像的各类特征图,特征图是卷积神经网络中卷积层、激活层、池化层或批量归一化层等层输出的三维数据,该特征图的三个维度分别为:高度(height,H)、宽度(width,W)和通道数(channel,C),W和H的乘积可以称为特征图的分辨率(W*H)。特征图可以表示待处理图像的各种信息,例如,图像中的边缘信息、线条、纹理等。
示例的,待处理图像的分辨率为96×96,其划分为144个8×8的图像样本,特征提 取层对每个8×8的图像样本进行卷积,并将卷积得到的所有结果进行聚合,得到待处理图像的第一特征图。
在一些可能的示例中,该卷积神经网络还可以包括激活层,例如,线性整流层(rectified linear units layer,ReLU),或参数化修正线性单元(parametric rectified linear unit,PReLU)等。
在另一些可能的示例中,该卷积神经网络还可以包括池化层(pooling layer)、批量归一化层(BN layer)、全连接层(fully connected layer)等其它功能模块。关于CNN的各个功能模块的相关原理请参考现有技术的相关阐述,不予赘述。
S230、发送节点利用卷积神经网络包括的特征压缩层压缩第一特征图得到第二特征图。
在本文中,记第一特征图的维度为W 1×H 1×C 1,第二特征图的维度为W 2×H 2×C 2
上述的特征压缩层包括至少一层卷积层,该卷积层用于缩减第一特征图的通道数,该第二特征图的通道数小于第一特征图的通道数。示例的,该卷积层的输入通道数为C in1,C in1=C 1;输出通道数为C out1,C out1=C 2,C in1>C out1。例如,输出通道数为输入通道数的1/K,K可以为2、3、4、6、8、12或16等。
在一种可能的示例中,该卷积层还可以用于对第一特征图进行下采样。例如,记卷积层的步长(stride)为2,对第一特征图进行下采样,得到第二特征图,其中,W 1=2W 2,H 1=2H 2,C 1=2C 2
作为一种可选的实施方式,上述的卷积层的卷积核可以根据发送节点的实际算力和图像处理需求进行确定。例如,卷积层的卷积核可以是3×3、5×5或7×7等。
在目前的技术方案中,神经网络特征图的数据降维常常使用池化层,池化层的池化操作主要是通过一个池化核来减少特征图的参数,例如,最大值池化、平均值池化和最小值池化。然而,池化层在减少特征图的总数据量的过程中,会导致特征图的通道数据增加,如CNN中的VGG(Visual Geometry Group Network)模型,VGG网络由卷积层模块后接全连接层模块构成,VGG网络串联数个vgg_biock,其超参数由变量conv_rach定义。该变量指定了VGG网络中每个VGG块的输出通道数,在VGG块对特征图进行数据降维处理的过程中,会将原始特征图的高和宽减半,并将原始特征图的通道数翻倍,这会导致虽然特征图的总数据量减少,但是在特征图的传输到云侧设备后,由于通道数增加且分辨率减少,导致特征图中每条通道数所对应的图像信息降低,云侧设备重建得到的重建特征图会丢失较多的图像信息,导致图像处理结果与待处理图像实际所指示的信息差异较大。
相比之下,在本申请实施例所提供的数据处理方法中,发送节点利用卷积神经网络包括特征压缩层压缩第一特征图得到第二特征图,两者的差异主要体现在第二特征图的通道数小于第一特征图的通道数。在第二特征图的分辨率小于第一特征图的分辨率的情况下(如C 1/C 2=W 1/W 2=H 1/H 2),第二特征图中每条通道所对应的图像信息不变,接收节点重构该第二特征图后得到的重建特征图丢失的图像信息减少。在第一特征图和第二特征图的分辨率相同的情况下,第二特征图中每条通道所对应的图像信息增加,接收节点重构该第二特征图后得到的重建特征图丢失的图像信息减少,接收节点依据该第二特征图得到的图像处理结果与待处理图像实际所指示的信息之间的差异降低。
S240、发送节点向接收节点发送第二特征图。
作为一种可选的实施方式,发送节点向接收节点发送第二特征图时,发送节点可以先 对第二特征图进行编码得到码流,并将该码流发送到接收节点。例如,编码的方法可以采用无损编码方法,如LZMA(lempel-ziv-markov chain-algorithm)算法。又如,编码的方法还可以采用有损编码方法,如联合图像专家小组(joint photographic experts group,JPEG)编码、高级视频编码(advanced video coding,AVC)、高效率视频编码(high efficiency video coding,HEVC)以及其他图像编码方法等。又如,编码的方法还可以采用基于卷积神经网络和算术编码的熵编码方法,如面向变分自动编码器(variation auto encoder,VAE)特征图的熵编码方法。
在一种示例中,发送节点可以将第二特征图整数化为8比特,将各通道组成YUV400格式数据,输入HEVC或VVC编码器。
在另一种示例中,发送节点还可以将第二特征图整数化为N比特数据后,使用无损编码算法压缩。
在另一种示例中,发送节点还可以使用针对特征图数据设计的编码器进行压缩处理。
发送节点也可以不对第二特征图进行编码,如发送节点向接收节点发送特征压缩层输出的该第二特征图。
在一种可能的情形中,发送节点和接收节点的交互可以通过图1所示出的网络进行数据传输。例如,发送节点和接收节点可以是通过传输控制协议(transmission control protocol,TCP)、网络协议(Internet Protocol,IP)、TCP/IP协议等进行传输。
S250、接收节点利用卷积神经网络包括的特征重构层对第二特征图进行重构,得到第三特征图。
在本申请实施例中,该第二特征图是发送节点利用卷积神经网络包括的特征提取层和特征压缩层对待处理图像进行特征提取和压缩得到的,但是一些可能的示例中,该第二特征图还可以是其他处理设备对待处理图像进行特征提取和压缩后,由与发送节点通信的网络设备转发的,例如,该处理设备可以是手机,网络设备可以是路由器。
在本文中,记第三特征图的维度为W 3×H 3×C 3
上述的特征重构层可以包括至少一层反卷积层(de-convolution layer),该反卷积层用于增加第二特征图的通道数,以使C 2<C 3,且C 3=C 1。示例的,该反卷积层的输入通道数为C in2,C in2=C 2;输出通道数为C out2,C out2=C 3,C in2<C out2。例如,输出通道数可以为输入通道数的K倍,K可以为2、3、4、6、8、12或16等。
在一种可能的示例中,该反卷积层还可以用于对第二特征图进行上采样。例如,记反卷积层的stride=2,对第二特征图进行上采样,得到第三特征图,其中,W 3=2W 2=W 1,C 3=2C 2=C 1
作为一种可选的实施方式,上述的反卷积层的卷积核可以根据接收节点的实际算力和图像处理需求进行确定。例如,反卷积层的卷积核可以是3×3、5×5或7×7等。
在一种可能的设计中,若第二特征图是通过码流的形式进行传输,接收节点还可以对该码流进行解码,接收节点解码的方式和发送节点对第二特征图编码的方式是匹配的。
如图3所示,图3为本申请提供的一种卷积神经网络的结构示意图,该卷积神经网络300包括第一部分网络310和第二部分网络320,其中,发送节点可以利用第一部分网络310获取待处理图像的第二特征图,接收节点可以利用第二部分网络320处理第二特征图,获取图像处理结果。
其中,第一部分网络310包括特征提取层311和特征压缩层312,第二部分网络320包括特征重构层321和特征输出层322。
针对于特征压缩层312和特征重构层321,本申请还提供以下可能的实现方式。
在第一种可能的情形中,特征压缩层312包括2个卷积层,第一个卷积层用于实现对第一特征图的下采样,第二个卷积层用于减少第一特征图的通道数,得到第二特征图。例如,第一个卷积层的输入通道数记为C in,步长stride为2,输出通道数C out1=C in1;第二个卷积层的输入特征图为第一个卷积层的输出特征图,步长stride=1,输出通道数C out2<C in。相应的,特征重构层321包含1个卷积层和1个反卷积层,卷积层用于增加第二特征图的通道数,再由反卷积层实现对第二特征图的上采样,得到第三特征图。例如,卷积层的输入特征图为特征压缩层的输出特征图(如上述的第二特征图),stride=1,输出通道数C out3=C in;反卷积层的stride=2,输出通道数C out4=C in
在第二种可能的情形中,特征压缩层312包含2个卷积层,第一个卷积层用于减少第一特征图的通道数,第二个卷积层用于实现对第一特征图的下采样,得到第二特征图。例如,第一个卷积层的输入通道数记为C in,步长stride=1,输出通道数C out1<C in;第二个卷积层的输入为第一个卷积层的输出,步长stride=2,输出通道数C out2<C out1。相应的,特征重构层321包含1个卷积层和1个反卷积层,首先由反卷积层实现对第二特征图的上采样,再由卷积层增加第二特征图的通道数,得到第三特征图,例如,反卷积层的输入为特征压缩层的输出,stride=2,输出通道数C out3Cout1;卷积层的stride=1,输出通道数C out4=C in
在第三种可能的情形中,特征压缩层312和特征重构层321可以为非对称的结构。例如,特征压缩层包含3个卷积层,第一个卷积层的输入通道数记为C in,步长stride为1,输出通道数C out1<C in;第二个卷积层的输入为第一个卷积层的输出,步长stride=2,输出通道数C out2=C out1;第三个卷积层的输入为第二个卷积层的输出,步长stride=2,输出通道数C out3=C out1。相应的,特征重构层包含2个反卷积层:第1个反卷积层的输入为特征压缩层的输出,stride=2,输出通道数C out4=C out3;第2个反卷积层的输入为第1个反卷积层的输出,卷积核为3×3、5×5或7×7,stride=2,输出通道数C out5=C in
上述第一种~第三种可能的情形仅为本申请提供的实施例,不代表对本申请的限定。在另一些可能的情形中,特征压缩层中还可以包含更多的卷积层,特征重构层中也可以包含更多的卷积层和反卷积层。
作为一种可选的实施方式,上述卷积层或反卷积层的输出还可以经过ReLU等激活层、BN层等处理之后再输入下一个卷积层或反卷积层,以提高特征压缩层和特征重构层的输出特征图的非线性,使得图像处理的精度更高,本申请对此不作限定。
作为一种可选的实施方式,包括特征提取层311和特征输出层322的神经网络也可以用于对待处理图像进行图像处理。如图4所示,图4为现有技术提供的一种卷积神经网络的结构示意图,该卷积神经网络400包括特征提取层410和特征输出层420,图3所示出的特征压缩层312可以实现和特征提取层410相同的功能,图3所示出的特征重构层321可以实现和特征输出层420相同的功能。
如图4所示,特征提取层410可以包括网络层conv1和网络层conv2,网络层conv1和网络层conv2均可以是卷积层。示例的,若待处理图像的参数为:W×H×3,网络层conv1的输出特征图的参数为:(W/2)×(H/2)×64,网络层conv2的输出特征图(第 一特征图)的参数为:(W/4)×(H/4)×256。
特征输出层420可以包括网络层conv3、网络层conv4、网络层conv5。示例的,网络层conv3~网络层conv5可以是卷积层,如网络层conv3的输出特征图的参数为:(W/8)×(H/8)×512、网络层conv4的输出特征图的参数为:(W/16)×(H/16)×1024、网络层conv5的输出特征图的参数为:(W/32)×(H/32)×2048。
如图4所示,卷积神经网络400的主干网络包括网络层conv1~网络层conv5,该主干网络用于提取待处理图像的多个特征图。
另外,该卷积神经网络还包括脖子网络层424(neck network)和头部网络层425(head network)。
脖子网络层424可以用于对头部网络输出的特征图进行进一步整合处理,得到新的特征图。例如,脖子网络层424可以是特征金字塔网络(feature pyramid network,FPN)。
头部网络层425用于处理脖子网络层424输出的特征图得到图像处理结果。例如,头部网络包含全连接层、softmax模块等。关于脖子网络(neck network)和头部网络(head network)的更多内容可以参考现有技术的相关阐述,这里不加赘述。
也就是说,本申请在卷积神经网络的主干网络中引入了特征压缩层和特征重构层,使得在保证图像处理的基础上,可以对第一特征图进行压缩,减少第一特征图的通道数,从而减少了发送节点和接收节点传输的数据量。
S260、接收节点利用卷积神经网络包括的特征输出层和图像处理层对第二特征图进行处理,得到图像处理结果。
该图像处理层可以包括图4所示出的脖子网络层424和头部网络层425。该图像处理结果指示待处理图像的信息。
在第一种可能的设计中,该图像处理结果可以是对待处理图像进行目标检测的结果,该信息可以是待处理图像中的某个区域。如图1所示,在道路监控的场景中,发送节点可以是终端116(如路口的监控摄像头),接收节点可以是服务器131。例如,终端116在终端114(如汽车或货车等)通过路口时采集待处理图像,并将待处理图像进行特征提取和压缩后得到的特征图发送到服务器131,若服务器131确定终端114的车牌为外地车牌,则服务器131将该外地车牌信息通过数据报文的形式发送给交通管理的中央控制设备。
在第二种可能的设计中,该图像处理结果可以是对待处理图像进行人脸识别的结果。如图1所示,在行政楼的进出口场景中,发送节点可以是终端115(如行政楼的监控摄像头),接收节点可以是服务器131。例如,终端115在用户1和用户2进入行政楼时采集人脸图像,并将该人脸图像进行特征提取和压缩后得到的特征图发送到服务器131,服务器131判断用户1和用户2是否为行政楼注册的合法用户,例如,服务器131将待处理图像中的人脸特征与人脸比对库进行匹配,若匹配成功,则确定用户1为合法用户,服务器131发送验证通过信息至终端115,终端115依据该验证通过信息打开进出口闸门,用户1可以通过该进出口闸门进入行政楼。
在第三种可能的设计中,该图像处理结果可以是对待处理图像进行物体分类的结果,待处理图像的信息可以是图像中的物体分类信息。如图1所示,在住宅场景中,发送节点可以是终端111~终端113中的任意一个,接收节点可以是服务器131。例如,终端111采集住宅内的各个图像,其中包括沙发、电视、桌子等,并将这一组图像进行特征提取和压 缩后得到的特征图发送到服务器131,服务器131根据这一组特征图确定图像中各个物体的分类,以及每一类图像对应的购物链接,并将这些信息发送到终端111。
在第四种可能的设计中,该图像处理结果可以是对待处理图像进行地理定位的结果。如图1所示,在车辆行驶场景中,发送节点可以是终端114(如汽车或货车中安装的行车记录仪),接收节点可以是服务器131。例如,终端114将接近路口时拍摄的待处理图像(如道路图像,道路图像包括图1所示出的住宅、树、行政楼以及各参考物的相对位置信息)进行特征提取和压缩得到特征图,并将该特征图发送到服务器131,服务器131对该特征图进行特征重构和图像处理后得到该待处理图像对应的地理位置,并将该地理位置发送到终端114。
上述可能的设计仅为本申请为了说明图像处理方法而提供的可能的实现方式,不代表对本申请的限定,本申请提供的图像处理方法还可以应用于更多的图像处理场景中。
可选的,如图2所示,该图像处理方法还包括以下步骤S270和S280。
S270、接收节点发送图像处理结果。
在一种可能的情况中,接收节点可以向图2所示出的发送节点发送图像处理结果。
在另一种可能的情况中,接收节点可以向其他节点发送图像处理结果。例如,在道路监控场景中,该其他节点可以是交通管理系统的中央控制设备。
在本申请提供的图像处理方法中,发送节点仅需向接收节点发送第二特征图,该第二特征图的通道数小于第一特征图的通道数,使得在第一特征图的分辨率未增大的情况下,发送节点向接收节点发送的数据量减少,降低了端侧和云侧的传输时延。
S280、发送节点显示图像处理结果。
发送节点上可以具有显示区域,如该显示区域可以包括显示面板。显示面板可以采用液晶显示屏(Liquid Crystal Display,LCD),有机发光二极管(Organic Light-Emitting Diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(Active-Matrix Organic Light Emitting Diode的,AMOLED),柔性发光二极管(Flex Light-Emitting Diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(Quantum Dot Light Emitting Diodes,QLED)等。在一些实施例中,发送节点可以包括1个或多个显示屏194。
这里以发送节点是手机、图像处理的场景是物体分类为例进行说明,在第一种可能的示例中,发送节点只显示待处理图像,如图5中的(a)所示,手机显示待处理图像,其包括多个图形。
在第二种可能的示例中,发送节点只显示图像处理结果,如图5中的(b)所示,手机显示待处理图像中存在的物体种类,如水杯、笔等。
在第三种可能的示例中,发送节点显示待处理图像和图像处理结果,如图5中的(c)所示,手机在显示待处理图像的情况下,将每种物体的种类标记在待处理图像的相应位置。
通过对待处理图像和图像处理结果进行显示,有利于用户获得待处理图像中的各种信息,如物理分类、人脸识别、目标检测等,减少人力获取图像信息的过程,提高获取视觉信息的效率。
作为一种可选的实施方式,发送节点接收到该图像处理结果后,还可以执行其他处理。例如,将发送节点拍摄的图像中物体的名称告知用户;或者,如果接收节点分析第二特征图给出警告信息,则发送节点发出警告语音提示,提醒处于发送节点对应环境中的用户注 意安全等,本申请对此不作限定。
值得注意的是,上述实施例以发送节点可以实现如图1所示出的端侧设备和/或边缘设备的功能、接收节点可以实现如图1所示出的云侧设备的功能为例进行说明,在另一种可能的情况下,若发送节点可以实现如图1所示出的端侧设备的功能,接收节点还可以实现如图1所示出的边缘设备的功能。
上述的卷积神经网络可以是在已有的图像处理网络(如图4所示出的卷积神经网络400)中增加瓶颈结构层,并依据该图像处理网络对瓶颈结构层进行训练得到的,其中,瓶颈结构(bottleneckstructure)是一种多层网络结构,瓶颈结构层的输入通道数和输出通道数相同,而瓶颈结构层的中间特征图的通道数小于输入通道数。例如,瓶颈结构层的输入数据(如上述待处理图像的第一特征图)先经过一层或多层神经网络层,得到中间数据(如上述的第二特征图),中间数据再经过1层或多层神经网络层得到输出数据(如上述的第三特征图);其中,中间数据的数据量(即宽、高和通道数的乘积)低于输入数据量和输出数据量。
在本申请的上述实施例所提供的图像处理方法中,图像处理网络包括特征提取层、特征输出层和图像处理层,瓶颈结构层包括特征压缩层和特征重构层,卷积神经网络可以被分为2部分:第一部分网络包括特征提取层和特征压缩层,第二部分网络包括特征重构层、特征输出层和图像处理层,例如,上述的发送节点的功能可以是通过部署在发送节点的卷积神经网络的第一部分网络实现的,上述的接收节点的功能可以是通过部署在接收节点的卷积神经网络的第二部分网络实现的。
为了获取上述实施例中的卷积神经网络,以实现前述的图像处理方法,本申请还提供一种卷积神经网络的训练方法,如图6所示,图6为本申请提供的一种训练方法的流程示意图,该训练方法可以由发送节点或接收节点执行,也可以由其他电子设备执行,本申请对此不做限定,该训练方法包括以下步骤。
S610、获取训练集。
该训练集包括至少一幅训练图像。例如,训练集可以包括50000~100000幅训练图像,该训练图像可以为以下任意一种类型:二值图像、灰度图像、索引图像和真彩图像等。
S620、依据训练集、第一卷积神经网络的第一特征提取层和第一特征输出层,训练第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络。
如图7所示,图7为本申请提供的一种卷积神经网络的训练示意图一,该第一卷积神经网络710包括第一特征提取层T701、第一瓶颈结构层711和第一特征输出层P701,该第二卷积神经网络720包括第一特征提取层T701、第二瓶颈结构层721和第一特征输出层P701。第一特征提取层T701可以用于实现图3示出的特征提取层311的功能,第一特征输出层P701可以用于实现图3示出的特征输出层322的功能,第二瓶颈结构层721可以用于实现图3示出的特征压缩层312和特征重构层321的功能。
上述的第一瓶颈结构层711和第二瓶颈结构层721均为瓶颈结构(bottleneck structure),瓶颈结构是一种多层网络结构。例如,第二瓶颈结构层721的输入数据(如上述待处理图像的第一特征图)先经过一层或多层神经网络层,得到中间数据(如上述的第二特征图),中间数据再经过1层或多层神经网络层得到输出数据(如上述的第三特征图);其中,中间数据的数据量(即宽、高和通道数的乘积)低于输入数据量和输出数据量。
如图7所示,第二特征图FM2`小于第一特征图FM1的通道数。
第二特征图FM2`是第一特征提取层T701和第二瓶颈结构层721对训练图像进行特征提取和压缩得到的,例如,第一特征提取层T701对训练图像进行特征提取得到第一特征图FM1,第二瓶颈结构层721中的特征压缩层7211压缩第一特征图FM1得到第二特征图FM2`。
第一特征图FM1是第一特征提取层T701对训练图像进行特征提取得到的。
在一种可能的情况下,如图7所示,第二特征图FM2的通道数大于或等于第二特征图FM2`的通道数。
第二特征图FM2是第一特征提取层T701和第一瓶颈结构层711对训练图像进行特征提取和压缩得到的,例如,第一特征提取层T701对训练图像进行特征提取得到第一特征图FM1,第一瓶颈结构层711中的特征压缩层7111压缩第一特征图FM1得到第二特征图FM2。
卷积神经网络中的池化层提取待处理图像的特征图时,待处理图像的分辨率会减小或不变,采用本申请提供的训练方法对第一卷积神经网络进行训练,得到的第二卷积神经网络可以对待处理图像进行特征提取和压缩,减少了待处理图像的特征图的通道数,进而减少了发送节点向接收节点发送的特征图的数据量。
另外,由于第一卷积神经网络和第二卷积神经网络具有相同的第一特征提取层和第一特征输出层,在卷积神经网络的训练过程中仅需对瓶颈结构层进行训练,减少了训练卷积神经网络所需的计算资源。
图8A为本申请提供的一种卷积神经网络的训练示意图二,第三卷积神经网络730包括第二特征提取层T702和第二特征输出层P702,第一特征提取层T701的参数与第二特征提取层T702的参数相同,第一特征输出层P701的参数与第二特征输出层P702的参数相同。第三卷积神经网络730可以用于实现图4所示出的卷积神经网络400的功能。上述的参数可以包括最大迭代次数(max_batches)或批大小(batch_size)等,关于卷积神经网络的各种参数可以参考现有技术的相关阐述,这里不加赘述。
图8B为本申请提供的一种卷积神经网络的训练示意图三,关于图8B所示出的第一卷积神经网络710的技术内容可以参考上述图7所示出的第一卷积神经网络710,这里不加赘述。
针对于上述的S620,一种常规的做法是,利用训练第三卷积神经网络730时使用的训练集(包含训练图像和训练图像对应的标注信息,如物体检测框和特征类别等),对第一卷积神经网络710进行训练,第一瓶颈结构层711的模型参数。然而,训练第三卷积神经网络730时使用的训练集包含训练图像的标注信息,导致第一卷积神经网络710的训练会消耗极大的计算资源,且训练速度较慢。
因此,为了解决上述问题,请参见图9,图9为本申请提供的另一种训练方法的流程示意图,针对于上述的S620,其可以包括S621~S624的操作步骤对应的训练方法。
S621、将训练集输入第三卷积神经网络,得到第一集合。
该第一集合包括第四特征图,第四特征图为第二特征提取层和第二特征输出层对训练图像进行特征提取和图像处理后得到的。这里以图8A和图8B所示出的第二特征提取层T702包括网络层conv1和网络层conv2,第二特征输出层P702包括网络层conv3、网络 层conv4和网络层conv5为例进行说明,该第一集合可以包括网络层conv3处理第一特征图FM1得到的第四特征图FM4_1,网络层conv4处理第四特征图FM4_1得到的第四特征图FM4_2、以及网络层conv5处理第四特征图FM4_2得到的第四特征图FM4_3中的任意一个或多个。
值得注意的是,这里是以第三卷积神经网络不包括瓶颈结构层来进行说明的。但在一些可能的实现方式中,第三卷积神经网络也可以包括瓶颈结构层,示例的,在第三卷积神经网络包括N(N为正整数)个瓶颈结构层的情况下,第一卷积神经网络要比第三卷积神经网络多一个第一瓶颈结构层。例如,第一卷积神经网络不仅包括第三卷积神经网络所有的结构,还包括一个上述的第一瓶颈结构层,在本申请实施例所提供的训练方法中,是利用第三卷积神经网络对第一卷积神经网络的该第一瓶颈结构层进行训练,得到第二卷积神经网络。
S622、将训练集输入第一卷积神经网络,得到第二集合。
该第二集合包括第五特征图,第五特征图为第一瓶颈结构层和第一特征输出层对第二特征图进行特征重构和图像处理得到的。这里以图8B所示出的第一特征提取层T701包括网络层conv1和网络层conv2,第一特征输出层P701包括网络层conv3、网络层conv4和网络层conv5为例进行说明,该第二集合可以包括网络层conv3处理第三特征图FM3得到的第五特征图FM5_1,网络层conv4处理第五特征图FM5_1得到的第五特征图FM5_2、以及网络层conv5处理第五特征图FM5_2得到的第五特征图FM5_3中的任意一个或多个。
S623、依据第一集合中的第四特征图和第二集合中的第五特征图计算损失函数。
在一种可能的实现方式中,可以依据第四特征图和第五特征图之间的距离来计算损失函数。
在一种示例中,该损失函数可以利用第四特征图和第五特征图的平均绝对误差L1norm(1-范数)来计算。例如,
Figure PCTCN2022083614-appb-000001
其中,i为序号,x i为第i组的第四特征图和第五特征图之间的距离,N为第一集合中包括的第四特征图和第二集合中包括的第五特征图的总组数。
在另一种示例中,该损失函数还可以利用第四特征图和第五特征图的均方误差L2norm(2-范数)来计算。例如,
Figure PCTCN2022083614-appb-000002
其中,i为序号,x i为第i组的第四特征图和第五特征图之间的距离,N为第一集合中包括的第四特征图和第二集合中包括的第五特征图的总组数。
例如,记损失函数为Loss,这里以损失函数是依据第四特征图和第五特征图的均方误差L2norm(2-范数)来计算为例,Loss=w1×L2(FM4_1,FM5_1)+w2×L2(FM4_2,FM5_2)+w3×L2(FM4_3,FM5_3)。
其中,w1、w2和w3为预设的加权系数,w1~w3均可以为正实数,例如,w1=0.3,w2=0.3,w3=0.4。L2(A,B)表示计算A和B两种三维数据的差值的L2norm(2-范数)。
作为一种可选的实施方式,计算损失函数还可以使用第二特征图FM2的正则化项, 该正则化项可以包括以下三项的任意一种。
1、第二特征图FM2的平均幅度,此项的加权系数为负实数。
2、第二特征图FM2的特征元素与它同一通道内相邻特征元素的梯度的L1norm,此项的加权系数为负实数。
3、第二特征图FM2的编码比特估计值,此项的加权系数为正实数。
由于图像处理过程中,待处理图像的第二特征图是压缩第一特征图得到的,在计算损失函数的过程中增加第二特征图FM2的正则化项,根据第二特征图的正则化项来训练第一瓶颈结构层,有利于降低压缩第一特征图导致的误差。
S624、依据损失函数更新第一瓶颈结构层的参数,获取第二瓶颈结构层,得到第二卷积神经网络。
例如,利用反向传播(backward propagation,BP)算法,并依据计算得到的损失函数更新第一瓶颈结构层的参数,得到第二瓶颈结构层,进而获得第二卷积神经网络。关于BP算法的相关原理请参考现有技术的相关阐述,此处不予赘述。
在本申请提供的训练方法中,获取到第二卷积神经网络的条件可以是反向传播的次数达到阈值,也可以是损失函数的值小于或等于阈值,还可以是相邻两次计算得到的损失函数值的差值小于或等于阈值,本申请对此不做限定。
本申请提供的训练方法,特别是对于同一训练图像,对第一卷积神经网络和第三卷积神经网络中对应的多层特征图(第四特征图和第五特征图)之间的距离计算损失函数,得到第二卷积神经网络,有利于第四特征图和第五特征图之间的距离尽可能的变小,进而减小第一特征图和第三特征图之间的误差,提高图像处理的准确率。
作为另一种可选的实施方式,如图9所示,图9为本申请提供的另一种训练方法流程示意图,上述的S622可以包括以下步骤S622a~S622d。
S622a、利用第一特征提取层,对训练图像进行特征提取,得到第一特征图。
如图8B所示,第一特征提取层T701对训练图像进行特征提取,得到第一特征图FM1。
S622b、利用特征压缩层,压缩第一特征图得到第六特征图。
如图8B所示,特征压缩层7111对第一特征图FM1进行压缩,得到第二特征图FM2(如上述的第六特征图),该第二特征图FM2的通道数小于第一特征图FM1的通道数。
S622c、利用特征重构层,重构第六特征图得到第三特征图。
如图8B所示,特征重构层7111对第二特征图FM2进行重构,得到第三特征图FM3,上述第一特征图FM1和第三特征图FM3的通道数相同,第一特征图FM1和第三特征图FM3的分辨率也可以相同。
S622d、利用第二特征输出层,处理第三特征图得到第二集合包括的第五特征图。
如图8B所示,第一特征输出层7112对第三特征图FM3进行处理,获得第五特征图(如第五特征图FM5_1、第五特征图FM5_2、第五特征图FM5_3中的任意一个或多个)。
上述的S623可以包括以下步骤S623a~S623c。
S623a、获取第四特征图与第五特征图之间的第一距离。
该第一距离可以是第四特征图与第五特征图之间的L1norm或L2norm。
S623b、获取第一特征图和第三特征图之间的第二距离。
该第二距离可以是第一特征图和第三特征图之间的L1norm或L2norm。
例如,如图8A和图8B所示,第二距离可以为第一特征图FM1与第三特征图FM3的L2norm。
S623c、依据第一距离和第二距离计算损失函数。
例如,记损失函数为Loss,这里以损失函数是依据第四特征图FM4和第五特征图FM5的均方误差L2norm(2-范数),以及第一特征图FM1和第三特征图FM3的均方误差L2norm(2-范数)来计算为例,Loss=w1×L2(FM4_1,FM5_1)+w2×L2(FM4_2,FM5_2)+w3×L2(FM4_3,FM5_3)+w4×L2(FM1,FM3)。
其中,w1、w2、w3和w4为预设的加权系数,w1~w4均可以为正实数,例如w1=w2=w3=w4=0.25,或者w1=0.35,w2=w3=0.25,w1=0.15,或者w1=0.4,w2=0.3,w3=0.2,w4=0.1等。L2(A,B)表示计算A和B两种三维数据的差值的L2norm(2-范数)。
在利用第四特征图和第五特征图之间的第一距离计算损失函数的基础上,增加第一特征图和第三特征图之间的第二距离来计算损失函数,有利于第四特征图和第五特征图之间的距离尽可能的小,以及第一特征图和第三特征图之间的距离尽可能的小,减少了特征压缩层和特征重构层的处理误差,提高了图像处理的准确率。
在现有技术中,特征图的数据降维常常采用PCA方法,PCA是一种将多个变量通过线性变换以选出较少个数重要变量的多元统计分析方法。例如,在神经网络的特征图压缩过程中,一组图像对应的128通道的原始特征图经过端侧设备利用PCA方法得到每张图像的主分量,如该组图像包括3张图像,主分量的数目依次为47、48和49,在云侧设备依据每张图像的主分量进行特征图重构的过程中,由于各图像的主分量的数目不同,导致重构这3张图像后,每张图像的重建特征图的通道数可能变为126、127和128,导致原始特征图和重建特征图的通道数出现差异。
相较于现有技术中PCA方法的原始特征图和重建特征图的数值差异,使用本申请的训练方法训练第一卷积神经网络时,考虑到了压缩前的第一特征图,以及重构后的第三特征图的距离,使得训练得到的第二卷积神经网络虽然存在将第一特征图进行压缩和重构的过程,但是第一特征图和第三特征图的差异要明显小于PCA方法中原始特征图和重建特征图的数值差异。换句话说,本申请提供的图像处理方法,相比于PCA方法具有更优的压缩性能。
在一种示例中,PCA方法可在平均精度均值(mean Average Precision,mAP)指标下降2%情况下,实现约3倍的数据缩减,例如,一组图像对应的128通道的原始特征图经过PCA后产生的主分量数目平均值为47.9时,mAP指标下降约2%。
而使用本申请的训练方法得到的第二卷积神经网络,可以对一组图像对应的128通道的第一特征图进行64倍数据缩减(例如,第二特征图的宽、高、通道数分别减少为第一特征图的宽、高、通道数的1/4),64倍数据缩减后的第二特征图压缩成的码流相比未缩减的第一特征图压缩成的码流,数据量减少90%,且损失mAP小于1%。
另外,采用本申请提供的训练方法,只需要输入大量训练图像,以训练图像激励第一卷积神经网络和第三卷积神经网络中产生的特征图作为指导,而不需要依赖视觉任务的人工标注数据,减少了训练图像的数据依赖;而使用特征图作为训练的指导,使得本申请提供的训练方法具有更优的通用性。
可以理解的是,为了实现上述实施例中功能,主机包括了执行各个功能相应的硬件结 构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
图10为本申请提供的一种训练装置和图像处理装置的结构示意图。下面结合图10对训练装置1010、第一图像处理装置1020和第二图像处理装置1030的结构和功能进行介绍,应理解,本实施例仅对训练装置1010、第一图像处理装置1020和第二图像处理装置1030的结构和功能模块进行示例性划分,本申请并不对其具体划分做任何限定。
如图10所示,训练装置1010包括获取单元1011和处理单元1012,训练装置1010用于实现上述图6或图9中所示的方法实施例中各个操作步骤对应的训练方法。
当训练装置1010用于实现图6所示的方法实施例中的功能时,获取单元1011用于执行上述的S610,处理单元1012用于执行上述的S620。
可选的,当训练装置1010用于实现图9所示的方法实施例中的功能时,处理单元1012包括第一训练单元1012a、第二训练单元1012b、损失计算单元1012c和第三训练单元1012d。第一训练单元1012a用于执行S621,第二训练单元1012b用于执行S622及其可能的子步骤S622a~S622d,损失计算单元1012c用于执行S623及其可能的子步骤S623a~S623c,第三训练单元1012d用于执行S624。
如图10所示,第一图像处理装置1020包括第一收发单元1021、特征提取单元1022、特征压缩单元1023和显示单元1024。第一图像处理装置1020用于实现上述图2中所示的方法实施例中发送节点的各个操作步骤对应的图像处理方法。
当第一图像处理装置1020用于实现图2所示的方法实施例中发送节点的功能时,第一收发单元1021用于执行上述的S210和S240,特征提取单元1022用于执行上述的S220,特征压缩单元1023用于执行上述的S230。可选的,显示单元1024用于执行上述的S280。
如图10所示,第二图像处理装置1030包括第二收发单元1031、特征重构单元1032和图像处理单元1033。第一图像处理装置1020用于实现上述图2中所示的方法实施例中接收节点的各个操作步骤对应的图像处理方法。
当第二图像处理装置1030用于实现图2所示的方法实施例中接收节点的功能时,第二收发单元1031用于执行上述的S270,特征重构单元1032用于执行上述的S250,图像处理单元1033用于执行上述的S260。
有关上述训练装置1010、第一图像处理装置1020和第二图像处理装置1030更详细的描述可以直接参考上述图2、图6或图9所示的方法实施例中相关描述直接得到,这里不加赘述。
图11为本申请提供的一种通信装置的结构示意图,该通信装置1100包括处理器1110和通信接口1120。处理器1110和通信接口1120之间相互耦合。可以理解的是,通信接口1120可以为收发器或输入输出接口。可选的,通信装置1100还可以包括存储器1130,用于存储处理器1110执行的指令或存储处理器1110运行指令所需要的输入数据或存储处理器1110运行指令后产生的数据。
当通信装置1100用于实现图2、图6或图9所示的方法时,可以实现上述训练装置1010、第一图像处理装置1020和第二图像处理装置1030的功能,此处不予赘述。
本申请实施例中不限定上述通信接口1120、处理器1110以及存储器1130之间的具体连接介质。本申请实施例在图11中以通信接口1120、处理器1110以及存储器1130之间通过总线1140连接,总线在图11中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器1130可用于存储软件程序及模块,如本申请实施例所提供的图像处理方法以及训练方法对应的程序指令/模块,处理器1110通过执行存储在存储器1130内的软件程序及模块,从而执行各种功能应用以及数据处理。该通信接口1120可用于与其他设备进行信令或数据的通信。在本申请中该通信装置1100可以具有多个通信接口1120。
其中,上述的存储器可以是但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。
上述的处理器可以是一种集成电路芯片,具有信号处理能力。该处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM、闪存、ROM、PROM、EPROM、EEPROM、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于通信装置或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于通信装置或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、通信装置、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以 是半导体介质,例如,固态硬盘(solid state drive,SSD)。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (30)

  1. 一种图像处理方法,其特征在于,应用于发送节点,所述方法包括:
    获取待处理图像;
    利用卷积神经网络包括的特征提取层,对所述待处理图像进行特征提取,得到第一特征图;
    利用所述卷积神经网络包括的特征压缩层,压缩所述第一特征图得到第二特征图,所述第二特征图的通道数小于所述第一特征图的通道数;
    向接收节点发送所述第二特征图。
  2. 根据权利要求1所述的方法,其特征在于,所述特征压缩层包括至少一层卷积层。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一特征图的分辨率为W×H,所述第二特征图的分辨率为W`×H`,W`×H`<W×H。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述方法还包括:
    接收所述待处理图像的图像处理结果,所述图像处理结果为所述接收节点依据所述第二特征图确定的。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    显示所述图像处理结果。
  6. 一种图像处理方法,其特征在于,应用于接收节点,所述方法包括:
    接收第二特征图,所述第二特征图是发送节点利用卷积神经网络,对待处理图像进行特征提取得到第一特征图,并将所述第一特征图进行压缩得到的;
    利用所述卷积神经网络包括的特征重构层,对所述第二特征图进行重构,得到第三特征图,所述第二特征图的通道数小于所述第三特征图的通道数,所述第一特征图的通道数和所述第三特征图的通道数相同;
    利用所述卷积神经网络包括的特征输出层和图像处理层,对所述第三特征图进行处理得到图像处理结果,所述图像处理结果指示所述待处理图像的信息;
    发送所述图像处理结果。
  7. 根据权利要求6所述的方法,其特征在于,所述特征重构层包括至少一层反卷积层。
  8. 根据权利要求6或7所述的方法,其特征在于,所述第二特征图的分辨率为W`×H`,所述第三特征图的分辨率为W×H,W`×H`<W×H。
  9. 根据权利要求6-8中任一项所述的方法,其特征在于,所述接收第二特征图,包括:
    接收所述发送节点发送的第二特征图;
    所述发送所述图像处理结果,包括:
    向所述发送节点发送所述图像处理结果。
  10. 一种卷积神经网络的训练方法,其特征在于,所述方法包括:
    获取训练集,所述训练集包括至少一幅训练图像;
    依据所述训练集、第一卷积神经网络的第一特征提取层和第一特征输出层,训练所述第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络,所述第二卷积神经网络包括 所述第一特征提取层、第二瓶颈结构层和所述第一特征输出层;
    其中,所述第一特征提取层用于对待处理图像进行特征提取得到第一特征图,所述第二瓶颈结构层中的特征压缩层用于压缩所述第一特征图得到第二特征图,所述第二特征图的通道数小于所述第一特征图的通道数。
  11. 根据权利要求10所述的方法,其特征在于,依据所述训练集、第一卷积神经网络的第一特征提取层和第一特征输出层,训练所述第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络,包括:
    将所述训练集输入第三卷积神经网络,得到第一集合,所述第三卷积神经网络包括第二特征提取层和第二特征输出层,所述第一特征提取层的参数与所述第二特征提取层的参数相同,所述第一特征输出层的参数与所述第二特征输出层的参数相同,所述第一集合包括第四特征图,所述第四特征图为所述第二特征提取层和所述第二特征输出层对所述训练图像进行特征提取后得到的;
    将所述训练集输入所述第一卷积神经网络,得到第二集合,所述第二集合包括第五特征图,所述第五特征图为所述第一瓶颈结构层和所述第一特征输出层对所述第二特征图进行特征重构和处理得到的;
    依据所述第一集合中的第四特征图和所述第二集合中的第五特征图计算损失函数;
    依据所述损失函数更新所述第一瓶颈结构层的参数,获取所述第二瓶颈结构层,得到所述第二卷积神经网络。
  12. 根据权利要求11所述的方法,其特征在于,将所述训练集输入所述第一卷积神经网络,得到第二集合,包括:
    利用所述第一特征提取层,对所述训练图像进行特征提取得到所述第一特征图;
    利用所述第一瓶颈结构层包括的特征压缩层,压缩所述第一特征图得到第六特征图,所述第六特征图的通道数小于所述第一特征图的通道数;
    利用所述第一瓶颈结构层包括的特征重构层,重构所述第六特征图得到第三特征图,所述第三特征图的通道数和所述第一特征图的通道数相同;
    利用所述第二特征输出层,处理所述第三特征图得到所述第二集合包括的所述第五特征图。
  13. 根据权利要求12所述的方法,其特征在于,依据所述第一集合中的第四特征图和所述第二集合中的第五特征图计算损失函数,包括:
    获取所述第四特征图与所述第五特征图之间的第一距离;
    获取所述第一特征图和所述第三特征图之间的第二距离;
    依据所述第一距离和所述第二距离计算所述损失函数。
  14. 根据权利要求10-13中任一项所述的方法,其特征在于,所述第一特征图的分辨率为W×H,所述第二特征图的分辨率为W`×H`,W`×H`<W×H。
  15. 一种图像处理装置,其特征在于,应用于发送节点,所述装置包括:
    收发单元,用于获取待处理图像;
    特征提取单元,用于利用卷积神经网络包括的特征提取层,对所述待处理图像进行特征提取,得到第一特征图;
    特征压缩单元,用于利用所述卷积神经网络包括的特征压缩层,压缩所述第一特征图 得到第二特征图,所述第二特征图的通道数小于所述第一特征图的通道数;
    所述收发单元,还用于向接收节点发送所述第二特征图。
  16. 根据权利要求15所述的装置,其特征在于,所述特征压缩层包括至少一层卷积层。
  17. 根据权利要求15或16所述的装置,其特征在于,所述第一特征图的分辨率为W×H,所述第二特征图的分辨率为W`×H`,W`×H`<W×H。
  18. 根据权利要求15-17中任一项所述的装置,其特征在于,所述收发单元,还用于接收所述待处理图像的图像处理结果,所述图像处理结果为所述接收节点依据所述第二特征图确定的。
  19. 根据权利要求18所述的装置,其特征在于,所述装置还包括:
    显示单元,用于显示所述图像处理结果。
  20. 一种图像处理装置,其特征在于,应用于接收节点,所述装置包括:
    收发单元,用于接收第二特征图,所述第二特征图是发送节点利用卷积神经网络,对待处理图像进行特征提取得到第一特征图,并将所述第一特征图进行压缩得到的;
    特征重构单元,用于利用所述卷积神经网络包括的特征重构层,对所述第二特征图进行重构,得到第三特征图,所述第二特征图的通道数小于所述第三特征图的通道数,所述第一特征图的通道数和所述第三特征图的通道数相同;
    图像处理单元,用于利用所述卷积神经网络包括的特征输出层和图像处理层,对所述第三特征图进行处理得到图像处理结果,所述图像处理结果指示所述待处理图像的信息;
    所述收发单元,用于发送所述图像处理结果。
  21. 根据权利要求20所述的装置,其特征在于,所述特征重构层包括至少一层反卷积层。
  22. 根据权利要求20或21所述的装置,其特征在于,所述第二特征图的分辨率为W`×H`,所述第三特征图的分辨率为W×H,W`×H`<W×H。
  23. 根据权利要求20-22中任一项所述的装置,其特征在于,所述收发单元,具体用于接收所述发送节点发送的第二特征图;
    所述收发单元,具体用于向所述发送节点发送所述图像处理结果。
  24. 一种卷积神经网络的训练装置,其特征在于,所述装置包括:
    获取单元,用于获取训练集,所述训练集包括至少一幅训练图像;
    处理单元,用于依据所述训练集、第一卷积神经网络的第一特征提取层和第一特征输出层,训练所述第一卷积神经网络的第一瓶颈结构层得到第二卷积神经网络,所述第二卷积神经网络包括所述第一特征提取层、第二瓶颈结构层和所述第一特征输出层;
    其中,所述第一特征提取层用于对待处理图像进行特征提取得到第一特征图,所述第二瓶颈结构层中的特征压缩层用于压缩所述第一特征图得到第二特征图,所述第二特征图的通道数小于所述第一特征图的通道数。
  25. 根据权利要求24所述的装置,其特征在于,所述处理单元包括:
    第一训练单元,用于将所述训练集输入第三卷积神经网络,得到第一集合,所述第三卷积神经网络包括第二特征提取层和第二特征输出层,所述第一特征提取层的参数与所述第二特征提取层的参数相同,所述第一特征输出层的参数与所述第二特征输出层的参数相 同,所述第一集合包括第四特征图,所述第四特征图为所述第二特征提取层和所述第二特征输出层对所述训练图像进行特征提取后得到的;
    第二训练单元,用于将所述训练集输入所述第一卷积神经网络,得到第二集合,所述第二集合包括第五特征图,所述第五特征图为所述第一瓶颈结构层和所述第一特征输出层对所述第二特征图进行特征重构和处理得到的;
    损失计算单元,用于依据所述第一集合中的第四特征图和所述第二集合中的第五特征图计算损失函数;
    第三训练单元,用于依据所述损失函数更新所述第一瓶颈结构层的参数,获取所述第二瓶颈结构层,得到所述第二卷积神经网络。
  26. 根据权利要求25所述的装置,其特征在于,所述第一瓶颈结构层包括特征压缩层和特征重构层;
    将所述训练集输入所述第一卷积神经网络,得到第二集合,包括:
    利用所述第一特征提取层,对所述训练图像进行特征提取得到所述第一特征图;
    利用所述第一瓶颈结构层包括的特征压缩层,压缩所述第一特征图得到第六特征图,所述第六特征图的通道数小于所述第一特征图的通道数;
    利用所述第一瓶颈结构层包括的特征重构层,重构所述第六特征图得到第三特征图,所述第三特征图的通道数和所述第一特征图的通道数相同;
    利用所述第二特征输出层,处理所述第三特征图得到所述第二集合包括的所述第五特征图。
  27. 根据权利要求26所述的装置,其特征在于,所述损失计算单元,具体用于获取所述第四特征图与所述第五特征图之间的第一距离;
    所述损失计算单元,具体用于获取所述第一特征图和所述第三特征图之间的第二距离;
    所述损失计算单元,具体用于依据所述第一距离和所述第二距离计算所述损失函数。
  28. 根据权利要求24-27中任一项所述的装置,其特征在于,所述第一特征图的分辨率为W×H,所述第二特征图的分辨率为W`×H`,W`×H`<W×H。
  29. 一种通信装置,其特征在于,包括处理器和接口电路,所述接口电路用于接收来自所述通信装置之外的其它通信装置的信号并传输至所述处理器或将来自所述处理器的信号发送给所述通信装置之外的其它通信装置,所述处理器通过逻辑电路或执行代码指令用于实现如权利要求1至9中任一项所述的方法,或权利要求10-14中任一项所述的方法。
  30. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被通信装置执行时,实现如权利要求1至9中任一项所述的方法,或权利要求10-14中任一项所述的方法。
PCT/CN2022/083614 2021-04-08 2022-03-29 一种图像处理方法、训练方法及装置 WO2022213843A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202280008179.8A CN116635895A (zh) 2021-04-08 2022-03-29 一种图像处理方法、训练方法及装置
EP22783910.7A EP4303818A1 (en) 2021-04-08 2022-03-29 Image processing method and apparatus, and training method and apparatus
US18/481,096 US20240029406A1 (en) 2021-04-08 2023-10-04 Image processing method, training method, and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2021109673A RU2773420C1 (ru) 2021-04-08 Способ обработки изображений, способ и устройство обучения
RU2021109673 2021-04-08

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/481,096 Continuation US20240029406A1 (en) 2021-04-08 2023-10-04 Image processing method, training method, and apparatus

Publications (1)

Publication Number Publication Date
WO2022213843A1 true WO2022213843A1 (zh) 2022-10-13

Family

ID=83545031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083614 WO2022213843A1 (zh) 2021-04-08 2022-03-29 一种图像处理方法、训练方法及装置

Country Status (4)

Country Link
US (1) US20240029406A1 (zh)
EP (1) EP4303818A1 (zh)
CN (1) CN116635895A (zh)
WO (1) WO2022213843A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3660785A1 (en) * 2018-11-30 2020-06-03 Laralab UG Method and system for providing an at least 3-dimensional medical image segmentation of a structure of an internal organ
CN111340901A (zh) * 2020-02-19 2020-06-26 国网浙江省电力有限公司 基于生成式对抗网络的复杂环境下输电网图片的压缩方法
CN112203098A (zh) * 2020-09-22 2021-01-08 广东启迪图卫科技股份有限公司 基于边缘特征融合和超分辨率的移动端图像压缩方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3660785A1 (en) * 2018-11-30 2020-06-03 Laralab UG Method and system for providing an at least 3-dimensional medical image segmentation of a structure of an internal organ
CN111340901A (zh) * 2020-02-19 2020-06-26 国网浙江省电力有限公司 基于生成式对抗网络的复杂环境下输电网图片的压缩方法
CN112203098A (zh) * 2020-09-22 2021-01-08 广东启迪图卫科技股份有限公司 基于边缘特征融合和超分辨率的移动端图像压缩方法

Also Published As

Publication number Publication date
CN116635895A (zh) 2023-08-22
US20240029406A1 (en) 2024-01-25
EP4303818A1 (en) 2024-01-10

Similar Documents

Publication Publication Date Title
EP3940591A1 (en) Image generating method, neural network compression method, and related apparatus and device
US20230196117A1 (en) Training method for semi-supervised learning model, image processing method, and device
WO2021042828A1 (zh) 神经网络模型压缩的方法、装置、存储介质和芯片
CN109711463B (zh) 基于注意力的重要对象检测方法
US11151447B1 (en) Network training process for hardware definition
US11244191B2 (en) Region proposal for image regions that include objects of interest using feature maps from multiple layers of a convolutional neural network model
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
CN112990219B (zh) 用于图像语义分割的方法和装置
EP4090022A1 (en) Image processing method and related device
CN116978011B (zh) 一种用于智能目标识别的图像语义通信方法及系统
CN112883887B (zh) 一种基于高空间分辨率光学遥感图像的建筑物实例自动提取方法
EP4283876A1 (en) Data coding method and related device
CN117197727B (zh) 一种基于全局时空特征学习的行为检测方法与系统
CN115131634A (zh) 图像识别方法、装置、设备、存储介质及计算机程序产品
WO2022213992A1 (zh) 一种数据处理方法及装置
WO2023019444A1 (zh) 语义分割模型的优化方法和装置
WO2024160219A1 (zh) 一种模型量化方法及其装置
CN114913339A (zh) 特征图提取模型的训练方法和装置
WO2023174256A1 (zh) 一种数据压缩方法以及相关设备
CN114529750A (zh) 图像分类方法、装置、设备及存储介质
WO2024001653A1 (zh) 特征提取方法、装置、存储介质及电子设备
WO2022213843A1 (zh) 一种图像处理方法、训练方法及装置
CN116543338A (zh) 一种基于注视目标估计的学生课堂行为检测方法
CN111626298A (zh) 一种实时图像语义分割装置及分割方法
CN116703944A (zh) 图像分割方法、图像分割装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22783910

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280008179.8

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2022783910

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022783910

Country of ref document: EP

Effective date: 20231005

NENP Non-entry into the national phase

Ref country code: DE