WO2020216227A9 - Procédé et appareil de classification d'image et procédé et appareil de traitement de données - Google Patents

Procédé et appareil de classification d'image et procédé et appareil de traitement de données Download PDF

Info

Publication number
WO2020216227A9
WO2020216227A9 PCT/CN2020/086015 CN2020086015W WO2020216227A9 WO 2020216227 A9 WO2020216227 A9 WO 2020216227A9 CN 2020086015 W CN2020086015 W CN 2020086015W WO 2020216227 A9 WO2020216227 A9 WO 2020216227A9
Authority
WO
WIPO (PCT)
Prior art keywords
mask
tensors
convolution
image
groups
Prior art date
Application number
PCT/CN2020/086015
Other languages
English (en)
Chinese (zh)
Other versions
WO2020216227A1 (fr
Inventor
韩凯
王云鹤
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020216227A1 publication Critical patent/WO2020216227A1/fr
Publication of WO2020216227A9 publication Critical patent/WO2020216227A9/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • N groups of mask tensors can be obtained from the register relatively quickly (compared to obtaining from external storage, the speed of obtaining parameters from the register will be faster), Can improve the execution speed of the above method to a certain extent.
  • the convolution kernel parameters of the M reference convolution kernels and the N sets of mask tensors are obtained by training the neural network according to the training image.
  • processing the multimedia data according to multiple convolution feature maps of the multimedia data includes: classifying or identifying the multimedia data according to the multiple convolution feature maps of the multimedia data.
  • the above-mentioned multimedia data is text, sound, picture (image), video, animation, etc.
  • performing deconvolution processing on multiple convolution feature maps of the road image to obtain the semantic segmentation result of the road image includes: splicing multiple convolution feature maps of the road image to obtain the target of the road image Convolution feature map: Deconvolution is performed on the target convolution feature map of the road image to obtain the semantic segmentation result of the road image.
  • each group of mask tensors in the N groups of mask tensors is composed of multiple mask tensors, and the number of bits occupied by the elements in the N groups of mask tensors is less than the number of M reference volumes.
  • the number of bits occupied by the elements in the convolution kernel parameters in the convolution kernel.
  • Each reference convolution kernel in the M reference convolution kernels corresponds to a group of mask tensors in the N groups of mask tensors.
  • FIG. 7 is a schematic flowchart of an image classification method according to an embodiment of the present application.
  • Figure 10 is a schematic diagram of the process of image classification using neural networks
  • FIG. 13 is a schematic diagram of the hardware structure of the neural network training device according to an embodiment of the present application.
  • Terminal equipment object detection
  • the mobile phone when a user uses a mobile phone to take a selfie, the mobile phone can automatically recognize the face according to the neural network model, and automatically capture the face to generate a prediction frame.
  • the neural network model in Figure 4 can be a target detection convolutional neural network model located in a mobile phone.
  • the target detection convolutional neural network model has the characteristics of fewer parameters (the convolution kernel has fewer parameters) and can be deployed in storage On mobile phones with limited resources.
  • the prediction box shown in FIG. 4 is only for illustration. For ease of understanding, the prediction box is directly displayed in the picture. In fact, the prediction box is displayed on the shooting interface of the selfie mobile phone.
  • the camera of an autonomous vehicle will capture the road image in real time.
  • the smart device in the autonomous vehicle needs to segment the captured road image to separate the road surface, roadbed, and vehicle. , Pedestrians, and other objects, and feed this information back to the control system of the autonomous vehicle, so that the autonomous vehicle can drive on the correct road area. Since autonomous driving has extremely high requirements for safety, smart devices in autonomous vehicles need to be able to quickly process and analyze captured real-time road images to obtain semantic segmentation results.
  • Deep neural network also called multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • the neural network can use an error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal neural network model parameters, such as the weight matrix.
  • the I/O interface 112 returns the processing result, such as the denoising processed image obtained as described above, to the client device 140 to provide it to the user.
  • the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 508.
  • the execution device 110 in FIG. 1 introduced above can execute each step of the image classification method or data processing method of the embodiment of this application.
  • the CNN model shown in FIG. 2 and the chip shown in FIG. 3 can also be used to execute this application.
  • the image classification method of the embodiment of the present application and the data processing method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • the value of each mask tensor in the first group of mask tensors is the same as the size of the first reference convolution kernel.
  • all mask tensors in each group of mask tensors in the foregoing N groups of mask tensors satisfy pairwise orthogonality.
  • the second way first perform convolution processing on the image to be processed according to M reference convolution kernels to obtain M reference convolution feature maps, and then obtain the to be processed according to M reference convolution feature maps and N sets of mask tensors Multiple convolution feature maps of the image.
  • F 11 to F ks are multiple sub-convolution kernels
  • X represents the image block to be processed
  • represents the element multiplication operation
  • Y represents the convolution feature map obtained by convolution
  • B i represents the i-th reference convolution kernel
  • M j represents the j-th mask tensor.
  • the sizes of these 3 convolution feature maps are c 1 ⁇ d 1 ⁇ d 2 , c 2 ⁇ d 1 ⁇ d 2 , and c 3 ⁇ d 1 ⁇ d 2 , then ,
  • the storage space occupied by the elements in the mask tensor is smaller. Therefore, the subconvolution kernel is obtained by combining the reference convolution kernel and the mask tensor. , The number of convolution kernel parameters is reduced, and the compression of convolution kernel parameters is realized, so that the neural network can be deployed on some devices with limited storage resources to perform image classification tasks.
  • sub-convolution kernel A sub-convolution kernel B
  • sub-convolution kernel C are essentially convolution kernels in a neural network, and are used to perform convolution processing on input data.
  • the subconvolution kernel A performs convolution processing on the input data to obtain a feature map A
  • the subconvolution kernel B performs convolution processing on the input data to obtain a feature map B
  • the subconvolution kernel C performs a convolution process on the input data.
  • the data is subjected to convolution processing to obtain feature maps C respectively.
  • the parameters of the convolution kernel of the first reference convolution kernel and the gradient of the parameters of the first group of mask tensors can be determined according to parameters such as the learning rate. After S6 is executed, S2 to S5 can be executed repeatedly until the preset loss function converges.
  • the data processing method shown in Figure 12 can be applied to the scene shown in Figure 5.
  • the multimedia data is a face image.
  • the convolution feature map of the face image can be obtained.
  • the identity of the person being photographed can be determined.
  • FIG. 15 is a schematic diagram of the hardware structure of a data processing device according to an embodiment of the present application.
  • the data processing device 5000 shown in FIG. 15 is similar to the image classification device 4000 in FIG. 14.
  • the data processing device 5000 includes a memory 5001, a processor 5002, a communication interface 5003, and a bus 5004. Among them, the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé et un appareil de classification d'image qui se rapportent au domaine de l'intelligence artificielle et concernent spécifiquement le domaine de la vision informatique. Le procédé de classification d'image consiste à : acquérir un paramètre de noyau de convolution d'un noyau de convolution de référence d'un réseau neuronal et un tenseur de masque du réseau neuronal et réaliser une opération de produit de Hadamard sur le noyau de convolution de référence du réseau neuronal et le tenseur de masque correspondant au noyau de convolution de référence pour obtenir une pluralité de sous-noyaux de convolution ; et selon la pluralité de sous-noyaux de convolution, effectuer un traitement de convolution sur une image à traiter et, selon une carte de caractéristiques de convolution finalement obtenue à partir d'une convolution, classifier l'image à traiter pour obtenir un résultat de classification de l'image à traiter. En raison du fait que le tenseur de masque occupe un espace de stockage plus petit par rapport au noyau de convolution, certains dispositifs à ressources de stockage limitées peuvent également déployer un réseau neuronal comprenant le noyau de convolution de référence et le tenseur de masque, de telle sorte que la classification d'image soit réalisée.
PCT/CN2020/086015 2019-04-24 2020-04-22 Procédé et appareil de classification d'image et procédé et appareil de traitement de données WO2020216227A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910335678.8A CN110188795B (zh) 2019-04-24 2019-04-24 图像分类方法、数据处理方法和装置
CN201910335678.8 2019-04-24

Publications (2)

Publication Number Publication Date
WO2020216227A1 WO2020216227A1 (fr) 2020-10-29
WO2020216227A9 true WO2020216227A9 (fr) 2020-11-26

Family

ID=67715037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086015 WO2020216227A1 (fr) 2019-04-24 2020-04-22 Procédé et appareil de classification d'image et procédé et appareil de traitement de données

Country Status (2)

Country Link
CN (1) CN110188795B (fr)
WO (1) WO2020216227A1 (fr)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188795B (zh) * 2019-04-24 2023-05-09 华为技术有限公司 图像分类方法、数据处理方法和装置
CN110738235B (zh) * 2019-09-16 2023-05-30 平安科技(深圳)有限公司 肺结核判定方法、装置、计算机设备及存储介质
CN110780923B (zh) * 2019-10-31 2021-09-14 合肥工业大学 应用于二值化卷积神经网络的硬件加速器及其数据处理方法
CN110995688B (zh) * 2019-11-27 2021-11-16 深圳申朴信息技术有限公司 一种用于互联网金融平台的个人数据共享方法、装置及终端设备
CN110991643B (zh) * 2019-12-25 2024-01-30 北京奇艺世纪科技有限公司 一种模型部署方法、装置、电子设备及存储介质
CN111126572B (zh) * 2019-12-26 2023-12-08 北京奇艺世纪科技有限公司 一种模型参数处理方法、装置、电子设备及存储介质
CN111275166B (zh) * 2020-01-15 2023-05-02 华南理工大学 基于卷积神经网络的图像处理装置、设备及可读存储介质
CN111260037B (zh) * 2020-02-11 2023-10-13 深圳云天励飞技术股份有限公司 图像数据的卷积运算方法、装置、电子设备及存储介质
CN111381968B (zh) * 2020-03-11 2023-04-25 中山大学 一种高效运行深度学习任务的卷积运算优化方法及系统
CN111539462B (zh) * 2020-04-15 2023-09-19 苏州万高电脑科技有限公司 模仿生物视觉神经元的图像分类方法、系统、装置及介质
CN111860582B (zh) * 2020-06-11 2021-05-11 北京市威富安防科技有限公司 图像分类模型构建方法、装置、计算机设备和存储介质
CN111708641B (zh) * 2020-07-14 2024-03-19 腾讯科技(深圳)有限公司 一种内存管理方法、装置、设备及计算机可读存储介质
CN111860522B (zh) * 2020-07-23 2024-02-02 中国平安人寿保险股份有限公司 身份证图片处理方法、装置、终端及存储介质
CN112215243A (zh) * 2020-10-30 2021-01-12 百度(中国)有限公司 图像特征提取方法、装置、设备及存储介质
CN112686249B (zh) * 2020-12-22 2022-01-25 中国人民解放军战略支援部队信息工程大学 一种基于对抗补丁的Grad-CAM攻击方法
WO2022141511A1 (fr) * 2020-12-31 2022-07-07 深圳市优必选科技股份有限公司 Procédé de classification d'image, dispositif informatique et support de stockage
CN112686320B (zh) * 2020-12-31 2023-10-13 深圳市优必选科技股份有限公司 图像分类方法、装置、计算机设备及存储介质
CN113138957A (zh) * 2021-03-29 2021-07-20 北京智芯微电子科技有限公司 用于神经网络推理的芯片及加速神经网络推理的方法
CN112990458B (zh) * 2021-04-14 2024-06-04 北京灵汐科技有限公司 卷积神经网络模型的压缩方法及装置
CN113392899B (zh) * 2021-06-10 2022-05-10 电子科技大学 一种基于二值化图像分类网络的图像分类方法
CN113239899B (zh) * 2021-06-17 2024-05-28 阿波罗智联(北京)科技有限公司 用于处理图像和生成卷积核的方法、路侧设备和云控平台
CN113536943B (zh) * 2021-06-21 2024-04-12 上海赫千电子科技有限公司 一种基于图像增强的道路交通标志识别方法
CN113537325B (zh) * 2021-07-05 2023-07-11 北京航空航天大学 一种用于图像分类的基于提取高低层特征逻辑的深度学习方法
CN113537492B (zh) * 2021-07-19 2024-04-26 第六镜科技(成都)有限公司 模型训练及数据处理方法、装置、设备、介质、产品
CN113642589B (zh) * 2021-08-11 2023-06-06 南方科技大学 图像特征提取方法及装置、计算机设备和可读存储介质
CN114491399A (zh) * 2021-12-30 2022-05-13 深圳云天励飞技术股份有限公司 数据处理方法、装置、终端设备及计算机可读存储介质
CN114239814B (zh) * 2022-02-25 2022-07-08 杭州研极微电子有限公司 用于图像处理的卷积神经网络模型的训练方法
CN115294381B (zh) * 2022-05-06 2023-06-30 兰州理工大学 基于特征迁移和正交先验的小样本图像分类方法及装置
CN115170917B (zh) * 2022-06-20 2023-11-07 美的集团(上海)有限公司 图像处理方法、电子设备及存储介质
CN115797709B (zh) * 2023-01-19 2023-04-25 苏州浪潮智能科技有限公司 一种图像分类方法、装置、设备和计算机可读存储介质
CN117726808A (zh) * 2023-09-21 2024-03-19 书行科技(北京)有限公司 一种模型生成方法、图像处理方法及相关设备
CN117314938B (zh) * 2023-11-16 2024-04-05 中国科学院空间应用工程与技术中心 一种基于多尺度特征融合译码的图像分割方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9202144B2 (en) * 2013-10-30 2015-12-01 Nec Laboratories America, Inc. Regionlets with shift invariant neural patterns for object detection
CN104517103A (zh) * 2014-12-26 2015-04-15 广州中国科学院先进技术研究所 一种基于深度神经网络的交通标志分类方法
EP3408798B1 (fr) * 2016-01-29 2020-07-15 FotoNation Limited Un reseau neuronal convolutionnel
CN106127297B (zh) * 2016-06-02 2019-07-12 中国科学院自动化研究所 基于张量分解的深度卷积神经网络的加速与压缩方法
US9779786B1 (en) * 2016-10-26 2017-10-03 Xilinx, Inc. Tensor operations and acceleration
US10037490B2 (en) * 2016-12-13 2018-07-31 Google Llc Performing average pooling in hardware
US11586905B2 (en) * 2017-10-11 2023-02-21 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for customizing kernel machines with deep neural networks
CN107886164A (zh) * 2017-12-20 2018-04-06 东软集团股份有限公司 一种卷积神经网络训练、测试方法及训练、测试装置
CN108229360B (zh) * 2017-12-26 2021-03-19 美的集团股份有限公司 一种图像处理的方法、设备及存储介质
CN108304795B (zh) * 2018-01-29 2020-05-12 清华大学 基于深度强化学习的人体骨架行为识别方法及装置
CN110188795B (zh) * 2019-04-24 2023-05-09 华为技术有限公司 图像分类方法、数据处理方法和装置

Also Published As

Publication number Publication date
WO2020216227A1 (fr) 2020-10-29
CN110188795A (zh) 2019-08-30
CN110188795B (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2020216227A9 (fr) Procédé et appareil de classification d'image et procédé et appareil de traitement de données
WO2020221200A1 (fr) Procédé de construction de réseau neuronal, procédé et dispositifs de traitement d'image
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
WO2021043168A1 (fr) Procédé d'entraînement de réseau de ré-identification de personnes et procédé et appareil de ré-identification de personnes
WO2021042828A1 (fr) Procédé et appareil de compression de modèle de réseau neuronal, ainsi que support de stockage et puce
WO2020253416A1 (fr) Procédé et dispositif de détection d'objet et support de stockage informatique
WO2021043112A1 (fr) Procédé et appareil de classification d'images
WO2021120719A1 (fr) Procédé de mise à jour de modèle de réseau neuronal, procédé et dispositif de traitement d'image
WO2021057056A1 (fr) Procédé de recherche d'architecture neuronale, procédé et dispositif de traitement d'image, et support de stockage
WO2021218517A1 (fr) Procédé permettant d'acquérir un modèle de réseau neuronal et procédé et appareil de traitement d'image
WO2020177607A1 (fr) Procédé et appareil de débruitage d'image
WO2021018163A1 (fr) Procédé et appareil de recherche de réseau neuronal
WO2021147325A1 (fr) Procédé et appareil de détection d'objets, et support de stockage
US12026938B2 (en) Neural architecture search method and image processing method and apparatus
WO2022052601A1 (fr) Procédé d'apprentissage de modèle de réseau neuronal ainsi que procédé et dispositif de traitement d'image
WO2022001805A1 (fr) Procédé et dispositif de distillation de réseau neuronal
WO2021022521A1 (fr) Procédé de traitement de données et procédé et dispositif d'apprentissage de modèle de réseau neuronal
US20220335583A1 (en) Image processing method, apparatus, and system
WO2021155792A1 (fr) Appareil de traitement, procédé et support de stockage
WO2022001372A1 (fr) Procédé et appareil d'entraînement de réseau neuronal, et procédé et appareil de traitement d'image
US12039440B2 (en) Image classification method and apparatus, and image classification model training method and apparatus
WO2021018251A1 (fr) Procédé et dispositif de classification d'image
WO2021164750A1 (fr) Procédé et appareil de quantification de couche convolutive
CN113065645B (zh) 孪生注意力网络、图像处理方法和装置
CN110222718B (zh) 图像处理的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20794289

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20794289

Country of ref document: EP

Kind code of ref document: A1