WO2020237519A1 - 识别方法、装置、设备以及存储介质 - Google Patents
识别方法、装置、设备以及存储介质 Download PDFInfo
- Publication number
- WO2020237519A1 WO2020237519A1 PCT/CN2019/088960 CN2019088960W WO2020237519A1 WO 2020237519 A1 WO2020237519 A1 WO 2020237519A1 CN 2019088960 W CN2019088960 W CN 2019088960W WO 2020237519 A1 WO2020237519 A1 WO 2020237519A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- neural network
- feedforward
- computer
- feature
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
Definitions
- the present invention relates to the field of artificial intelligence technology, in particular to an identification method, device, equipment and storage medium.
- Models based on neural networks have achieved excellent performance in many tasks, such as computer vision and natural language processing.
- these models rely on gradient-based optimization or training. Therefore, vector multiplication is one of the most basic operations of neural networks, and the change of its gradient has a great influence on the optimization of neural networks.
- the vector multiplication algorithm of Inner Product is generally used. Take the weight vector w and the feedforward vector x (that is, the input vector passed to this layer) in any dimensional space as an example, and if P represents the vector inner product, then:
- FIG. 1 is the orthogonal decomposition of the local gradient of the weight vector w.
- the vector x is orthogonally decomposed into a vector projection along the weight vector w (Vector Projection) Px and a deviation vector ( Vector Rejection) Rx. Since the projection vector Px is parallel to the weight vector w, what Px changes is the modulus length of the weight vector w, which is called the modulus length gradient of w; and Rx is perpendicular to the weight vector w, then Rx changes the direction of the weight vector w, Call it the directional gradient of w.
- the present invention provides a recognition method, device, equipment and storage medium to solve the technical problem that the inner product of the existing weight vector w and feedforward vector x is only related to the projection vector Px, which causes the direction of the weight vector w to be unable to be updated. .
- the present invention provides an identification method, including:
- a neural network to train the object to be identified to output a feature vector; wherein the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation and feedforward vector of the weight vector and the feedforward vector in the intermediate layer Projection correlation in the vertical direction of the weight vector;
- the recognition of the object to be recognized is realized according to the feature vector.
- the inner product operation of the two vectors is specifically:
- w and x represent the weight vector and the feedforward vector
- ⁇ is the angle between the vector w and the vector x
- ⁇ ⁇ 2 is the modulus of the vector
- * means to separate* from the neural network model.
- the inner product operation of the two vectors is specifically:
- * T represents the transposed vector of vector*.
- the object to be recognized is a picture, so that the neural network is used to train the picture to obtain a recognition result of the picture.
- the feature vector is pixel feature information of the picture.
- the object to be recognized is a voice, so that the neural network is used to train the voice to obtain a recognition result of the voice.
- the feature vector is word feature information of the speech.
- an identification device including:
- the acquisition module is used to acquire the object to be identified
- the training module is used to train the object to be recognized using a neural network to output feature vectors; wherein the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation of the two vectors in the intermediate layer is The projection of one vector in the vertical direction of another vector is related;
- the recognition module is used to realize the recognition of the object to be recognized according to the feature vector.
- the training module specifically includes:
- w and x represent the weight vector and the feedforward vector
- ⁇ is the angle between the vector w and the vector x
- ⁇ ⁇ 2 is the modulus of the vector
- * means to separate* from the neural network model.
- the training module specifically includes:
- * T represents the transposed vector of vector*.
- the object to be identified is a picture.
- the feature vector is pixel feature information of the picture.
- the object to be recognized is a sentence.
- the feature vector is word feature information of the sentence.
- the present invention provides an electronic device, including: at least one processor and a memory;
- the memory stores computer execution instructions
- the at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the identification method described in the first aspect and the optional solution.
- the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the first aspect and can be implemented The identification method described in the option.
- the present invention provides a recognition method, device, equipment, and storage medium.
- a neural network is used to train the object to be recognized to output a feature vector; wherein the weight vector and the feedforward vector in the intermediate layer of the neural network are
- the inner product operation is related to the projection of the feedforward vector in the vertical direction of the weight vector, making the modulus length of the local directional gradient of the weight vector w independent of the included angle ⁇ , that is, regardless of the value of the included angle, the modulus length of the local directional gradient of w Are the modulus length
- Figure 1 is an orthogonal decomposition diagram of the local gradient of the weight vector w;
- Fig. 2 is a schematic flowchart of an identification method according to an exemplary embodiment of the present invention.
- Fig. 4 is a schematic flowchart of an identification device according to an exemplary embodiment of the present invention.
- Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
- the traditional vector inner product only contains the information of the projection vector Px of the vector x on the vector w, and does not contain the information of the deviation vector Rx of the vector x from the vector w. Therefore, in Euclidean space, the vector inner product is also called the projection product.
- the local gradient of the inner product of the vector to the weight vector w is as follows:
- Px is parallel to w, which is its mode length gradient
- Rx is perpendicular to w, which is its direction gradient.
- the direction gradient Rx will change with the change of the included angle ⁇ , which will cause certain difficulties in optimization.
- the present invention provides a recognition method, device, equipment and storage medium to solve the technical problem that the inner product of the existing weight vector w and feedforward vector x is only related to the projection vector Px, which causes the direction of the weight vector w to be unable to be updated. .
- FIG. 2 is a schematic flowchart of an identification method according to an exemplary embodiment of the present invention. As shown in FIG. 2, this embodiment provides an identification method, including:
- the recognition method can be applied to artificial intelligence fields such as computer vision, natural speech processing, and recommendation systems.
- the field of computer vision includes: image recognition, video classification, target detection, target tracking, visual saliency analysis, image and video description, face recognition, visual question and answer, behavior understanding, abnormal behavior detection and other technical fields; in video surveillance, Robots, intelligent driving, drones and other application fields.
- the object to be recognized is a picture, and picture information can be collected through a camera, and other existing technologies can be used to collect picture information, which will not be repeated here.
- Natural language processing fields include: machine translation, speech recognition, part-of-speech tagging, natural language generation, text classification, information retrieval and extraction, question answering systems, automatic summarization, etc.
- the object to be recognized is sentence information.
- the user can input sentence information through the input interface to collect the sentence information to be recognized.
- Other existing technologies can also be used to collect sentence information, which will not be repeated here.
- the above-mentioned object to be recognized may be a picture, and this recognition method is used for picture recognition, and then applied to the field of computer machine vision.
- the above-mentioned object to be recognized can also be speech, and the recognition method is used for speech recognition, and then used in the field of natural speech processing.
- the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation of the weight vector and the feedforward vector in the intermediate layer is related to the projection of the feedforward vector in the vertical direction of the weight vector.
- the inner product operation of the weight vector and the feedforward vector is specifically:
- * means to separate * from the neural network model.
- separation means that when calculating the gradient, * is regarded as a constant, and the derivative of * is not obtained.
- the vector multiplication algorithm proposed by the present invention not only uses the information of the projection vector Px of the vector x on the vector w, but also uses the information of the deviation vector Rx of the vector x from the vector w. So it is called Projection and Rejection Product (PR Product).
- PR Product Projection and Rejection Product
- formula (6) is the same as formula (2), and the local gradient of the projection deviation product to the weight vector w is no longer derived.
- the local gradient of the projected deviation product to the weight vector w is derived as follows:
- E rx is the unit vector of the vector R x .
- P x is parallel to w and is the gradient of the modulus length of w, which is the same as the traditional vector inner product;
- 2 E rx is perpendicular to w and is the gradient of the direction of w.
- Fig. 3 is an orthogonal decomposition diagram of the local gradient of the weight vector w proposed by the present invention.
- 2 represents the modulus length of the vector *
- E rx represents the unit vector along the vector Rx (the direction is consistent with Rx, and the modulus length is 1 vector).
- the directional gradient does not change with the change of the included angle ⁇ .
- the two directions are consistent, but the directional gradient of the projection deviation product to w is always longer than the prior art
- the inner product of the medium vector must be large and always equal to the modulus length
- * means to separate * from the neural network model, that is, treat * as a constant when calculating the gradient in back propagation.
- the pixel information of the picture is input into the above-mentioned neural network, and after the above-mentioned neural network is processed, a feature vector is output.
- the aforementioned feature vector contains pixel information, and the recognition result of the picture can be obtained according to the aforementioned feature vector.
- the word information of the speech is input into the aforementioned neural network, and after the aforementioned neural network is processed, a feature vector is output.
- the aforementioned feature vector contains word information, and the speech recognition result can be obtained according to the aforementioned feature vector.
- the feature vector is pixel feature information of the picture, and the recognition of the object to be recognized is realized according to the pixel feature information.
- the feature vector is the word feature information of the sentence, and the recognition of the object to be recognized is realized according to the word feature information.
- the projection deviation product is used to perform two vector operations.
- the principle advantage is that the modulus length of the local directional gradient of w is independent of the included angle, and both are the modulus length of the feedforward vector x
- the use of the projected deviation product proposed by the present invention in feedforward neural networks, convolutional neural networks and recurrent neural networks, experiments on multiple tasks and multiple data sets show that: compared with traditional vector
- the inner product, the projection deviation product proposed by the present invention can robustly improve the performance of the neural network model.
- FIG. 4 is a schematic flowchart of an identification device according to an exemplary embodiment of the present invention. As shown in FIG. 4, this embodiment provides an identification device, including:
- the obtaining module 201 is used to obtain the object to be identified
- the training module 202 is used to train the object to be identified using a neural network to output feature vectors; wherein the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation of the two vectors in the intermediate layer is The projection of one vector in the vertical direction of the other vector is related;
- the recognition module 203 is configured to recognize the object to be recognized according to the feature vector.
- the training module 202 specifically includes:
- w and x represent the weight vector and the feedforward vector
- ⁇ is the angle between the vector w and the vector x
- ⁇ ⁇ 2 is the modulus of the vector
- * means to separate* from the neural network model.
- the training module 202 specifically includes:
- * T represents the transposed vector of vector*.
- the object to be identified is a picture.
- the feature vector is pixel feature information of the picture.
- the object to be recognized is a sentence.
- the feature vector is word feature information of the sentence.
- Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
- the electronic device 300 of this embodiment includes: a processor 301 and a memory 302.
- the memory 302 is used to store computer execution instructions
- the processor 301 is configured to execute computer-executable instructions stored in the memory to implement each step executed by the receiving device in the foregoing embodiment. For details, refer to the related description in the foregoing method embodiment.
- the memory 302 may be independent or integrated with the processor 301.
- the electronic device 300 further includes a bus 303 for connecting the memory 302 and the processor 301.
- An embodiment of the present invention also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the aforementioned identification method is implemented.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种识别方法,其特征在于,包括:获取待识别对象;利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关;根据所述特征向量实现对待识别对象的识别。
- 根据权利要求1所述的方法,其特征在于,所述权重向量与前馈向量的内积运算具体为:PR(w,x=||w|| 2[ |sinθ|||P x|| 2sign(cosθ)+ cosθ(||x|| 2-||R x|| 2)]其中,w和x分别表示权重向量与前馈向量,θ为向量w和向量x之间的夹角,‖ ‖ 2表示向量的模数, *表示将*从神经网络模型中分离。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述待识别对象为图片,以利用所述神经网络对所述图片进行训练获得对所述图片的识别结果。
- 根据权利要求4所述的方法,其特征在于,所述特征向量为图片的像素特征信息。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述待识别对象为语音,以利用所述神经网络对所述语音进行训练获得对所述语音的识别结果。
- 根据权利要求4所述的方法,其特征在于,所述特征向量为语音的词语特征信息。
- 一种识别装置,其特征在于,包括:获取模块,用于获取待识别对象;训练模块,用于利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中两个向量的内积运算与其中一个向量在另一个向量的垂直方向的投影相关;识别模块,用于根据所述特征向量实现对待识别对象的识别。
- 一种电子设备,其特征在于,包括:至少一个处理器和存储器;其中,所述存储器存储计算机执行指令;所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如权利要求1至7任一项所述的识别方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至7任一项所述的识别方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/088960 WO2020237519A1 (zh) | 2019-05-29 | 2019-05-29 | 识别方法、装置、设备以及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/088960 WO2020237519A1 (zh) | 2019-05-29 | 2019-05-29 | 识别方法、装置、设备以及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020237519A1 true WO2020237519A1 (zh) | 2020-12-03 |
Family
ID=73553011
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/088960 WO2020237519A1 (zh) | 2019-05-29 | 2019-05-29 | 识别方法、装置、设备以及存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020237519A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115630613A (zh) * | 2022-12-19 | 2023-01-20 | 长沙冉星信息科技有限公司 | 一种问卷调查中评价类问题的自动编码系统及其方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015180397A1 (zh) * | 2014-05-31 | 2015-12-03 | 华为技术有限公司 | 一种基于深度神经网络的数据类别识别方法及装置 |
CN105631899A (zh) * | 2015-12-28 | 2016-06-01 | 哈尔滨工业大学 | 一种基于灰度纹理特征的超声图像运动目标跟踪方法 |
CN106778882A (zh) * | 2016-12-23 | 2017-05-31 | 杭州云象网络技术有限公司 | 一种基于前馈神经网络的智能合约自动分类方法 |
CN109190496A (zh) * | 2018-08-09 | 2019-01-11 | 华南理工大学 | 一种基于多特征融合的单目静态手势识别方法 |
-
2019
- 2019-05-29 WO PCT/CN2019/088960 patent/WO2020237519A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015180397A1 (zh) * | 2014-05-31 | 2015-12-03 | 华为技术有限公司 | 一种基于深度神经网络的数据类别识别方法及装置 |
CN105631899A (zh) * | 2015-12-28 | 2016-06-01 | 哈尔滨工业大学 | 一种基于灰度纹理特征的超声图像运动目标跟踪方法 |
CN106778882A (zh) * | 2016-12-23 | 2017-05-31 | 杭州云象网络技术有限公司 | 一种基于前馈神经网络的智能合约自动分类方法 |
CN109190496A (zh) * | 2018-08-09 | 2019-01-11 | 华南理工大学 | 一种基于多特征融合的单目静态手势识别方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115630613A (zh) * | 2022-12-19 | 2023-01-20 | 长沙冉星信息科技有限公司 | 一种问卷调查中评价类问题的自动编码系统及其方法 |
CN115630613B (zh) * | 2022-12-19 | 2023-04-07 | 长沙冉星信息科技有限公司 | 一种问卷调查中评价类问题的自动编码系统及其方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3968179A1 (en) | Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device | |
CN109919209B (zh) | 一种领域自适应深度学习方法及可读存储介质 | |
CN112307940A (zh) | 模型训练方法、人体姿态检测方法、装置、设备及介质 | |
JP2022554068A (ja) | ビデオコンテンツ認識方法、装置、プログラム及びコンピュータデバイス | |
WO2022218012A1 (zh) | 特征提取方法、装置、设备、存储介质以及程序产品 | |
CN111382647B (zh) | 一种图片处理方法、装置、设备及存储介质 | |
CN110349161A (zh) | 图像分割方法、装置、电子设备、及存储介质 | |
CN115331150A (zh) | 图像识别方法、装置、电子设备、存储介质 | |
WO2020237519A1 (zh) | 识别方法、装置、设备以及存储介质 | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity | |
Gu et al. | A robust attention-enhanced network with transformer for visual tracking | |
Huang et al. | Joint representation learning for text and 3D point cloud | |
KR20160128869A (ko) | 사전 정보를 이용한 영상 물체 탐색 방법 및 이를 수행하는 장치 | |
CN110263820A (zh) | 识别方法、装置、设备以及存储介质 | |
WO2023102724A1 (zh) | 图像的处理方法和系统 | |
CN115457268A (zh) | 一种基于混合结构的分割方法、装置及存储介质 | |
WO2021223747A1 (zh) | 视频处理方法、装置、电子设备、存储介质及程序产品 | |
JP2022068146A (ja) | データ注釈方法、装置、電子機器、記憶媒体およびコンピュータプログラム | |
Jambhulkar et al. | Real-Time Object Detection and Audio Feedback for the Visually Impaired | |
Kwon et al. | An introduction to face-recognition methods and its implementation in software applications | |
CN113722514A (zh) | 一种基于深度学习的互联网教育视频图像筛选提取法 | |
Chae et al. | Smart advisor: Real-time information provider with mobile augmented reality | |
CN111695526A (zh) | 网络模型生成方法、行人重识别方法及装置 | |
Liu et al. | Feature-driven motion model-based particle-filter tracking method with abrupt motion handling | |
Zhang et al. | Multi-level Cross-attention Siamese Network For Visual Object Tracking. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19930401 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19930401 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19930401 Country of ref document: EP Kind code of ref document: A1 |