WO2020237519A1 - 识别方法、装置、设备以及存储介质 - Google Patents

识别方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2020237519A1
WO2020237519A1 PCT/CN2019/088960 CN2019088960W WO2020237519A1 WO 2020237519 A1 WO2020237519 A1 WO 2020237519A1 CN 2019088960 W CN2019088960 W CN 2019088960W WO 2020237519 A1 WO2020237519 A1 WO 2020237519A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
neural network
feedforward
computer
feature
Prior art date
Application number
PCT/CN2019/088960
Other languages
English (en)
French (fr)
Inventor
邹文斌
王振楠
徐晨
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2019/088960 priority Critical patent/WO2020237519A1/zh
Publication of WO2020237519A1 publication Critical patent/WO2020237519A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the field of artificial intelligence technology, in particular to an identification method, device, equipment and storage medium.
  • Models based on neural networks have achieved excellent performance in many tasks, such as computer vision and natural language processing.
  • these models rely on gradient-based optimization or training. Therefore, vector multiplication is one of the most basic operations of neural networks, and the change of its gradient has a great influence on the optimization of neural networks.
  • the vector multiplication algorithm of Inner Product is generally used. Take the weight vector w and the feedforward vector x (that is, the input vector passed to this layer) in any dimensional space as an example, and if P represents the vector inner product, then:
  • FIG. 1 is the orthogonal decomposition of the local gradient of the weight vector w.
  • the vector x is orthogonally decomposed into a vector projection along the weight vector w (Vector Projection) Px and a deviation vector ( Vector Rejection) Rx. Since the projection vector Px is parallel to the weight vector w, what Px changes is the modulus length of the weight vector w, which is called the modulus length gradient of w; and Rx is perpendicular to the weight vector w, then Rx changes the direction of the weight vector w, Call it the directional gradient of w.
  • the present invention provides a recognition method, device, equipment and storage medium to solve the technical problem that the inner product of the existing weight vector w and feedforward vector x is only related to the projection vector Px, which causes the direction of the weight vector w to be unable to be updated. .
  • the present invention provides an identification method, including:
  • a neural network to train the object to be identified to output a feature vector; wherein the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation and feedforward vector of the weight vector and the feedforward vector in the intermediate layer Projection correlation in the vertical direction of the weight vector;
  • the recognition of the object to be recognized is realized according to the feature vector.
  • the inner product operation of the two vectors is specifically:
  • w and x represent the weight vector and the feedforward vector
  • is the angle between the vector w and the vector x
  • ⁇ ⁇ 2 is the modulus of the vector
  • * means to separate* from the neural network model.
  • the inner product operation of the two vectors is specifically:
  • * T represents the transposed vector of vector*.
  • the object to be recognized is a picture, so that the neural network is used to train the picture to obtain a recognition result of the picture.
  • the feature vector is pixel feature information of the picture.
  • the object to be recognized is a voice, so that the neural network is used to train the voice to obtain a recognition result of the voice.
  • the feature vector is word feature information of the speech.
  • an identification device including:
  • the acquisition module is used to acquire the object to be identified
  • the training module is used to train the object to be recognized using a neural network to output feature vectors; wherein the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation of the two vectors in the intermediate layer is The projection of one vector in the vertical direction of another vector is related;
  • the recognition module is used to realize the recognition of the object to be recognized according to the feature vector.
  • the training module specifically includes:
  • w and x represent the weight vector and the feedforward vector
  • is the angle between the vector w and the vector x
  • ⁇ ⁇ 2 is the modulus of the vector
  • * means to separate* from the neural network model.
  • the training module specifically includes:
  • * T represents the transposed vector of vector*.
  • the object to be identified is a picture.
  • the feature vector is pixel feature information of the picture.
  • the object to be recognized is a sentence.
  • the feature vector is word feature information of the sentence.
  • the present invention provides an electronic device, including: at least one processor and a memory;
  • the memory stores computer execution instructions
  • the at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the identification method described in the first aspect and the optional solution.
  • the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the first aspect and can be implemented The identification method described in the option.
  • the present invention provides a recognition method, device, equipment, and storage medium.
  • a neural network is used to train the object to be recognized to output a feature vector; wherein the weight vector and the feedforward vector in the intermediate layer of the neural network are
  • the inner product operation is related to the projection of the feedforward vector in the vertical direction of the weight vector, making the modulus length of the local directional gradient of the weight vector w independent of the included angle ⁇ , that is, regardless of the value of the included angle, the modulus length of the local directional gradient of w Are the modulus length
  • Figure 1 is an orthogonal decomposition diagram of the local gradient of the weight vector w;
  • Fig. 2 is a schematic flowchart of an identification method according to an exemplary embodiment of the present invention.
  • Fig. 4 is a schematic flowchart of an identification device according to an exemplary embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
  • the traditional vector inner product only contains the information of the projection vector Px of the vector x on the vector w, and does not contain the information of the deviation vector Rx of the vector x from the vector w. Therefore, in Euclidean space, the vector inner product is also called the projection product.
  • the local gradient of the inner product of the vector to the weight vector w is as follows:
  • Px is parallel to w, which is its mode length gradient
  • Rx is perpendicular to w, which is its direction gradient.
  • the direction gradient Rx will change with the change of the included angle ⁇ , which will cause certain difficulties in optimization.
  • the present invention provides a recognition method, device, equipment and storage medium to solve the technical problem that the inner product of the existing weight vector w and feedforward vector x is only related to the projection vector Px, which causes the direction of the weight vector w to be unable to be updated. .
  • FIG. 2 is a schematic flowchart of an identification method according to an exemplary embodiment of the present invention. As shown in FIG. 2, this embodiment provides an identification method, including:
  • the recognition method can be applied to artificial intelligence fields such as computer vision, natural speech processing, and recommendation systems.
  • the field of computer vision includes: image recognition, video classification, target detection, target tracking, visual saliency analysis, image and video description, face recognition, visual question and answer, behavior understanding, abnormal behavior detection and other technical fields; in video surveillance, Robots, intelligent driving, drones and other application fields.
  • the object to be recognized is a picture, and picture information can be collected through a camera, and other existing technologies can be used to collect picture information, which will not be repeated here.
  • Natural language processing fields include: machine translation, speech recognition, part-of-speech tagging, natural language generation, text classification, information retrieval and extraction, question answering systems, automatic summarization, etc.
  • the object to be recognized is sentence information.
  • the user can input sentence information through the input interface to collect the sentence information to be recognized.
  • Other existing technologies can also be used to collect sentence information, which will not be repeated here.
  • the above-mentioned object to be recognized may be a picture, and this recognition method is used for picture recognition, and then applied to the field of computer machine vision.
  • the above-mentioned object to be recognized can also be speech, and the recognition method is used for speech recognition, and then used in the field of natural speech processing.
  • the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation of the weight vector and the feedforward vector in the intermediate layer is related to the projection of the feedforward vector in the vertical direction of the weight vector.
  • the inner product operation of the weight vector and the feedforward vector is specifically:
  • * means to separate * from the neural network model.
  • separation means that when calculating the gradient, * is regarded as a constant, and the derivative of * is not obtained.
  • the vector multiplication algorithm proposed by the present invention not only uses the information of the projection vector Px of the vector x on the vector w, but also uses the information of the deviation vector Rx of the vector x from the vector w. So it is called Projection and Rejection Product (PR Product).
  • PR Product Projection and Rejection Product
  • formula (6) is the same as formula (2), and the local gradient of the projection deviation product to the weight vector w is no longer derived.
  • the local gradient of the projected deviation product to the weight vector w is derived as follows:
  • E rx is the unit vector of the vector R x .
  • P x is parallel to w and is the gradient of the modulus length of w, which is the same as the traditional vector inner product;
  • 2 E rx is perpendicular to w and is the gradient of the direction of w.
  • Fig. 3 is an orthogonal decomposition diagram of the local gradient of the weight vector w proposed by the present invention.
  • 2 represents the modulus length of the vector *
  • E rx represents the unit vector along the vector Rx (the direction is consistent with Rx, and the modulus length is 1 vector).
  • the directional gradient does not change with the change of the included angle ⁇ .
  • the two directions are consistent, but the directional gradient of the projection deviation product to w is always longer than the prior art
  • the inner product of the medium vector must be large and always equal to the modulus length
  • * means to separate * from the neural network model, that is, treat * as a constant when calculating the gradient in back propagation.
  • the pixel information of the picture is input into the above-mentioned neural network, and after the above-mentioned neural network is processed, a feature vector is output.
  • the aforementioned feature vector contains pixel information, and the recognition result of the picture can be obtained according to the aforementioned feature vector.
  • the word information of the speech is input into the aforementioned neural network, and after the aforementioned neural network is processed, a feature vector is output.
  • the aforementioned feature vector contains word information, and the speech recognition result can be obtained according to the aforementioned feature vector.
  • the feature vector is pixel feature information of the picture, and the recognition of the object to be recognized is realized according to the pixel feature information.
  • the feature vector is the word feature information of the sentence, and the recognition of the object to be recognized is realized according to the word feature information.
  • the projection deviation product is used to perform two vector operations.
  • the principle advantage is that the modulus length of the local directional gradient of w is independent of the included angle, and both are the modulus length of the feedforward vector x
  • the use of the projected deviation product proposed by the present invention in feedforward neural networks, convolutional neural networks and recurrent neural networks, experiments on multiple tasks and multiple data sets show that: compared with traditional vector
  • the inner product, the projection deviation product proposed by the present invention can robustly improve the performance of the neural network model.
  • FIG. 4 is a schematic flowchart of an identification device according to an exemplary embodiment of the present invention. As shown in FIG. 4, this embodiment provides an identification device, including:
  • the obtaining module 201 is used to obtain the object to be identified
  • the training module 202 is used to train the object to be identified using a neural network to output feature vectors; wherein the neural network includes an input layer, an intermediate layer, and an output layer; the inner product operation of the two vectors in the intermediate layer is The projection of one vector in the vertical direction of the other vector is related;
  • the recognition module 203 is configured to recognize the object to be recognized according to the feature vector.
  • the training module 202 specifically includes:
  • w and x represent the weight vector and the feedforward vector
  • is the angle between the vector w and the vector x
  • ⁇ ⁇ 2 is the modulus of the vector
  • * means to separate* from the neural network model.
  • the training module 202 specifically includes:
  • * T represents the transposed vector of vector*.
  • the object to be identified is a picture.
  • the feature vector is pixel feature information of the picture.
  • the object to be recognized is a sentence.
  • the feature vector is word feature information of the sentence.
  • Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
  • the electronic device 300 of this embodiment includes: a processor 301 and a memory 302.
  • the memory 302 is used to store computer execution instructions
  • the processor 301 is configured to execute computer-executable instructions stored in the memory to implement each step executed by the receiving device in the foregoing embodiment. For details, refer to the related description in the foregoing method embodiment.
  • the memory 302 may be independent or integrated with the processor 301.
  • the electronic device 300 further includes a bus 303 for connecting the memory 302 and the processor 301.
  • An embodiment of the present invention also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the aforementioned identification method is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供一种识别方法、装置、设备以及存储介质,该方法包括:获取待识别对象;利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关;根据所述特征向量实现对待识别对象的识别。本发明提供的识别方法中,神经网络中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关,使得权重向量的局部方向梯度的模长和夹角无关,使得神经网络中不存在权重向量的局部方向更新的阻碍,提升了神经网络性能,使识别精度更高。

Description

识别方法、装置、设备以及存储介质 技术领域
本发明涉及人工智能技术领域,尤其涉及一种识别方法、装置、设备以及存储介质。
背景技术
基于神经网络的模型,如前馈神经网络、卷积神经网络和递归神经网络,在很多任务上,都取得了卓越的性能,如计算机视觉、自然语言处理等。目前,这些模型依赖于基于梯度的优化或训练。因此,向量相乘作为神经网络的最基本操作之一,其梯度的变化对神经网络的优化有很大的影响。
在神经网络中,一般都使用向量内积(Inner Product)这种向量相乘的算法。以任意维空间下的权重向量w和前馈向量x(也就是传到这一层的输入向量)为例,以P代表向量内积,则有:
P=w Tx   (1)
其中,* T表示向量*的转置向量。那么P对w的局部梯度就是前馈向量x。图1为权重向量w的局部梯度的正交分解,如图1所示,把向量x做正交分解为沿着权重向量w的投影向量(Vector Projection)Px和垂直权重向量w的背离向量(Vector Rejection)Rx。由于投影向量Px平行于权重向量w,那么Px改变的是权重向量w的模长,称之为w的模长梯度;而Rx垂直于权重向量w,那么Rx改变的是权重向量w的方向,称之为w的方向梯度。
当权重向量w和前馈向量x的夹角越来越接近于0或者π时,则Rx的模长越来越小,也就是权重向量w的方向梯度越来越小。这会直接对权重向量w的方向的更新造成一定的困难。
技术问题
本发明提供一种识别方法、装置、设备以及存储介质,以解决现有的权重向量w和前馈向量x的内积仅与投影向量Px有关,造成权 重向量w的方向的无法更新的技术问题。
技术解决方案
第一方面,本发明提供一种识别方法,包括:
获取待识别对象;
利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关;
根据所述特征向量实现对待识别对象的识别。
可选地,所述两个向量的内积运算具体为:
PR(w,x=||w|| 2[ |sinθ|||P x|| 2sign(cosθ)+ cosθ(||x|| 2-||R x|| 2)]
其中,w和x分别表示权重向量与前馈向量,θ为向量w和向量x之间的夹角,‖ ‖ 2表示向量的模数, *表示将*从神经网络模型中分离。
可选地,所述两个向量的内积运算具体为:
Figure PCTCN2019088960-appb-000001
其中,* T表示向量*的转置向量。
可选地,所述待识别对象为图片,以利用所述神经网络对所述图片进行训练获得对所述图片的识别结果。
可选地,所述特征向量为图片的像素特征信息。
可选地,所述待识别对象为语音,以利用所述神经网络对所述语音进行训练获得对所述语音的识别结果。
可选地,所述特征向量为语音的词语特征信息。
第二方面,本发明提供一种识别装置,包括:
获取模块,用于获取待识别对象;
训练模块,用于利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中两个向量的内积运算与其中一个向量在另一个向量的垂直方向 的投影相关;
识别模块,用于根据所述特征向量实现对待识别对象的识别。
可选地,训练模块具体包括:
PR(w,x=||w|| 2[ |sinθ|||P x|| 2sign(cosθ)+ cosθ(||x|| 2-||R x|| 2)]
其中,w和x分别表示权重向量与前馈向量,θ为向量w和向量x之间的夹角,‖ ‖ 2表示向量的模数, *表示将*从神经网络模型中分离。
可选地,训练模块具体包括:
Figure PCTCN2019088960-appb-000002
其中,* T表示向量*的转置向量。
可选地,所述待识别对象为图片。
可选地,所述特征向量为图片的像素特征信息。
可选地,所述待识别对象为语句。
可选地,所述特征向量为语句的词语特征信息。
第三方面,本发明提供一种电子设备,包括:至少一个处理器和存储器;
其中,所述存储器存储计算机执行指令;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行第一方面及可选方案所述的识别方法。
第三方面,本发明提供一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现第一方面及可选方案所述的识别方法。
有益效果
本发明提供一种识别方法、装置、设备以及存储介质,在上述识别方法中,利用神经网络训练所述待识别对象,以输出特征向量;其中,神经网络中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关,使得权重向量w的局部方向梯度 的模长和夹角θ无关,也就是不管夹角为何值,w的局部方向梯度的模长都是前馈向量x的模长||x|| 2;而||x|| 2一般都大于||Rx||,除非夹角是π/2或3π/2(这种情况几乎不可能)时,二者相等。所以,相比传统的向量内积,本发明提供的识别算法中神经网络不存在权重向量w的局部方向更新的阻碍,提升了神经网络性能,使识别精度更高。
附图说明
图1为权重向量w的局部梯度的正交分解图;
图2为本发明根据一示例性实施例示出的识别方法的流程示意图;
图3为本发明提出的权重向量w的局部梯度的正交分解图;
图4为本发明根据一示例性实施例示出的识别装置的流程示意图;
图5为本发明根据一示例性实施例示出的电子设备的结构示意图。
本发明的实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在欧式空间中,现有的向量内积还有另外一种几何定义:
P(w,x)=w Tx=||w|| 2||x|| 2cosθ   (2)
即,两个向量的模长和二者夹角的余弦的乘积。
由于,向量x在w上的投影向量Px的模长为:
||P x|| 2=||x|| 2|cosθ|   (3)
所以,公式(2)可以写成:
Figure PCTCN2019088960-appb-000003
其中,sign(*)表示*的符号。从公式(4)中,可以看出,传统的向量内积只包含向量x在向量w上的投影向量Px信息,而没有包含向量x背离向量w的背离向量Rx的信息。因此,在欧式空间中,向量内积又称为投影积(projection product)。
向量内积对权重向量w的局部梯度如下:
Figure PCTCN2019088960-appb-000004
其中,Px平行于w,是其模长梯度;Rx垂直于w,是其方向梯度。方向梯度Rx会随着夹角θ的改变而改变,这会对优化造成一定的困难。
本发明提供一种识别方法、装置、设备以及存储介质,以解决现有的权重向量w和前馈向量x的内积仅与投影向量Px有关,造成权重向量w的方向的无法更新的技术问题。
图2为本发明根据一示例性实施例示出的识别方法的流程示意图。如图2所示,本实施例提供一种识别方法,包括:
S101、获取待识别对象。
更具体地,在本实施例中,该识别方法可以应用于计算机视觉、自然语音处理、推荐系统等人工智能领域。
其中,计算机视觉领域包含:图像识别、视频分类、目标检测、目标跟踪、视觉显著性分析、图像及视频描述、人脸识别、视觉问答、行为理解、异常行为检测等技术领域;在视频监控、机器人、智能驾驶、无人机等应用领域。在计算机视觉领域,待识别对象为图片,可以通过摄像头采集图片信息,可以采用其他现有技术采集图片信息,此处不再赘述。
自然语言处理领域包括:机器翻译、语音识别、词性标注、自然语言生成、文本分类、信息检索与抽取、问答系统、自动摘要等等。在自然语音处理领域,待识别对象为语句信息,用户可以通过输入界 面输入语句信息,以采集待识别的语句信息,也可以采用其他的现有技术采集语句信息,此处不再赘述。
上述待识别对象可以为图片,该识别方法用于图片识别,进而应用于计算机机器视觉领域。上述待识别对象也可以为语音,则该识别方法用于语音识别,进而用于自然语音处理领域。
S102、利用神经网络训练所述待识别对象,以输出特征向量。
更具体地,神经网络包括输入层、中间层及输出层;中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关。
权重向量与前馈向量的内积运算具体为:
PR(w,x=||w|| 2[ |sinθ|||P x|| 2sign(cosθ)+ cosθ(||x|| 2-||R x|| 2)]=||w|| 2||x|| 2[ |sinθ|cosθ+ cosθ(1-|sinθ|)]    (6)
由图1及直角三角形性质可轻易得出,如下公式
||P x|| 2=||x|| 2|cosθ|
||R x|| 2=||x|| 2|sinθ|
在公式(6)中, *表示将*从神经网络模型中分离,在这里,分离的意思是在计算梯度时,将*看做常数,而不对*求导数。
可以看出,本发明提出的这种向量相乘的算法,既利用了向量x在向量w上的投影向量Px信息,也利用了向量x背离向量w的背离向量Rx的信息。因此称为投影背离积(Projection and Rejection Product,简称PR Product)。
在神经网络的前向传播时,公式(6)和公式(2)相同,不再推导投影背离积对权重向量w的局部梯度。在反向传播时,投影背离积对权重向量w的局部梯度推导如下:
Figure PCTCN2019088960-appb-000005
其中,
Figure PCTCN2019088960-appb-000006
M w表示权重向量w的投影矩阵,这是矩阵论中的基本知识,其性质是:M wx=P x。E rx是向量R x的单位向量。P x平行于w,是w的模长梯度,这和传统的向量内积是一样的;||x|| 2E rx垂直于w,是w的方向梯度。
图3为本发明提出的权重向量w的局部梯度的正交分解图。如图3所示,其中,||*|| 2表示向量*的模长,E rx表示沿着向量Rx的单位向量(方向和Rx一致,模长为1的向量)。该方向梯度不随夹角θ的改变而改变,和现有技术中向量内积对w的方向梯度相比,二者方向一致,但投影背离积对w的方向梯度的模长总比现有技术中向量内积的要大,且恒等于前馈向量x的模长||x|| 2,使得对w方向的更新不会产生阻碍。
由于神经网络中,θ不能直接得到,因此不能依赖公式(6)直接进行两个向量的投影背离积的运算。由公式(2)可得:
cosθ=w Tx/‖w‖ 2‖x‖ 2  (8)
由勾股定理可得:
Figure PCTCN2019088960-appb-000007
将公式(8)、(9)代入公式(6),可得到投影背离积的实施公式:
Figure PCTCN2019088960-appb-000008
同样, *表示将*从神经网络模型中分离,也就是在反向传播计算梯度时,将*看做常数。
只要按照公式(10)进行的向量相乘的操作,不论公式(10)中各组分的计算顺序,都属于本专利的保护范围。不论什么类型的神经网络,如前馈神经网络、卷积神经网络、递归神经网络,都可以使用本发明提出的投影背离积。
当待识别对象为图片时,将图片的像素信息输入至上述神经网络中,经过上述神经网络处理后,输出特征向量。上述特征向量中包含有像素信息,可以根据上述特征向量获得图片的识别结果。
当待识别对象为语音时,将语音的词语信息输入至上述神经网络中,经过上述神经网络处理后,输出特征向量。上述特征向量中包含有词语信息,可根据上述特征向量获得语音的识别结果。
S103、根据所述特征向量实现对待识别对象的识别。
更具体地,当待识别对象为图片时,特征向量为图片的像素特征信息,根据像素特征信息实现对待识别对象的识别。
当待识别对象为语句,特征向量为语句的词语特征信息,根据词语特征信息实现对待识别对象的识别。
在本实施例提供的识别方法中,利用投影背离积进行两个向量运算,在原理上的优势是:w的局部方向梯度的模长和夹角无关,都是前馈向量x的模长||x||2;而||x||2一般都大于||Rx||,除非夹角是π/2或3π/2(这种情况几乎不可能)时,二者相等。所以,相比传统的向量内积,这种算法不会阻碍对权重向量w的方向的更新。
在应用中的优势,在前馈神经网络、卷积神经网络和递归神经网络中使用本发明提出的投影背离积,在多种任务和多个数据集上的实验表明:相比于传统的向量内积,本发明提出的投影背离积可以鲁棒地提高神经网络模型的性能。
图4为本发明根据一示例性实施例示出的识别装置的流程示意图。如图4所示,本实施例提供一种识别装置,包括:
获取模块201,用于获取待识别对象;
训练模块202,用于利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中两个向量的内积运算与其中一个向量在另一个向量的垂直方向的投影相关;
识别模块203,用于根据所述特征向量实现对待识别对象的识别。
可选地,训练模块202具体包括:
PR(w,x=||w|| 2[ |sinθ|||P x|| 2sign(cosθ)+ cosθ(||x|| 2-||R x|| 2)]
其中,w和x分别表示权重向量与前馈向量,θ为向量w和向量x之间的夹角,‖ ‖ 2表示向量的模数, *表示将*从神经网络模型中分离。
可选地,训练模块202具体包括:
Figure PCTCN2019088960-appb-000009
其中,* T表示向量*的转置向量。
可选地,所述待识别对象为图片。
可选地,所述特征向量为图片的像素特征信息。
可选地,所述待识别对象为语句。
可选地,所述特征向量为语句的词语特征信息。
图5为本发明根据一示例性实施例示出的电子设备的结构示意图。如图2所示,本实施例的电子设备300包括:处理器301以及存储器302。
存储器302,用于存储计算机执行指令;
处理器301,用于执行存储器存储的计算机执行指令,以实现上述实施例中接收设备所执行的各个步骤。具体可以参见前述方法实施例中的相关描述。
可选的,存储器302既可以是独立的,也可以跟处理器301集成在一起。
当存储器302独立设置时,该电子设备300还包括总线303,用于连接所述存储器302和处理器301。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现上述的识别方法。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (10)

  1. 一种识别方法,其特征在于,包括:
    获取待识别对象;
    利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中权重向量与前馈向量的内积运算与前馈向量在权重向量的垂直方向的投影相关;
    根据所述特征向量实现对待识别对象的识别。
  2. 根据权利要求1所述的方法,其特征在于,所述权重向量与前馈向量的内积运算具体为:
    PR(w,x=||w|| 2[ |sinθ|||P x|| 2sign(cosθ)+ cosθ(||x|| 2-||R x|| 2)]
    其中,w和x分别表示权重向量与前馈向量,θ为向量w和向量x之间的夹角,‖ ‖ 2表示向量的模数, *表示将*从神经网络模型中分离。
  3. 根据权利要求2所述的方法,其特征在于,所述两个向量的内积运算具体为:
    Figure PCTCN2019088960-appb-100001
    其中,* T表示向量*的转置向量。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述待识别对象为图片,以利用所述神经网络对所述图片进行训练获得对所述图片的识别结果。
  5. 根据权利要求4所述的方法,其特征在于,所述特征向量为图片的像素特征信息。
  6. 根据权利要求1至3任一项所述的方法,其特征在于,所述待识别对象为语音,以利用所述神经网络对所述语音进行训练获得对所述语音的识别结果。
  7. 根据权利要求4所述的方法,其特征在于,所述特征向量为语音的词语特征信息。
  8. 一种识别装置,其特征在于,包括:
    获取模块,用于获取待识别对象;
    训练模块,用于利用神经网络训练所述待识别对象,以输出特征向量;其中,所述神经网络包括输入层、中间层及输出层;所述中间层中两个向量的内积运算与其中一个向量在另一个向量的垂直方向的投影相关;
    识别模块,用于根据所述特征向量实现对待识别对象的识别。
  9. 一种电子设备,其特征在于,包括:至少一个处理器和存储器;
    其中,所述存储器存储计算机执行指令;
    所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如权利要求1至7任一项所述的识别方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至7任一项所述的识别方法。
PCT/CN2019/088960 2019-05-29 2019-05-29 识别方法、装置、设备以及存储介质 WO2020237519A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/088960 WO2020237519A1 (zh) 2019-05-29 2019-05-29 识别方法、装置、设备以及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/088960 WO2020237519A1 (zh) 2019-05-29 2019-05-29 识别方法、装置、设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2020237519A1 true WO2020237519A1 (zh) 2020-12-03

Family

ID=73553011

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088960 WO2020237519A1 (zh) 2019-05-29 2019-05-29 识别方法、装置、设备以及存储介质

Country Status (1)

Country Link
WO (1) WO2020237519A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630613A (zh) * 2022-12-19 2023-01-20 长沙冉星信息科技有限公司 一种问卷调查中评价类问题的自动编码系统及其方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180397A1 (zh) * 2014-05-31 2015-12-03 华为技术有限公司 一种基于深度神经网络的数据类别识别方法及装置
CN105631899A (zh) * 2015-12-28 2016-06-01 哈尔滨工业大学 一种基于灰度纹理特征的超声图像运动目标跟踪方法
CN106778882A (zh) * 2016-12-23 2017-05-31 杭州云象网络技术有限公司 一种基于前馈神经网络的智能合约自动分类方法
CN109190496A (zh) * 2018-08-09 2019-01-11 华南理工大学 一种基于多特征融合的单目静态手势识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180397A1 (zh) * 2014-05-31 2015-12-03 华为技术有限公司 一种基于深度神经网络的数据类别识别方法及装置
CN105631899A (zh) * 2015-12-28 2016-06-01 哈尔滨工业大学 一种基于灰度纹理特征的超声图像运动目标跟踪方法
CN106778882A (zh) * 2016-12-23 2017-05-31 杭州云象网络技术有限公司 一种基于前馈神经网络的智能合约自动分类方法
CN109190496A (zh) * 2018-08-09 2019-01-11 华南理工大学 一种基于多特征融合的单目静态手势识别方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630613A (zh) * 2022-12-19 2023-01-20 长沙冉星信息科技有限公司 一种问卷调查中评价类问题的自动编码系统及其方法
CN115630613B (zh) * 2022-12-19 2023-04-07 长沙冉星信息科技有限公司 一种问卷调查中评价类问题的自动编码系统及其方法

Similar Documents

Publication Publication Date Title
EP3968179A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
CN109919209B (zh) 一种领域自适应深度学习方法及可读存储介质
CN112307940A (zh) 模型训练方法、人体姿态检测方法、装置、设备及介质
JP2022554068A (ja) ビデオコンテンツ認識方法、装置、プログラム及びコンピュータデバイス
WO2022218012A1 (zh) 特征提取方法、装置、设备、存储介质以及程序产品
CN111382647B (zh) 一种图片处理方法、装置、设备及存储介质
CN110349161A (zh) 图像分割方法、装置、电子设备、及存储介质
CN115331150A (zh) 图像识别方法、装置、电子设备、存储介质
WO2020237519A1 (zh) 识别方法、装置、设备以及存储介质
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
Gu et al. A robust attention-enhanced network with transformer for visual tracking
Huang et al. Joint representation learning for text and 3D point cloud
KR20160128869A (ko) 사전 정보를 이용한 영상 물체 탐색 방법 및 이를 수행하는 장치
CN110263820A (zh) 识别方法、装置、设备以及存储介质
WO2023102724A1 (zh) 图像的处理方法和系统
CN115457268A (zh) 一种基于混合结构的分割方法、装置及存储介质
WO2021223747A1 (zh) 视频处理方法、装置、电子设备、存储介质及程序产品
JP2022068146A (ja) データ注釈方法、装置、電子機器、記憶媒体およびコンピュータプログラム
Jambhulkar et al. Real-Time Object Detection and Audio Feedback for the Visually Impaired
Kwon et al. An introduction to face-recognition methods and its implementation in software applications
CN113722514A (zh) 一种基于深度学习的互联网教育视频图像筛选提取法
Chae et al. Smart advisor: Real-time information provider with mobile augmented reality
CN111695526A (zh) 网络模型生成方法、行人重识别方法及装置
Liu et al. Feature-driven motion model-based particle-filter tracking method with abrupt motion handling
Zhang et al. Multi-level Cross-attention Siamese Network For Visual Object Tracking.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19930401

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19930401

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19930401

Country of ref document: EP

Kind code of ref document: A1