WO2020147369A1 - Procédé de traitement du langage naturel, procédé d'apprentissage et dispositif de traitement des données - Google Patents

Procédé de traitement du langage naturel, procédé d'apprentissage et dispositif de traitement des données Download PDF

Info

Publication number
WO2020147369A1
WO2020147369A1 PCT/CN2019/114146 CN2019114146W WO2020147369A1 WO 2020147369 A1 WO2020147369 A1 WO 2020147369A1 CN 2019114146 W CN2019114146 W CN 2019114146W WO 2020147369 A1 WO2020147369 A1 WO 2020147369A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
processing
granularity
feature
words
Prior art date
Application number
PCT/CN2019/114146
Other languages
English (en)
Chinese (zh)
Inventor
李梓超
蒋欣
刘群
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020147369A1 publication Critical patent/WO2020147369A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of natural language processing, in particular to a natural language processing method, training method and data processing equipment.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Natural language processing tasks can be divided into different granularities, generally divided into character level, word level, phrase level, sentence level, discourse level, etc. These particle sizes become coarser in turn.
  • part-of-speech tagging is a word-level task
  • named entity recognition is a phrase-level task
  • syntactic analysis is usually a sentence-level task.
  • Information at different granularities is not isolated, but is transmitted to each other.
  • the word-level and phrase-level features are usually considered.
  • sentence classification sentence-to-sentence semantic matching
  • sentence translation or rewriting it is usually necessary to use multiple granular information, and finally synthesize it.
  • the current mainstream natural language processing method based on deep learning is to process natural language text through neural networks.
  • the neural network processes the words of different granularity in the processing process are mixed together, and the probability of obtaining the correct processing result is low. Therefore, new solutions need to be studied.
  • the embodiments of the present application provide a natural language processing method, training method, and data processing device, which can avoid the process of obtaining coarser-grained information from finer-grained information, and can effectively improve the performance of processing natural language processing tasks.
  • the embodiments of the present application provide a natural language processing method, which includes: obtaining natural language text to be processed; processing the natural language text using a deep neural network obtained by training, and output processing the natural language text The target result obtained from the text; wherein the deep neural network includes: a granular annotation network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network, and the processing includes: using the The granularity tagging network determines the granularity of each word in the natural language text; using the first feature network to perform feature extraction on the first granular words in the natural language text, and output the obtained first feature information to the first feature information A processing network; using the second feature network to perform feature extraction on words with a second granularity in the natural language text, and output the obtained second feature information to the second processing network; using the first processing network Process the first characteristic information, and output the obtained first processing result to the fusion network; use the second processing network to perform the processing on the second characteristic information
  • the deep neural network may include N feature networks and N processing networks.
  • the N feature networks and the N processing networks have a one-to-one correspondence, and N is an integer greater than one.
  • a pair of corresponding feature networks and processing networks are used to process words of the same granularity. Since the data processing equipment separates words of different granularities for processing, the processing operations for words of each granularity do not depend on the processing results of words of other granularities, which avoids obtaining coarser-grained information from finer-grained information This process greatly reduces the probability that the data processing device will get wrong results.
  • the data processing device uses a deep neural network to independently process words of different granularity, avoiding the process of obtaining coarser-grained information from finer-grained information, and can effectively improve the performance of processing natural processing tasks.
  • the architecture of the first characteristic network and the second characteristic network are different, and/or the architecture of the first processing network and the second processing network are different.
  • Words with different granularities have different characteristics. Using networks with different architectures to process words with different granularities can more specifically process words with different granularities.
  • words of different granularities are processed through feature networks of different architectures or processing networks of different architectures, which further improves the performance of the data processing device in processing natural language processing tasks.
  • the input of the granular annotation network is the natural language text
  • the using the granular annotation network to determine the granularity of each word in the natural language text includes: using the granular annotation network Determine the granularity of each word in the natural language text according to N granularities to obtain the annotation information of the natural language text, and output the annotation information to the first feature network and the second feature network; wherein, The label information is used to describe the granularity of each word or the probability that each word belongs to the N granularities; N is an integer greater than 1;
  • the using the first feature network to perform feature extraction on the words of the first granularity in the natural language text includes: using the first feature network to process the words of the first granularity to obtain the first feature information,
  • the first feature information is a vector or matrix representing words of the first granularity;
  • the using the second feature network to perform feature extraction on the words of the second granularity in the natural language text includes: using the second feature network to process the words of the second granularity to obtain the second feature information,
  • the second feature information is a vector or matrix representing words of the second granularity.
  • the granular annotation network can accurately determine the granularity of each word in the natural language text, so that each feature network can process words with a specific granularity.
  • the granular labeling network includes a long and short-term memory network LSTM and a bidirectional long short-term memory network BiLSTM; and the using the granular labeling network to determine the granularity of each word in the natural language text includes:
  • g l LSTM([h l ,z l-1 ;g l-1 ]);
  • BiLSTM() in the formula represents the processing operation of the LSTM
  • LSTM() represents the processing operation of the BiLSTM
  • x represents a word in the natural language text
  • x l represents the first natural language text x l words
  • h represents the hidden state variable in the BiLSMT network
  • h l , h l-1 , h l+1 in turn indicate that the BiLSMT network processes the lth word
  • the hidden state variable of the (l+1)th word is
  • g represents the hidden state variable in the LSTM network
  • g l and g l-1 respectively represent the hidden state when the LSMT network processes the lth word and the (l-1)th word in the natural language text variable.
  • z represents the probability that a word belongs to the reference granularity
  • z l-1 and z l respectively represent the probability that the lth word and the (l-1)th word in the natural language text belong to the reference granularity
  • the reference granularity is Any one of the N types of granularities
  • GS represents the Gumbel Softmax function
  • is a hyperparameter (temperature) in the Gumbel Softmax function
  • Wg is a parameter matrix, that is, a parameter matrix in the granularity annotation network.
  • the granular annotation network uses the architecture of a multi-layer LSTM network to determine the granularity of each word in the natural language text, and can make full use of the granularity of the determined word to determine the granularity of the new word (word of the granularity to be determined) , Simple implementation and high processing efficiency.
  • the using the first feature network to perform feature extraction on words with a first granularity in the natural language text includes:
  • ENC z represents the first feature network
  • the first feature network is a Transformer model
  • ENC z () represents the processing operation performed by the first feature network
  • X represents the natural language text
  • Z X [z1,z2,...,zL] represents the label information
  • z1 to z1 sequentially represent the granularity of the first word to the Lth (last) word in the natural language text
  • Uz represents the first feature The first characteristic information output by the network.
  • the feature network can be used to accurately and quickly extract the feature information of the corresponding granular words.
  • the first processing result is a sequence containing one or more words
  • the processing of the first characteristic information using the first processing network includes: using the first The processing network processes the input first feature information and the words that have been output by the first processing network in the process of processing the first feature information to obtain the first processing result.
  • the first processing network adopts a recursive manner to process the feature information output by the corresponding feature network, which can make full use of the relevance of each word in the natural language text, thereby improving the efficiency and accuracy of processing.
  • the target result output by the fusion network is a sequence containing one or more words, and the fusion network is used to fuse the first processing result and the second processing result
  • Obtaining the target result includes: using the fusion network to process the first processing result, the second processing result, and the fusion network has outputted in the process of processing the first processing result and the second processing result To determine the target words to be output, output the target words.
  • the fusion network uses a recursive method to process the processing results input to it by each processing network, which can make full use of the relevance of each word in the natural language text, thereby improving the efficiency and accuracy of its processing.
  • the converged network includes at least one LSTM network, and the converged network is used to process the first processing result, the second processing result, and the converged network is processing the first
  • the processing result and the sequence output in the process of the second processing result to determine the target word to be output include:
  • the LSTM network uses the following formula to calculate the probability of a word with a reference granularity to be output:
  • h t LSMT(h t-1 ,y t-1 ,v0,v1);
  • h t represents the hidden state variable in the LSMT network when the LSMT network processes the t-th word
  • h t-1 represents the hidden state variable in the LSMT network when the LSMT network processes the (t-1)-th word
  • Hidden state variable LSMT() represents the processing operation done by LSMT
  • the LMST network has currently output (t-1) words
  • the y t-1 represents the (t-1)th output of the fusion network Words
  • v0 represents the first processing result
  • v1 represents the second processing result
  • W z is a parameter matrix in the fusion network
  • is a hyperparameter
  • y 1:t-1 , X) is the probability of the word of the reference granularity (granularity z) currently to be output
  • t is an integer greater than 1.
  • y 1:t-1 ,X) represents the probability of outputting the target word y t at the reference granularity
  • y 1:t-1 ,X) represents Output the probability of the target word.
  • y 1:t-1 ,X) can be given by the processing network.
  • the processing network of granularity z can input the probability of each word in the words (words of granularity z) currently to be output to the fusion network.
  • the fusion network can calculate the probability of each word being output among the words currently to be output, and output the word with the highest probability of being output (target word).
  • the embodiments of the present application provide a training method, which includes: inputting training samples into a deep neural network for processing to obtain a prediction processing result; wherein the deep neural network includes: a granular annotation network, a first feature Network, a second feature network, a first processing network, a second processing network, and a fusion network.
  • the processing includes: using the granularity labeling network to determine the granularity of each word in the training sample; using the first feature network to Perform feature extraction on words of the first granularity in the training sample, and output the obtained third feature information to the first processing network; use the second feature network to feature words of the second granularity in the training sample Extracting, outputting the obtained fourth characteristic information to the second processing network; using the first processing network to perform target processing on the third characteristic information, and outputting the obtained third processing result to the fusion network; Use the second processing network to perform the target processing on the fourth characteristic information, and output the obtained fourth processing result to the fusion network; use the fusion network to fuse the third processing result and the first Four processing results obtain the prediction processing result; the first granularity and the second granularity are different; according to the prediction processing result and the standard result, the loss corresponding to the training sample is determined; the standard result is using the The deep neural network processes the expected processing result of the training sample; using the loss corresponding to the training sample, the parameters of the deep neural
  • the data processing device trains a deep neural network that can independently process words of different granularities, so as to obtain a deep neural network that can avoid the process of obtaining coarser-grained information from finer-grained information, and is simple to implement.
  • the architecture of the first characteristic network and the second characteristic network are different, and/or the architecture of the first processing network and the second processing network are different.
  • the input of the granular annotation network is the natural language text
  • the using the granular annotation network to determine the granularity of each word in the natural language text includes: using the granular annotation network Determine the granularity of each word in the natural language text according to N granularities to obtain the annotation information of the natural language text, and output the annotation information to the first feature network and the second feature network; wherein, The label information is used to describe the granularity of each word or the probability that each word belongs to the N granularities; N is an integer greater than 1;
  • the using the first feature network to perform feature extraction on the words of the first granularity in the natural language text includes: using the first feature network to process the words of the first granularity to obtain the third feature information,
  • the third feature information is a vector or matrix representing words of the first granularity;
  • the using the second feature network to perform feature extraction on the words of the second granularity in the natural language text includes: using the second feature network to process the words of the second granularity to obtain the fourth feature information,
  • the fourth feature information is a vector or matrix representing words of the second granularity.
  • the granular labeling network includes a long and short-term memory network LSTM and a bidirectional long short-term memory network BiLSTM; and the using the granular labeling network to determine the granularity of each word in the natural language text includes:
  • g l LSTM([h l ,z l-1 ;g l-1 ]);
  • BiLSTM() in the formula represents the processing operation of the LSTM
  • LSTM() represents the processing operation of the BiLSTM
  • x represents a word in the natural language text
  • x l represents the first natural language text x l words
  • h represents the hidden state variable in the BiLSMT network
  • h l , h l-1 , h l+1 in turn indicate that the BiLSMT network processes the lth word
  • the hidden state variable of the (l+1)th word is
  • g represents the hidden state variable in the LSTM network
  • g l and g l-1 respectively represent the hidden state when the LSMT network processes the lth word and the (l-1)th word in the natural language text variable.
  • z represents the probability that a word belongs to the reference granularity
  • z l-1 and z l respectively represent the probability that the lth word and the (l-1)th word in the natural language text belong to the reference granularity
  • the reference granularity is Any one of the N types of granularities
  • GS represents the Gumbel Softmax function
  • is a hyperparameter (temperature) in the Gumbel Softmax function
  • Wg is a parameter matrix, that is, a parameter matrix in the granularity annotation network.
  • the granular annotation network uses the architecture of a multi-layer LSTM network to determine the granularity of each word in the natural language text, and can make full use of the granularity of the determined word to determine the granularity of the new word (word of the granularity to be determined) , Simple implementation and high processing efficiency.
  • the using the first feature network to perform feature extraction on words with a first granularity in the natural language text includes:
  • ENC z represents the first feature network
  • the first feature network is a Transformer model
  • ENC z () represents the processing operation performed by the first feature network
  • X represents the natural language text
  • Z X [z1,z2,...,zL] represents the label information
  • z1 to z1 sequentially represent the granularity of the first word to the Lth (last) word in the natural language text
  • Uz represents the first feature The third characteristic information output by the network.
  • the third processing result is a sequence containing one or more words
  • the processing of the third characteristic information using the first processing network includes: using the first processing network The processing network processes the input third characteristic information and the words that have been output by the first processing network in the process of processing the third characteristic information to obtain the third processing result.
  • the target result output by the fusion network is a sequence containing one or more words
  • the fusion network is used to fuse the third processing result and the fourth processing result
  • Obtaining the target result includes: using the fusion network to process the third processing result, the fourth processing result, and the fusion network has output in the process of processing the third processing result and the fourth processing result To determine the target words to be output, output the target words.
  • the converged network includes at least one LSTM network, and the converged network is used to process the third processing result, the fourth processing result, and the third processing result of the converged network.
  • the processing result and the sequence output in the process of the fourth processing result to determine the target word to be output include:
  • the LSTM network uses the following formula to calculate the probability of a word with a reference granularity to be output:
  • h t LSMT(h t-1 ,y t-1 ,v2,v3);
  • h t represents the hidden state variable in the LSMT network when the LSMT network processes the t-th word
  • h t-1 represents the hidden state variable in the LSMT network when the LSMT network processes the (t-1)-th word
  • Hidden state variable LSMT() represents the processing operation done by LSMT
  • the LMST network has currently output (t-1) words
  • the y t-1 represents the (t-1)th output of the fusion network Words
  • v2 represents the third processing result
  • v3 represents the fourth processing result
  • W z is a parameter matrix in the fusion network
  • is a hyperparameter
  • y 1:t-1 , X) is the probability of the word of the reference granularity (granularity z) currently to be output
  • t is an integer greater than 1.
  • y 1:t-1 ,X) represents the probability of outputting the target word y t at the reference granularity
  • y 1:t-1 ,X) represents Output the probability of the target word.
  • the using the loss corresponding to the training sample to update the parameters of the deep neural network through an optimization algorithm includes:
  • the loss function is used to calculate the loss between the prediction processing result and the standard result; Wherein, during the update process of any one of the first characteristic network, the second characteristic network, the first processing network, and the second processing network, the parameters of any one of the other three networks are maintained constant.
  • the embodiments of the application provide a data processing device.
  • the data processing device includes: an acquisition unit for obtaining natural language texts to be processed; a processing unit for processing the natural language text obtained by training using a deep neural network; Language and text are processed; wherein the deep neural network includes: a granular annotation network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network, and the processing includes: using the granularity
  • the tagging network determines the granularity of each word in the natural language text; using the first feature network to perform feature extraction on words with the first granularity in the natural language text, and output the obtained first feature information to the first Processing network; using the second feature network to perform feature extraction on words of the second granularity in the natural language text, and output the obtained second feature information to the second processing network; using the first processing network to The first characteristic information is processed, and the obtained first processing result is output to the fusion network; the second processing network is used to perform the processing on the
  • the data processing device uses a deep neural network to independently process words of different granularity, avoiding the process of obtaining coarser-grained information from finer-grained information, and can effectively improve the performance of processing natural processing tasks.
  • the architecture of the first characteristic network and the second characteristic network are different, and/or the architecture of the first processing network and the second processing network are different.
  • the input of the granular annotation network is the natural language text; the processing unit is specifically configured to use the granular annotation network to determine each of the natural language texts according to N types of granularities.
  • the granularity of words is used to obtain the annotation information of the natural language text, and the annotation information is output to the first feature network and the second feature network; wherein the annotation information is used to describe the granularity of each word Or the probability that each word belongs to the N types of granularities; N is an integer greater than 1;
  • the processing unit is specifically configured to process the words of the first granularity by using the first characteristic network to obtain the first characteristic information, where the first characteristic information is a vector or word representing the words of the first granularity matrix;
  • the processing unit is specifically configured to use the second feature network to process the words of the second granularity to obtain the second feature information, where the second feature information is a vector or word representing the words of the second granularity. matrix.
  • the granular labeling network includes a long short-term memory network LSTM and a bidirectional long short-term memory network BiLSTM; the processing unit is specifically configured to use the granular labeling network to determine the natural language using the following formula The granularity of words in the text:
  • g l LSTM([h l ,z l-1 ;g l-1 ]);
  • BiLSTM() in the formula represents the processing operation of the LSTM
  • LSTM() represents the processing operation of the BiLSTM
  • x represents a word in the natural language text
  • x l represents the first natural language text x l words
  • h represents the hidden state variable in the BiLSMT network
  • h l , h l-1 , h l+1 in turn indicate that the BiLSMT network processes the lth word
  • the hidden state variable of the (l+1)th word is
  • g represents the hidden state variable in the LSTM network
  • g l and g l-1 respectively represent the hidden state when the LSMT network processes the lth word and the (l-1)th word in the natural language text variable.
  • z represents the probability that a word belongs to the reference granularity
  • z l-1 and z l respectively represent the probability that the lth word and the (l-1)th word in the natural language text belong to the reference granularity
  • the reference granularity is Any one of the N types of granularities
  • GS represents the Gumbel Softmax function
  • is a hyperparameter (temperature) in the Gumbel Softmax function
  • Wg is a parameter matrix, that is, a parameter matrix in the granularity annotation network.
  • the processing unit is specifically configured to use the first feature network to use the following formula to perform feature extraction on words of the first granularity in the natural language text:
  • ENC z represents the first feature network
  • the first feature network is a Transformer model
  • ENC z () represents the processing operation performed by the first feature network
  • X represents the natural language text
  • Z X [z1,z2,...,zL] represents the label information
  • z1 to z1 sequentially represent the granularity of the first word to the Lth (last) word in the natural language text
  • Uz represents the first feature The first characteristic information output by the network.
  • the first processing result is a sequence containing one or more words; the processing unit is specifically configured to use the first processing network to compare the input first feature information and The first processing network processes the output words in the process of processing the first characteristic information to obtain the first processing result.
  • the target result output by the fusion network is a sequence containing one or more words; the processing unit is specifically configured to use the fusion network to process the first processing result, The second processing result and the words that have been output by the fusion network in the process of processing the first processing result and the second processing result to determine the target word to be output, and output the target word.
  • the converged network includes at least one LSTM network
  • the processing unit is specifically configured to use a vector obtained by combining the first processing result and the second processing result to input to the LSTM network;
  • the processing unit is specifically configured to use the LSTM network to calculate the probability of a word with a reference granularity to be output by using the following formula:
  • h t LSMT(h t-1 ,y t-1 ,v0,v1);
  • h t represents the hidden state variable in the LSMT network when the LSMT network processes the t-th word
  • h t-1 represents the hidden state variable in the LSMT network when the LSMT network processes the (t-1)-th word
  • Hidden state variable LSMT() represents the processing operation done by LSMT
  • the LMST network has currently output (t-1) words
  • the y t-1 represents the (t-1)th output of the fusion network Words
  • v0 represents the first processing result
  • v1 represents the second processing result
  • W z is a parameter matrix in the fusion network
  • is a hyperparameter
  • y 1:t-1 , X) is the probability of the word of the reference granularity (granularity z) currently to be output
  • t is an integer greater than 1.
  • the processing unit is specifically configured to use the fusion network to calculate the probability of the target word to be output by using the following formula:
  • y 1:t-1 ,X) represents the probability of outputting the target word y t at the reference granularity
  • y 1:t-1 ,X) represents Output the probability of the target word.
  • the embodiments of the present application provide another data processing device.
  • the data processing device includes: a processing unit for inputting training samples into a deep neural network for processing to obtain a prediction processing result; wherein, the deep neural network Including: a granular labeling network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network.
  • the processing includes: using the granular labeling network to determine the granularity of each word in the training sample Use the first feature network to perform feature extraction on words of the first granularity in the training sample, and output the obtained third feature information to the first processing network; use the second feature network to perform feature extraction on the training Perform feature extraction on words of the second granularity in the sample, and output the obtained fourth feature information to the second processing network; use the first processing network to perform target processing on the third feature information, and the obtained third
  • the processing result is output to the fusion network; the second processing network is used to perform the target processing on the fourth characteristic information, and the obtained fourth processing result is output to the fusion network;
  • the third processing result and the fourth processing result obtain the predicted processing result; the first granularity and the second granularity are different; the processing unit is further configured to, according to the predicted processing result and the standard result, Determine the loss corresponding to the training sample; the standard result is the processing result expected to be obtained by using the deep neural network to process the training sample; use the loss
  • the data processing device trains a deep neural network that can independently process words of different granularities, so as to obtain a deep neural network that can avoid the process of obtaining coarser-grained information from finer-grained information, and is simple to implement.
  • the first characteristic network and the second characteristic network have different architectures, and/or the first processing network and the second processing network have different architectures.
  • the input of the granular annotation network is the natural language text; the processing unit is specifically configured to use the granular annotation network to determine each of the natural language texts according to N types of granularities.
  • the granularity of words is used to obtain the annotation information of the natural language text, and the annotation information is output to the first feature network and the second feature network; wherein the annotation information is used to describe the granularity of each word Or the probability that each word belongs to the N types of granularities; N is an integer greater than 1;
  • the processing unit is specifically configured to process the words of the first granularity by using the first characteristic network to obtain the third characteristic information, where the third characteristic information is a vector or word representing the words of the first granularity matrix;
  • the processing unit is specifically configured to process the words of the second granularity by using the second characteristic network to obtain the fourth characteristic information, where the fourth characteristic information is a vector representing the words of the second granularity Or matrix.
  • the granular labeling network includes a long short-term memory network LSTM and a bidirectional long short-term memory network BiLSTM; the processing unit is specifically configured to use the granular labeling network to determine the natural language using the following formula The granularity of words in the text:
  • g l LSTM([h l ,z l-1 ;g l-1 ]);
  • BiLSTM() in the formula represents the processing operation of the LSTM
  • LSTM() represents the processing operation of the BiLSTM
  • x represents a word in the natural language text
  • x l represents the first natural language text x l words
  • h represents the hidden state variable in the BiLSMT network
  • h l , h l-1 , h l+1 in turn indicate that the BiLSMT network processes the lth word
  • the hidden state variable of the (l+1)th word is
  • g represents the hidden state variable in the LSTM network
  • g l and g l-1 respectively represent the hidden state when the LSMT network processes the lth word and the (l-1)th word in the natural language text variable.
  • z represents the probability that a word belongs to the reference granularity
  • z l-1 and z l respectively represent the probability that the lth word and the (l-1)th word in the natural language text belong to the reference granularity
  • the reference granularity is Any one of the N types of granularities
  • GS represents the Gumbel Softmax function
  • is a hyperparameter (temperature) in the Gumbel Softmax function
  • Wg is a parameter matrix, that is, a parameter matrix in the granularity annotation network.
  • the processing unit is specifically configured to use the first feature network to use the following formula to perform feature extraction on words of the first granularity in the natural language text:
  • ENC z represents the first feature network
  • the first feature network is a Transformer model
  • ENC z () represents the processing operation performed by the first feature network
  • X represents the natural language text
  • Z X [z1,z2,...,zL] represents the label information
  • z1 to z1 sequentially represent the granularity of the first word to the Lth (last) word in the natural language text
  • Uz represents the first feature The third characteristic information output by the network.
  • the first processing result is a sequence containing one or more words; the processing unit is specifically configured to use the first processing network to pair the input third characteristic information and The first processing network processes the output words in the process of processing the third characteristic information to obtain the third processing result.
  • the target result output by the fusion network is a sequence containing one or more words; the processing unit is specifically configured to use the fusion network to process the third processing result, The fourth processing result and the words that have been output by the fusion network in the process of processing the third processing result and the fourth processing result to determine the target word to be output, and output the target word.
  • the converged network includes at least one LSTM network; the processing unit is specifically configured to input a vector obtained by combining the third processing result and the fourth processing result into the LSTM The internet;
  • the LSTM network uses the following formula to calculate the probability of a word with a reference granularity to be output:
  • h t LSMT(h t-1 ,y t-1 ,v2,v3);
  • h t represents the hidden state variable in the LSMT network when the LSMT network processes the t-th word
  • h t-1 represents the hidden state variable in the LSMT network when the LSMT network processes the (t-1)-th word
  • Hidden state variable LSMT() represents the processing operation done by LSMT
  • the LMST network has currently output (t-1) words
  • the y t-1 represents the (t-1)th output of the fusion network Words
  • v2 represents the third processing result
  • v3 represents the fourth processing result
  • W z is a parameter matrix in the fusion network
  • is a hyperparameter
  • y 1:t-1 , X) is the probability of the word of the reference granularity (granularity z) currently to be output
  • t is an integer greater than 1.
  • y 1:t-1 ,X) represents the probability of outputting the target word y t at the reference granularity
  • y 1:t-1 ,X) represents Output the probability of the target word.
  • the processing unit is specifically configured to update the parameters of the at least one network by using the gradient value of the loss function relative to the at least one network included in the deep neural network; To calculate the loss between the predicted processing result and the standard result; wherein any one of the first feature network, the second feature network, the first processing network, and the second processing network During the network update process, the parameters of any one of the other three networks remain unchanged.
  • the embodiments of the present application provide yet another data processing device.
  • the data processing device includes: a processor, a memory, an input device, and an output device.
  • the memory is used to store code;
  • the code is used to execute the method provided in the first aspect or the second aspect, the input device is used to obtain the natural language text to be processed, and the output device is used to output the target result obtained by the processor processing the natural language text.
  • inventions of the present application provide a computer program product.
  • the computer program product includes program instructions that, when executed by a processor, cause the processor to execute the first aspect or the second aspect described above. method.
  • the embodiments of the present application provide a computer-readable storage medium, the computer storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to Perform the method of the above-mentioned first aspect or the above-mentioned second aspect.
  • Figures 1A to 1C are application scenarios of natural language processing systems
  • Fig. 2 is a flowchart of a natural language processing method provided by an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a deep neural network provided by an embodiment of this application.
  • FIG. 4 is a schematic structural diagram of a granular labeling network 301 provided by an embodiment of this application.
  • FIG. 5 is a schematic structural diagram of a feature network provided by an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a deep neural network provided by an embodiment of this application.
  • FIG. 7 is a flowchart of a training method provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a data processing device provided by an embodiment of this application.
  • FIG. 9 is a schematic structural diagram of a neural network processor provided by an embodiment of this application.
  • FIG. 10 is a block diagram of a partial structure of an intelligent terminal provided by an embodiment of the application.
  • FIG. 11 is a block diagram of a part of the structure of another data processing device provided by an embodiment of the application.
  • the network models used to process natural language processing tasks do not perform operations on words with different granularities in natural language texts. Separation. That is to say, in the currently adopted scheme, operations on words between different granularities are not decoupled.
  • a pooling operation is usually used to synthesize finer-grained features to form coarser-grained features.
  • the word-level and phrase-level features are integrated through the pooling operation to form sentence-level features. It can be understood that if the finer-grained features obtained are wrong, the coarser-grained features obtained from the finer-grained features will also be wrong.
  • the deep neural network can be analyzed or adjusted to realize the networks of operations of different granularities.
  • the deep neural network used in this application includes multiple decoupled sub-networks for processing words of different granularities. These sub-networks can be optimized in a targeted manner to ensure that operations at each granularity are controllable .
  • Reusable and transferable Operations at different granularities have different reusable or transferable characteristics.
  • sentence-level operations sentences translation or transformation
  • phrase or word-level operations have more field features.
  • the deep neural network since the deep neural network includes multiple independent sub-networks for processing words of different granularities, a part of the sub-networks obtained by training using samples in a certain field can be applied to other fields.
  • a natural language processing system includes user equipment and data processing equipment.
  • the user equipment may be a mobile phone, a personal computer, a tablet computer, a wearable device, a personal digital assistant, a game console, an information processing center, and other smart terminals.
  • the user equipment is the initiator of natural language data processing, and serves as the initiator of natural language processing tasks (for example, translation tasks, paraphrase tasks, etc.).
  • natural language processing tasks for example, translation tasks, paraphrase tasks, etc.
  • users initiate natural language processing tasks through the user equipment.
  • the paraphrase task is the task of transforming a natural language text into another text with the same meaning but different expressions as the natural language text. For example, "What makes the second world war happen" can be repeated as "What is the reason of world war II".
  • the data processing device may be a device or server with data processing functions such as a cloud server, a network server, an application server, and a management server.
  • the data processing device receives query sentences/voice/text questions from the smart terminal through an interactive interface, and then performs machine learning, deep learning, search, reasoning, and decision-making through a memory that stores data and a processor that performs data processing.
  • Language data processing in other ways.
  • the storage may be a general term including a database for local storage and storing historical data.
  • the database may be on a data processing device or on other network servers.
  • FIG. 1B shows another application scenario of the natural language processing system.
  • the smart terminal is directly used as a data processing device, directly receiving input from the user and directly processed by the hardware of the smart terminal itself.
  • the specific process is similar to that of FIG. 1A, and the above description can be referred to, which will not be repeated here.
  • the user equipment may be a local device 101 or 102
  • the data processing device may be an execution device 210
  • the data storage system 250 may be integrated on the execution device 210 or set on the cloud Or on other network servers.
  • FIG. 2 is a flowchart of a natural language processing method provided by an embodiment of the application. As shown in FIG. 2, the method may include:
  • the natural language text to be processed may be a sentence currently to be processed by the data processing device.
  • the data processing device can process the received natural language text or the natural language text obtained by recognizing voice sentence by sentence.
  • obtaining the natural language text to be processed may be that the data processing device receives data such as voice or text sent by the user equipment, and obtains the natural language text to be processed according to the received voice or text data.
  • the data processing device receives 2 sentences sent by the user device, the data processing device obtains the first sentence (natural language text to be processed), and uses the trained deep neural network to process the first sentence , Output and process the first sentence to get the result; get the second sentence (natural language text to be processed), use the trained deep neural network to process the second sentence, and output and process the second sentence to get the result .
  • obtaining the natural language text to be processed may be that the smart terminal directly receives data such as voice or text input by the user, and obtains the natural language text to be processed according to the received voice or text data.
  • the smart terminal receives 2 sentences input by the user, the smart terminal obtains the first sentence (natural language text to be processed), uses the trained deep neural network to process the first sentence, and outputs the processing The first sentence is the result; the second sentence (natural language text to be processed) is obtained, the second sentence is processed by the deep neural network obtained by training, and the second sentence is output and processed to obtain the result.
  • the deep neural network may include: a granular annotation network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network.
  • the data processing device uses the deep neural network to do the natural language text
  • the processing may include: using the granular annotation network to determine the granularity of each word in the natural language text; using the first feature network to perform feature extraction on the first granular word in the natural language text, and output the obtained first feature information to The first processing network; using the second feature network to perform feature extraction on words of the second granularity in the natural language text, and output the obtained second feature information to the second processing network; using the first processing network to Perform target processing on the first characteristic information, and output the obtained first processing result to the fusion network; use the second processing network to perform the target processing on the second characteristic information, and output the obtained second processing result to the fusion network Use the fusion network to fuse the first processing result and the second processing result to obtain the target result; the first granularity and the second gran
  • the first granularity and the second granularity may be any two different granularities among character level, word level, phrase level, and sentence level.
  • the granularity of a word refers to the granularity of the word in the natural language text (sentence).
  • the target processing can be translation, retelling, abstract generation, etc.
  • the target result is another natural language text obtained by processing the natural language text.
  • the target result is a natural language text obtained by translating the natural language text.
  • the target result is another natural language text obtained by retelling the natural language text.
  • the natural language text to be processed can be regarded as an input sequence, and the target result (another natural language text) obtained by the data processing device processing the natural language text can be regarded as a generated sequence.
  • the deep neural network may include N feature networks and N processing networks.
  • the N feature networks and the N processing networks have a one-to-one correspondence, and N is an integer greater than one.
  • a pair of corresponding feature network and processing network are used to process words of the same granularity.
  • the first feature network performs feature extraction on words of the first granularity in the natural language text to obtain first feature information
  • the first processing network performs target processing on the first feature information.
  • the deep neural network may also include features for feature extraction of words of other granularities (granularities other than the first granularity and the second granularity).
  • the deep neural network can also include target processing for the feature information of words of other granularities (granularities other than the first granularity and the second granularity) Processing network.
  • the number of feature networks included in the deep neural network and the number of processing networks are not limited. If the words in the natural language text are classified into N granularities, the deep neural network includes N feature networks and N processing networks. That is to say, if the words in the natural language text are classified according to N granularities, the deep neural network includes N feature networks and N feature networks.
  • the words in natural language text are divided into phrase-level words and sentence-level words, then the deep neural network includes two feature networks, one feature network is used to extract the feature of phrase-level words to obtain the feature information of phrase-level words Another feature network is used to extract feature information of sentence-level words to obtain feature information of sentence-level words; the deep neural network includes two processing networks, one processing network is used to target the feature information of phrase-level words Processing, another processing network is used to target the feature information of sentence-level words.
  • the deep neural network includes N feature networks and N processing networks
  • the N feature networks output N feature information
  • the N processing networks output N processing results
  • the fusion network is used to fuse the N
  • the processing result is the final output result.
  • the fusion network is not limited to fusing two processing results.
  • any two of the N feature networks perform feature extraction on words with different granularities in natural language text; any two of the N processing networks perform target processing on the feature information of words with different granularities.
  • any two characteristic networks of the N characteristic networks do not share parameters; any two of the N processing networks do not share parameters.
  • the target processing can be translation, retelling, abstract generation, etc.
  • the parameters of the first feature network and the second feature network are different, and the architectures adopted are the same or different.
  • the first feature network uses a deep neural network architecture
  • the second feature network uses a Transformer architecture.
  • the first processing network and the second processing network have different parameters and adopt the same or different architectures.
  • the first processing network uses a deep neural network architecture
  • the second processing network uses a Transformer architecture. It can be understood that the multiple feature networks included in the deep neural network may adopt different architectures, and the multiple processing networks included in the deep neural network may adopt different architectures.
  • the data processing device uses the mutually decoupled network in the deep neural network to process words of different granularity respectively, which can effectively improve the performance of processing natural processing tasks.
  • FIG. 3 is a schematic structural diagram of a deep neural network provided by an embodiment of the application.
  • the deep neural network may include N feature networks and N processing networks. To facilitate understanding, only two feature networks (the first feature Network and second characteristic network) and 2 processing networks (first processing network and second processing network).
  • 301 is a granular annotation network
  • 302 is a first feature network
  • 303 is a second feature network
  • 304 is a first processing network
  • 305 is a second processing network
  • 306 is a converged network.
  • the data processing equipment uses the deep neural network in Figure 3 to process natural language text as follows:
  • the granularity labeling network 301 determines the granularity of each word in the natural language text according to N types of granularities to obtain the labeling information of the natural language text, and outputs the labeling information to the first feature network 302 and the second feature network 303.
  • the input of the granular annotation network 301 is the natural language text to be processed; the output may be annotation information, or annotation information and the natural language text.
  • the input of the first feature network 302 and the input of the second feature network 303 are both the annotation information and the natural language text.
  • the annotation information is used to describe the granularity of each word in the natural language text or the probability that each word in the natural language text belongs to the N types of granularities; N is an integer greater than 1.
  • the granularity labeling network 301 labels the granularity to which each word (assuming the word is the basic processing unit) in the input natural language text (input sequence), that is, determines the label of each word in the natural language text. Assuming that we consider two granularities: phrase-level granularity and sentence-level granularity, the granularity of each word in the input natural language text (sentence) is determined to be one of these two granularities.
  • the granularity annotation network 301 determines the granularity of each word in the input natural language text "what makes the second world war happen", where words such as “what", “makes”, and “happen” are determined to be sentence-level Granularity, words such as "the”, “second”, “world”, and “war” are determined as phrase-level granularity. It is worth noting that the granularity of each word in the natural language text to be processed is not labeled with data (label), but the granularity annotation network 301 determines the granularity of each word in the input natural language text.
  • the first feature network 302 uses the input natural language text and annotation information to perform feature extraction, and outputs the obtained first feature information to the first processing network 304.
  • the first feature information is a vector or matrix representing words of the first granularity.
  • the input of the first feature network 302 is natural language text and tagging information.
  • the natural language text can be feature-extracted from the first-granularity words, and the vector or matrix representation of the first-granularity words in the natural language text can be obtained, that is, the The first feature information.
  • the second feature network 303 uses the input natural language text and annotation information to perform feature extraction, and outputs the obtained second feature information to the second processing network 305.
  • the second feature information is a vector or matrix representing words of the second granularity.
  • the input of the second feature network 303 is natural language text and tagging information, and the words of the second granularity in the natural language text can be feature extracted, and the vector or matrix representation of the words of the second granularity in the natural language text can be obtained, that is, the The second feature information.
  • the embodiment of the present application does not limit the order in which the data processing device performs step 313 and step 312. Step 313 and step 312 can be performed at the same time, or step 312 can be performed before step 313, or step 313 can be performed before step 312.
  • the first processing network 304 uses the input first characteristic information and the processing result output by the first processing network 304 in the process of processing the first characteristic information for processing to obtain the first processing result.
  • the first processing network 304 processes the input first feature information in a recursive manner (for example, translation, paraphrase, abstract extraction, etc.), that is, the first processing network 304 uses the output of the first feature network 302 (first The feature information) and the previously output processing result (sequence) are input, and the representation of the vector or matrix (the first processing result) is calculated through the deep neural network.
  • a recursive manner for example, translation, paraphrase, abstract extraction, etc.
  • the second processing network 305 uses the input second characteristic information and the processing result output by the second processing network 305 in the process of processing the second characteristic information for processing to obtain the second processing result.
  • the second processing network 305 processes the input second feature information in a recursive manner (for example, translation, paraphrase, abstract extraction, etc.), that is, the second processing network 305 uses the output of the second feature network 303 (second The feature information) and the previously output processing result (sequence) are input, and the representation of the vector or matrix is calculated through the deep neural network (the second processing result).
  • the embodiment of the present application does not limit the order in which the data processing device executes step 314 and step 315. Step 314 and step 315 can be executed simultaneously, or step 314 can be executed first and then step 315 can be executed, or step 315 can be executed before step 314 is executed.
  • the fusion network 306 uses the first processing result, the second processing result, and the processing results that the fusion network 306 has output in the process of processing the first processing result and the second processing result to determine the target word to be output, and output the target Words.
  • the target word is included in the first processing result or the second processing result.
  • the fusion network 306 can merge the output of processing networks of different granularities, that is, determine the granularity of the current word to be output and then determine the word to be output.
  • the first step is to determine the words to be output with "sentence level” granularity and output "what"; the second step to determine the words to be output with "sentence level” granularity and output "is”; repeat the previous operation until the final output sentence is completed (Corresponding to the target result) generation. It should be noted that the above steps 311 to 316 are all completed by deep neural network calculations.
  • the data processing device uses feature networks of different granularities and processing networks of different granularities to independently process words of different granularities, which can effectively improve the probability of obtaining correct results.
  • FIG. 4 is a schematic structural diagram of a granular labeling network 301 provided by an embodiment of this application.
  • the granular annotation network 301 includes a Long Short-Term Memory (LSTM) 402 and a Bi LSTM (Bi-directional LSTM) network 401. It can be seen from FIG. 4 that the granular labeling network 301 uses a multilayer LSTM network architecture.
  • LSTM Long Short-Term Memory
  • Bi LSTM Bi-directional LSTM
  • the input of LSTM401 is natural language text
  • the output of LSTM402 is labeling information, that is, the granularity label of each word or the probability that each word belongs to various granularities.
  • the granularity annotation network 301 is used to predict the granularity corresponding to each word in the input sentence (natural language text).
  • the BiLSTM network 401 is used to convert the input natural language text into a vector, which is used as the input of the next layer of the LSTM network 402; the LSTM network 402 calculates and outputs the probability that each word in the natural language text belongs to each granularity.
  • the labeling information can use the GS (Gumbel-Softmax) function instead of the commonly used Softmax operation.
  • GS Gumbel-Softmax
  • each word has a probability of belonging to each granularity, and this value is close to 0 or 1.
  • the following uses mathematical formulas to describe the manner in which the granularity annotation network 301 predicts the granularity of each word in the natural language text.
  • the mathematical formula corresponding to the processing process of BiLSTM network 401 is as follows:
  • g l LSTM([h l ,z l-1 ;g l-1 ]);
  • BiLSTM() in the formula represents the processing of a two-way recursive deep neural network
  • LSTM() represents the processing of a (one-way) recursive deep neural network
  • l represents the position index of the word
  • x represents the input sentence (natural language text)
  • X l represents the lth word in the input sentence x
  • h represents the hidden states in the BiLSMT network 401
  • h l , h l-1 in turn represent the BiLSMT network 401
  • g represents the hidden state variable in the (one-way) LSTM network, and its calculation process follows the calculation rules of the LSTM network.
  • g l and g l-1 respectively indicate that the LSTM network 402 processes the lth word and the (l)th word in the input sentence.
  • Hidden state variables for words are hidden state variables for words.
  • z represents the probability that the word belongs to a certain granularity (phrase-level granularity, sentence-level granularity or other granularity), z l-1 and z l respectively represent the lth word and (l-1)th word in the input sentence
  • GS represents the Gumbel Softmax function
  • is the hyperparameter (temperature) in the Gumbel Softmax function
  • Wg is the parameter matrix, that is, a parameter matrix in the granularity annotation network.
  • the granularity annotation network 301 uses the architecture of a multi-layer LSTM network to determine the granularity of each word in a natural language text, and can make full use of the granularity of the determined word to determine the granularity of a new word (word with a granularity to be determined), which is simple to implement and process efficient.
  • FIG. 5 is a schematic structural diagram of a first characteristic network 302 and a second characteristic network 303 provided by an embodiment of this application.
  • the first feature network 302 performs feature extraction on words of the first granularity in the natural language text
  • the second feature network 303 performs feature extraction on the natural language text.
  • the words of the second granularity in the text are feature extracted.
  • the network architectures adopted by the first feature network 302 and the second feature network 303 may be the same or different.
  • a feature network that processes words of a certain granularity can be understood as a feature network of that granularity, and feature networks of different granularities process words of different granularity.
  • the parameters of the first characteristic network 302 and the second characteristic network 303 are not shared, and the hyperparameter settings are different.
  • both the first feature network 302 and the second feature network 303 adopt the Transformer model.
  • This model is based on a multi-head self-attention mechanism, which processes input sentences (natural language text) at a certain granularity. Words, so as to construct a vector as the characteristic information of the granular words.
  • the first feature network 302 may only focus on the words of the first granularity in the input sentence (natural language text); the second feature network 303 may only focus on Input sentences (natural language text) in the second granularity of words.
  • the granular feature network 301 determines the probability that each word in the natural language text belongs to the aforementioned N types of granularities
  • the first feature network 302 can focus on the words of the first granularity in the input sentence (natural language text);
  • the feature network 303 can focus on the words of the second granularity in the input sentence (natural language text).
  • the first feature network 302 it focuses on words with a higher probability of belonging to the first granularity in the input sentence; for the second feature network 303, it focuses on words belonging to the second Words with higher probability of granularity. It can be understood that the higher the probability that a word belongs to the first granularity, the higher the attention of the first feature network 302 to the word.
  • the first feature network 302 can use a self-attention mechanism with a limited window (similar to a deep neural network mechanism, but its weight is still calculated by attention.
  • the first feature network 302 will focus on words at the first granularity in the input sentence and ignore words at other granularity levels.
  • the first feature network 302 can be a feature network with a phrase-level granularity. When extracting the features of each word, only Pay attention to the two adjacent words of the word, as shown in Figure 5.
  • the second feature network 303 can adopt the Self-Attention mechanism of the whole sentence, so as to be able to pay attention to the global information of the sentence.
  • the second feature network 303 can be sentence-level The granular feature network focuses on the entire input sentence when extracting the features of each word, as shown in Figure 5.
  • the second feature network 303 will focus on the second granular word in the input sentence , While ignoring words at other levels of granularity.
  • the Transformer model is a commonly used model in the field, and the working principle of the model will not be described in detail here.
  • the first feature network 302 can obtain the input sentence (natural language text).
  • the feature network at each granularity obtains the vector representation of the word at the granularity, denoted as Uz.
  • the processing operations implemented by the first feature network 302 and the second feature network 303 are described below with the aid of mathematical formulas.
  • the mathematical formulas corresponding to the processing operations implemented by the first feature network 302 and the second feature network 303 are as follows:
  • the input of the feature network is the input sentence X and the label information Z X.
  • the annotation information output by the granularity annotation network 301 is the granularity of each word in the natural language text
  • the annotation information of the input sentence input by the feature network is the annotation information output by the granularity annotation network 301.
  • the annotation information output by the granularity annotation network 301 is [1100001], and these binary values sequentially represent the granularity of the first word to the last word in the input sentence, 0 means word-level granularity, and 1 means sentence-level granularity.
  • the annotation information output by the granularity annotation network 301 is the probability that each word in the natural language text belongs to the above N types of granularities
  • the annotation information of the input sentence input by the feature network is obtained according to the annotation information output by the granularity annotation network 301 Label information.
  • the data processing device may further process the annotation information output by the granular annotation network 301 to obtain the annotation information that can be input to the feature network.
  • the data processing device uses the granularity at which each word in the natural language text belongs to the maximum probability as the granularity of each word. For example, if the probability that a word in the input sentence (natural language text) belongs to the phrase-level granularity and sentence-level granularity are 0.85 and 0.15, respectively, the granularity of the word is the phrase-level granularity. For another example, according to phrase-level granularity and sentence-level granularity, the granularity of each word in the natural language text is classified.
  • the annotation information output by the granularity annotation network 301 is [0.92 0.88 0.08 0.07 0.04 0.06 0.97], and the value in the annotation information In turn, it indicates the probability that the first word to the last word in the natural language text belong to the sentence-level granularity.
  • the data processing device can set the value less than 0.5 in the label information to 0, and the value greater than or equal to 0.5 to 1, to get The new label information [1100001] is input into the feature network.
  • the data processing device samples the natural language text according to the probability that each word in the natural language text belongs to the aforementioned N types of granularities, and obtains the annotation information of the natural language text by using the granularity of each word obtained by the sampling. And input to the feature network.
  • Each feature network included in the deep neural network independently processes words of different granularities, and uses networks of different architectures to process words of different granularities, with better feature extraction performance.
  • the processing performed by the processing network and the processing performed by the convergence network 306 will be introduced below in conjunction with the structures of the first feature network 302, the second feature network 303, the first processing network 304, the second processing network 305, and the converged network 306.
  • Fig. 6 is a schematic structural diagram of a deep neural network provided by an embodiment of the application, and Fig. 6 does not show a granular annotation network.
  • the input of the first processing network 304 is the first characteristic information output by the first characteristic network 302, and the first processing network 304 has outputted processing results (words) in the process of processing the first characteristic information;
  • the input of the second processing network 305 is the second feature information output by the second feature network 303, and the second processing network 305 outputs the processed results (words) that have been output in the process of processing the second feature information;
  • the input of the fusion network 306 is the first A processing result, a second processing result, and words that have been output in the process of processing the first processing result and the second processing result.
  • the output of the fusion network 306 is obtained by fusing the first processing result and the second processing result Target result.
  • the architectures adopted by the first processing network 304 and the second processing network 305 may be the same or different.
  • the first processing network 304 and the second processing network 305 may not share parameters.
  • a processing network that processes words of a certain granularity can be understood as a processing network of that granularity, and processing networks of different granularities process words of different granularity.
  • each granularity has a corresponding processing network.
  • the granularity of each word in a natural language text is divided into phrase-level granularity and sentence-level granularity.
  • Deep neural networks include a phrase-level granularity processing network and a sentence-level granularity processing network.
  • the processing networks of different granularities are decoupled, which means that they do not share parameters and can adopt different architectures.
  • the phrase-level granularity processing network uses a deep neural network architecture
  • the sentence-level granularity processing network uses a Transformer architecture.
  • the processing network can output one word at a time and the granularity of the word.
  • the processing network can be performed in a recursive manner, that is, the processing network of each granularity takes the output of the corresponding granular feature network and the words that have been output before as input, calculates the probability of multiple words to be output at present, and has the highest output probability The word and the label information corresponding to the word.
  • the processing network uses its input to calculate the probability of each word currently to be output, and performs sampling according to the probability of each word, and outputs the sampled word and the label information corresponding to the word.
  • the processing network uses its input to calculate the probability of each word currently to be output (that is, the probability of each word currently being output), and output the probability of each word currently to be output.
  • the processing network currently has F words to be output.
  • the processing network uses its input to calculate the probability of the first word to be output, the probability of the second word to be output, and the probability of the Fth word to be output. , And input these probabilities into the fusion network, and F is an integer greater than 1.
  • the label information corresponding to a word may be the probability that the word belongs to a certain granularity, or the granularity of the word, or the probability that the word belongs to various granularities.
  • the processing performed by the first processing network 302 may be as follows: In the first step, the first processing network 302 processes the input first feature information to predict the first word (a word) currently required to be output, and output the first word The label information corresponding to the first word; in the second step, the first processing network 302 processes the input first feature information and the first word to predict the second word (a word) that is currently required to be output, and output the The second word and the label information corresponding to the second word; the first processing network 302 processes the input first feature information, the first word, and the second word to predict the third word (a word ), output the third word and the label information corresponding to the third word; repeat the previous steps until the processing of the first processing result is completed.
  • each processing network included in the deep neural network can process the input feature information in a similar manner to the first processing network 302.
  • the input of a certain processing network is the characteristic information obtained by feature extraction of "a good geologist" by its corresponding characteristic network, and the processing network processes the input characteristic information, predicting the current need to output "a” and Output; the processing network processes the input feature information and the previously output "a”, predicting the current need to output "great” and output; the processing network processes the input feature information, the previous output "a” and “great” “For processing, predict the current need to output "geologist” and output.
  • the first processing network 304 receives the input of the first feature network 302 and the words it has output for calculation.
  • the calculation method is to use the Self-Attention mechanism with a limited window; the second processing network 305 receives the second feature
  • the input of the network 303 and the words that it has output are calculated, and the calculation method is to adopt the Self-Attention mechanism of the whole sentence range.
  • the processing result obtained by the processing network at each granularity is denoted as Vz, and z represents the index of the granularity level, namely the granularity z.
  • the first processing network 304 and the second processing network 305 may also adopt different architectures. The following describes the operations performed by the convergent network 305 on the processing results input by each processing network.
  • the fusion network 306 can merge the processing results output by the processing network at different granularities to obtain the target result.
  • the output of the fusion network 306 is a sequence containing words.
  • the input of the fusion network 306 is the processing results of each processing network (the first processing result and the second processing result) and the sequence that the fusion network 306 has output in the process of processing these processing results.
  • the operations performed by the fusion network 306 can be as follows: the fusion network 306 merges the processing results input by each processing network into a vector; inputs the vector to an LSTM network for processing to determine the current word to be output, that is, to determine the current word to be output.
  • the fusion network 306 outputs the target word currently to be output by the processing network of this granularity.
  • Said inputting the vector into an LSTM network for processing to determine the words of the current granularity to be output may be inputting the vector into an LSTM network for processing to determine the probability of the words of each granularity in the above N granularities being output, and then Determine the word of the current granularity to be output; among them, the word of the granularity to be output has the highest probability of being output currently.
  • the particle size is any one of the above-mentioned N particle sizes.
  • the target word is the word with the highest probability of being output among the multiple words currently to be output by the processing network of the granularity to be output.
  • the probability of the first word, the second word, and the third word to be output by the processing network of the reference granularity is 0.06, 0.8, 0.14, respectively, and the target word to be output by the processing network of the reference granularity is this
  • the second word is the word with the highest probability of being output. It can be understood that the fusion network 306 may first determine which granular words are currently to be output, and then output the words to be output by the processing network with this granularity.
  • the operations performed by the fusion network 306 can also be as follows: the fusion network 306 merges the processing results input by each processing network into a vector; inputs the vector to an LSTM network for processing to determine the current words to be output by each processing network The probability of each word being output; the fusion network 306 outputs the target word with the highest probability of being output among the words.
  • Each processing network refers to a processing network of each granularity.
  • the words currently to be output by the first processing network include “a”, “good” and “geologist”, and the words currently to be output by the second processing network include: “How”, “can”, “I” and “be”, the fusion network calculates the current probability of each of these 7 words being output, and outputs the word with the highest probability of being output among these 7 words.
  • the following describes how to calculate the probability of each word being output in each word currently to be output by the processing network with reference granularity.
  • the reference particle size is any one of the above-mentioned N particle sizes.
  • the (t-1) words already output by the fusion network 306 are denoted as [y 1 ,y 2 ,...,y t-1 ], and t is an integer greater than 1.
  • the vectors (processing results) output by the first processing network and the second processing network are v0 and v1, respectively.
  • the fusion network 306 combines these two vectors and the sequence output by the fusion network 306, and inputs the merged vector
  • the LSTM network performs processing to calculate the probability of words with a reference granularity to be output.
  • the converged network 306 includes the LSTM network.
  • the LSTM network can use the following formula to calculate the probability of words with a reference granularity to be output:
  • h t LSMT(h t-1 ,y t-1 ,v0,v1);
  • h t represents the hidden state variable in the LSMT network when the LSMT network processes the t-th word
  • LSMT() represents the processing operation performed by the LSMT
  • y t-1 represents the (t-1 ) Words
  • W z is a parameter matrix in the fusion network
  • is a hyperparameter
  • y 1:t-1 ,X) is the probability of a word of granularity z currently to be output.
  • the fusion network 306 can use a similar method to calculate the probability of currently outputting words of any one of the above N granularities.
  • the mixed probability model is used to calculate the probability of outputting the target word.
  • the target word is a word currently to be output by the processing network of the granularity z.
  • the formula for calculating the probability of outputting the target word is as follows:
  • y 1:t-1 ,X) represents the probability of outputting the target word y t on the granularity z
  • y 1:t-1 ,X) represents outputting the target The probability of the word.
  • y 1:t-1 ,X) can be given by the processing network.
  • the processing network of granularity z can input the probability of each word (word of granularity z) currently to be output to the fusion network, that is, the probability of each word in the words currently to be output by the processing network is output.
  • the input of the first processing network is the feature information obtained by the first feature network's feature extraction of "a good geologist", and the processing network processes the feature information to obtain the current probability of output "a",
  • the probability of "great” to be output and the probability of "geologist” to be output are output, and these words and the corresponding probability are input to the fusion network.
  • y 1:t-1 ,X) represents the probability of outputting "great” at the granularity z.
  • the fusion network 306 may first calculate the probability of words of each of the above N granularities to be output, then calculate the probability of each word currently to be output to be output, and finally, output the word with the highest probability of being output.
  • the foregoing embodiment describes the use of a deep neural network obtained by training to implement a natural language processing method.
  • the following describes how to train a required deep neural network.
  • FIG. 7 is a flowchart of a training method provided by an embodiment of the application. As shown in FIG. 7, the method may include:
  • the data processing device inputs the training samples to the deep neural network for processing, and obtains a prediction processing result.
  • the deep neural network includes: a granular labeling network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network.
  • the processing includes: using the granular labeling network to determine the value of each word in the training sample Granularity; Use the first feature network to perform feature extraction on words with the first granularity in the training sample, and output the obtained third feature information to the first processing network; Use the second feature network to perform feature extraction in the second training sample Perform feature extraction on the granular words, and output the obtained fourth feature information to the second processing network; use the first processing network to perform target processing on the third feature information, and output the obtained third processing result to the fusion network ; Use the second processing network to perform the target processing on the fourth characteristic information, and output the obtained fourth processing result to the fusion network; use the fusion network to fuse the third processing result and the fourth processing result to obtain the prediction Processing result; the first particle size and the second particle size are different.
  • the architectures of the first feature network and the second feature network are different, and/or the architectures of the first processing network and the second processing network are different.
  • the input of the granular annotation network is the natural language text
  • the granular annotation network is used to determine the granularity of each word in the natural language text according to N types of granularities to obtain the annotation information of the natural language text, and send it to the first feature network and
  • the second feature network outputs the labeling information; where the labeling information is used to describe the granularity of each word or the probability that each word belongs to the N kinds of granularities; N is an integer greater than 1.
  • the first feature network is used to perform feature extraction using the input natural language text and the annotation information, and output the obtained third feature information to the first processing network; wherein, the third feature information represents the first A vector or matrix of granular words; the first processing network is used to use the input third characteristic information and the processing result output by the first processing network as target processing to obtain the third processing result.
  • the fusion network outputs one word at a time, and the fusion network is used to use the third processing result, the fourth processing result, and the words that the fusion network has output in the process of processing the third processing result and the fourth processing result , Determine the target word to be output, and output the target word.
  • the data processing device determines the loss corresponding to the training sample according to the predicted processing result and the standard result.
  • the standard result is the expected processing result obtained by using the deep neural network to process the training sample.
  • each training sample corresponds to a standard result, so that the data processing device can calculate and use the deep neural network to process the loss of each training sample, thereby optimizing the deep neural network.
  • the following takes training a deep neural network to process retelling tasks as an example to introduce the training samples and standard results that can be used by the data processing device to train the deep neural network.
  • the granular annotation network 301 is obtained through end-to-end learning. Due to end-to-end learning, in order to ensure that the granular labeling network 301 can be differentiated, during the training process, the granular labeling network 301 actually gives the probability that each word belongs to a different granularity, rather than an absolute 0/1 label.
  • the data processing equipment trains the deep neural network to process different natural language processing tasks, and uses different training samples and standard results. For example, if the data processing device is trained to handle retelling tasks, training samples and standard results similar to those in Table 1 can be used. For another example, if the data processing device is trained to handle translation tasks, the training sample used is English text, and the standard result is the standard Chinese text corresponding to the training sample.
  • the data processing device uses the loss corresponding to the training sample to update the parameters of the deep neural network through an optimization algorithm.
  • data processing equipment can train deep neural networks to handle different natural language processing tasks.
  • the data processing equipment training deep neural network processes different natural language processing tasks, and the data processing equipment calculates the loss between the predicted processing result and the standard result differently, that is, the method of calculating the loss corresponding to the training sample is different.
  • the data processing device uses the loss corresponding to the training sample, and updating the parameters of the deep neural network through an optimization algorithm may be to use the gradient value of the loss function relative to at least one network included in the deep neural network, Update the parameters of the at least one network; the loss function is used to calculate the loss between the predicted processing result and the standard result; wherein, the first characteristic network, the second characteristic network, the first processing network, the second During the update process of any one of the processing networks, the parameters of any one of the other three networks remain unchanged.
  • an optimization algorithm such as a gradient descent algorithm
  • the deep neural network used in the foregoing embodiment is a network obtained by using the training method in FIG. 7. It should be understood that the structure and processing process of the deep neural network in FIG. 7 are the same as the deep neural network in the foregoing embodiment.
  • the data processing device trains a deep neural network that can independently process words of different granularities, so as to obtain a deep neural network that can avoid the process of obtaining coarser-grained information from finer-grained information, and is simple to implement.
  • FIG. 8 is a schematic structural diagram of a data processing device provided by an embodiment of the application. As shown in FIG. 8, the data processing device may include:
  • the obtaining unit 801 is configured to obtain the natural language text to be processed
  • the processing unit 802 is configured to process the natural language text by using the deep neural network obtained by training;
  • the output unit 803 is configured to output the target result obtained by processing the natural language text.
  • the deep neural network includes: a granular labeling network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network.
  • the processing includes: using the granular labeling network to determine each word in the natural language text Use the first feature network to perform feature extraction on words of the first granularity in the natural language text, and output the obtained first feature information to the first processing network; use the second feature network to perform feature extraction on the natural language text Perform feature extraction on words with the second granularity in the second granularity, and output the obtained second feature information to the second processing network; use the first processing network to process the first feature information, and output the obtained first processing result to the Convergence network; use the second processing network to process the second characteristic information, and output the obtained second processing result to the fusion network; use the fusion network to fuse the first processing result and the second processing result to obtain the Target result; the first particle size and the second particle size are different.
  • the processing unit 802 may be a central processing unit (Central Processing Unit, CPU) in a data processing device, a neural network processor (Neural-network Processing Unit, NPU), or other types of processors.
  • the output unit 803 may be a display, a display screen, an audio device, etc.
  • the target result may be another natural language text obtained from the natural language text, and the obtained natural language text is displayed on the display screen of the data processing device.
  • the target result can be a voice corresponding to another natural language text obtained from the natural language text, and the audio device in the data processing device plays the voice.
  • the processing unit 802 is also used to input training samples into the deep neural network for processing to obtain prediction processing results; according to the prediction processing results and standard results, determine the loss corresponding to the training samples;
  • the standard result is the processing result expected to be obtained by using the deep neural network to process the training sample; using the loss corresponding to the training sample, the parameters of the deep neural network are updated through an optimization algorithm.
  • the deep neural network includes: a granular labeling network, a first feature network, a second feature network, a first processing network, a second processing network, and a fusion network.
  • the processing includes: using the granular labeling network to determine the value of each word in the training sample Granularity; Use the first feature network to perform feature extraction on words with the first granularity in the training sample, and output the obtained third feature information to the first processing network; Use the second feature network to perform feature extraction in the second training sample Perform feature extraction on the granular words, and output the obtained fourth feature information to the second processing network; use the first processing network to perform target processing on the third feature information, and output the obtained third processing result to the fusion network ; Use the second processing network to perform the target processing on the fourth characteristic information, and output the obtained fourth processing result to the fusion network; use the fusion network to fuse the third processing result and the fourth processing result to obtain the prediction Processing result; the first particle size and the second particle size are different.
  • Deep Neural Network can be understood as a neural network with many hidden layers.
  • the "many” here has no special metric.
  • the essence of the multi-layer neural network and deep neural network we often say The above is the same thing.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
  • the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the Lth layer is defined as Note that the input layer has no W parameter.
  • more hidden layers allow the network to better describe complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks.
  • FIG. 9 is a schematic structural diagram of a neural network processor provided by an embodiment of the application.
  • the neural network processor NPU 90NPU is mounted on the main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks (for example, natural language processing tasks).
  • the core part of the NPU is the arithmetic circuit 90, and the arithmetic circuit 903 is controlled by the controller 904 to extract matrix data from the memory and perform multiplication operations.
  • the arithmetic circuit 903 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 903 is a two-dimensional systolic array. The arithmetic circuit 903 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 903 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to matrix B from the weight memory 902 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit takes the matrix A data and matrix B from the input memory 901 to perform matrix operations, and the partial or final result of the obtained matrix is stored in the accumulator 908.
  • the unified memory 906 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 902 through the direct memory access controller (DMAC) 905.
  • the input data is also transferred to the unified memory 906 through the DMAC.
  • DMAC direct memory access controller
  • the Bus Interface Unit (BIU) 510 is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer) 909.
  • the bus interface unit 510 is also used for the instruction fetch memory 909 to obtain instructions from the external memory, and also used for the storage unit access controller 905 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 906 or the weight data to the weight memory 902 or the input data to the input memory 901.
  • the vector calculation unit 907 has multiple arithmetic processing units, if necessary, further processing the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector calculation unit 907 can store the processed output vector in the unified buffer 906.
  • the vector calculation unit 907 may apply a nonlinear function to the output of the arithmetic circuit 903, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 907 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 903, for example for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 909 connected to the controller 904 is used to store instructions used by the controller 904;
  • the unified memory 906, the input memory 901, the weight memory 902, and the fetch memory 909 are all On-Chip memories.
  • each layer in the deep neural network shown in FIG. 3 may be executed by the matrix calculation unit 212 or the vector calculation unit 907.
  • NPU is used to implement a natural language processing method and training method based on a deep neural network, which can greatly improve the efficiency of processing natural language processing tasks and training a deep neural network of a data processing device.
  • FIG. 10 is a block diagram of a partial structure of an intelligent terminal provided by an embodiment of the application.
  • the smart terminal includes: a radio frequency (RF) circuit 1010, a memory 1020, an input unit 1030, a display unit 1040, a sensor 1050, an audio circuit 1060, a wireless fidelity (WiFi) module 1070, a system on chip (System On Chip, SoC) 1080 and power supply 1090 and other components.
  • RF radio frequency
  • the memory 1020 includes DDR memory, of course, may also include high-speed random access memory, or include other storage units such as non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • non-volatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the structure of the smart terminal shown in FIG. 10 does not constitute a limitation on the smart terminal, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the RF circuit 1010 can be used for receiving and sending signals during the process of sending and receiving information or talking. In particular, after receiving the downlink information of the base station, it is processed by SoC 1080; in addition, the designed uplink data is sent to the base station.
  • the RF circuit 1010 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
  • the RF circuit 1010 can also communicate with the network and other devices through wireless communication.
  • the above wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Messaging Service
  • the memory 1020 may be used to store software programs and modules.
  • the SoC 1080 runs the software programs and modules stored in the memory 1020 to execute various functional applications and data processing of the smart terminal.
  • the memory 1020 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, a translation function, a retelling function, etc.), etc.;
  • the data storage area can store data (such as audio data, phone book, etc.) created according to the use of the smart terminal.
  • the input unit 1030 can be used to receive input natural language text and voice data, and generate key signal inputs related to user settings and function control of the smart terminal.
  • the input unit 1030 may include a touch panel 1031 and other input devices 1032.
  • the touch panel 1031 also known as a touch screen, can collect user touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 1031 or near the touch panel 1031. Operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1031 is used to receive the natural language text input by the user and input the natural language text into the SoC1080.
  • the touch panel 1031 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it Give SoC 1080, and can receive commands from SoC 1080 and execute them.
  • the touch panel 1031 can be realized by various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 1030 may also include other input devices 1032.
  • other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, joystick, touch screen, microphone, etc.
  • the microphone included in the input device 1032 can receive the voice data input by the user and input the voice data to the SoC1080.
  • the SoC 1080 runs the software programs and modules stored in the memory 1020 to execute the data processing method provided in this application to process the natural language text input by the input unit 1030 to obtain the target result. SoC 1080 may also convert the voice data input by the input unit 1030 into natural language text, and then execute the data processing method provided in this application to process the natural language text to obtain the target result.
  • the display unit 1040 may be used to display information input by the user or information provided to the user and various menus of the smart terminal.
  • the display unit 1040 may include a display panel 1041, and optionally, the display panel 1041 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
  • the display unit 1040 can be used to display the target result obtained by the SoC 1080 processing natural language text. Further, the touch panel 1031 can cover the display panel 1041.
  • the touch panel 1031 When the touch panel 1031 detects a touch operation on or near it, it is sent to SoC 1080 to determine the type of touch event, and then SoC 1080 displays the touch event according to the type of touch event.
  • the display panel 1041 provides corresponding visual output.
  • the touch panel 1031 and the display panel 1041 are used as two independent components to implement the input and input functions of the smart terminal, in some embodiments, the touch panel 1031 and the display panel 1041 can be integrated And realize the input and output functions of the intelligent terminal.
  • the smart terminal may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor can include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 1041 according to the brightness of the ambient light.
  • the proximity sensor can close the display panel 1041 and the display panel 1041 when the smart terminal is moved to the ear. / Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used to identify smart terminal posture applications (such as horizontal and vertical screen switching, Related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that can be configured in smart terminals, here No longer.
  • smart terminal posture applications such as horizontal and vertical screen switching, Related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.
  • other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that can be configured in smart terminals, here No longer.
  • the audio circuit 1060, the speaker 1061, and the microphone 1062 can provide an audio interface between the user and the smart terminal.
  • the audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the speaker 1061 converts it into a sound signal for output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, which is then output by the audio circuit 1060.
  • the audio data is converted into audio data, and then the audio data is output to SoC 1080 for processing, and then sent to another smart terminal through the RF circuit 1010, or the audio data is output to the memory 1020 for further processing.
  • WiFi is a short-distance wireless transmission technology.
  • the smart terminal can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 1070. It provides users with wireless broadband Internet access.
  • FIG. 10 shows the WiFi module 1070, it is understandable that it is not a necessary component of the smart terminal, and can be omitted as needed without changing the essence of the invention.
  • SoC 1080 is the control center of the intelligent terminal. It uses various interfaces and lines to connect the various parts of the entire intelligent terminal. By running or executing software programs and/or modules stored in the memory 1020, and calling data stored in the memory 1020, Perform various functions of the smart terminal and process data, thereby monitoring the smart terminal as a whole.
  • SoC 1080 may include multiple processing units, such as CPUs or various service processors; SoC 1080 may also integrate application processors and modem processors, where the application processor mainly processes operating systems, user interfaces, and For application programs, the modem processor mainly deals with wireless communication. It is understandable that the above modem processor may not be integrated into SoC 1080.
  • the smart terminal also includes a power supply 1090 (such as a battery) for supplying power to various components.
  • a power supply 1090 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the SoC 1080 through a power management system, so that functions such as charging, discharging, and power management are realized through the power management system.
  • the smart terminal may also include a camera, a Bluetooth module, etc., which will not be repeated here.
  • Fig. 11 is a block diagram of a partial structure of a data processing device provided by an embodiment of the present application.
  • the data processing device 1100 may include a processor 1101, a memory 1102, an input device 1103, an output device 1104, and a bus 1105.
  • the processor 1101, the memory 1102, the input device 1103, and the output device 1104 realize the communication connection between each other through the bus 1105.
  • the processor 1101 may adopt a general CPU, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits for executing related programs to implement the technology provided by the embodiments of the present invention Program.
  • the processor 1101 corresponds to the processing unit 802 in FIG. 8.
  • the memory 1102 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 1102 may store an operating system and other application programs.
  • the program code used to implement the modules and components of the data processing device provided in the embodiment of the present application through software or firmware, or the program code used to implement the foregoing method provided in the method embodiment of the present application is stored in the memory 1102, And the processor 1101 reads the code in the memory 1102 to execute operations required by the modules and components included in the data processing device, or execute the above-mentioned methods provided in the embodiments of the present application.
  • the input device 1103, corresponding to the acquiring unit 801, is used to input natural language text to be processed by the data processing device.
  • the output device 1104, corresponding to the output unit 803, is used to output the target result obtained by the data processing device.
  • the bus 1105 may include a path for transferring information between various components of the data processing device (for example, the processor 1101, the memory 1102, the input device 1103, and the output device 1104).
  • the data processing device 1100 shown in FIG. 11 only shows the processor 1101, the memory 1102, the input device 1103, the output device 1104, and the bus 1105, in the specific implementation process, those skilled in the art should understand that, The data processing device 1100 also includes other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the data processing device 1100 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the data processing device 1100 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 11.
  • An embodiment of the present application provides a computer-readable storage medium.
  • the above-mentioned computer-readable storage medium stores a computer program.
  • the above-mentioned computer program includes software program instructions. When the above-mentioned program instructions are executed by a processor in a data processing device, the foregoing embodiments are implemented.
  • the data processing method and/or training method in.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available medium integrated servers, data centers, and the like.
  • the usable medium may be a magnetic medium (eg, floppy disk, hard disk, magnetic tape), optical medium (eg, DVD), or semiconductor medium (eg, solid state disk (SSD)), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé de traitement du langage naturel, un procédé d'apprentissage et un dispositif de traitement des données se référant au domaine de l'intelligence artificielle. Ledit procédé comprend les étapes consistant : à obtenir un texte en langage naturel à traiter ; et à traiter le texte en langage naturel au moyen d'un réseau neuronal profond appris, et à sortir un résultat cible obtenu par traitement du texte en langage naturel, le réseau neuronal profond comprenant : un réseau de marquage de granularité, un premier réseau de caractéristiques, un second réseau de caractéristiques, un premier réseau de traitement, un second réseau de traitement et un réseau de fusion. Selon la présente invention, le dispositif de traitement des données fait intervenir des réseaux découplés les uns des autres pour traiter des mots de granularités différentes dans un texte en langage naturel, ce qui permet d'améliorer efficacement la performance du traitement d'une tâche de traitement du langage naturel.
PCT/CN2019/114146 2019-01-18 2019-10-29 Procédé de traitement du langage naturel, procédé d'apprentissage et dispositif de traitement des données WO2020147369A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910108559.9 2019-01-18
CN201910108559.9A CN109902296B (zh) 2019-01-18 2019-01-18 自然语言处理方法、训练方法及数据处理设备

Publications (1)

Publication Number Publication Date
WO2020147369A1 true WO2020147369A1 (fr) 2020-07-23

Family

ID=66944544

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114146 WO2020147369A1 (fr) 2019-01-18 2019-10-29 Procédé de traitement du langage naturel, procédé d'apprentissage et dispositif de traitement des données

Country Status (2)

Country Link
CN (1) CN109902296B (fr)
WO (1) WO2020147369A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116032798A (zh) * 2022-12-28 2023-04-28 天翼云科技有限公司 一种针对零信任身份授权的自动化测试方法及装置

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902296B (zh) * 2019-01-18 2023-06-30 华为技术有限公司 自然语言处理方法、训练方法及数据处理设备
CN110472063B (zh) * 2019-07-12 2022-04-08 新华三大数据技术有限公司 社交媒体数据处理方法、模型训练方法及相关装置
CN112329465B (zh) * 2019-07-18 2024-06-25 株式会社理光 一种命名实体识别方法、装置及计算机可读存储介质
CN110705273B (zh) * 2019-09-02 2023-06-13 腾讯科技(深圳)有限公司 基于神经网络的信息处理方法及装置、介质和电子设备
CN110837738B (zh) * 2019-09-24 2023-06-30 平安科技(深圳)有限公司 相似问识别方法、装置、计算机设备及存储介质
CN110674783B (zh) * 2019-10-08 2022-06-28 山东浪潮科学研究院有限公司 一种基于多级预测架构的视频描述方法及系统
CN111444686B (zh) * 2020-03-16 2023-07-25 武汉中科医疗科技工业技术研究院有限公司 医学数据标注方法、装置、存储介质及计算机设备
CN112488290B (zh) * 2020-10-21 2021-09-07 上海旻浦科技有限公司 具有依赖关系的自然语言多任务建模、预测方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160162478A1 (en) * 2014-11-25 2016-06-09 Lionbridge Techologies, Inc. Information technology platform for language translation and task management
CN107145483A (zh) * 2017-04-24 2017-09-08 北京邮电大学 一种基于嵌入式表示的自适应中文分词方法
CN107797985A (zh) * 2017-09-27 2018-03-13 百度在线网络技术(北京)有限公司 建立同义鉴别模型以及鉴别同义文本的方法、装置
CN108268643A (zh) * 2018-01-22 2018-07-10 北京邮电大学 一种基于多粒度lstm网络的深层语义匹配实体链接方法
CN109902296A (zh) * 2019-01-18 2019-06-18 华为技术有限公司 自然语言处理方法、训练方法及数据处理设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635949B2 (en) * 2015-07-07 2020-04-28 Xerox Corporation Latent embeddings for word images and their semantics
CN107918782B (zh) * 2016-12-29 2020-01-21 中国科学院计算技术研究所 一种生成描述图像内容的自然语言的方法与系统
EP3376400A1 (fr) * 2017-03-14 2018-09-19 Fujitsu Limited Réglage de contexte dynamique dans des modèles de langage
CN108460089B (zh) * 2018-01-23 2022-03-01 海南师范大学 基于Attention神经网络的多元特征融合中文文本分类方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160162478A1 (en) * 2014-11-25 2016-06-09 Lionbridge Techologies, Inc. Information technology platform for language translation and task management
CN107145483A (zh) * 2017-04-24 2017-09-08 北京邮电大学 一种基于嵌入式表示的自适应中文分词方法
CN107797985A (zh) * 2017-09-27 2018-03-13 百度在线网络技术(北京)有限公司 建立同义鉴别模型以及鉴别同义文本的方法、装置
CN108268643A (zh) * 2018-01-22 2018-07-10 北京邮电大学 一种基于多粒度lstm网络的深层语义匹配实体链接方法
CN109902296A (zh) * 2019-01-18 2019-06-18 华为技术有限公司 自然语言处理方法、训练方法及数据处理设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116032798A (zh) * 2022-12-28 2023-04-28 天翼云科技有限公司 一种针对零信任身份授权的自动化测试方法及装置

Also Published As

Publication number Publication date
CN109902296B (zh) 2023-06-30
CN109902296A (zh) 2019-06-18

Similar Documents

Publication Publication Date Title
WO2020147369A1 (fr) Procédé de traitement du langage naturel, procédé d'apprentissage et dispositif de traitement des données
CN110599557B (zh) 图像描述生成方法、模型训练方法、设备和存储介质
US10956771B2 (en) Image recognition method, terminal, and storage medium
KR102360659B1 (ko) 기계번역 방법, 장치, 컴퓨터 기기 및 기억매체
KR102646667B1 (ko) 이미지 영역을 찾기 위한 방법, 모델 훈련 방법 및 관련 장치
CN107943860B (zh) 模型的训练方法、文本意图的识别方法及装置
US11977851B2 (en) Information processing method and apparatus, and storage medium
WO2020108483A1 (fr) Procédé d'apprentissage de modèle, procédé de traduction machine, dispositif informatique et support de stockage
CN111816159B (zh) 一种语种识别方法以及相关装置
CN111898636B (zh) 一种数据处理方法及装置
CN113821589B (zh) 一种文本标签的确定方法及装置、计算机设备和存储介质
CN110162600B (zh) 一种信息处理的方法、会话响应的方法及装置
CN113821720A (zh) 一种行为预测方法、装置及相关产品
CN114065900A (zh) 数据处理方法和数据处理装置
CN112862021B (zh) 一种内容标注方法和相关装置
CN116975295B (zh) 一种文本分类方法、装置及相关产品
CN110019648A (zh) 一种训练数据的方法、装置及存储介质
CN115795025A (zh) 一种摘要生成方法及其相关设备
CN114840563B (zh) 一种字段描述信息的生成方法、装置、设备及存储介质
CN115866291A (zh) 一种数据处理方法及其装置
CN113569043A (zh) 一种文本类别确定方法和相关装置
CN114840499B (en) Method, related device, equipment and storage medium for generating table description information
CN117057345B (zh) 一种角色关系的获取方法及相关产品
CN115906060A (zh) 一种数据处理方法及相关设备
CN116028632A (zh) 一种领域语言模型的确定方法和相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19910183

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19910183

Country of ref document: EP

Kind code of ref document: A1