WO2020211720A1 - 数据处理方法和代词消解神经网络训练方法 - Google Patents

数据处理方法和代词消解神经网络训练方法 Download PDF

Info

Publication number
WO2020211720A1
WO2020211720A1 PCT/CN2020/084432 CN2020084432W WO2020211720A1 WO 2020211720 A1 WO2020211720 A1 WO 2020211720A1 CN 2020084432 W CN2020084432 W CN 2020084432W WO 2020211720 A1 WO2020211720 A1 WO 2020211720A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
training
word
positive
neural network
Prior art date
Application number
PCT/CN2020/084432
Other languages
English (en)
French (fr)
Inventor
张金超
孟凡东
周杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020211720A1 publication Critical patent/WO2020211720A1/zh
Priority to US17/339,933 priority Critical patent/US20210294972A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of computer technology, and relates to a data processing method, device, computer readable storage medium and computer equipment, and a pronoun resolution neural network training method, device, computer readable storage medium and computer equipment.
  • pronoun resolution technology refers to the given text content to be detected, and finds the candidate replacement words referred to by the pronoun through algorithm positioning.
  • the current solution to the problem of pronoun resolution is to model the problem of pronoun resolution through a neural network, and obtain the target candidate substitute words referred to by the pronoun through neural network prediction.
  • the current neural network directly classifies the pronoun and the corresponding candidate substitute words to obtain the target candidate substitute words, resulting in low accuracy of pronoun resolution.
  • the embodiments of the present application provide a data processing method, device, computer readable storage medium and computer equipment that can improve the accuracy of pronoun resolution, as well as a pronoun resolution neural network training method, device, computer readable storage medium and Computer equipment.
  • a data processing method including:
  • the pronoun resolution neural network performs positive example iterative processing based on the first feature and the second feature to obtain the corresponding positive example feature vector modulus, and performs negative example iterative processing based on the first feature and second feature to obtain the corresponding negative feature vector modulus.
  • Example feature vector modulus length and counter-example feature vector modulus length are calculated to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set;
  • a data processing device which includes:
  • the to-be-detected text acquisition module is used to acquire the to-be-detected text, and determine the contextual word set and candidate replacement word set corresponding to the to-be-detected word in the to-be-detected text;
  • the feature extraction module is used to input the context word set and the candidate replacement word set into the pronoun resolution neural network.
  • the pronoun resolution neural network performs feature extraction on the context word set and the candidate replacement word set to obtain the corresponding first feature and second feature ;
  • Iterative processing module used for pronoun resolution neural network to perform positive example iterative processing based on the first feature and second feature to obtain the corresponding positive example feature vector modulus, and perform negative example iterative processing based on the first feature and second feature to obtain the corresponding negative example feature
  • the vector modulus length is calculated according to the positive example feature vector modulus length and the negative example feature vector modulus length to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set;
  • the target substitute word determination module is used to determine the target substitute word according to the substitution possibility corresponding to each candidate substitute word;
  • the target substitute word insertion module is used to insert the target substitute word into the text to be detected according to the position corresponding to the word to be detected to obtain the target text.
  • a computer device which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the following steps when executing the program:
  • the pronoun resolution neural network performs positive example iterative processing based on the first feature and the second feature to obtain the corresponding positive example feature vector modulus, and performs negative example iterative processing based on the first feature and second feature to obtain the corresponding negative feature vector modulus.
  • Example feature vector modulus length and counter-example feature vector modulus length are calculated to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set;
  • a computer-readable storage medium is provided with a computer program stored thereon, and when the computer program is executed by a processor, the processor executes the following steps:
  • the pronoun resolution neural network performs positive example iterative processing based on the first feature and the second feature to obtain the corresponding positive example feature vector modulus, and performs negative example iterative processing based on the first feature and second feature to obtain the corresponding negative feature vector modulus.
  • Example feature vector modulus length and counter-example feature vector modulus length are calculated to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set;
  • a pronoun resolution neural network training method which includes:
  • the training text has a corresponding standard training text label
  • the initial pronoun resolution neural network performs positive iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs counter-example iterative processing based on the first training feature and the second training feature to obtain the corresponding negative training
  • the feature vector modulus length is calculated according to the positive example training feature vector modulus length and the negative example training feature vector modulus length to obtain the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set;
  • the model parameters of the initial pronoun resolution neural network are adjusted until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • a pronoun resolution neural network training device which includes:
  • the training text acquisition module is used to acquire training text, and the training text has a corresponding standard training text label;
  • the training text processing module is used to determine the training context word set corresponding to the word to be detected in the training text and the training candidate replacement word set;
  • the training feature representation module is used to input the training context word set and the training candidate replacement word set into the initial pronoun resolution neural network, and the initial pronoun resolution neural network respectively performs feature extraction on the training context word set and the training candidate replacement word set to obtain the corresponding The first training feature and the second training feature;
  • the training feature iterative processing module is used for the initial pronoun resolution neural network to perform positive example iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and proceed according to the first training feature and the second training feature
  • the counter example iterative processing obtains the corresponding counter example training feature vector modulus length, and calculates the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set according to the positive training feature vector modulus length and the counter example training feature vector modulus length;
  • the training loss value calculation module is used to calculate the training loss value according to the training substitution possibility corresponding to each training candidate replacement word and the corresponding standard training text label;
  • the neural network training module is used to adjust the model parameters of the initial pronoun resolution neural network according to the training loss value until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • a computer device which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the following steps when executing the program:
  • the training text has a corresponding standard training text label
  • the initial pronoun resolution neural network performs positive iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs counter-example iterative processing based on the first training feature and the second training feature to obtain the corresponding negative training
  • the feature vector modulus length is calculated according to the positive example training feature vector modulus length and the negative example training feature vector modulus length to obtain the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set;
  • the model parameters of the initial pronoun resolution neural network are adjusted until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • a computer-readable storage medium is provided with a computer program stored thereon, and when the computer program is executed by a processor, the processor executes the following steps:
  • the training text has a corresponding standard training text label
  • the initial pronoun resolution neural network performs positive iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs counter-example iterative processing based on the first training feature and the second training feature to obtain the corresponding negative training
  • the feature vector modulus length is calculated according to the positive example training feature vector modulus length and the negative example training feature vector modulus length to obtain the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set;
  • the model parameters of the initial pronoun resolution neural network are adjusted until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • the pronoun resolution neural network can make good use of the context in the text to be detected
  • the word sequence, and, the feature corresponding to the candidate replacement word, through the positive example iterative processing and the negative example iterative processing on the features corresponding to the context word sequence and the candidate replacement word, the corresponding positive and negative feature vector modulus lengths are obtained
  • the substitution probability corresponding to each candidate substitution word in the candidate substitution word set is obtained.
  • the pronoun resolution neural network can well integrate the features corresponding to the context word sequence and the candidate replacement word, according to the features corresponding to the context word sequence and the candidate replacement word, the probability of substitution corresponding to the candidate replacement word is calculated. This feature is in addition to the word sequence.
  • the characteristics of the word sequence also include the characteristics of the word sequence corresponding to the word sequence, which can well solve the problem of sparseness at the data level, thereby improving the accuracy of the replacement probability of each candidate replacement word in the candidate replacement word set, thereby improving the pronoun resolution Accuracy.
  • Figure 1 is an application environment diagram of a data processing method or a pronoun elimination neural network training method in an embodiment
  • Figure 2 is a schematic flow chart of a data processing method in an embodiment
  • FIG. 3 is a schematic flowchart of a step of determining the context word set and candidate replacement word set corresponding to the word to be detected in the text to be detected in an embodiment
  • FIG. 4 is a schematic flowchart of the feature extraction step of the pronoun resolution neural network in an embodiment
  • FIG. 5 is a schematic flowchart of a positive example iterative processing step in an embodiment
  • FIG. 6 is a schematic diagram of code implementation of positive example iterative processing or negative example iterative processing in an embodiment
  • FIG. 7 is a schematic flowchart of a counter-example iterative processing step in an embodiment
  • Fig. 8 is a schematic flowchart of a neural network training method for pronoun resolution in an embodiment
  • FIG. 9 is a schematic diagram of the network structure of the pronoun resolution neural network in an embodiment
  • 10 is a schematic diagram of comparison of verification results of the pronoun resolution neural network in an embodiment
  • Figure 11 is a structural block diagram of a data processing device in an embodiment
  • Figure 12 is a structural block diagram of a text acquisition module to be detected in an embodiment
  • FIG. 13 is a structural block diagram of a neural network training device for pronoun elimination in an embodiment
  • Fig. 14 is a structural block diagram of a computer device in an embodiment.
  • the embodiments of the present application provide an efficient method for processing natural language containing zero-referential problems. For details, please refer to the following embodiments.
  • Fig. 1 is an application environment diagram of a data processing method in an embodiment.
  • the data processing method is applied to a data processing system.
  • the data processing system includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer.
  • the server 120 may be implemented as an independent server or a server cluster composed of multiple servers.
  • the data processing method provided in the embodiments of the present application can be executed by any device having a processor and a memory.
  • the device can independently complete the data processing method provided in the embodiment of this application.
  • the device can cooperate with other devices to jointly complete the data processing method.
  • the storage server cluster and the computing server cluster cooperate with each other to complete the data processing method provided in the embodiment of the present application.
  • the terminal 110 may send the text to be detected to the server 120, and the server 120 obtains the text to be detected, determines the context word set and the candidate replacement word set corresponding to the word to be detected in the text to be detected, and combines the context word set and the candidate replacement word
  • the set is input into the pronoun resolution neural network.
  • the pronoun resolution neural network performs feature extraction on the context word set and candidate replacement word set to obtain the corresponding first and second features.
  • the pronoun resolution neural network performs positive example iterative processing based on the first feature and the second feature to obtain the corresponding positive example feature vector modulus, and performs negative example iterative processing based on the first feature and second feature to obtain the corresponding negative feature vector modulus.
  • Example feature vector modulus length and counter-example feature vector modulus length are calculated to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set, and the target substitution word is determined according to the substitution probability corresponding to each candidate substitution word, and the position corresponding to the word to be detected is determined Insert the target substitute word into the text to be detected to obtain the target text.
  • the server 120 sends the target text to the terminal 110 for display.
  • FIG. 1 may also be an application environment diagram of the neural network training method for pronoun elimination.
  • the pronoun resolution neural network training method is applied to the pronoun resolution neural network training system.
  • the pronoun resolution neural network training system includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer.
  • the server 120 may be implemented as an independent server or a server cluster composed of multiple servers.
  • the terminal 110 may send the training text to the server 120, and the server 120 obtains the training text, the training text has a corresponding standard training text label, and determines the training context word set and the training candidate replacement word set corresponding to the word to be detected in the training text , Input the training context word set and the training candidate replacement word set into the initial pronoun resolution neural network, and the initial pronoun resolution neural network respectively performs feature extraction on the training context word set and the training candidate replacement word set to obtain the corresponding first training feature and The second training feature, the initial pronoun resolution neural network performs positive iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs counter-example iteration based on the first training feature and the second training feature Processing to obtain the corresponding negative training feature vector modulus length, calculate according to the positive training feature vector modulus length and negative training feature vector modulus length, obtain the training substitution possibility corresponding to each training candidate substitution word in the training candidate substitution word set, according to each The training substitution
  • a data processing method is provided.
  • the method is applied to the terminal 110 or the server 120 in FIG. 1 as an example.
  • the data processing method includes the following steps:
  • Step 202 Obtain the text to be detected, and determine the context word set and candidate replacement word set corresponding to the word to be detected in the text to be detected.
  • the text to be detected is text that needs to be resolved by detecting pronouns, and there are one or more texts to be detected.
  • the text to be detected can be obtained in real time or stored in advance.
  • the text to be detected can be real-time crawling of news information or forum posts through a web crawler when an instruction for pronoun resolution is received.
  • the text to be detected may also be stored in the database in advance.
  • pronoun resolution is to detect the substitute words referred to by the words to be detected in the text to be detected, and the words to be detected are the omitted or missing parts of the text to be detected.
  • the text to be detected is "Xiao Ming ate an apple, & is sweet", and & represents the word to be detected.
  • the name of the preset object may be stored, and the text including the name of the preset object may be obtained as the text to be detected.
  • object names such as "A company”, “B product” and “C company” can be stored in advance, and then crawl the network including one of “A company”, “B product” and “C company” through web crawler technology Or multiple words of text as the text to be detected.
  • the data source corresponding to the text to be detected is preset.
  • the data source corresponding to the text to be detected may be preset to be the D website, the E website, or the F website.
  • the text to be detected may be obtained by further filtering the text information.
  • the text to be detected may be obtained by further filtering the text information.
  • one or more of the title, abstract, first paragraph, and last paragraph of the article may be used as the text to be detected.
  • the word to be detected in the text to be detected is the omitted or missing part of the text to be detected
  • the context word set is a word set consisting of the above word sequence and the following word sequence of the word to be detected, the upper part of the word to be detected
  • the word sequence is a word sequence centered on the position of the word to be detected, composed of forward words at the position of the word to be detected, and the following word sequence is centered on the position of the word to be detected, and is determined by the position of the word to be detected.
  • a sequence of words consisting of backward words in position.
  • the embodiment of the present application first segment the text to be detected to obtain multiple words, perform syntactic analysis on the multiple words, determine the position of the word to be detected, and obtain the forward direction according to the position of the word to be detected. For words and backward words, the obtained forward words form the above word sequence, the backward words form the following word sequence, and then the context word set is formed according to the above word sequence and the following word sequence.
  • the candidate replacement word set is a word set composed of candidate replacement words of the word to be detected, and the candidate replacement word is a candidate word used to replace the word to be detected, and may be a noun phrase.
  • Candidate replacement words can be selected from the words corresponding to the text to be detected according to preset filtering rules.
  • the preset filtering rules can be customized. Customization can be to filter nominal phrases from the words corresponding to the text to be detected as candidate replacement words. It can also filter adjective words as candidate replacement words from words corresponding to the text to be detected.
  • the embodiment of the present application first segment the text to be detected to obtain multiple words, perform syntactic analysis on the multiple words, obtain candidate substitute words according to the syntactic analysis result, and form a candidate substitute word set from the obtained candidate substitute words.
  • syntactic analysis is to analyze the grammatical function of words in the text to be tested, and obtain the result of syntactic analysis. For example, “I'm late”, where “I” is the subject, “Come” is the predicate, and "Late” is the complement.
  • the acquired text to be detected is: “Xiao Ming ate a small apple, it is very sweet, and he is in a wonderful mood.”
  • the text to be detected is segmented to obtain multiple words: “Xiao Ming”, “eat “A”, “a”, “Little Apple”, “Very Sweet”, “He”, “Mood”, “Super” and “Wonderful”.
  • the context word set is composed of the above word sequence and the following word sequence .
  • the preset screening rule for candidate replacement words is to select nominal phrases from the words corresponding to the text to be detected as candidate replacement words, so the candidate replacement words obtained by screening are: "Xiaoming" and " ⁇ apple”, which are composed of candidate replacement words A set of candidate replacement words.
  • Step 204 Input the context word set and the candidate replacement word set into the pronoun resolution neural network, and the pronoun resolution neural network respectively performs feature extraction on the context word set and the candidate replacement word set to obtain corresponding first and second features.
  • pronoun resolution is to detect substitute words referred to by the words to be detected in the text to be detected, and the words to be detected are the omitted or missing parts of the text to be detected.
  • the pronoun resolution neural network is used to determine the candidate replacement words corresponding to the words to be detected.
  • the pronoun resolution neural network is pre-trained.
  • the pronoun resolution neural network can be a capsule network (Capsule Network) or a vector machine (Support Vector Machine, SVM) classification Models for classification, such as a neural network (Artificial Neural Network, ANN) classifier model, and a logistic regression algorithm (logistic Regression, LR) classifier model.
  • ANN Artificial Neural Network
  • LR logistic regression algorithm
  • the pronoun resolution neural network combines the features of the word sequence and the word sequence corresponding to the word sequence to obtain the possibility of substitution between the word to be detected and each candidate replacement word, and improve the accuracy of pronoun resolution.
  • the embodiment of the present application adopts a supervised learning method.
  • feature extraction refers to mapping one or more input features to additional features.
  • the context word set and the candidate replacement word set are input into the pronoun resolution neural network.
  • the pronoun resolution neural network can extract the features of the context word set through the feature representation sub-network to obtain the first feature corresponding to the context word set, which is represented by the feature
  • the sub-network performs feature extraction on the candidate replacement word set to obtain the second feature corresponding to the candidate replacement word set.
  • the first feature includes, but is not limited to, a word vector feature based on the word sequence in the context word set and a word vector feature based on the word sequence corresponding to the word sequence in the context word set.
  • the second feature includes, but is not limited to, the word vector feature based on the word sequence in the candidate replacement word set and the word vector feature based on the word sequence corresponding to the word sequence in the candidate replacement word set. Based on the word sequence in the context word set or candidate replacement word set, it refers to the feature extraction of the word sequence of the context word or candidate replacement word to obtain the corresponding word vector feature. It is understandable that the feature extraction by the word sequence refers to The word vector is extracted as a whole.
  • the word sequence corresponding to the word sequence in the context word set or candidate replacement word set refers to the feature extraction of the word sequence corresponding to the word sequence of the context word or candidate replacement word to obtain the corresponding word vector feature. It can be understood that The feature extraction of the word sequence corresponding to the word sequence refers to the extraction of the word vector as a whole.
  • the contextual word set and the candidate replacement word set are input into the pronoun resolution neural network, which includes a forward feature representation sub-network, a backward feature representation sub-network, a character vector feature representation sub-network, and the pronoun
  • the resolution neural network uses the forward feature representation sub-network to perform feature extraction on the word sequence in the context word set to obtain the corresponding first forward sub-feature, and uses the backward feature representation sub-network to perform feature extraction on the word sequence in the context word set ,
  • the corresponding first backward sub-feature is obtained, the character sequence corresponding to the word sequence in the context word set is extracted through the character vector feature representation sub-network, and the corresponding first character vector sub-feature is obtained, and the first forward sub-feature is
  • the feature, the first backward sub-feature, and the first word vector sub-feature constitute the first feature corresponding to the context word set.
  • the pronoun resolution neural network uses the forward feature representation sub-network to perform feature extraction on the word sequence in the candidate replacement word set to obtain the corresponding second forward sub feature. Perform feature extraction on the word sequence to obtain the corresponding second backward sub-feature. Use the character vector feature representation sub-network to perform feature extraction on the word sequence corresponding to the word sequence in the candidate replacement word set to obtain the corresponding second character vector sub-feature , The second forward sub-feature, the second backward sub-feature, and the second word vector sub-feature are combined into the second feature corresponding to the candidate replacement word set.
  • Step 206 The pronoun resolution neural network performs positive example iterative processing based on the first feature and the second feature to obtain the corresponding positive example feature vector modulus, and performs negative example iterative processing based on the first feature and the second feature to obtain the corresponding negative feature vector modulus length Calculate according to the positive example feature vector modulus length and the negative example feature vector modulus length to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set.
  • positive example iterative processing refers to the process of repeating iterative calculation of features to obtain the positive example feature vector modulus length
  • the positive example feature vector modulus length refers to the length of the positive example feature vector
  • the positive example feature vector is the probability of positive substitution
  • the custom dynamic routing algorithm can be used for iterative processing of positive examples and iterative processing of negative examples, because the iterative processing of positive examples and the iterative processing of negative examples correspond to
  • the preset weight coefficients are different, so the first feature and the second feature are calculated by a custom dynamic routing algorithm to obtain the positive example feature vector modulus length and the negative feature vector modulus length respectively.
  • the counter-example iterative processing refers to the process of repeatedly iteratively calculating features to obtain the length of the counter-example feature vector, while the counter-example feature vector length refers to the length of the counter-example feature vector.
  • the counter-example feature vector is the feature vector corresponding to the possibility of counter-example substitution.
  • Possibility refers to the probability that the word to be detected in the text to be detected does not match each candidate replacement word.
  • calculating the modulus length of the positive eigenvectors and the modulus length of the negative eigenvectors through a custom dynamic routing algorithm may be to calculate the initial iteration center based on the first feature and the second feature, and use the initial iteration center as the positive example iteration process and The initial iterative center of the counter-example iterative process, and the initial iterative center as the current iterative center, and then linearly transform the first feature and the second feature according to the preset weight coefficients corresponding to the positive-example iterative process and the counter-example iterative process to obtain the positive-example iteration Processing and counter-example iterative processing corresponding to the first intermediate feature and the second intermediate feature.
  • the initial feature vector modulus corresponding to the positive example iterative processing and the negative example iterative processing is calculated. long.
  • the iterative center is updated according to the first intermediate similarity and the second intermediate similarity corresponding to the positive and negative iterative processing and the initial iterative center, the updated iterative center is used as the current iterative center, and the positive iterative processing is returned
  • the first intermediate feature and the second intermediate feature corresponding to the counter-example iterative process are similar to the current iteration center until the convergence condition is met, and the positive-example feature vector modulus corresponding to the positive-example iterative process and the counter-example iterative process are obtained.
  • the negative example feature vector modulus length is obtained through customization.
  • the customization can be that when the number of iterations reaches the preset number of iterations, the convergence condition is considered to be satisfied, or it can be considered to be satisfied when the initial eigenvector modulus length no longer changes. Convergence conditions and so on.
  • the substitution probability refers to the probability that each candidate substitution word in the candidate substitution word set replaces the word to be detected, and the substitution probability may be a percentage probability, or a score value, etc.
  • the candidate replacement is calculated according to the positive example feature vector modulus length and the negative example feature vector modulus length.
  • the substitution probability corresponding to each candidate substitution word in the word set.
  • the substitution possibility includes but is not limited to the positive substitution possibility and the negative substitution possibility.
  • the so-called positive substitution possibility refers to the substitution possibility that each candidate substitution word in the candidate substitution word set can replace the word to be detected.
  • the negative substitution possibility is Refers to the possibility that each candidate replacement word in the candidate replacement word set cannot replace the word to be detected.
  • the substitution probability corresponding to each candidate substitution word in the candidate substitution word set can be calculated by the following formula:
  • P pos refers to the possibility of substitution of positive examples
  • P neg refers to the possibility of substitution of negative examples
  • V pos refers to the modulus of the feature vector of the positive example
  • V neg refers to the modulus of the feature vector of the negative example.
  • Step 208 Determine the target substitute word according to the substitution possibility corresponding to each candidate substitute word.
  • the target substitute word refers to a substitute word in the candidate word set that can replace the word to be detected in the text to be detected.
  • the substitution probability corresponding to each candidate substitution word is obtained from the preset rule Determine the target alternative words, among which the preset rules can be customized.
  • Customization can be to determine the candidate substitution words with the highest substitution possibility as the target substitution words, or if the substitution possibilities of each candidate substitution word include positive substitution possibility and Negative example substitution possibility, positive substitution possibility refers to the substitution possibility that each candidate substitution word in the candidate substitution word set can replace the word to be tested, and negative substitution possibility means that each candidate substitution word in the candidate substitution word set cannot replace the word to be tested
  • substitution possibility of a word therefore, the target substitution word can be determined from the candidate substitution word set according to the positive substitution possibility, for example, the candidate substitution word with the highest positive substitution possibility is determined as the target substitution word, and so on.
  • the substitution probability corresponding to each candidate substitution word includes the positive substitution probability and the negative substitution probability.
  • the candidate substitution word set includes word a, word b, and word c, and the positive substitution probability corresponding to word a
  • the probability of substitution is 0.7, the probability of substitution of negative example is 0.3, the probability of positive substitution corresponding to word b is 0.8, the probability of negative substitution is 0.2, the probability of positive substitution corresponding to word c is 0.4, and the probability of negative substitution is 0.6.
  • the substitution probability corresponding to the candidate substitution word determines the target substitution word rule is to determine the candidate substitution word corresponding to each candidate substitution word with the highest positive substitution probability as the target substitution word, then the target substitution word is word b.
  • Step 210 Insert the target substitute word into the text to be detected according to the position corresponding to the word to be detected to obtain the target text.
  • inserting refers to writing or putting the target replacement word into the position corresponding to the word to be detected in the text to be detected.
  • the position of the word to be detected in the text to be detected is determined, and the target substitute word is inserted into the position of the word to be detected, thereby obtaining the target text.
  • determining the position of the word to be detected in the text to be detected may be to first segment the text to be detected to obtain multiple words, perform syntactic analysis on the multiple words to obtain the syntactic analysis result, and then determine the location of the word to be detected according to the syntactic analysis result. The position in the text to be detected.
  • the text to be detected is: "Xiao Ming ate a small apple, it is very sweet”.
  • the target replacement word determined from the candidate replacement word set in the embodiment of the application is: "Little Apple”. First, determine that the position of the word to be detected in the text to be detected is in front of "Very sweet”, and then insert the target substitute word into the position corresponding to the word to be detected, and finally get the target text, the target text is: "Xiao Ming ate a Little apples, little apples are very sweet”.
  • the pronoun resolution neural network can make good use of the context word sequence in the text to be detected and the features corresponding to the candidate replacement words, by performing positive and negative iteration processing on the features corresponding to the context word sequence and candidate replacement words
  • the corresponding positive example feature vector modulus length and negative example feature vector modulus length are obtained by processing, and finally the substitution probability corresponding to each candidate replacement word in the candidate replacement word set is calculated according to the positive example feature vector modulus length and the negative feature vector modulus length.
  • the pronoun resolution neural network can well integrate the features corresponding to the context word sequence and the candidate replacement word, the probability of substitution corresponding to the candidate replacement word is calculated according to the features corresponding to the context word sequence and the candidate replacement word, except for the feature corresponding to the word sequence
  • the feature also includes the feature of the word sequence corresponding to the word sequence, which can well solve the problem of sparseness at the data level, thereby improving the accuracy of the substitution possibility corresponding to each candidate substitution word in the candidate substitution word set, thereby improving the accuracy of pronoun resolution .
  • the pronoun resolution neural network after the pronoun resolution neural network performs feature extraction on the context word set and candidate replacement word set to obtain the corresponding first feature and second feature, it further includes: the pronoun resolution neural network performs feature extraction on the first feature and the second feature Dimension transformation and length scaling are performed to obtain the corresponding first target feature and second target feature.
  • the first feature and the second feature have the problem of dimensional diversity and length range diversity, and the dimensions and lengths of the first feature and the second feature are not uniform, for the subsequent positive example feature vector modulus length and counter-example feature
  • the accuracy of the calculation of the vector modulus length so it is necessary to perform dimensional changes on the first feature and the second feature after the pronoun resolution neural network performs feature extraction on the context word set and the candidate replacement word set to obtain the corresponding first feature and second feature
  • the length scaling enables the first feature and the second feature to overcome the problems of dimensional diversity and length diversity, and ensure the accuracy of the subsequent calculation of the model length of the positive and negative feature vector.
  • the first target feature refers to the first feature obtained after the first feature undergoes dimensional transformation and length scaling processing
  • the second target feature refers to the second feature obtained after the second feature undergoes dimensional transformation and length scaling processing.
  • the feature conversion in the pronoun resolution neural network can be combined with the sub-network to perform dimensional transformation and length scaling processing on the first feature and the second feature to obtain the first feature and the second feature.
  • a target feature and a second target feature are scaled by the linear transformation function to obtain the corresponding intermediate feature, and then the corresponding intermediate feature is scaled by the length scaling function to obtain the first target corresponding to the first feature.
  • the first feature and the second feature are respectively dimensionally scaled through the linear transformation function
  • the corresponding intermediate feature can be dimensionally scaled by the following formula:
  • w i is the preset weight coefficient corresponding to the first feature or the second feature
  • f i is the first feature or the second feature
  • b i is the bias parameter obtained by training
  • the squash function is the squeeze function
  • u i is the intermediate feature corresponding to the first feature or the second feature.
  • the corresponding intermediate feature is scaled by the length scaling function to obtain the first target feature corresponding to the first feature, and the second target feature corresponding to the second feature can be scaled by the following formula:
  • the squash function is a squeeze function
  • u i is an intermediate feature corresponding to the first feature or the second feature.
  • the pronoun resolution neural network performs positive example iterative processing based on the first feature and second feature to obtain the corresponding positive example feature vector modulus, and performs negative example iterative processing based on the first feature and second feature to obtain the corresponding negative example feature vector Modulus length, including: pronoun resolution neural network performs positive example iterative processing according to the first target feature and second target feature to obtain the corresponding positive example feature vector modulus length, and performs counter-example iterative processing according to the first target feature and second target feature to obtain the corresponding The negative example feature vector modulus length.
  • the pronoun resolution neural network performs positive example iterative processing according to the first target feature and the second target feature
  • the corresponding positive example feature vector modulus length is obtained
  • the negative example iterative processing is performed according to the first target feature and the second target feature to obtain the corresponding negative example feature vector modulus length.
  • the specific process refer to the steps of performing positive example iterative processing based on the first feature and second feature to obtain the corresponding positive example feature vector modulus, and performing negative example iterative processing based on the first feature and second feature to obtain the corresponding negative feature vector modulus length. I won't repeat them here.
  • determining the context word set and candidate replacement word set corresponding to the word to be detected in the text to be detected includes:
  • Step 302 segment the text to be detected to obtain multiple words.
  • segmentation refers to segmenting a piece of text data into multiple words, and the segmentation method can be set according to actual needs. For example, one or more of a segmentation method based on string matching, a segmentation method based on understanding, or a segmentation method based on statistics may be used for segmentation. It is also possible to use segmentation tools such as a stutter segmentation application tool or Hanlp segmentation application tool to segment the text to be detected. After segmentation, the word sequence arranged in sequence according to the word sequence of the text to be detected is obtained.
  • Step 304 Perform syntactic analysis on each word, and determine the location of the word to be detected according to the syntactic analysis result.
  • the syntactic analysis is to analyze the grammatical function of the words obtained from the segmentation in the text to be detected to obtain the syntactic analysis result.
  • the syntactic analysis structure can be a syntactic structure.
  • the syntactic structure refers to the combination of words and words according to certain rules, such as "I am late", where "I” is the subject, "Come” is the predicate, and "Late” It is a complement, and the corresponding syntactic structure can be: subject + predicate + object, or "Xiao Ming ate a small apple, it is very sweet”.
  • the corresponding syntactic structure can be: noun phrase + verb phrase + quantifier + noun phrase + word to be detected + Adjective phrase.
  • the location of the word to be detected can be determined according to the result of the syntactic analysis. Since the word to be detected refers to the omitted or missing part of the text to be detected, the syntax is performed on each word When the syntactic analysis result is obtained by the analysis, the position of the word to be detected can be detected according to the syntactic analysis result.
  • the text to be detected is: "Xiao Ming ate a small apple, it is very sweet”
  • the multiple words obtained by segmentation are: “Xiao Ming”, “eat”, “ge”, “little apple”, “very sweet” ", perform syntactic analysis on the segmented words, and the syntactic analysis result is: noun phrase + verb phrase + quantifier + noun phrase + word to be detected + adjective phrase, so it can be seen that the position of the word to be detected in the text to be detected is: “ The position before "very sweet", that is, part of the content before "very sweet” is omitted or missing.
  • Step 306 Obtain the above word sequence and the following word sequence according to the location of the word to be detected, and form a context word set according to the above word sequence and the following word sequence.
  • the above word sequence of the word to be detected is centered on the position of the word to be detected, and a word sequence composed of forward words at the position of the word to be detected, and the following word sequence is centered on the position of the word to be detected ,
  • the word sequence composed of backward words at the position of the word to be detected is centered on the position of the word to be detected.
  • the text to be detected is: "Xiao Ming ate a small apple, it is very sweet, and he is in a wonderful mood", first segment the text to be detected, and get multiple words: “Xiao Ming”, “eat”, “person”, “Little Apple”, “Very Sweet”, “He”, “Mood”, “Super” and “Wonderful”.
  • the context word set is composed of the above word sequence and the following word sequence .
  • Step 308 Obtain candidate replacement words according to the syntactic analysis result, and form a candidate replacement word set according to the candidate replacement words.
  • the candidate replacement words are candidate words used to replace the words to be detected, and may be nominal phrases and the like.
  • candidate alternative words are obtained from the syntactic analysis result according to a preset screening rule.
  • the preset screening rule can be customized, and the customization can be based on the syntactic structure. Noun phrases are used as candidate replacement words, or adjectives are used as candidate replacement words based on the syntactic structure, etc.
  • a candidate replacement word set is formed according to the candidate replacement words.
  • the text to be detected is: "Xiao Ming ate a small apple, it is very sweet, and he is in a wonderful mood", first segment the text to be detected, and get multiple words: “Xiao Ming", “eat”, “person”, “Little Apple”, “Very Sweet”, “He”, “Mood”, “Super” and “Wonderful”.
  • the preset screening rule for candidate replacement words is to select nominal phrases from the words corresponding to the text to be detected as candidate replacement words, so the candidate replacement words obtained by screening are: "Xiaoming" and " ⁇ apple", which are composed of candidate replacement words A set of candidate replacement words.
  • the pronoun resolution neural network separately performs feature extraction on the context word set and the candidate replacement word set to obtain the corresponding first feature and second feature, including:
  • the pronoun resolution neural network compresses and expresses the word sequence in the context word set through the forward feature representation sub-network and the backward feature representation sub-network to obtain the corresponding first forward sub-feature and first backward sub-feature.
  • the forward feature representation sub-network and the backward feature representation sub-network are both used to perform feature operations on word sequences to obtain corresponding forward sub-features and backward sub-features.
  • compressed representation is the process of performing feature operations on word sequences to obtain corresponding sub-features.
  • the forward feature representation sub-network and the backward feature representation sub-network can be two LSTM neural sub-networks.
  • the pronoun resolution neural network performs feature extraction on the above word sequence in the above word set through the forward feature representation sub-network to obtain the first forward sub feature corresponding to the above word sequence, and at the same time, through the backward feature representation subnet The network performs feature extraction on the following word sequence in the context word set to obtain the first backward sub-feature corresponding to the above word sequence.
  • the pronoun resolution neural network compresses and represents the word sequence corresponding to the word sequence in the context word set, and obtains the first character vector sub-feature.
  • the quantum feature constitutes the first feature corresponding to the set of context words.
  • the pronoun resolution neural network also includes a character vector feature representation sub-network for feature extraction of the word sequence corresponding to the word sequence.
  • the pronoun resolution neural network uses the character vector feature representation sub-network to obtain the word sequence corresponding to the context word set.
  • the character sequence is subjected to feature extraction, and the corresponding first character vector sub-feature is obtained.
  • the pronoun resolution neural network performs feature extraction on the above word sequence in the context word set through the word vector feature representation sub-network to obtain the word vector sub-features corresponding to the above word sequence, and the word vector feature representation sub-network performs the feature extraction on the context word set Feature extraction is performed on the following word sequence in the following word sequence to obtain the character vector sub-feature corresponding to the following word sequence, and the first character vector sub-feature is composed of the character vector sub-feature corresponding to the above word sequence and the character vector sub-feature corresponding to the following word sequence.
  • the first forward sub-feature, the first backward sub-feature, and the first word vector sub-feature are combined into the first feature corresponding to the context word set.
  • it can be expressed by the following expression:
  • f 0 is the first forward sub-feature
  • f 1 is the first backward sub-feature
  • f 2 is the first word vector sub-feature
  • LSTM forward is the forward feature representation subnet
  • LSTM reverse is the backward feature representation.
  • Network BERT is the character vector feature representation sub-network
  • zp_pre_word is the word sequence in the context word set
  • zp_pre_chars is the word sequence corresponding to the word sequence in the context word set
  • N represents the number of words corresponding to the word sequence in the context word set
  • M Represents the number of word sequences corresponding to the word sequence in the context word set.
  • Step 406 The pronoun resolution neural network compresses and expresses the word sequence in the candidate replacement word set through the forward feature representation sub-network and the backward feature representation sub-network to obtain the corresponding second forward sub-feature and second backward sub-feature .
  • the pronoun resolution neural network performs feature extraction on the candidate replacement words in the candidate replacement word set through the forward feature representation sub-network to obtain the second forward subfeature corresponding to the candidate replacement word, and at the same time, through the backward feature representation The network performs feature extraction on the candidate replacement words in the candidate replacement word set to obtain the second backward sub-feature corresponding to the candidate replacement words.
  • Step 408 The pronoun resolution neural network compresses and represents the word sequence corresponding to the word sequence in the candidate replacement word set to obtain the second character vector sub-feature, and the second forward sub-feature, the second backward sub-feature and the second character
  • the vector sub-features constitute the second feature corresponding to the candidate replacement word set.
  • the pronoun resolution neural network includes a character vector feature representation sub-network.
  • the character vector feature representation sub-network is a sub-network for feature extraction of the character sequence corresponding to the word sequence. Therefore, the pronoun resolution neural network uses the character vector feature representation sub-network.
  • the network performs feature extraction on the character sequence corresponding to the candidate substitution word in the candidate substitution word set to obtain the second character vector sub-feature corresponding to the candidate substitution word.
  • the second forward sub-feature, the second backward sub-feature, and the second word vector sub-feature are combined into a second feature corresponding to the candidate replacement word set.
  • the pronoun resolution neural network performs positive example iterative processing according to the first feature and the second feature to obtain the corresponding positive example feature vector modulus, including:
  • Step 502 Calculate the initial positive iteration center of the positive iteration process based on the first feature and the second feature, and use the initial positive iteration center as the current positive iteration center.
  • the pronoun resolution neural network After the pronoun resolution neural network obtains the first feature and the second feature, it needs to process the positive example iteratively on the first feature and the second feature. First, it needs to obtain the initial positive example iteration center of the positive example iterative process, and set the initial positive The example iteration center serves as the current positive example iteration center.
  • the current positive iteration center here is the reference center that is undergoing positive iteration processing.
  • the initial positive example iteration center can be specifically calculated based on the first feature and the second feature, and the calculation method can be customized.
  • the customization can be a weighted sum of the first feature and the second feature, and the weighted sum is obtained
  • the result of is used as the initial positive example iteration center, or the mean value of the first feature and the second feature can be calculated, and the result obtained by the mean value calculation is used as the initial positive example iteration center, etc.
  • FIG. 6 shows a schematic diagram of code implementation of positive example iterative processing or negative example iterative processing in an embodiment.
  • FIG. 6 shows a schematic diagram of the code implementation of the positive example iterative processing in one embodiment, as shown in FIG. 6, u i in FIG. 5 represents the first feature or the second feature, and k j represents the current positive Example iteration center.
  • the initial positive example iteration center shown in FIG. 6 is a weighted summation of the first feature and the second feature, and then the tanh function transformation is performed to obtain the initial positive example iteration center.
  • the initial positive iteration center can be calculated as follows:
  • l represents the total number of the first feature and the second feature
  • u i represents the first feature or the second feature
  • k j represents the initial positive example iteration center.
  • Step 504 Perform linear transformation on the first feature and the second feature respectively according to the preset positive example weight coefficients to obtain the corresponding first positive intermediate feature and the second positive intermediate feature.
  • the preset positive weight coefficients are the weight coefficients used to linearly transform the first feature and the second feature in the iterative process of positive examples, and the preset positive weight coefficients are obtained from the training of the pronoun resolution neural network, that is, in During the iterative processing of the positive example, the weight coefficients of the linear change of the first feature and the second feature are both preset positive weight coefficients.
  • the first feature and the second feature are respectively linearly transformed according to preset positive example weight coefficients to obtain the corresponding first positive example intermediate feature and second positive example intermediate feature.
  • the linear transformation may specifically be the product of the preset positive weight coefficient and the first feature to obtain the first positive intermediate feature, and the product of the preset positive weight coefficient and the second feature to obtain the second positive intermediate feature.
  • u i in FIG. 6 represents the first feature or the second feature, Represents the intermediate feature of the positive example corresponding to u i , if u i is the first feature, then Is the first positive example intermediate feature; if u i is the second feature, Is the middle feature of the second positive example, It is the preset positive weight coefficient during iterative processing of positive examples.
  • the first feature and the second feature can be linearly transformed separately as follows:
  • Step 506 Perform similarity calculations on the first positive example intermediate feature and the second positive example intermediate feature respectively with the current positive example iteration center to obtain the corresponding first positive example similarity and second positive example similarity.
  • the similarity is a measure to comprehensively evaluate the similarity between two things.
  • the similarity here is to evaluate the similarity between the intermediate features of the positive example and the current iteration center of the positive example. The higher the similarity, the middle of the positive example. The closer the feature is to the current positive iteration center, on the contrary, it means that the intermediate feature of the positive example is not similar to the current positive iteration center.
  • the first positive example intermediate feature and the second positive example intermediate feature are respectively calculated with the current positive example iteration center to obtain the corresponding The similarity of the first positive example and the similarity of the second positive example.
  • the similarity calculation method can be customized, and the customization can be but not limited to Euclidean distance, cosine similarity, and so on.
  • the positive example similarity can be calculated as the following formula:
  • Step 508 Perform a normalization operation on the similarity of the first positive example and the similarity of the second positive example to obtain the corresponding intermediate similarity of the first positive example and the intermediate similarity of the second positive example.
  • the normalization operation is a way to simplify calculations, that is, a dimensional expression is transformed into a dimensionless expression and becomes a scalar. For example, change the similarity of the positive example to a decimal between (0,1), or change the similarity of the positive example to 0 or 1, etc., to convert a dimensional expression to a dimensionless expression.
  • a softmax function normalized exponential function
  • a softmax function normalized exponential function
  • c j in Fig. 6 is the intermediate similarity of the positive example obtained after the normalization operation, Is the similarity of the first positive example and the similarity of the second positive example, if Is the first positive example similarity, then c j is the first positive example intermediate similarity, if Is the second positive example similarity, then c j is the second positive example intermediate similarity.
  • the intermediate similarity of positive examples can be calculated as the following formula:
  • Step 510 Calculate the initial positive example feature vector modulus length according to the intermediate similarity of the first positive example and the corresponding intermediate feature of the first positive example, the second positive example similarity and the corresponding intermediate feature of the second positive example.
  • the modulus length of the eigenvector of the initial positive example refers to the modulus length of the eigenvector of the positive example obtained in the first iteration of the positive example. Specifically, it can be based on the intermediate similarity of the first positive example and the corresponding intermediate feature of the first positive example, the second positive example Example similarity and the corresponding intermediate feature of the second positive example are calculated to obtain the initial positive example feature vector modulus length. The calculation method can be customized.
  • the result is used as the modulus length of the initial positive example feature vector, etc.
  • v j in Fig. 6 represents the modulus of the positive example feature vector
  • c ij represents the intermediate similarity of the positive example
  • the modulus length of the positive eigenvector can be calculated as the following formula:
  • the squash function is a squeeze function
  • the squash function is a function that maps a larger input value to a smaller interval of 0 to 1
  • l is the total number of the first feature and the second feature.
  • Step 512 Calculate the positive update iteration center based on the initial positive eigenvector modulus and the initial positive iteration center, set the positive update iteration center as the current positive iteration center, and return to set the first positive intermediate feature and the second positive
  • the middle feature of the example is similar to the current positive example iteration center until the convergence condition is met, and the positive example feature vector modulus length is obtained.
  • the convergence condition of the positive example iterative processing is set in advance, the initial positive eigenvector modulus obtained by the calculation cannot be the final positive eigenvector modulus. It is necessary to continue the positive example iterative processing until the convergence condition is satisfied.
  • the positive example feature vector modulus length can be output.
  • the convergence condition can be customized, and the customization can be the number of iterations or the modulus length of the positive eigenvector meets the preset modulus length value, and the convergence condition can be considered to be satisfied.
  • the positive update iteration center can be calculated according to the initial positive feature vector modulus length and the initial positive iteration center, the positive update iteration center is used as the current positive iteration center, and the first positive intermediate feature and the first positive example are returned.
  • the steps of calculating the similarity between the intermediate features of the two positive examples and the current positive example iteration center continue to perform the positive example iteration process until the convergence condition is met, and the positive example feature vector modulus length is obtained.
  • the calculation method of the positive example update iteration center calculated according to the initial positive eigenvector modulus and the initial positive iteration center can be customized.
  • Customization can be the average of the initial positive eigenvector modulus and the initial positive iteration center
  • the mean value calculation result is used as the positive example to update the iterative center, or the initial positive eigenvector modulus and the initial positive iterative center are weighted and summed, and the weighted sum result is used as the positive example to update the iterative center.
  • the 14th step in Figure 6 is to calculate the positive example update iteration center.
  • the positive example update iteration center may be the mean value calculation of the initial positive example feature vector modulus and the initial positive example iteration center As a result, the positive update iteration center can be calculated as the following formula:
  • the 16th step obtains the final positive eigenvector modulus according to the positive eigenvector modulus obtained when the convergence condition is met last time, which can be calculated as follows:
  • w j is the preset weight coefficient corresponding to the iterative processing of positive examples
  • v j on the left side of the equation is the final positive example feature vector modulus length
  • v j on the right side of the equation is the positive example feature obtained when the convergence condition is satisfied for the last time Vector modulus length.
  • performing counter-example iterative processing according to the first feature and the second feature to obtain the corresponding counter-example feature vector modulus length includes:
  • Step 602 Calculate the initial counter example iteration center of the counter example iteration process based on the first feature and the second feature, and use the initial counter example iteration center as the current counter example iteration center.
  • the pronoun resolution neural network After the pronoun resolution neural network obtains the first feature and the second feature, it needs to perform counter-example iteration processing on the first feature and the second feature. First, it needs to obtain the initial counter-example iteration center of the counter-example iterative processing, and use the initial counter-example iteration center as Current counterexample iteration center.
  • the current counter-example iteration center here is the reference center that is undergoing counter-example iteration processing.
  • the initial counterexample iteration center can be specifically calculated based on the first feature and the second feature, and the calculation method can be customized.
  • the customization can be a weighted sum of the first feature and the second feature, and the weighted sum is obtained
  • the result is used as the initial counter-example iteration center, or the average value of the first feature and the second feature can be calculated, and the result obtained by the average calculation is used as the initial counter-example iteration center, and so on.
  • FIG. 6 shows a schematic diagram of code implementation of counter-example iterative processing in an embodiment.
  • FIG. 6 shows a schematic diagram of code implementation of counter-example iterative processing in an embodiment, as shown in FIG. 6, u i in FIG. 5 represents the first feature or second feature, and k j represents the current counter-example iteration center.
  • the initial counterexample iteration center shown in FIG. 6 is the weighted summation of the first feature and the second feature, and then the tanh function transformation calculation is performed to obtain the initial counterexample iteration center.
  • the initial counterexample iteration center can be calculated as follows:
  • l represents the total number of the first feature and the second feature
  • u i represents the first feature or the second feature
  • k j represents the initial counterexample iteration center.
  • Step 604 Perform linear transformation on the first feature and the second feature respectively according to the preset counterexample weight coefficients to obtain the corresponding first counterexample intermediate feature and the second counterexample intermediate feature.
  • the preset counterexample weight coefficient here is the weight coefficient used to linearly transform the first feature and the second feature in the counterexample iterative process.
  • the preset counterexample weight coefficient is obtained from the training of the pronoun resolution neural network, that is, the counterexample iteration is performed During processing, the weight coefficients of the linear change of the first feature and the second feature are both preset counterexample weight coefficients.
  • linear transformation is performed on the first feature and the second feature respectively according to preset counterexample weight coefficients to obtain the corresponding first counterexample intermediate feature and second counterexample intermediate feature.
  • the linear transformation may specifically be the product calculation of the preset counterexample weight coefficient and the first feature to obtain the first counterexample intermediate feature, and the product calculation of the preset counterexample weight coefficient and the second feature to obtain the second counterexample intermediate feature.
  • u i in FIG. 6 represents the first feature or the second feature, Represents the intermediate feature of the counterexample corresponding to u i , if u i is the first feature, then Is the intermediate feature of the first counterexample; if u i is the second feature, then Is the middle feature of the second counterexample, It is the preset counter-example weight coefficient during iterative processing of counter-examples.
  • the first feature and the second feature can be linearly transformed separately as follows:
  • Step 606 Perform similarity calculations on the first counterexample intermediate feature and the second counterexample intermediate feature respectively with the current counterexample iteration center to obtain the corresponding first counterexample similarity and second counterexample similarity.
  • similarity is a measure to comprehensively evaluate the similarity between two things.
  • the similarity here is to evaluate the similarity between the intermediate features of the counterexample and the current counterexample iteration center. The higher the similarity, the The closer the counter-example iteration center is, on the contrary, it means that the middle feature of the counter-example is not similar to the current counter-example iteration center.
  • the first counterexample intermediate feature and the second counterexample intermediate feature are respectively calculated for similarity with the current counterexample iteration center to obtain the corresponding first counterexample similarity Similarity to the second counterexample.
  • the similarity calculation method can be customized, and the customization can be but not limited to Euclidean distance, cosine similarity, and so on.
  • the counterexample similarity can be calculated as the following formula:
  • Step 608 Perform a normalization operation on the similarity of the first counterexample and the similarity of the second counterexample to obtain the corresponding intermediate similarity of the first counterexample and the intermediate similarity of the second counterexample.
  • the normalization operation is a way to simplify calculations, that is, a dimensional expression is transformed into a dimensionless expression and becomes a scalar. For example, change the counterexample similarity to a decimal between (0,1), or change the counterexample similarity to 0 or 1, etc., to convert a dimensional expression to a dimensionless expression.
  • a softmax function normalized exponential function
  • a softmax function normalized exponential function
  • c j in FIG. 6 is the intermediate similarity of the counter example obtained after the normalization operation, Is the first counterexample similarity and the second counterexample similarity, if Is the first counterexample similarity, then c j is the first counterexample intermediate similarity, if Is the second counter-example similarity, then c j is the second counter-example intermediate similarity.
  • the intermediate similarity of counterexamples can be calculated as follows:
  • Step 610 Calculate the initial counterexample feature vector modulus length based on the first counterexample intermediate similarity and the corresponding first counterexample intermediate feature, the second counterexample similarity and the corresponding second counterexample intermediate feature.
  • the initial counter-example feature vector modulus length refers to the counter-example feature vector modulus obtained from the first counter-example iteration. Specifically, it can be based on the first counter-example intermediate similarity and the corresponding first counter-example intermediate feature, the second counter-example similarity and the corresponding The intermediate feature of the second counter example is calculated to obtain the initial counter example feature vector modulus length.
  • the calculation method can be customized, which can be to sum the intermediate similarity of the first counterexample and the corresponding intermediate feature of the first counterexample, the second counterexample similarity and the corresponding intermediate feature of the second counterexample, and use the sum result as the initial counterexample feature vector Modulus length, or mean calculation of the intermediate similarity of the first counterexample and the corresponding intermediate feature of the first counterexample, the similarity of the second counterexample and the corresponding intermediate feature of the second counterexample, and the average calculation result as the initial counterexample feature vector modulus length, etc. Wait.
  • v j in Fig. 6 represents the modulus of the feature vector of the counter example
  • c ij represents the intermediate similarity of the counter example
  • the modulus length of the counter-example feature vector can be calculated as follows:
  • the squash function is a squeeze function
  • the squash function is a function that maps a larger input value to a smaller interval of 0 to 1
  • l is the total number of the first feature and the second feature.
  • Step 612 calculate the counter example update iteration center based on the initial counter example feature vector modulus length and the initial counter example iteration center, set the counter example update iteration center as the current counter example iteration center, and return to the first counter example intermediate feature and the second counter example intermediate feature respectively to the current counter example
  • the steps of similarity calculation are performed at the iteration center until the convergence condition is met, and the modulus length of the counterexample feature vector is obtained.
  • the convergence conditions of the counterexample iterative processing are set in advance, the initial counterexample eigenvector modulus cannot be the final counterexample eigenvector modulus after the calculation.
  • the counterexample iterative process needs to be continuously performed until the convergence conditions are met before outputting the counterexample.
  • the eigenvector modulus length can be customized, and the customization can be the number of iterations or the modulus length of the counterexample feature vector meeting the preset modulus length value, and the convergence condition can be considered to be satisfied.
  • the counter example update iteration center can be calculated according to the initial counter example feature vector modulus length and the initial counter example iteration center, the counter example update iteration center is used as the current counter example iteration center, and the first counter example intermediate feature and the second counter example intermediate feature are respectively and
  • the current counter-example iteration center carries out the similarity calculation step continuously to the counter-example iterative process until the convergence condition is met, and the counter-example feature vector modulus length is obtained.
  • the calculation method of the counterexample update iteration center can be customized.
  • Customization can be to calculate the mean value of the initial counterexample eigenvector modulus length and the initial counterexample iteration center.
  • the result is used as a counterexample to update the iteration center, or it can also be a weighted summation of the initial counterexample feature vector modulus and the initial counterexample iteration center, and the weighted summation result is used as a counterexample to update the iteration center.
  • step 14 in Fig. 6 is to calculate the counter example update iteration center.
  • the counter example update iteration center can be the initial counter example feature vector modulus length and the mean value calculation result of the initial counter example iteration center, which can be as The following formula calculates the counter-example update iteration center:
  • the modulus length of the feature vector of the counter example can be output.
  • the 16th step obtains the final counter-example eigenvector modulus length according to the counter-example eigenvector modulus obtained during the last time the convergence condition is satisfied, which can be calculated as follows:
  • wj is the preset weight coefficient corresponding to the counter-example iterative processing
  • vj on the left side of the equation is the final counter-example eigenvector modulus length
  • vj on the right side of the equation is the counter-example eigenvector modulus length obtained when the convergence condition is met last time.
  • a training method for pronoun resolution neural network is provided.
  • the method is mainly applied to the terminal 110 or the server 120 in FIG. 1 as an example.
  • the training method of the pronoun resolution neural network specifically includes the following steps:
  • Step 702 Obtain training text, and the training text has a corresponding standard training text label.
  • the training text is input data that needs to be trained on the pronoun resolution neural network, and the training text may be one or more.
  • the training text can be obtained in real time or stored in advance.
  • Step 704 Determine the training context word set and training candidate replacement word set corresponding to the word to be detected in the training text.
  • the training text needs to be preprocessed. Specifically, the training context word set corresponding to the word to be detected in the training text and the training candidate replacement word set are determined.
  • the training context word set corresponding to the word to be detected in the training text and the training candidate replacement word set are determined.
  • first segment the training text to obtain multiple words, perform syntactic analysis on the multiple words, determine the position of the word to be detected, and obtain the training forward words and the training backward words according to the positions of the words to be detected,
  • the obtained training forward words form the training above word sequence
  • the backward words form the training following word sequence
  • the training context word set is formed according to the training above word sequence and the training following word sequence.
  • the training candidate substitute words are obtained according to the syntactic analysis result, and a training candidate substitute word set is formed by the obtained training candidate substitute words.
  • Step 706 Input the training context word set and the training candidate replacement word set into the initial pronoun resolution neural network, and the initial pronoun resolution neural network respectively performs feature extraction on the training context word set and the training candidate replacement word set to obtain the corresponding first training feature And the second training feature.
  • the initial pronoun negative neural network is an untrained pronoun resolution neural network
  • the initial pronoun resolution neural network can be a capsule network (Capsule Network), a vector machine (Support Vector Machine, SVM) classifier model, and a neural network (Artificial Neural Network).
  • ANN ANN classifier model
  • logistic regression algorithm logistic Regression, LR
  • the training context word set and the training candidate replacement word set are input into the initial pronoun resolution neural network, and the initial pronoun resolution neural network can perform feature extraction on the training context word set through the feature representation sub-network to obtain the training context word set correspondence
  • the second training feature corresponding to the training candidate replacement word set is obtained by performing feature extraction on the training candidate replacement word set through the feature representation sub-network.
  • Step 708 The initial pronoun resolution neural network performs positive example iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs negative example iterative processing based on the first training feature and the second training feature to obtain the corresponding
  • the model length of the negative example training feature vector is calculated according to the model length of the positive example training feature vector and the model length of the negative example training feature vector to obtain the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set.
  • the positive example iterative processing refers to the process of repeating iterative calculation of features to obtain the positive training feature vector modulus length, while the positive training feature vector modulus length refers to the length of the positive training feature vector, and the positive training feature vector is a positive replacement The feature vector corresponding to the probability degree.
  • the positive substitution probability refers to the probability that the word to be detected in the training text matches each candidate replacement word. Iterative processing of positive examples can be iteratively calculated through a custom dynamic routing algorithm to obtain the training feature vector modulus of the positive example.
  • the custom dynamic routing algorithm can be used for the iterative processing of positive examples and the iterative processing of negative examples, because the iterative processing of positive examples and iterative processing of negative examples correspond
  • the preset training weight coefficients are different, so the first training feature and the second training feature are calculated by a custom dynamic routing algorithm to obtain the positive training feature vector modulus and the negative training feature vector modulus respectively.
  • the counter-example iterative processing refers to the process of repeatedly iteratively calculating the features to obtain the length of the counter-example training feature vector.
  • the counter-example training feature vector length refers to the length of the counter-example training feature vector.
  • the counter-example training feature vector is the feature vector corresponding to the possibility of counter-example substitution.
  • the possibility of counterexample substitution refers to the probability that the word to be detected in the training text does not match each candidate substitution word.
  • calculating the modulus length of the positive training feature vector and the modulus length of the negative training feature vector through a custom dynamic routing algorithm can be calculated based on the first training feature and the second training feature to obtain the initial iteration center, and the initial iteration center is respectively regarded as the positive
  • the initial iteration center of example iteration processing and negative example iteration processing, and the initial iteration center as the current iteration center and then perform the first training feature and the second training feature according to the preset training weight coefficients corresponding to the positive example iteration processing and the negative example iteration processing
  • Linear transformation is used to obtain the first intermediate training feature and the second intermediate training feature corresponding to the iterative processing of positive examples and the iterative processing of negative examples.
  • the similarity between the first intermediate training feature and the second intermediate training feature corresponding to the positive example iterative processing and the negative example iterative processing respectively with the current iteration center to obtain the first training similarity and the corresponding
  • the second training similarity followed by normalization of the first training similarity and the second training similarity corresponding to the iterative processing of positive examples and the iterative processing of negative examples, to obtain the first intermediate training corresponding to the iterative processing of positive examples and the iterative processing of negative examples
  • the similarity and the second intermediate training similarity are calculated according to the first intermediate training similarity and the second intermediate training similarity corresponding to the positive and negative iterative processing and the corresponding first and second intermediate training features
  • the modulus length of the initial training feature vector corresponding to the iterative processing of positive examples and the iterative processing of negative examples are calculated according to the first intermediate training similarity and the second intermediate training similarity corresponding to the positive and negative iterative processing and the corresponding first and second intermediate training features.
  • the iteration center is updated according to the first intermediate training similarity and the second intermediate training similarity and the initial iteration center corresponding to the iterative processing of the positive example and the iterative processing of the negative example, the updated iteration center is used as the current iteration center, and the positive example is returned.
  • the feature vector modulus length is calculated to obtain the training substitution probability corresponding to each training candidate substitution word in the training candidate substitution word set.
  • the training substitution possibility includes but not limited to the positive training substitution possibility and the negative training substitution possibility.
  • the so-called positive training substitution possibility refers to the substitution possibility of each training candidate substitution word in the training candidate substitution word set that can replace the word to be tested.
  • Counterexample substitution possibility refers to the substitution possibility that each training candidate substitution word in the training candidate substitution word set cannot replace the word to be detected.
  • the training substitution probability corresponding to each candidate substitution word in the training candidate substitution word set can be calculated by the following formula:
  • P pos refers to the possibility of training substitution of positive examples
  • P neg refers to the possibility of training substitution of negative examples
  • V pos refers to the modulus of training feature vector of positive examples
  • V neg refers to the modulus of training feature vector of negative examples.
  • Step 710 Calculate a training loss value according to the training substitution possibility corresponding to each training candidate replacement word and the corresponding standard training text label.
  • the training loss value is used to adjust the parameters of the initial pronoun resolution neural network model.
  • the training loss value can be calculated according to the training substitution possibility corresponding to each training candidate replacement word and the corresponding standard training text label.
  • the calculation method of calculating the training loss value is customized. The customization can be calculated according to the positive training substitution possibility in the training substitution possibility and the corresponding standard training text label, the negative training substitution possibility and the corresponding standard training text label calculation owned.
  • the training loss value can be calculated by the following formula:
  • P pos refers to the possibility of training substitution of positive examples
  • P neg refers to the possibility of training substitution of negative examples
  • V pos refers to the modulus of training feature vector of positive example
  • V neg refers to the modulus of training feature vector of negative example
  • J( ⁇ ) is training Loss value
  • y i is the standard training text label of the training sample.
  • step 712 the model parameters of the initial pronoun resolution neural network are adjusted according to the training loss value until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • the model parameters of the initial pronoun resolution neural network are continuously adjusted according to the training loss value until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • the convergence condition can be customized.
  • the customization can be that the training loss value no longer changes, or the number of adjustments reaches a preset number, etc., then the initial pronoun resolution neural network can be considered to meet the convergence condition, and the pronoun resolution neural network can be obtained.
  • the pronoun elimination neural network training method makes good use of the characteristics of the context word set and candidate replacement word set in the training text, and well corresponds the context word set and the candidate replacement word set. Fusion of the features of the pronouns, thereby improving the output accuracy of the pronoun resolution neural network.
  • the pronoun resolution neural network training method further includes: obtaining an anti-interference feature set; inputting the anti-interference features in the anti-interference feature set into the initial pronoun resolution neural network, and the initial pronoun resolution neural network is based on the first training feature , The second training feature and the anti-interference feature generate additional training features.
  • the anti-interference feature set is composed of features used to prevent interference from other features during the training of the initial pronoun resolution neural network.
  • the anti-interference feature set is input to the initial pronoun resolution neural network training to improve the output accuracy of the pronoun resolution neural network.
  • the anti-interference feature set may be composed of a set of artificially designed comprehensive features.
  • the anti-interference feature set is input to the initial pronoun resolution neural network, and the initial pronoun resolution neural network generates additional training features according to the first training feature, the second training feature, and the anti-interference feature.
  • the specific process of generating additional training features by the initial pronoun resolution neural network for the first training feature, the second training feature and the anti-interference feature can refer to the description of step 204 in FIG. 2, which will not be repeated here.
  • the initial pronoun resolution neural network performs positive iterative processing based on the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs negative iterative processing based on the first training feature and the second training feature Obtain the corresponding negative training feature vector modulus, including: the initial pronoun resolution neural network performs positive example iterative processing according to the first training feature, second training feature, anti-interference feature and additional training feature to obtain the corresponding positive training feature vector modulus , According to the first training feature, the second training feature, the anti-interference feature, and the additional training feature, perform counter-example iterative processing to obtain the corresponding counter-example training feature vector modulus length.
  • the initial pronoun resolution neural network after the initial pronoun resolution neural network generates additional training features for the first training feature, second training feature, and anti-interference feature, the initial pronoun resolution neural network generates additional training features based on the first training feature, second training feature, anti-interference feature, and additional training features.
  • the training feature is processed by the positive example iterative process to obtain the corresponding positive training feature vector modulus
  • the negative example is iteratively processed according to the first training feature, the second training feature, the anti-interference feature and the additional training feature to obtain the corresponding negative training feature vector modulus.
  • step 206 in FIG. 2, which is not repeated here for details, reference may be made to the description of step 206 in FIG. 2, which is not repeated here.
  • the training substitution probability corresponding to each training candidate substitution word in the training candidate substitution word set is calculated according to the positive training feature vector modulus and the negative training feature vector modulo length, including: training the feature vector modulus according to the positive example The long and negative training feature vector modulus length is calculated to obtain the positive training substitution probability and the negative training substitution probability corresponding to each training candidate substitution word in the training candidate substitution word set.
  • the possibility of positive training substitution refers to the probability that the word to be detected in the training text matches each training candidate replacement word
  • the negative training possibility refers to the mismatch between the word to be detected in the training text and each training candidate replacement word
  • the positive training substitution probability and the negative training substitution probability corresponding to each training candidate substitution word in the training candidate substitution word set can be calculated according to the positive training feature vector modulus length and the negative training feature vector modulo length.
  • the calculation method can be customized, and the customization can be a customized dynamic routing algorithm. For the customized dynamic routing algorithm, refer to the description of step 206 in FIG. 2, which will not be repeated again.
  • the positive example training substitution probability and the negative example training substitution probability corresponding to each training candidate substitution word in the training candidate substitution word set are calculated according to the positive training feature vector modulus length and the negative training feature vector modulo length. As calculated by the following formula:
  • P pos refers to the possibility of training substitution of positive examples
  • P neg refers to the possibility of training substitution of negative examples
  • V pos refers to the modulus of training feature vector of positive examples
  • V neg refers to the modulus of training feature vector of negative examples.
  • calculating the training loss value according to the training substitution possibility corresponding to each training candidate substitution word and the corresponding standard training text label includes: according to the positive training substitution possibility corresponding to each training candidate substitution word and the corresponding standard training The training loss value is calculated by the text label, the counterexample training substitution possibility and the corresponding standard training text label.
  • the training loss value can be calculated according to the positive training substitution probability corresponding to each training candidate substitution word and the corresponding standard training text label, the negative training substitution probability and the corresponding standard training text label.
  • the training loss value can be calculated by the following formula:
  • P pos refers to the possibility of training substitution of positive examples
  • P neg refers to the possibility of training substitution of negative examples
  • V pos refers to the modulus of training feature vector of positive example
  • V neg refers to the modulus of training feature vector of negative example
  • J( ⁇ ) is training Loss value
  • y i is the standard training text label of the training sample.
  • a data processing method and a pronoun resolution neural network training method are provided, which specifically include the following steps:
  • the training text has a corresponding standard training text label.
  • the initial pronoun resolution neural network performs positive iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, and performs counter-example iterative processing based on the first training feature and the second training feature to obtain the corresponding
  • the negative example training feature vector modulus length is calculated according to the positive example training feature vector modulus length and the negative example training feature vector modulus length to obtain the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set.
  • the pronoun resolution neural network compresses and expresses the word sequence in the context word set through the forward feature representation sub-network and the backward feature representation sub-network to obtain the corresponding first forward sub-feature and first backward sub-feature .
  • the pronoun resolution neural network compresses and represents the word sequence corresponding to the word sequence in the context word set, and obtains the first character vector sub-feature.
  • the vector subfeature constitutes the first feature corresponding to the context word set.
  • the pronoun resolution neural network compresses and expresses the word sequence in the candidate replacement word set through the forward feature representation sub-network and the backward feature representation sub-network to obtain the corresponding second forward sub-feature and second backward sub-feature feature.
  • the pronoun resolution neural network compresses and expresses the word sequence corresponding to the word sequence in the candidate replacement word set to obtain the second word vector sub-feature.
  • the second forward sub-feature, the second backward sub-feature and the second The character vector sub-features constitute the second feature corresponding to the candidate replacement word set.
  • the pronoun resolution neural network performs dimensional transformation and length scaling processing on the first feature and the second feature to obtain the corresponding first target feature and second target feature.
  • the pronoun resolution neural network performs positive example iterative processing according to the first target feature and the second target feature to obtain the corresponding positive example feature vector modulus, and performs counter-example iterative processing according to the first target feature and the second target feature to obtain the corresponding negative example feature
  • the vector modulus length is calculated according to the positive example feature vector modulus length and the negative example feature vector modulus length to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set.
  • the initial positive iteration center of the positive iteration process is calculated, and the initial positive iteration center is taken as the current positive iteration center.
  • the first feature and the second feature are respectively linearly transformed according to the preset positive example weight coefficient to obtain the corresponding first positive example intermediate feature and the second positive example intermediate feature.
  • the similarity of the second counterexample and the corresponding intermediate feature of the second counterexample are calculated to obtain the initial counterexample feature vector modulus length.
  • FIG. 9 shows a schematic diagram of the network structure of the pronoun resolution neural network in an embodiment.
  • the pronoun resolution neural network includes: Feature Representation. Conversion and combination layer (Feature Transformation&Combination), feature aggregation layer (Feature Clustering), classification layer (Classification).
  • the text to be detected is obtained, and the text to be detected is preprocessed to obtain the context word set and candidate replacement word set corresponding to the word to be detected in the text to be detected, and the context word set and candidate replacement word set are input to the pronoun resolution neural network Among them, the feature representation layer of the pronoun resolution neural network extracts features from the set of context words and candidate replacement words to obtain the corresponding first and second features.
  • the specific feature representation layer uses a bidirectional long and short-term neural sub-network to represent the word sequences in the context word set and the candidate replacement word set, and uses the word vector feature representation sub-network BERT to correspond to the word sequences in the context word set and the candidate replacement word set
  • the context word set and the candidate replacement word set can be respectively obtained 3 sets of features, that is, the first feature includes 2 sets of features f 0 , f 1 , and 1 set corresponding to the word sequence in the context word set
  • the feature f 2 corresponding to the word sequence corresponding to the word sequence
  • the second feature includes the features f 0 and f 1 corresponding to the word sequence in the 2 sets of candidate replacement word sets, and the feature f 2 corresponding to the word sequence corresponding to the 1 set of word sequences.
  • the feature conversion and combination layer is to unify the dimensions and scale the length of the features extracted from the feature representation layer. Since the vector output by the feature representation layer has the problem of dimensional diversity and length range diversity, it is necessary to perform dimensional transformation and length scaling for each feature. Specifically, a linear transformation function can be used to scale the dimension of the feature, and a length scaling function (squash) can be used to scale the length of the feature, and finally the corresponding first target feature and second target feature are obtained.
  • the feature aggregation layer performs iterative weight aggregation on various features.
  • the pronoun resolution neural network performs positive example iterative processing based on the first target feature and the second target feature through the feature aggregation layer to obtain the corresponding positive example feature vector modulus, and performs a negative example based on the first target feature and the second target feature Iterative processing obtains the corresponding counter-example feature vector modulus length. That is, multiple features output by the feature conversion and combination layer are input to the feature aggregation layer, and the feature aggregation layer obtains two vector modulus lengths as the positive example feature vector modulus length and the negative feature vector modulus length through calculation.
  • the pronoun resolution neural network inputs the positive feature vector modulus and negative feature vector modulus output from the feature aggregation layer to the classification layer, and the classification layer calculates the candidate replacement word set based on the positive feature vector modulus and the negative feature vector modulus.
  • the substitution probability corresponding to each candidate substitution word in.
  • determine the target replacement word according to the replacement probability corresponding to each candidate replacement word in the candidate replacement word set for example, use the candidate replacement word with the highest replacement probability as the target replacement word, and finally insert the target replacement word into the text to be detected. Corresponding position, get the target text.
  • the text to be detected is: "Xiao Ming ate an apple today, it is very sweet”
  • the position of the word to be detected in the text to be detected is before “very sweet”
  • that is, part of the content is omitted before “very sweet”
  • the candidate is substituted
  • the words are: “Xiao Ming” and "Apple”.
  • the substitution probability of "Xiao Ming” and the word to be detected is calculated by the pronoun resolution neural network to be 0.4, and the probability of substitution between "Apple” and the word to be detected is 0.9, so the target replacement word is determined For: “Apple”. Insert “apple” into the position corresponding to the word to be detected in the text to be detected, and the target text is: "Xiao Ming ate an apple today, the apple is very sweet”.
  • the training method of the pronoun resolution neural network is similar to the application method, and the input data is training text with standard training text labels.
  • the training process in order to improve the application accuracy of the pronoun resolution neural network, it is necessary to add the anti-interference feature set, and input the anti-interference features in the anti-interference feature set into the pronoun resolution neural network for training.
  • the training loss value of the pronoun resolution neural network is calculated through the training substitution probability corresponding to each training candidate replacement word obtained in the training process and the corresponding standard training text label.
  • the pronoun resolution neural network is trained according to the training loss value, and the model parameters are continuously adjusted until the convergence condition is met to obtain the final pronoun resolution neural network.
  • FIG. 10 shows a schematic diagram of the comparison of the verification results of the pronoun resolution neural network in an embodiment.
  • the last line of ZP-CapsNet in FIG. 10 is the application
  • the corresponding verification results of the pronoun resolution neural network on the six test data sets, and the other rows are the comparison pronoun resolution neural networks corresponding to the six test data sets The verification result.
  • the six test data sets include broadcast news BN (Broadcast News), news agency NW (Newswires), broadcast conversations BC (Broadcast Conversations), telephone conversations TC (Telephone Conversation), network blog NW (Web Blogs) and magazine MZ (Magazines) ), the verification result is the intermediate value calculated according to the accuracy rate and the recall rate.
  • the Overall of each pronoun resolution neural network in Figure 10 is a comprehensive value calculated based on the verification results of the six test data sets. It can be known from the Overall in FIG. 10 that the comprehensive effect of the pronoun resolution neural network of this application in the actual pronoun resolution application is better than that of other comparative pronoun resolution neural networks.
  • a data processing device 800 includes:
  • the to-be-detected text acquisition module 802 is configured to acquire the to-be-detected text, and determine the contextual word set and candidate replacement word set corresponding to the to-be-detected word in the to-be-detected text.
  • the feature extraction module 804 is used to input the context word set and the candidate replacement word set into the pronoun resolution neural network.
  • the pronoun resolution neural network respectively performs feature extraction on the context word set and the candidate replacement word set to obtain the corresponding first feature and second feature feature.
  • the feature vector modulus length is calculated according to the positive example feature vector modulus length and the negative example feature vector modulus length to obtain the substitution probability corresponding to each candidate replacement word in the candidate replacement word set.
  • the target substitute word determination module 808 is configured to determine the target substitute word according to the substitution possibility corresponding to each candidate substitute word.
  • the target substitute word insertion module 810 is configured to insert the target substitute word into the text to be detected according to the position corresponding to the word to be detected to obtain the target text.
  • the to-be-detected text acquisition module 802 includes:
  • the to-be-detected text segmentation unit 802a is used to segment the to-be-detected text to obtain multiple words.
  • the syntactic analysis unit 802b is used to perform syntactic analysis on each word, and determine the position of the word to be detected according to the syntactic analysis result.
  • the word sequence obtaining unit 802c is configured to obtain the above word sequence and the following word sequence according to the position of the word to be detected, and form a context word set according to the above word sequence and the following word sequence.
  • the candidate replacement word acquiring unit 802d is configured to acquire candidate replacement words according to the syntactic analysis result, and form a candidate replacement word set according to the candidate replacement words.
  • the data processing device 800 further includes a feature processing module, which is used to perform dimensional transformation and length scaling processing on the first feature and the second feature through the pronoun resolution neural network to obtain the corresponding first target Feature and the second target feature; the iterative processing module 806, through the pronoun resolution neural network, performs positive example iterative processing according to the first target feature and the second target feature to obtain the corresponding positive example feature vector modulus, according to the first target feature and The second target feature performs counter-example iterative processing to obtain the corresponding counter-example feature vector modulus length.
  • a feature processing module which is used to perform dimensional transformation and length scaling processing on the first feature and the second feature through the pronoun resolution neural network to obtain the corresponding first target Feature and the second target feature
  • the iterative processing module 806, through the pronoun resolution neural network performs positive example iterative processing according to the first target feature and the second target feature to obtain the corresponding positive example feature vector modulus, according to the first target feature and
  • the second target feature perform
  • the feature extraction module 804 is also used for the pronoun resolution neural network to compress and represent the word sequence in the context word set through the forward feature representation sub-network and the backward feature representation sub-network to obtain the corresponding first forward Sub-features and the first backward sub-feature; the pronoun resolution neural network compresses and represents the word sequence corresponding to the word sequence in the context word set, and obtains the first word vector sub-feature.
  • the first forward sub-feature and the first backward The sub-features and the first word vector sub-features constitute the first feature corresponding to the context word set; the pronoun resolution neural network compresses the word sequence in the candidate replacement word set through the forward feature representation sub-network and the backward feature representation sub-network.
  • the pronoun resolution neural network compresses and represents the word sequence corresponding to the word sequence in the candidate replacement word set, and obtains the second word vector sub-feature.
  • the forward sub-feature, the second backward sub-feature, and the second word vector sub-feature constitute the second feature corresponding to the candidate replacement word set.
  • the iterative processing module 806 is further configured to calculate the initial positive iterative center of the positive iterative process based on the first feature and the second feature, and use the initial positive iterative center as the current positive iterative center;
  • the positive example weight coefficients linearly transform the first feature and the second feature to obtain the corresponding first positive example intermediate feature and the second positive example intermediate feature;
  • the first positive example intermediate feature and the second positive example intermediate feature are respectively compared with
  • the current positive example iteration center performs similarity calculation to obtain the corresponding first positive similarity and the second positive similarity; normalize the first positive similarity and the second positive similarity to obtain the corresponding The intermediate similarity of the first positive example and the intermediate similarity of the second positive example; calculated according to the intermediate similarity of the first positive example and the corresponding intermediate feature of the first positive example, the similarity of the second positive example and the corresponding intermediate feature of the second positive example Obtain the modulus length of the initial positive eigenvector; calculate the positive update iteration center based on the initial positive
  • the iterative processing module 806 is further configured to calculate the initial counter example iteration center of the counter example iteration process according to the first feature and the second feature, and use the initial counter example iteration center as the current counter example iteration center; Perform linear transformations on the first feature and the second feature to obtain the corresponding first and second counterexample intermediate features; calculate the similarity between the first counterexample intermediate feature and the second counterexample intermediate feature and the current counterexample iteration center respectively , Get the corresponding first counter-example similarity and second counter-example similarity; normalize the first counter-example similarity and the second counter-example similarity to get the corresponding first counter-example intermediate similarity and second counter-example intermediate similarity Calculate the initial counterexample feature vector modulus length according to the first counterexample intermediate similarity and the corresponding first counterexample midfeature, the second counterexample similarity degree and the corresponding second counterexample midfeature; according to the initial counter
  • a pronoun resolution neural network training device 1000 which includes:
  • the training text obtaining module 1002 is used to obtain training text, and the training text has a corresponding standard training text label.
  • the training text processing module 1004 is used to determine the training context word set and the training candidate replacement word set corresponding to the words to be detected in the training text.
  • the training feature representation module 1006 is used to input the training context word set and the training candidate replacement word set into the initial pronoun resolution neural network, and the initial pronoun resolution neural network respectively performs feature extraction on the training context word set and the training candidate replacement word set to obtain the corresponding The first training feature and the second training feature.
  • Training feature iterative processing module 1008 used for the initial pronoun resolution neural network to perform positive iterative processing according to the first training feature and the second training feature to obtain the corresponding positive training feature vector modulus, according to the first training feature and the second training feature Perform counter-example iterative processing to obtain the corresponding counter-example training feature vector modulus length, and calculate the training substitution probability corresponding to each training candidate replacement word in the training candidate replacement word set according to the positive training feature vector modulus and the counter-example training feature vector modulus length.
  • the training loss value calculation module 1010 is used to calculate the training loss value according to the training substitution possibility corresponding to each training candidate replacement word and the corresponding standard training text label.
  • the neural network training module 1012 is used to adjust the model parameters of the initial pronoun resolution neural network according to the training loss value until the convergence condition is met, and the pronoun resolution neural network is obtained.
  • the training text acquisition module is also used to obtain the anti-interference feature set; the training feature iterative processing module is also used to input the anti-interference features in the anti-interference feature set into the initial pronoun resolution neural network, and the initial pronoun resolution neural network
  • the network generates additional training features based on the first training feature, the second training feature, and the anti-interference feature.
  • the initial pronoun resolution neural network performs positive iterative processing based on the first training feature, second training feature, anti-interference feature, and additional training feature to get the corresponding
  • the positive example training feature vector modulus length of, according to the first training feature, the second training feature, the anti-interference feature and the additional training feature, the negative example iterative processing is performed to obtain the corresponding negative training feature vector modulus.
  • the training feature iterative processing module 1008 is further configured to calculate the positive training alternatives corresponding to each training candidate replacement word in the training candidate replacement word set according to the positive training feature vector modulus length and the negative training feature vector modulus length.
  • the training loss value calculation module 1010 is also used to calculate the substitution probability and the corresponding standard training text label according to the positive training substitution probability corresponding to each training candidate substitution word, the negative training substitution possibility and the corresponding standard training text The label calculates the training loss value.
  • Fig. 14 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may specifically be the terminal 110 or the server 120 in FIG. 1.
  • the computer equipment includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • the processor can implement a data processing method or a pronoun resolution neural network training method.
  • a computer program may also be stored in the internal memory.
  • the processor can execute the data processing method or the pronoun elimination neural network training method.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer equipment can be a touch layer covered on the display screen, or a button, trackball or touch pad set on the housing of the computer equipment. It can be an external keyboard, touchpad, or mouse. It should be noted that if the computer device is the server 120 in FIG. 1, the computer device does not include a display screen.
  • FIG. 14 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the data processing device or the pronoun resolution neural network training device provided in the present application can be implemented in the form of a computer program, and the computer program can run on the computer device as shown in FIG. 14.
  • the memory of the computer device can store various program modules that make up the data processing device or the pronoun resolution neural network training device, such as the text acquisition module, feature extraction module, iterative processing module, and target replacement word determination module shown in Figure 11 And the target alternative words are inserted into the module.
  • the computer program formed by each program module causes the processor to execute the steps in the data processing method of each embodiment of the application described in this specification. Also shown in FIG. 13 are the training text acquisition module, the training text processing module, the training feature representation module, the training feature iteration processing module, the training loss value calculation module, and the neural network training module.
  • the computer device shown in FIG. 11 can obtain the text to be detected by executing the text acquisition module in the data processing apparatus shown in FIG. 11 to determine the context word set and candidate replacement words corresponding to the word to be detected in the text to be detected Collection; the feature extraction module executes the input of the context word set and the candidate replacement word set into the pronoun resolution neural network, and the pronoun resolution neural network performs feature extraction on the context word set and candidate replacement word set to obtain the corresponding first feature and second feature ;
  • the iterative processing module executes the pronoun resolution neural network according to the first feature and the second feature to perform positive example iterative processing to obtain the corresponding positive example feature vector modulus, and to perform the negative example iterative processing according to the first feature and second feature to obtain the corresponding negative example feature vector Modulus length, calculated according to the positive example feature vector modulus length and negative example feature vector modulus length to obtain the substitution probability corresponding to each candidate substitution word in the candidate substitution word set; the target substitution word determination module performs the determination according to the substitution probability corresponding to
  • a computer device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the aforementioned data processing method or the pronoun resolution neural network training method .
  • the steps of the data processing method or the pronoun elimination neural network training method may be the steps in the data processing method or the pronoun elimination neural network training method of the foregoing embodiments.
  • a computer-readable storage medium is provided, and a computer program is stored, and when the computer program is executed by a processor, the processor executes the steps of the above-mentioned data processing method or the pronoun resolution neural network training method.
  • the steps of the data processing method or the pronoun elimination neural network training method may be the steps in the data processing method or the pronoun elimination neural network training method of the foregoing embodiments.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

本申请是关于一种数据处理方法和代词消解神经网络训练方法。该方法包括:代词消解神经网络,能够很好地利用待检测文本中的上下文词序列和候选替代词语对应的特征,通过对上下文词序列和候选替代词语对应的特征,进行正例迭代处理和反例迭代处理,得到对应的正例特征向量模长和反例特征向量模长,最后根据正例特征向量模长和反例特征向量模长计算,得到候选替代词语集合中各个候选替代词语对应的替代可能度。

Description

数据处理方法和代词消解神经网络训练方法
本申请要求于2019年4月19日提交的申请号为201910319013.8、发明名称为“数据处理方法和代词消解神经网络训练方法”的中国专利申请的优先权,上述申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,涉及一种数据处理方法、装置、计算机可读存储介质和计算机设备,以及代词消解神经网络训练方法、装置、计算机可读存储介质和计算机设备。
背景技术
随着计算机技术的发展,出现了代词消解技术,代词消解技术是指给定待检测文本内容,通过算法定位找到该代词所指代的候选替代词语。目前的代词消解问题的解决方式是通过神经网络来对代词消解问题进行建模,通过神经网络预测得到该代词所指代的目标候选替代词语。然而,目前的神经网络直接对该代词和对应的候选替代词语进行分类得到目标候选替代词语,导致代词消解的准确率低。
发明内容
基于此,本申请实施例提供了一种能够提高代词消解的准确率的数据处理方法、装置、计算机可读存储介质和计算机设备,以及代词消解神经网络训练方法、装置、计算机可读存储介质和计算机设备。
一方面,提供了一种数据处理方法,包括:
获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;
将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;
代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度;
根据各个候选替代词语对应的替代可能度确定目标替代词语;
根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
另一方面,提供了一种数据处理装置,该装置包括:
待检测文本获取模块,用于获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;
特征提取模块,用于将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;
迭代处理模块,用于代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度;
目标替代词语确定模块,用于根据各个候选替代词语对应的替代可能度确定目标替代 词语;
目标替代词语插入模块,用于根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
另一方面,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,该处理器执行程序时实现以下步骤:
获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;
将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;
代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度;
根据各个候选替代词语对应的替代可能度确定目标替代词语;
根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
另一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;
将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;
代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度;
根据各个候选替代词语对应的替代可能度确定目标替代词语;
根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
另一方面,提供了一种代词消解神经网络训练方法,该方法包括:
获取训练文本,训练文本存在对应的标准训练文本标签;
确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合;
将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征;
初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度;
根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值;
根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
另一方面,提供了一种代词消解神经网络训练装置,该装置包括:
训练文本获取模块,用于获取训练文本,训练文本存在对应的标准训练文本标签;
训练文本处理模块,用于确定训练文本中待检测词语对应的训练上下文词语集合和训 练候选替代词语集合;
训练特征表示模块,用于将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征;
训练特征迭代处理模块,用于初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度;
训练损失值计算模块,用于根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值;
神经网络训练模块,用于根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
另一方面,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,该处理器执行程序时实现以下步骤:
获取训练文本,训练文本存在对应的标准训练文本标签;
确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合;
将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征;
初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度;
根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值;
根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
另一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
获取训练文本,训练文本存在对应的标准训练文本标签;
确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合;
将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征;
初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度;
根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值;
根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
上述数据处理方法、装置、计算机可读存储介质和计算机设备以及代词消解神经网络训练方法、装置、计算机可读存储介质和计算机设备,代词消解神经网络,能够很好地利用待检测文本中的上下文词序列,和,候选替代词语对应的特征,通过对上下文词序列和 候选替代词语对应的特征,进行正例迭代处理和反例迭代处理,得到对应的正例特征向量模长和反例特征向量模长,最后根据正例特征向量模长和反例特征向量模长计算,得到候选替代词语集合中各个候选替代词语对应的替代可能度。由于代词消解神经网络能够很好地融合了上下文词序列和候选替代词语对应的特征,根据上下文词序列和候选替代词语对应的特征,计算候选替代词语对应的替代可能度,该特征除了词序列对应的特征还包括词序列对应的字序列的特征,能够很好地解决数据层面上稀疏问题,从而提高候选替代词语集合中,各个候选替代词语对应的替代可能度的准确性,进而提高代词消解的准确率。
附图说明
图1为一个实施例中数据处理方法或代词消解神经网络训练方法的应用环境图;
图2为一个实施例中数据处理方法的流程示意图;
图3为一个实施例中确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合步骤的流程示意图;
图4为一个实施例中代词消解神经网络特征提取步骤的流程示意图;
图5为一个实施例中正例迭代处理步骤的流程示意图;
图6为一个实施例中正例迭代处理或反例迭代处理的代码实施示意图;
图7为一个实施例中反例迭代处理步骤的流程示意图;
图8为一个实施例中代词消解神经网络训练方法的流程示意图;
图9为一个实施例中代词消解神经网络的网络结构示意图;
图10为一个实施例中代词消解神经网络的验证结果对比示意图;
图11为一个实施例中数据处理装置的结构框图;
图12为一个实施例中待检测文本获取模块的结构框图;
图13为一个实施例中代词消解神经网络训练装置的结构框图;
图14为一个实施例中计算机设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
在语言学中,在某句话语中提及了目标事物后,当再次提及该目标事物时,该话语可以通过多种方式进行上下文照应,以暗指该目标事物。该技术可以称为零指代,本申请实施例可以用于处理零指代问题。在处理自然语言的技术领域在中,技术人员需要令机器明了该目标事物。因此,自然语言处理中,需要将零指代的目标事物补充到其被省略的地方。例如,自然语句“杰克受到琳达的干扰,迟到了。”其中,后半句“迟到了”省略了目标事物“杰克”,经过零指代处理的句子可以是“杰克受到琳达的干扰,杰克迟到了。”
本申请实施例提供一种高效的处理含有零指代问题的自然语言的方法,详情请参见如下实施例。
图1为一个实施例中数据处理方法的应用环境图。参照图1,该数据处理方法应用于数据处理系统。该数据处理系统包括终端110和服务器120。终端110和服务器120通过网络连接。终端110具体可以是台式终端或移动终端,移动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
需要说明的是,本申请实施例提供的数据处理方法,可以由任意具有处理器和存储器的设备来执行。在一种可能的方式中,该设备可以独立完成本申请实施例提供的数据处理方法。在另一种可能的方式中,该设备可以和其它设备配合,共同完成该数据处理方法。例如,存储服务器集群和计算服务器集群和相互配合来完成本申请实施例提供的数据处理 方法。
可选地,终端110可将待检测文本发送至服务器120,服务器120获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合,将上下文词语集合和候选替代词语集合输入至代词消解神经网络中。代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征。代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度,根据各个候选替代词语对应的替代可能度确定目标替代词语,根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。可选地,服务器120将目标文本发送至终端110进行显示。
在另一个实施例中,图1还可为代词消解神经网络训练方法的应用环境图。参照图1,该代词消解神经网络训练方法应用于代词消解神经网络训练系统。该代词消解神经网络训练系统包括终端110和服务器120。终端110和服务器120通过网络连接。终端110具体可以是台式终端或移动终端,移动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
可选地,终端110可将训练文本发送至服务器120,服务器120获取训练文本,训练文本存在对应的标准训练文本标签,确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合,将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合,进行特征提取得到对应的第一训练特征和第二训练特征,初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理,得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理,得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算,得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度,根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签,计算训练损失值,根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。可选地,服务器120将代词消解神经网络存储本地或者发送至终端110,供终端110应用。
如图2所示,在一个实施例中,提供了一种数据处理方法。本实施例以该方法应用于上述图1中的终端110或服务器120来举例说明。参照图2,该数据处理方法包括如下步骤:
步骤202,获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合。
可选地,待检测文本是需要进行检测代词消解的文本,待检测文本为一个或多个。待检测文本可以是实时获取的,也可以是预先存储的。比如,待检测文本可以是在收到进行代词消解的指令时实时通过网络爬虫爬取新闻资讯信息或论坛帖子。待检测文本也可以是预先将待检测文本存储至数据库。其中,代词消解是将待检测文本中待检测词语所指代的替代词检测出来,待检测词语是待检测文本中被省略或者缺失的部分。例如,待检测文本为“小明吃了一个苹果,&很甜”,&代表待检测词语。
在一个实施例中,可以存储预设对象名称,获取包括预设对象名称的文本作为待检测文本。例如,可以预先存储“A公司”、“B产品”以及“C公司”等对象名称,然后通过网络爬虫技术爬取网络中包括“A公司”、“B产品”以及“C公司”中的一个或多个词语的文本作为待检测文本。
在一个实施例中,待检测文本对应的数据源是预先设置的。例如,本申请实施例可以预先设置待检测文本对应的数据来源是D网址、E网址或F网址等。
在一个实施例中,待检测文本可以是对文本信息进一步筛选得到的。例如,对于一篇文章,可以将文章的标题、摘要、第一段、最后一段中的一种或多种作为待检测文本。
可选地,待检测文本中的待检测词语是待检测文本中被省略或者缺失的部分,而上下文词语集合是待检测词语的上文词序列和下文词序列组成的词语集合,待检测词语的上文词序列是以待检测词语所在的位置为中心,由待检测词语所在的位置的前向词语组成的词序列,而下文词序列是以待检测词语所在的位置为中心,由待检测词语所在的位置的后向词语组成的词序列。在一种可能的方式中,本申请实施例先对待检测文本进行分割,得到多个词语,对多个词语进行句法分析,确定待检测词语所在的位置,根据待检测词语所在的位置获取前向词语和后向词语,由获取到的前向词语组成上文词序列,后向词语组成下文词序列,再根据上文词序列和下文词序列组成上下文词语集合。
可选地,侯选替代词语集合是待检测词语的候选替代词语组成的词语集合,候选替代词语是用来替代待检测词语的候选词语,可以是名词性短语。候选替代词语可以根据预设筛选规则从待检测文本对应的词语中筛选得到的,预设筛选规则可自定义,自定义可以是从待检测文本对应的词语中筛选名词性短语作为候选替代词语,还可以是从待检测文本对应的词语中筛选形容词语作为候选替代词语等等。可选地,本申请实施例先对待检测文本进行分割,得到多个词语,对多个词语进行句法分析,根据句法分析结果获取候选替代词语,由获取到的候选替代词语组成候选替代词语集合。其中,句法分析是对待检测文本中的词语语法功能进行分析,得到句法分析结果。比如“我来晚了”,这里“我”是主语,“来”是谓语,“晚了”是补语。
在一个实施例中,获取到的待检测文本为:“小明吃了个小苹果,很甜,他心情超级美妙”,先对待检测文本进行分割,得到多个词语为:“小明”,“吃了”,“个”,“小苹果”,“很甜”,“他”,“心情”,“超级”和“美妙”。对各个词语进行句法分析,确定待检测文本中的待检测词语所在的位置为“很甜”前面省略的一部分内容,再根据待检测词语所在的位置获取上文词序列为:“小明”,“吃了”,“个”,“小苹果”,下文词序列为:“很甜”,“他”,“心情”,“超级”和“美妙”,由上文词序列和下文词序列组成上下文词语集合。而候选替代词的预设筛选规则为从待检测文本对应的词语中筛选名词性短语作为候选替代词语,因此筛选得到的候选替代词为:“小明”和“小苹果”,由候选替代词语组成候选替代词语集合。
步骤204,将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征。
可选地,代词消解是将待检测文本中待检测词语所指代的替代词检测出来,待检测词语是待检测文本中被省略或者缺失的部分。代词消解神经网络用于确定待检测词语对应的候选替代词语的,代词消解神经网络是预先训练得到的,代词消解神经网络可以是胶囊网络(Capsule Network)、向量机(Support Vector Machine,SVM)分类器模型、神经网络(Artificial Neural Network,ANN)分类器模型、逻辑回归算法(logistic Regression,LR)分类器模型等各种进行分类的模型。而代词消解神经网络通过对词序列和词序列对应的字序列对应的特征进行融合,能够得到待检测词语与各个候选替代词语对应的替代可能度,提高代词消解的准确率。将上下文词语集合和候选替代词语集合输入至代词消解神经网络之前,需要通过训练样本对代词消解神经网络进行模型训练,确定模型的参数,使模型能够从输入的待检测文本中确定待检测文本中待检测词语与各个候选替代词语对应的替代可能度。可选地,在进行模型训练时,本申请实施例采用有监督的学习方式。
可选地,特征提取是指将输入的一个或多个特征映射为另外的特征。例如,将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络可通过特征表示子网络对上下文词语集合进行特征提取,得到上下文词语集合对应的第一特征,通过特征表示子网络对候选替代词语集合进行特征提取得到候选替代词语集合对应的第二特征。
可选地,第一特征包括但不限于基于上下文词语集合中的词序列的词向量特征和基于 上下文词语集合中的词序列对应的字序列的字向量特征。第二特征包括但不限于基于候选替代词语集合中的词序列的词向量特征和基于候选替代词语集合中的词序列对应的字序列的字向量特征。基于上下文词语集合或者候选替代词语集合中的词序列,是指对上下文词语或者候选替代词语的词序列进行特征提取得到对应的词向量特征,可以理解的是,以词序列进行特征提取是指以词向量为一个整体进行提取的。
而基于上下文词语集合或者候选替代词语集合中的词序列对应的字序列是指对上下文词语或者候选替代词语的词序列对应的字序列进行特征提取得到对应的词向量特征,可以理解的是,以词序列对应的字序列进行特征提取是指以字向量为一个整体进行提取的。
在一个实施例中,将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络包括前向特征表示子网络、后向特征表示子网络、字向量特征表示子网络,代词消解神经网络通过前向特征表示子网络对上下文词语集合中的词序列进行特征提取,得到对应的第一前向子特征,通过后向特征表示子网络对上下文词语集合中的词序列进行特征提取,得到对应的第一后向子特征,通过字向量特征表示子网络对上下文词语集合中的词序列对应的字序列进行特征提取,得到对应的第一字向量子特征,将第一前向子特征、第一后向子特征和第一字向量子特征组成上下文词语集合对应的第一特征。
同样地,代词消解神经网络通过前向特征表示子网络对候选替代词语集合中的词序列进行特征提取,得到对应的第二前向子特征,通过后向特征表示子网络对候选替代词语集合中的词序列进行特征提取,得到对应的第二后向子特征,通过字向量特征表示子网络对候选替代词语集合中的词序列对应的字序列进行特征提取,得到对应的第二字向量子特征,将第二前向子特征、第二后向子特征和第二字向量子特征组成候选替代词语集合对应的第二特征。
步骤206,代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度。
可选地,正例迭代处理是指对特征进行重复迭代计算得到正例特征向量模长的过程,而正例特征向量模长是指正例特征向量的长度,正例特征向量是正例替代可能度对应的特征向量,正例替代可能度是指待检测文本中待检测词语与各个候选替代词语相互匹配的可能度。正例迭代处理可以是通过自定义动态路由算法迭代计算得到正例特征向量模长,自定义动态路由算法可以用于正例迭代处理和反例迭代处理,由于正例迭代处理和反例迭代处理对应的预设权重系数不同,因此将第一特征和第二特征通过自定义动态路由算法计算可分别得到正例特征向量模长和反例特征向量模长。
而反例迭代处理是指对特征进行重复迭代计算得到反例特征向量模长的过程,而反例特征向量模长是指反例特征向量的长度,反例特征向量是反例替代可能度对应的特征向量,反例替代可能度是指待检测文本中待检测词语与各个候选替代词语相互不匹配的可能度。
可选地,通过自定义动态路由算法计算正例特征向量模长和反例特征向量模长可以是根据第一特征和第二特征计算得到初始迭代中心,将初始迭代中心分别作为正例迭代处理和反例迭代处理的初始迭代中心,并将初始迭代中心作为当前迭代中心,再根据正例迭代处理和反例迭代处理对应的预设权重系数对第一特征和第二特征进行线性变换,得到正例迭代处理和反例迭代处理对应的第一中间特征和第二中间特征。再将正例迭代处理和反例迭代处理对应的第一中间特征和第二中间特征分别与当前迭代中心进行相似度计算,得到正例迭代处理和反例迭代处理对应的第一相似度和第二相似度,对正例迭代处理和反例迭代处理对应的第一相似度和第二相似度进行归一化,得到正例迭代处理和反例迭代处理对应的第一中间相似度和第二中间相似度,根据正例迭代处理和反例迭代处理对应的第一中间相似度和第二中间相似度和对应的第一中间特征和第二中间特征计算得到正例迭代处理和反例迭代处理对应的初始特征向量模长。
然后,再根据正例迭代处理和反例迭代处理对应的第一中间相似度和第二中间相似度和初始迭代中心更新迭代中心,将更新后的迭代中心作为当前迭代中心,返回将正例迭代处理和反例迭代处理对应的第一中间特征和第二中间特征分别与当前迭代中心进行相似度计算的步骤,直至满足收敛条件,得到正例迭代处理对应的正例特征向量模长和反例迭代处理对应的反例特征向量模长。可选地,收敛条件是通过自定义得到的,自定义可以是迭代次数达到预设迭代次数时,则认为满足收敛条件,还可以是初始特征向量模长不再发生变化时,则可认为满足收敛条件等等。
其中,替代可能度是指候选替代词语集合中各个候选替代词语替代待检测词语的可能度,替代可能度可以是百分制概率,或者分数值等等。在将第一特征和第二特征进行正例迭代处理和反例迭代处理得到正例特征向量模长和反例特征向量模长后,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度。替代可能度包括但不限于正例替代可能度和反例替代可能度,所谓正例替代可能度是指候选替代词语集合中各个候选替代词语能够替代待检测词语的替代可能度,反例替代可能度是指候选替代词语集合中各个候选替代词语无法替代待检测词语的替代可能度。
其中,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度可以以下公式计算得到:
Figure PCTCN2020084432-appb-000001
Figure PCTCN2020084432-appb-000002
其中,P pos是指正例替代可能度,P neg是指反例替代可能度,V pos是指正例特征向量模长,V neg是指反例特征向量模长。
步骤208,根据各个候选替代词语对应的替代可能度确定目标替代词语。
其中,目标替代词语是指候选词语集合中能够替代待检测文本中待检测词语的替代词语。可选地,在根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度后,根据预设规则从各个候选替代词语对应的替代可能度确定目标替代词语,其中预设规则可自定义,自定义可以是将替代可能度最高的候选替代词语确定为目标替代词语,或者若各个候选替代词语对应的替代可能度包括正例替代可能度和反例替代可能度,正例替代可能度是指候选替代词语集合中各个候选替代词语能够替代待检测词语的替代可能度,反例替代可能度是指候选替代词语集合中各个候选替代词语无法替代待检测词语的替代可能度,因此可根据正例替代可能度从候选替代词语集合中确定目标替代词语,例如将正例替代可能度最高的候选替代词语确定为目标替代词语等等。
在一个实施例中,各个候选替代词语对应的替代可能度包括正例替代可能度和反例替代可能度,候选替代词语集合包括词语a、词语b和词语c,词语a对应的正例替代可能度为0.7,反例替代可能度为0.3,词语b对应的正例替代可能度为0.8,反例替代可能度为0.2,词语c对应的正例替代可能度为0.4,反例替代可能度为0.6,从各个候选替代词语对应的替代可能度确定目标替代词语的规则为将各个候选替代词语对应的正例替代可能度最高的候选替代词语确定为目标替代词语,则目标替代词语为词语b。
步骤210,根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
其中,插入是指将目标替代词语写入或者放入到待检测文本中待检测词语对应的位置中。可选地,在根据各个候选替代词语对应的替代可能度确定目标替代词语后,确定待检测词语在待检测文本中的位置,将目标替代词语插入到待检测词语所在的位置上,从而得到目标文本。其中,确定待检测词语在待检测文本中的位置可以是先对待检测文本进行分 割,得到多个词语,对多个词语进行句法分析,得到句法分析结果,再根据句法分析结果确定待检测词语在待检测文本中的位置。
在一个实施例中,待检测文本为:“小明吃了个小苹果,很甜”。本申请实施例从候选替代词语集合中确定的目标替代词语为:“小苹果”。首先,确定该待检测文本中待检测词语所在的位置为“很甜”前面,再将目标替代词语插入到待检测词语对应的位置上,最后得到目标文本,目标文本为:“小明吃了个小苹果,小苹果很甜”。
上述数据处理方法,代词消解神经网络能够很好地利用待检测文本中的上下文词序列和候选替代词语对应的特征,通过对上下文词序列和候选替代词语对应的特征进行正例迭代处理和反例迭代处理得到对应的正例特征向量模长和反例特征向量模长,最后根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度。由于代词消解神经网络能够很好地融合了上下文词序列和候选替代词语对应的特征,根据上下文词序列和候选替代词语对应的特征计算候选替代词语对应的替代可能度,该特征除了词序列对应的特征还包括词序列对应的字序列的特征,能够很好地解决数据层面上稀疏问题,从而提高候选替代词语集合中各个候选替代词语对应的替代可能度的准确性,进而提高代词消解的准确率。
在一个实施例中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征之后,还包括:代词消解神经网络对第一特征和第二特征进行维度变换和长度缩放处理,得到对应的第一目标特征和第二目标特征。
其中,由于第一特征和第二特征存在着维度多样性和长度范围多样性的问题,而第一特征和第二特征的维度和长度不是统一的,为了后续正例特征向量模长和反例特征向量模长计算的准确性,因此需要在代词消解神经网络对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征之后,对第一特征和第二特征进行维度变化和长度缩放,使得第一特征和第二特征克服维度多样性和长度多样性的问题,保证后续正例特征向量模长和反例特征向量模长计算的准确性。
其中,第一目标特征是指第一特征进行维度变换和长度缩放处理后得到的第一特征,第二目标特征是指第二特征进行维度变换和长度缩放处理后得到的第二特征。可选地,代词消解神经网络在得到第一特征和第二特征后,可通过代词消解神经网络中的特征转换结合子网络对第一特征和第二特征进行维度变换和长度缩放处理,得到第一目标特征和第二目标特征。具体可以是,首先通过线性变换函数分别对第一特征和第二特征进行维度缩放,得到对应中间特征,再通过长度缩放函数对对应的中间特征进行长度缩放,得到第一特征对应的第一目标特征,和第二特征对应的第二目标特征。其中,通过线性变换函数分别对第一特征和第二特征进行维度缩放,得到对应中间特征可以通过以下公式进行维度缩放:
u i=squash(w i*f i+b i)
其中,w i为第一特征或者第二特征对应的预设权重系数,f i为第一特征或者第二特征,b i为训练得到的偏置参数,squash函数为挤压函数,挤压函数是将一个较大的输入值映射到较小的区间0~1的函数,u i为第一特征或者第二特征对应的中间特征。
通过长度缩放函数对对应的中间特征进行长度缩放,得到第一特征对应的第一目标特征,和第二特征对应的第二目标特征可以通过以下公式进行长度缩放:
Figure PCTCN2020084432-appb-000003
其中,squash函数为挤压函数,u i为第一特征或者第二特征对应的中间特征。
本实施例中,代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,包括:代词消解神经网络根据第一目标特征和第二目标特征进行正例迭代处理得 到对应的正例特征向量模长,根据第一目标特征和第二目标特征进行反例迭代处理得到对应的反例特征向量模长。
其中,在对第一特征和第二特征进行维度变换和长度缩放得到对应的第一目标特征和第二目标特征后,代词消解神经网络根据第一目标特征和第二目标特征进行正例迭代处理得到对应的正例特征向量模长,根据第一目标特征和第二目标特征进行反例迭代处理得到对应的反例特征向量模长。具体过程可参考根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长的步骤,在此不作赘述。
在一个实施例中,如图3所示,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合,包括:
步骤302,对待检测文本进行分割,得到多个词语。
其中,由于待检测文本一般是以句子形式的评论或者文章,因此需要对待检测文本进行分割,得到分割后的多个词语。分割是指将一段文本数据切分为多个词语,分割的方法可以根据实际需要进行设置。例如可以采用基于字符串匹配的分割方法、基于理解的分割方法或者基于统计的分割方法中的一种或多种方法进行分割。还可以采用结巴分割应用工具或者Hanlp分割应用工具等分割工具对待检测文本进行分割。分割后,得到根据待检测文本的词语排列顺序依次排列的词序列。
步骤304,对各个词语进行句法分析,根据句法分析结果确定待检测词语所在的位置。
其中,句法分析是对从分割得到的词语在待检测文本中的语法功能进行分析得到句法分析结果。句法分析结构可以是句法结构,句法结构是指词语与词语之间按照一定的规则组合构成的,比如“我来晚了”,这里“我”是主语,“来”是谓语,“晚了”是补语,对应的句法结构可以为:主语+谓语+宾语,或者“小明吃了个小苹果,很甜”对应的句法结构可以为:名词短语+动词短语+量词+名词短语+待检测词语+形容词短语。
可选地,在对各个词语进行句法分析后,可根据句法分析结果确定待检测词语所在的位置,由于待检测词语是指待检测文本中省略部分或者是缺失部分,因此在对各个词语进行句法分析得到句法分析结果时,可根据句法分析结果检测得到待检测词语所在的位置。例如,待检测文本为:“小明吃了个小苹果,很甜”,对其分割得到的多个词语为:“小明”、“吃了”、“个”、“小苹果”、“很甜”,对分割后的词语进行句法分析,得到句法分析结果为:名词短语+动词短语+量词+名词短语+待检测词语+形容词短语,因此可见待检测文本中待检测词语所在的位置为:“很甜”前面的位置,即“很甜”前面省略或者缺失了部分内容。
步骤306,根据待检测词语所在的位置获取上文词序列和下文词序列,根据上文词序列和下文词序列组成上下文词语集合。
其中,待检测词语的上文词序列是以待检测词语所在的位置为中心,由待检测词语所在的位置的前向词语组成的词序列,而下文词序列是以待检测词语所在的位置为中心,由待检测词语所在的位置的后向词语组成的词序列。可选地,在根据句法分析结果确定待检测词语所在的位置后,以待检测词语所在的位置为中心,获取待检测词语所在的位置的前向词语组成的上文词序列,和获取待检测词语所在的位置的后向词语组成的下文词序列,再根据上文词序列和下文词序列组成上下文词语集合。
例如,待检测文本为:“小明吃了个小苹果,很甜,他心情超级美妙”,先对待检测文本进行分割,得到多个词语为:“小明”,“吃了”,“个”,“小苹果”,“很甜”,“他”,“心情”,“超级”和“美妙”。对各个词语进行句法分析,确定待检测文本中的待检测词语所在的位置为“很甜”前面省略的一部分内容,再根据待检测词语所在的位置获取上文词序列为:“小明”,“吃了”,“个”,“小苹果”,下文词序列为:“很甜”,“他”,“心情”,“超级”和“美妙”,由上文词序列和下文词序列组成上下文词语集合。
步骤308,根据句法分析结果获取候选替代词语,根据候选替代词语组成候选替代词语集合。
其中,候选替代词语是用来替代待检测词语的候选词语,可以是名词性短语等。可选地,在对分割后的词语进行句法分析得到句法分析结果后,根据预设筛选规则从句法分析结果中获取候选替代词语,预设筛选规则可自定义,自定义可以是根据句法结构将名词性短语作为候选替代词语,或者是根据句法结构将形容词语作为候选替代词语等等。可选地,根据预设筛选规则从分割后的多个词语中筛选得到候选替代词语后,根据候选替代词语组成候选替代词语集合。
例如,待检测文本为:“小明吃了个小苹果,很甜,他心情超级美妙”,先对待检测文本进行分割,得到多个词语为:“小明”,“吃了”,“个”,“小苹果”,“很甜”,“他”,“心情”,“超级”和“美妙”。而候选替代词的预设筛选规则为从待检测文本对应的词语中筛选名词性短语作为候选替代词语,因此筛选得到的候选替代词为:“小明”和“小苹果”,由候选替代词语组成候选替代词语集合。
在一个实施例中,如图4所示,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征,包括:
步骤402,代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对上下文词语集合中的词序列进行压缩表示,得到对应的第一前向子特征和第一后向子特征。
其中,前向特征表示子网络和后向特征表示子网络都是用于对词序列进行特征运算,得到对应的前向子特征和后向子特征的。其中,压缩表示是对词序列进行特征运算得到对应的子特征的过程。其中,前向特征表示子网络和后向特征表示子网络可以是2个LSTM神经子网络。
可选地,代词消解神经网络通过前向特征表示子网络对上文词语集合中的上文词序列进行特征提取,得到上文词序列对应的第一前向子特征,同时,通过后向特征表示子网络对上下文词语集合中的下文词序列进行特征提取,得到上文词语序列对应的第一后向子特征。
步骤404,代词消解神经网络对上下文词语集合中的词序列对应的字序列进行压缩表示,得到第一字向量子特征,将第一前向子特征、第一后向子特征和第一字向量子特征组成上下文词语集合对应的第一特征。
其中,代词消解神经网络中还包括用于对词序列对应的字序列进行特征提取的字向量特征表示子网络,代词消解神经网络通过字向量特征表示子网络对上下文词语集合中得到词序列对应的字序列进行特征提取,得到对应的第一字向量子特征。
可选地,代词消解神经网络通过字向量特征表示子网络对上下文词语集合中的上文词序列进行特征提取,得到上文词序列对应的字向量子特征,同时字向量特征表示子网络对上下文词语集合中的下文词序列进行特征提取,得到下文词序列对应的字向量子特征,由上文词序列对应的字向量子特征和下文词序列对应的字向量子特征组成第一字向量子特征。
可选地,将第一前向子特征、第一后向子特征和第一字向量子特征组成上下文词语集合对应的第一特征。其中,可通过以下表达方式表示:
f 0=LSTM forward(zp_pre_words [0:N])
f 1=LSTM reverse(zp_pre_words [0:N])
f 2=BERT(zp_pre_chars [0:M])
其中,f 0为第一前向子特征,f 1为第一后向子特征,f 2为第一字向量子特征,LSTM forward为前向特征表示子网络,LSTM reverse为后向特征表示子网络,BERT为字向量特征表示子网络,zp_pre_word为上下文词语集合中的词序列,zp_pre_chars为上下文词语集合中的词序列对应的字序列,N代表上下文词语集合中的词序列对应的词数量,M代表上下文词语集合中的词序列对应的字序列的数量。
步骤406,代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对候选替代词语集合中的词序列进行压缩表示,得到对应的第二前向子特征和第二后向子特征。
可选地,代词消解神经网络通过前向特征表示子网络对候选替代词语集合中的候选替代词语进行特征提取,得到候选替代词语对应的第二前向子特征,同时,通过后向特征表示子网络对候选替代词语集合中的候选替代词语进行特征提取,得到候选替代词语对应的第二后向子特征。
步骤408,代词消解神经网络对候选替代词语集合中的词序列对应的字序列进行压缩表示,得到第二字向量子特征,将第二前向子特征、第二后向子特征和第二字向量子特征组成候选替代词语集合对应的第二特征。
可选地,代词消解神经网络包括字向量特征表示子网络,字向量特征表示子网络是用于对词序列对应的字序列进行特征提取的子网络,因此代词消解神经网络通过字向量特征表示子网络对候选替代词语集合中的候选替代词语对应的字序列进行特征提取,得到候选替代词语对应的第二字向量子特征。可选地,将第二前向子特征、第二后向子特征和第二字向量子特征组成候选替代词语集合对应的第二特征。
在一个实施例中,如图5所示,代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,包括:
步骤502,根据第一特征和第二特征计算得到正例迭代处理的初始正例迭代中心,将初始正例迭代中心作为当前正例迭代中心。
其中,代词消解神经网络在得到第一特征和第二特征后,需要对第一特征和第二特征进行正例迭代处理,首先,需要获取正例迭代处理的初始正例迭代中心,将初始正例迭代中心作为当前正例迭代中心。这里的当前正例迭代中心是正在进行正例迭代处理的参考中心。其中,初始正例迭代中心具体可以根据第一特征和第二特征计算得到的,计算的方式可自定义,自定义可以是对第一特征和第二特征进行加权求和,将加权求和得到的结果作为初始正例迭代中心,或者还可以是对第一特征和第二特征进行均值计算,将均值计算得到的结果作为初始正例迭代中心等等。
在一个实施例中,如图6所示,图6示出一个实施例中正例迭代处理或者反例迭代处理的代码实施示意图。在正例迭代处理过程中,图6示出一个实施例中正例迭代处理的代码实施示意图,如图6所示,图5中的u i表示第一特征或者第二特征,k j表示当前正例迭代中心。图6示出的初始正例迭代中心是将第一特征和第二特征进行加权求和,再做tanh函数变换计算得到初始正例迭代中心。具体可以如以下公式计算初始正例迭代中心:
Figure PCTCN2020084432-appb-000004
其中,l表示第一特征和第二特征的总数量,u i表示第一特征或者第二特征,k j表示初始正例迭代中心。
步骤504,根据预设正例权重系数对第一特征和第二特征分别进行线性变换,得到对应的第一正例中间特征和第二正例中间特征。
其中,这里的预设正例权重系数是正例迭代处理过程用于对第一特征和第二特征进行线性变换的权重系数,预设正例权重系数是代词消解神经网络训练得到的,即在进行正例迭代处理时,第一特征和第二特征进行线性变化的权重系数都为预设正例权重系数。
可选地,根据预设正例权重系数分别对第一特征和第二特征进行线性变换,得到对应的第一正例中间特征和第二正例中间特征。线性变换具体可以是将预设正例权重系数和第一特征进行乘积计算得到第一正例中间特征,和将预设正例权重系数和第二特征进行乘积计算得到第二正例中间特征。
在一个实施例中,如图6所示,图6中的u i表示第一特征或者第二特征,
Figure PCTCN2020084432-appb-000005
表示u i对 应的正例中间特征,若u i为第一特征,则
Figure PCTCN2020084432-appb-000006
为第一正例中间特征;若u i为第二特征,则
Figure PCTCN2020084432-appb-000007
为第二正例中间特征,
Figure PCTCN2020084432-appb-000008
为正例迭代处理时的预设正例权重系数。具体可以如以下公式对第一特征和第二特征分别进行线性变换:
Figure PCTCN2020084432-appb-000009
步骤506,将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算,得到对应的第一正例相似度和第二正例相似度。
其中,相似度是综合评定两个事物之间相近程度的一种度量,这里的相似度是评定正例中间特征和当前正例迭代中心之间相近程度的,相似度越高,说明正例中间特征和当前正例迭代中心越相近,反之,则说明正例中间特征和当前正例迭代中心不相近。可选地,在得到第一正例中间特征和第二正例中间特征后,将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算,得到对应的第一正例相似度和第二正例相似度。其中,相似度计算方式可自定义,自定义可以是但不限于欧式距离、余弦相似度等等。
在一个实施例中,如图6所示,图6中的,
Figure PCTCN2020084432-appb-000010
表示u i对应的正例中间特征,k j表示当前正例迭代中心,
Figure PCTCN2020084432-appb-000011
代表正例相似度,若
Figure PCTCN2020084432-appb-000012
表示第一正例中间特征,则
Figure PCTCN2020084432-appb-000013
表示第一正例相似度,若
Figure PCTCN2020084432-appb-000014
表示第二正例中间特征,则
Figure PCTCN2020084432-appb-000015
表示第二正例相似度。具体可以如以下公式计算正例相似度:
Figure PCTCN2020084432-appb-000016
步骤508,对第一正例相似度和第二正例相似度进行归一化操作,得到对应的第一正例中间相似度和第二正例中间相似度。
其中,归一化操作是一种简化计算的方式,即将有量纲的表达式,经过变换,化为无量纲的表达式,成为标量。例如,将正例相似度变为(0,1)之间的小数,或者将正例相似度变为0或1等等,将有量纲的表达式转换为无量纲的表达式。具体可以使用但不限于softmax函数(归一化指数函数)对第一正例相似度和第二正例相似度进行归一化操作。
在一个实施例中,如图6所示,图6中的c j为归一化操作后得到的正例中间相似度,
Figure PCTCN2020084432-appb-000017
为第一正例相似度和第二正例相似度,若
Figure PCTCN2020084432-appb-000018
为第一正例相似度,则c j为第一正例中间相似度,若
Figure PCTCN2020084432-appb-000019
为第二正例相似度,则c j为第二正例中间相似度。具体可以如以下公式计算正例中间相似度:
Figure PCTCN2020084432-appb-000020
步骤510,根据第一正例中间相似度和对应的第一正例中间特征、第二正例相似度和对应的第二正例中间特征计算得到初始正例特征向量模长。
其中,初始正例特征向量模长是指第一次进行正例迭代得到的正例特征向量模长,具体可以根据第一正例中间相似度和对应的第一正例中间特征、第二正例相似度和对应的第二正例中间特征计算得到初始正例特征向量模长。计算方式可自定义,可以是对第一正例中间相似度和对应的第一正例中间特征、第二正例相似度和对应的第二正例中间特征进行求和,将求和结果作为初始正例特征向量模长,或者是对第一正例中间相似度和对应的第一正例中间特征、第二正例相似度和对应的第二正例中间特征进行均值计算,将均值计算结果作为初始正例特征向量模长等等。
在一个实施例中,如图6所示,图6中的v j表示正例特征向量模长,c ij表示正例中间相似度,
Figure PCTCN2020084432-appb-000021
表示正例中间特征,若c ij表示第一正例中间相似度,则
Figure PCTCN2020084432-appb-000022
为对应的第一正例 中间特征,若c ij表示第二正例中间相似度,则
Figure PCTCN2020084432-appb-000023
为对应的第二正例中间特征。具体可以如以下公式计算正例特征向量模长:
Figure PCTCN2020084432-appb-000024
其中,squash函数为挤压函数,挤压函数是将一个较大的输入值映射到较小的区间0~1的函数,l为第一特征和第二特征的总数量。
步骤512,根据初始正例特征向量模长和初始正例迭代中心计算得到正例更新迭代中心,将正例更新迭代中心作为当前正例迭代中心,返回将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到正例特征向量模长。
其中,由于预先设置了正例迭代处理的收敛条件,因此在计算得到初始正例特征向量模长无法为最终的正例特征向量模长,需不断进行正例迭代处理,直至满足收敛条件,方可输出得到正例特征向量模长。其中,收敛条件可自定义,自定义可以是迭代次数或者是正例特征向量模长满足预设模长值,则可认为满足收敛条件。
可选地,可根据初始正例特征向量模长和初始正例迭代中心计算得到正例更新迭代中心,将正例更新迭代中心作为当前正例迭代中心,返回将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算的步骤不断进行正例迭代处理,直至满足收敛条件,得到正例特征向量模长。其中,根据初始正例特征向量模长和初始正例迭代中心计算得到正例更新迭代中心的计算方式可自定义,自定义可以是对初始正例特征向量模长和初始正例迭代中心进行均值计算,将均值计算结果作为正例更新迭代中心,或者还可以是对初始正例特征向量模长和初始正例迭代中心进行加权求和,将加权求和结果作为正例更新迭代中心等等。
在一个实施例中,如图6所示,图6中的第14步为计算正例更新迭代中心,正例更新迭代中心可以是初始正例特征向量模长和初始正例迭代中心的均值计算结果,可以如以下公式计算得到正例更新迭代中心:
Figure PCTCN2020084432-appb-000025
可选地,当正例迭代处理满足收敛条件时,则可输出得到正例特征向量模长。如图6中的第16步,第16步根据最后一次满足收敛条件得到的正例特征向量模长得到最终的正例特征向量模长,具体可以如以下公式计算得到:
||v j||=||w j*v j||
其中,w j为正例迭代处理对应的预设权中系数,等式左边的v j为最终的正例特征向量模长,等式右边的v j为最后一次满足收敛条件得到的正例特征向量模长。
在一个实施例中,如图7所示,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,包括:
步骤602,根据第一特征和第二特征计算得到反例迭代处理的初始反例迭代中心,将初始反例迭代中心作为当前反例迭代中心。
其中,代词消解神经网络在得到第一特征和第二特征后,需要对第一特征和第二特征进行反例迭代处理,首先,需要获取反例迭代处理的初始反例迭代中心,将初始反例迭代中心作为当前反例迭代中心。这里的当前反例迭代中心是正在进行反例迭代处理的参考中心。其中,初始反例迭代中心具体可以根据第一特征和第二特征计算得到的,计算的方式可自定义,自定义可以是对第一特征和第二特征进行加权求和,将加权求和得到的结果作为初始反例迭代中心,或者还可以是对第一特征和第二特征进行均值计算,将均值计算得到的结果作为初始反例迭代中心等等。
在一个实施例中,如图6所示,这里的图6示出一个实施例中反例迭代处理的代码实施示意图。在反例迭代处理过程中,图6示出一个实施例中反例迭代处理的代码实施示意图,如图6所示,图5中的u i表示第一特征或者第二特征,k j表示当前反例迭代中心。图6示出的初始反例迭代中心是将第一特征和第二特征进行加权求和,再做tanh函数变换计算得到初始反例迭代中心。具体可以如以下公式计算初始反例迭代中心:
Figure PCTCN2020084432-appb-000026
其中,l表示第一特征和第二特征的总数量,u i表示第一特征或者第二特征,k j表示初始反例迭代中心。
步骤604,根据预设反例权重系数对第一特征和第二特征分别进行线性变换,得到对应的第一反例中间特征和第二反例中间特征。
其中,这里的预设反例权重系数是反例迭代处理过程用于对第一特征和第二特征进行线性变换的权重系数,预设反例权重系数是代词消解神经网络训练得到的,即在进行反例迭代处理时,第一特征和第二特征进行线性变化的权重系数都为预设反例权重系数。
可选地,根据预设反例权重系数分别对第一特征和第二特征进行线性变换,得到对应的第一反例中间特征和第二反例中间特征。线性变换具体可以是将预设反例权重系数和第一特征进行乘积计算得到第一反例中间特征,和将预设反例权重系数和第二特征进行乘积计算得到第二反例中间特征。
在一个实施例中,如图6所示,图6中的u i表示第一特征或者第二特征,
Figure PCTCN2020084432-appb-000027
表示u i对应的反例中间特征,若u i为第一特征,则
Figure PCTCN2020084432-appb-000028
为第一反例中间特征;若u i为第二特征,则
Figure PCTCN2020084432-appb-000029
为第二反例中间特征,
Figure PCTCN2020084432-appb-000030
为反例迭代处理时的预设反例权重系数。具体可以如以下公式对第一特征和第二特征分别进行线性变换:
Figure PCTCN2020084432-appb-000031
步骤606,将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算,得到对应的第一反例相似度和第二反例相似度。
其中,相似度是综合评定两个事物之间相近程度的一种度量,这里的相似度是评定反例中间特征和当前反例迭代中心之间相近程度的,相似度越高,说明反例中间特征和当前反例迭代中心越相近,反之,则说明反例中间特征和当前反例迭代中心不相近。可选地,在得到第一反例中间特征和第二反例中间特征后,将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算,得到对应的第一反例相似度和第二反例相似度。其中,相似度计算方式可自定义,自定义可以是但不限于欧式距离、余弦相似度等等。
在一个实施例中,如图6所示,图6中的,
Figure PCTCN2020084432-appb-000032
表示u i对应的反例中间特征,k j表示当前反例迭代中心,
Figure PCTCN2020084432-appb-000033
代表反例相似度,若
Figure PCTCN2020084432-appb-000034
表示第一反例中间特征,则
Figure PCTCN2020084432-appb-000035
表示第一反例相似度,若
Figure PCTCN2020084432-appb-000036
表示第二反例中间特征,则
Figure PCTCN2020084432-appb-000037
表示第二反例相似度。具体可以如以下公式计算反例相似度:
Figure PCTCN2020084432-appb-000038
步骤608,对第一反例相似度和第二反例相似度进行归一化操作,得到对应的第一反例中间相似度和第二反例中间相似度。
其中,归一化操作是一种简化计算的方式,即将有量纲的表达式,经过变换,化为无量纲的表达式,成为标量。例如,将反例相似度变为(0,1)之间的小数,或者将反例相似度变为0或1等等,将有量纲的表达式转换为无量纲的表达式。具体可以使用但不限于 softmax函数(归一化指数函数)对第一反例相似度和第二反例相似度进行归一化操作。
在一个实施例中,如图6所示,图6中的c j为归一化操作后得到的反例中间相似度,
Figure PCTCN2020084432-appb-000039
为第一反例相似度和第二反例相似度,若
Figure PCTCN2020084432-appb-000040
为第一反例相似度,则c j为第一反例中间相似度,若
Figure PCTCN2020084432-appb-000041
为第二反例相似度,则c j为第二反例中间相似度。具体可以如以下公式计算反例中间相似度:
Figure PCTCN2020084432-appb-000042
步骤610,根据第一反例中间相似度和对应的第一反例中间特征、第二反例相似度和对应的第二反例中间特征计算得到初始反例特征向量模长。
其中,初始反例特征向量模长是指第一次进行反例迭代得到的反例特征向量模长,具体可以根据第一反例中间相似度和对应的第一反例中间特征、第二反例相似度和对应的第二反例中间特征计算得到初始反例特征向量模长。计算方式可自定义,可以是对第一反例中间相似度和对应的第一反例中间特征、第二反例相似度和对应的第二反例中间特征进行求和,将求和结果作为初始反例特征向量模长,或者是对第一反例中间相似度和对应的第一反例中间特征、第二反例相似度和对应的第二反例中间特征进行均值计算,将均值计算结果作为初始反例特征向量模长等等。
在一个实施例中,如图6所示,图6中的v j表示反例特征向量模长,c ij表示反例中间相似度,
Figure PCTCN2020084432-appb-000043
表示反例中间特征,若c ij表示第一反例中间相似度,则
Figure PCTCN2020084432-appb-000044
为对应的第一反例中间特征,若c ij表示第二反例中间相似度,则
Figure PCTCN2020084432-appb-000045
为对应的第二反例中间特征。具体可以如以下公式计算反例特征向量模长:
Figure PCTCN2020084432-appb-000046
其中,squash函数为挤压函数,挤压函数是将一个较大的输入值映射到较小的区间0~1的函数,l为第一特征和第二特征的总数量。
步骤612,根据初始反例特征向量模长和初始反例迭代中心计算得到反例更新迭代中心,将反例更新迭代中心作为当前反例迭代中心,返回将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到反例特征向量模长。
其中,由于预先设置了反例迭代处理的收敛条件,因此在计算得到初始反例特征向量模长无法为最终的反例特征向量模长,需不断进行反例迭代处理,直至满足收敛条件,方可输出得到反例特征向量模长。其中,收敛条件可自定义,自定义可以是迭代次数或者是反例特征向量模长满足预设模长值,则可认为满足收敛条件。
可选地,可根据初始反例特征向量模长和初始反例迭代中心计算得到反例更新迭代中心,将反例更新迭代中心作为当前反例迭代中心,返回将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算的步骤不断进行反例迭代处理,直至满足收敛条件,得到反例特征向量模长。其中,根据初始反例特征向量模长和初始反例迭代中心计算得到反例更新迭代中心的计算方式可自定义,自定义可以是对初始反例特征向量模长和初始反例迭代中心进行均值计算,将均值计算结果作为反例更新迭代中心,或者还可以是对初始反例特征向量模长和初始反例迭代中心进行加权求和,将加权求和结果作为反例更新迭代中心等等。
在一个实施例中,如图6所示,图6中的第14步为计算反例更新迭代中心,反例更新迭代中心可以是初始反例特征向量模长和初始反例迭代中心的均值计算结果,可以如以下公式计算得到反例更新迭代中心:
Figure PCTCN2020084432-appb-000047
可选地,当反例迭代处理满足收敛条件时,则可输出得到反例特征向量模长。如图6中的第16步,第16步根据最后一次满足收敛条件得到的反例特征向量模长得到最终的反例特征向量模长,具体可以如以下公式计算得到:
||v j||=||w j*v j||
其中,wj为反例迭代处理对应的预设权中系数,等式左边的vj为最终的反例特征向量模长,等式右边的vj为最后一次满足收敛条件得到的反例特征向量模长。
在一个实施例中,如图8所示,提供了一种代词消解神经网络训练方法。本实施例主要以该方法应用于上述图1中的终端110或服务器120来举例说明。参照图8,该代词消解神经网络训练方法具体包括如下步骤:
步骤702,获取训练文本,训练文本存在对应的标准训练文本标签。
其中,训练文本是需要对代词消解神经网络进行训练的输入数据,训练文本可以是为一个或多个。训练文本可以是实时获取的,也可以是预先存储的。训练文本存在对应的标准训练文本标签,由于训练文本中包括待检测词语,因此训练文本存在对应的标准训练文本标签为训练文本中的待检测词语的实际指代词语。
步骤704,确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合。
其中,在将训练文本输入至代词消解神经网络进行训练之前,需要对训练文本进行预处理,具体可以是确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合。可选地,先对训练文本进行分割,得到多个词语,对多个词语进行句法分析,确定待检测词语所在的位置,根据待检测词语所在的位置获取训练前向词语和训练后向词语,由获取到的训练前向词语组成训练上文词序列,后向词语组成训练下文词序列,再根据训练上文词序列和训练下文词序列组成训练上下文词语集合。
可选地,根据句法分析结果获取训练候选替代词语,由获取到的训练候选替代词语组成训练候选替代词语集合。
步骤706,将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征。
其中,初始代词消极神经网络是未进行训练的代词消解神经网络,初始代词消解神经网络可以是胶囊网络(Capsule Network)、向量机(Support Vector Machine,SVM)分类器模型、神经网络(Artificial Neural Network,ANN)分类器模型、逻辑回归算法(logistic Regression,LR)分类器模型等各种进行分类的模型。
可选地,将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络可通过特征表示子网络对训练上下文词语集合进行特征提取,得到训练上下文词语集合对应的第一训练特征,通过特征表示子网络对训练候选替代词语集合进行特征提取得到训练候选替代词语集合对应的第二训练特征。
步骤708,初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度。
其中,正例迭代处理是指对特征进行重复迭代计算得到正例训练特征向量模长的过程,而正例训练特征向量模长是指正例训练特征向量的长度,正例训练特征向量是正例替代可能度对应的特征向量,正例替代可能度是指训练文本中待检测词语与各个候选替代词语相互匹配的可能度。正例迭代处理可以是通过自定义动态路由算法迭代计算得到正例训练特 征向量模长,自定义动态路由算法可以用于正例迭代处理和反例迭代处理,由于正例迭代处理和反例迭代处理对应的预设训练权重系数不同,因此将第一训练特征和第二训练特征通过自定义动态路由算法计算可分别得到正例训练特征向量模长和反例训练特征向量模长。
而反例迭代处理是指对特征进行重复迭代计算得到反例训练特征向量模长的过程,反例训练特征向量模长是指反例训练特征向量的长度,反例训练特征向量是反例替代可能度对应的特征向量,反例替代可能度是指训练文本中待检测词语与各个候选替代词语相互不匹配的可能度。
可选地,通过自定义动态路由算法计算正例训练特征向量模长和反例训练特征向量模长可以是根据第一训练特征和第二训练特征计算得到初始迭代中心,将初始迭代中心分别作为正例迭代处理和反例迭代处理的初始迭代中心,并将初始迭代中心作为当前迭代中心,再根据正例迭代处理和反例迭代处理对应的预设训练权重系数对第一训练特征和第二训练特征进行线性变换,得到正例迭代处理和反例迭代处理对应的第一中间训练特征和第二中间训练特征。再将正例迭代处理和反例迭代处理对应的第一中间训练特征和第二中间训练特征分别与当前迭代中心进行相似度计算,得到正例迭代处理和反例迭代处理对应的第一训练相似度和第二训练相似度,紧接着对正例迭代处理和反例迭代处理对应的第一训练相似度和第二训练相似度进行归一化,得到正例迭代处理和反例迭代处理对应的第一中间训练相似度和第二中间训练相似度,根据正例迭代处理和反例迭代处理对应的第一中间训练相似度和第二中间训练相似度和对应的第一中间训练特征和第二中间训练特征计算得到正例迭代处理和反例迭代处理对应的初始训练特征向量模长。
然后,再根据正例迭代处理和反例迭代处理对应的第一中间训练相似度和第二中间训练相似度和初始迭代中心更新迭代中心,将更新后的迭代中心作为当前迭代中心,返回将正例迭代处理和反例迭代处理对应的第一中间训练特征和第二中间训练特征分别与当前迭代中心进行相似度计算的步骤,直至满足收敛条件,得到正例迭代处理对应的正例训练特征向量模长和反例迭代处理对应的反例训练特征向量模长。
其中,在将第一训练特征和第二训练特征进行正例迭代处理和反例迭代处理得到正例训练特征向量模长和反例训练特征向量模长后,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度。训练替代可能度包括但不限于正例训练替代可能度和反例训练替代可能度,所谓正例训练替代可能度是指训练候选替代词语集合中各个训练候选替代词语能够替代待检测词语的替代可能度,反例替代可能度是指训练候选替代词语集合中各个训练候选替代词语无法替代待检测词语的替代可能度。
其中,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个候选替代词语对应的训练替代可能度可以以下公式计算得到:
Figure PCTCN2020084432-appb-000048
Figure PCTCN2020084432-appb-000049
其中,P pos是指正例训练替代可能度,P neg是指反例训练替代可能度,V pos是指正例训练特征向量模长,V neg是指反例训练特征向量模长。
步骤710,根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值。
其中,训练损失值是用来调整初始代词消解神经网络模型参数的,具体可以根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值。其 中,计算训练损失值的计算方式自定义,自定义可以是根据训练替代可能度中的正例训练替代可能度和对应的标准训练文本标签、反例训练替代可能度和对应的标准训练文本标签计算得到的。例如,计算训练损失值可以以下公式计算得到:
Figure PCTCN2020084432-appb-000050
Figure PCTCN2020084432-appb-000051
Figure PCTCN2020084432-appb-000052
其中,P pos是指正例训练替代可能度,P neg是指反例训练替代可能度,V pos是指正例训练特征向量模长,V neg是指反例训练特征向量模长,J(θ)为训练损失值,y i为训练样本的标准训练文本标签。
步骤712,根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
其中,在计算得到训练损失值后,根据训练损失值对初始代词消解神经网络的模型参数不断进行调整,直至满足收敛条件,得到代词消解神经网络。其中,收敛条件可自定义,自定义可以是训练损失值不再发生变化,或者调整次数达到预设次数等等,则可认为初始代词消解神经网络满足收敛条件,从而得到代词消解神经网络。
上述代词消解神经网络训练方法,代词消解神经网络在训练过程时,很好地利用训练文本中的上下文词语集合和候选替代词语集合对应的特征,很好地将上下文词语集合和候选替代词语集合对应的特征进行融合,进而提高代词消解神经网络的输出准确度。
在一个实施例中,代词消解神经网络训练方法还包括:获取抗干扰特征集合;将抗干扰特征集合中的抗干扰特征输入至初始代词消解神经网络中,初始代词消解神经网络根据第一训练特征、第二训练特征和抗干扰特征生成额外训练特征。
其中,抗干扰特征集合是由初始代词消解神经网络训练时用于防止其他特征干扰的特征组成的,抗干扰特征集合输入至初始代词消解神经网络训练,可提高代词消解神经网络的输出准确性。其中,抗干扰特征集合可以是由一组人工设计的综合性特征组成的。可选地,获取到抗干扰特征集合后,将抗干扰特征集合输入至初始代词消解神经网络中,初始代词消解神经网络根据第一训练特征、第二训练特征和抗干扰特征生成额外训练特征。其中,初始代词消解神经网络对第一训练特征、第二训练特征和抗干扰特征生成额外训练特征具体过程可以参考图2中步骤204的描述,在此不做赘述。
本实施例中,初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,包括:初始代词消解神经网络根据第一训练特征、第二训练特征、抗干扰特征和额外训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征、第二训练特征、抗干扰特征和额外训练特征进行反例迭代处理得到对应的反例训练特征向量模长。
可选地,初始代词消解神经网络对第一训练特征、第二训练特征和抗干扰特征生成额外训练特征后,初始代词消解神经网络根据第一训练特征、第二训练特征、抗干扰特征和额外训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征、第二训练特征、抗干扰特征和额外训练特征进行反例迭代处理得到对应的反例训练特征向量模长。具体可参考图2中步骤206的描述,在此不做赘述。
在一个实施例中,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度,包括:根据正例训练 特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的正例训练替代可能度和反例训练替代可能度。
其中,正例训练替代可能度是指训练文本中待检测词语与各个训练候选替代词语相互匹配的可能度,而反例训练可能度是指训练文本中待检测词语与各个训练候选替代词语不相互匹配的可能度。可选地,可根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的正例训练替代可能度和反例训练替代可能度。计算方式可以自定义,自定义可以是自定义动态路由算法,自定义动态路由算法可参考图2中的步骤206的描述,再次不作赘述。
在一个实施例中,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的正例训练替代可能度和反例训练替代可能度具体可以如以下公式计算得到:
Figure PCTCN2020084432-appb-000053
Figure PCTCN2020084432-appb-000054
其中,P pos是指正例训练替代可能度,P neg是指反例训练替代可能度,V pos是指正例训练特征向量模长,V neg是指反例训练特征向量模长。
本实施例中,根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值,包括:根据各个训练候选替代词语对应的正例训练替代可能度和对应的标准训练文本标签,反例训练替代可能度和对应的标准训练文本标签计算得到训练损失值。
可选地,可根据各个训练候选替代词语对应的正例训练替代可能度和对应的标准训练文本标签,反例训练替代可能度和对应的标准训练文本标签计算得到训练损失值。其中,计算训练损失值可以以以下公式计算得到:
Figure PCTCN2020084432-appb-000055
Figure PCTCN2020084432-appb-000056
Figure PCTCN2020084432-appb-000057
其中,P pos是指正例训练替代可能度,P neg是指反例训练替代可能度,V pos是指正例训练特征向量模长,V neg是指反例训练特征向量模长,J(θ)为训练损失值,y i为训练样本的标准训练文本标签。
在一个具体的实施例中,提供了一种数据处理方法以及代词消解神经网络训练方法,具体包括以下步骤:
1、获取训练文本,训练文本存在对应的标准训练文本标签。
2、确定训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合。
3、将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征。
4、初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对 应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度。
5、根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值。
6、根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
7、获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合。
7-1、对待检测文本进行分割,得到多个词语。
7-2、对各个词语进行句法分析,根据句法分析结果确定待检测词语所在的位置。
7-3、根据待检测词语所在的位置获取上文词序列和下文词序列,根据上文词序列和下文词序列组成上下文词语集合。
7-4、根据句法分析结果获取候选替代词语,根据候选替代词语组成候选替代词语集合。
8、将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征。
8-1、代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对上下文词语集合中的词序列进行压缩表示,得到对应的第一前向子特征和第一后向子特征。
8-2、代词消解神经网络对上下文词语集合中的词序列对应的字序列进行压缩表示,得到第一字向量子特征,将第一前向子特征、第一后向子特征和第一字向量子特征组成上下文词语集合对应的第一特征。
8-3、代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对候选替代词语集合中的词序列进行压缩表示,得到对应的第二前向子特征和第二后向子特征。
8-4、代词消解神经网络对候选替代词语集合中的词序列对应的字序列进行压缩表示,得到第二字向量子特征,将第二前向子特征、第二后向子特征和第二字向量子特征组成候选替代词语集合对应的第二特征。
9、代词消解神经网络对第一特征和第二特征进行维度变换和长度缩放处理,得到对应的第一目标特征和第二目标特征。
10、代词消解神经网络根据第一目标特征和第二目标特征进行正例迭代处理得到对应的正例特征向量模长,根据第一目标特征和第二目标特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度。
10-1、根据第一特征和第二特征计算得到正例迭代处理的初始正例迭代中心,将初始正例迭代中心作为当前正例迭代中心。
10-2、根据预设正例权重系数对第一特征和第二特征分别进行线性变换,得到对应的第一正例中间特征和第二正例中间特征。
10-3、将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算,得到对应的第一正例相似度和第二正例相似度。
10-4、对第一正例相似度和第二正例相似度进行归一化操作,得到对应的第一正例中间相似度和第二正例中间相似度。
10-5、根据第一正例中间相似度和对应的第一正例中间特征、第二正例相似度和对应的第二正例中间特征计算得到初始正例特征向量模长。
10-6、根据初始正例特征向量模长和初始正例迭代中心计算得到正例更新迭代中心,将正例更新迭代中心作为当前正例迭代中心,返回将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到正例特征向量模长。
10-7、根据第一特征和第二特征计算得到反例迭代处理的初始反例迭代中心,将初始反例迭代中心作为当前反例迭代中心。
10-8、根据预设反例权重系数对第一特征和第二特征分别进行线性变换,得到对应的第一反例中间特征和第二反例中间特征。
10-9、将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算,得到对应的第一反例相似度和第二反例相似度。
10-10、对第一反例相似度和第二反例相似度进行归一化操作,得到对应的第一反例中间相似度和第二反例中间相似度。
10-11、根据第一反例中间相似度和对应的第一反例中间特征、第二反例相似度和对应的第二反例中间特征计算得到初始反例特征向量模长。
10-12、根据初始反例特征向量模长和初始反例迭代中心计算得到反例更新迭代中心,将反例更新迭代中心作为当前反例迭代中心,返回将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到反例特征向量模长。
11、根据各个候选替代词语对应的替代可能度确定目标替代词语。
12、根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
在一个中文零指代消解的应用场景中,如图9所示,图9示出一个实施例中代词消解神经网络的网络结构示意图,代词消解神经网络包括:特征表示层(Feature Representation),特征转换和结合层(Feature Transformation&Combination),特征聚合层(Feature Clustering),分类层(Classification)。
可选地,获取待检测文本,对待检测文本进行预处理,得到待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合,将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络的特征表示层是对上下文词语集合和候选替代词语集合进行抽取特征,得到对应的第一特征和第二特征。具体特征表示层使用双向的长短时神经子网络对上下文词语集合和候选替代词语集合中的词序列进行表示,使用字向量特征表示子网络BERT对上下文词语集合和候选替代词语集合中的词序列对应的字序列进行表示,表示完毕后,上下文词语集合和候选替代词语集合可以分别得到3组特征,即第一特征包括2组上下文词语集合中词序列对应的特征f 0、f 1,和1组词序列对应的字序列对应的特征f 2,第二特征包括2组候选替代词语集合中词序列对应的特征f 0、f 1,和1组词序列对应的字序列对应的特征f 2
紧接着,特征转换和结合层是对特征表示层提取到的特征进行维度统一和长度缩放。由于特征表示层输出的向量存在着维度多样性和长度范围多样性的问题,需对各个特征进行维度变换和长度缩放。具体可使用线性变换函数来对特征进行维度缩放,使用长度缩放函数(squash)来对特征进行长度缩放,最后得到对应的第一目标特征和第二目标特征。
然后,特征聚合层对各种特征进行迭代式的权重聚合。可选地,代词消解神经网络通过特征聚合层对根据第一目标特征和第二目标特征进行正例迭代处理得到对应的正例特征向量模长,根据第一目标特征和第二目标特征进行反例迭代处理得到对应的反例特征向量模长。即,将特征转换和结合层输出的多个特征输入至特征聚合层,特征聚合层通过计算得到两个向量模长为正例特征向量模长和反例特征向量模长。
然后,代词消解神经网络将特征聚合层输出的正例特征向量模长和反例特征向量模长输入至分类层,分类层根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代概率。紧接着,根据候选替代词语集合中各个候选替代词语对应的替代概率确定目标替代词语,例如将替代概率最大的候选替代词语作为目标替代词语,最后将目标替代词语插入到待检测文本中待检测词语对应的位置,得到目标文本。
例如,待检测文本为:“小明今天吃了个苹果,很甜”,该待检测文本的待检测词语的位置在“很甜”的前面,即“很甜”前面省略了部分内容,候选替代词语为:“小明”和“苹果”,通过代词消解神经网络计算得到“小明”与待检测词语的替代概率为0.4,而“苹果”与待检测词语的替代概率为0.9,因此确定目标替代词语为:“苹果”。将“苹果”插入到待检测文本中的待检测词语对应的位置,得到目标文本为:“小明今天吃了个苹果,苹果很甜”。
其中,代词消解神经网络的训练方法与应用方法类似,输入数据为带有标准训练文本标签的训练文本。但是在训练过程中,为了提高代词消解神经网络的应用时的准确性,需要加入抗干扰特征集合,将抗干扰特征集合中的抗干扰特征输入至代词消解神经网络中进行训练。代词消解神经网络的训练损失值是通过训练过程得到的各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算得到的。最后,根据训练损失值对代词消解神经网络进行训练,不断调整模型参数,直至满足收敛条件得到最终的代词消解神经网络。
在一个实施例中,如图10所示,图10示出一个实施例中代词消解神经网络的验证结果对比示意图,如图10所示,图10中ZP-CapsNet所在的最后一行是本申请的代词消解神经网络在六个测试数据集上对应的验证结果,其他行(从第一行zhao and Ng至倒数第二行Yin et al)为比对代词消解神经网络在六个测试数据集上对应的验证结果。其中,六个测试数据集包括广播新闻BN(Broadcast News)、通讯社NW(Newswires)、广播对话BC(Broadcast Conversations)、电话对话TC(Telephone Conversation)、网络博客NW(Web Blogs)和杂志MZ(Magazines),验证结果为根据准确率和召回率计算得到的中间值。而图10中各个代词消解神经网络的Overall是根据六个测试数据集的验证结果计算得到的综合值。可以从图10中的Overall得知,本申请的代词消解神经网络在实际代词消解应用中的综合效果比其他的比对代词消解神经网络的综合效果更好。
应该理解的是,虽然上述流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图11所示,一种数据处理装置800,该装置包括:
待检测文本获取模块802,用于获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合。
特征提取模块804,用于将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征。
迭代处理模块806,用于代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度。
目标替代词语确定模块808,用于根据各个候选替代词语对应的替代可能度确定目标替代词语。
目标替代词语插入模块810,用于根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
在一个实施例中,如图12所示,待检测文本获取模块802包括:
待检测文本分割单元802a,用于对待检测文本进行分割,得到多个词语。
句法分析单元802b,用于对各个词语进行句法分析,根据句法分析结果确定待检测词语所在的位置。
词序列获取单元802c,用于根据待检测词语所在的位置获取上文词序列和下文词序列,根据上文词序列和下文词序列组成上下文词语集合。
候选替代词语获取单元802d,用于根据句法分析结果获取候选替代词语,根据候选替代词语组成候选替代词语集合。
在一个实施例中,数据处理装置800还包括特征处理模块,所述特征处理模块用于通过代词消解神经网络对第一特征和第二特征进行维度变换和长度缩放处理,得到对应的第一目标特征和第二目标特征;所述迭代处理模块806,通过代词消解神经网络根据第一目标特征和第二目标特征进行正例迭代处理得到对应的正例特征向量模长,根据第一目标特征和第二目标特征进行反例迭代处理得到对应的反例特征向量模长。
在一个实施例中,特征提取模块804还用于代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对上下文词语集合中的词序列进行压缩表示,得到对应的第一前向子特征和第一后向子特征;代词消解神经网络对上下文词语集合中的词序列对应的字序列进行压缩表示,得到第一字向量子特征,将第一前向子特征、第一后向子特征和第一字向量子特征组成上下文词语集合对应的第一特征;代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对候选替代词语集合中的词序列进行压缩表示,得到对应的第二前向子特征和第二后向子特征;代词消解神经网络对候选替代词语集合中的词序列对应的字序列进行压缩表示,得到第二字向量子特征,将第二前向子特征、第二后向子特征和第二字向量子特征组成候选替代词语集合对应的第二特征。
在一个实施例中,迭代处理模块806还用于根据第一特征和第二特征计算得到正例迭代处理的初始正例迭代中心,将初始正例迭代中心作为当前正例迭代中心;根据预设正例权重系数对第一特征和第二特征分别进行线性变换,得到对应的第一正例中间特征和第二正例中间特征;将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算,得到对应的第一正例相似度和第二正例相似度;对第一正例相似度和第二正例相似度进行归一化操作,得到对应的第一正例中间相似度和第二正例中间相似度;根据第一正例中间相似度和对应的第一正例中间特征、第二正例相似度和对应的第二正例中间特征计算得到初始正例特征向量模长;根据初始正例特征向量模长和初始正例迭代中心计算得到正例更新迭代中心,将正例更新迭代中心作为当前正例迭代中心,返回将第一正例中间特征和第二正例中间特征分别与当前正例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到正例特征向量模长。
在另一个实施例中,迭代处理模块806还用于根据第一特征和第二特征计算得到反例迭代处理的初始反例迭代中心,将初始反例迭代中心作为当前反例迭代中心;根据预设反例权重系数对第一特征和第二特征分别进行线性变换,得到对应的第一反例中间特征和第二反例中间特征;将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算,得到对应的第一反例相似度和第二反例相似度;对第一反例相似度和第二反例相似度进行归一化操作,得到对应的第一反例中间相似度和第二反例中间相似度;根据第一反例中间相似度和对应的第一反例中间特征、第二反例相似度和对应的第二反例中间特征计算得到初始反例特征向量模长;根据初始反例特征向量模长和初始反例迭代中心计算得到反例更新迭代中心,将反例更新迭代中心作为当前反例迭代中心,返回将第一反例中间特征和第二反例中间特征分别与当前反例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到反例特征向量模长。
在一个实施例中,如图13所示,提供一种代词消解神经网络训练装置1000,该装置包括:
训练文本获取模块1002,用于获取训练文本,训练文本存在对应的标准训练文本标签。
训练文本处理模块1004,用于确定训练文本中待检测词语对应的训练上下文词语集合 和训练候选替代词语集合。
训练特征表示模块1006,用于将训练上下文词语集合和训练候选替代词语集合输入至初始代词消解神经网络中,初始代词消解神经网络分别对训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征。
训练特征迭代处理模块1008,用于初始代词消解神经网络根据第一训练特征和第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度。
训练损失值计算模块1010,用于根据各个训练候选替代词语对应的训练替代可能度和对应的标准训练文本标签计算训练损失值。
神经网络训练模块1012,用于根据训练损失值对初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
在一个实施例中,训练文本获取模块还用于获取抗干扰特征集合;训练特征迭代处理模块还用于将抗干扰特征集合中的抗干扰特征输入至初始代词消解神经网络中,初始代词消解神经网络根据第一训练特征、第二训练特征和抗干扰特征生成额外训练特征,初始代词消解神经网络根据第一训练特征、第二训练特征、抗干扰特征和额外训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据第一训练特征、第二训练特征、抗干扰特征和额外训练特征进行反例迭代处理得到对应的反例训练特征向量模长。
在一个实施例中,训练特征迭代处理模块1008还用于根据正例训练特征向量模长和反例训练特征向量模长计算得到训练候选替代词语集合中各个训练候选替代词语对应的正例训练替代可能度和反例训练替代可能度;训练损失值计算模块1010还用于根据各个训练候选替代词语对应的正例训练替代可能度和对应的标准训练文本标签,反例训练替代可能度和对应的标准训练文本标签计算得到训练损失值。
图14示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是图1中的终端110或服务器120。如图14所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现数据处理方法或者代词消解神经网络训练方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行数据处理方法或者代词消解神经网络训练方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。应当说明的是,若计算机设备为图1中的服务器120,则计算机设备不包括显示屏。
本领域技术人员可以理解,图14中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的数据处理装置或者代词消解神经网络训练装置可以实现为一种计算机程序的形式,计算机程序可在如图14所示的计算机设备上运行。计算机设备的存储器中可存储组成该数据处理装置或者代词消解神经网络训练装置的各个程序模块,比如,图11所示的待检测文本获取模块、特征提取模块、迭代处理模块、目标替代词语确定模块和目标替代词语插入模块。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的数据处理方法中的步骤。又如图13所示的训练文本获取模块、训练文本处理模块、训练特征表示模块、训练特征迭代处理模块、训练损失值计算模块以及神经网络训练模块。
例如,图11所示的计算机设备可以通过如图11所示的数据处理装置中的待检测文本获取模块执行获取待检测文本,确定待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;特征提取模块执行将上下文词语集合和候选替代词语集合输入至代词消解神经网络中,代词消解神经网络分别对上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;迭代处理模块执行代词消解神经网络根据第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据正例特征向量模长和反例特征向量模长计算得到候选替代词语集合中各个候选替代词语对应的替代可能度;目标替代词语确定模块执行根据各个候选替代词语对应的替代可能度确定目标替代词语;目标替代词语插入模块执行根据待检测词语对应的位置将目标替代词语插入待检测文本得到目标文本。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述数据处理方法或者代词消解神经网络训练方法的步骤。此处数据处理方法或者代词消解神经网络训练训练方法的步骤可以是上述各个实施例的数据处理方法或者代词消解神经网络训练训练方法中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述数据处理方法或者代词消解神经网络训练训练方法的步骤。此处数据处理方法或者代词消解神经网络训练训练方法的步骤可以是上述各个实施例的数据处理方法或者代词消解神经网络训练训练方法中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种数据处理方法,所述方法由计算机设备执行,所述方法包括:
    获取待检测文本,确定所述待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;
    将所述上下文词语集合和候选替代词语集合输入至代词消解神经网络中,所述代词消解神经网络分别对所述上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;
    通过所述代词消解神经网络根据所述第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据所述第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据所述正例特征向量模长和所述反例特征向量模长计算得到所述候选替代词语集合中各个候选替代词语对应的替代可能度;
    根据所述各个候选替代词语对应的替代可能度确定目标替代词语;
    根据所述待检测词语对应的位置将所述目标替代词语插入所述待检测文本得到目标文本。
  2. 根据权利要求1所述的方法,其中,所述通过代词消解神经网络分别对所述上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征之后,还包括:
    通过所述代词消解神经网络对所述第一特征和所述第二特征进行维度变换和长度缩放处理,得到对应的第一目标特征和第二目标特征;
    通过所述代词消解神经网络根据所述第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据所述第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,包括:
    通过所述代词消解神经网络根据所述第一目标特征和所述第二目标特征进行正例迭代处理得到对应的正例特征向量模长,根据所述第一目标特征和第二目标特征进行反例迭代处理得到对应的反例特征向量模长。
  3. 根据权利要求1所述的方法,其中,所述确定所述待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合,包括:
    对所述待检测文本进行分割,得到多个词语;
    对各个所述词语进行句法分析,根据句法分析结果确定所述待检测词语所在的位置;
    根据所述待检测词语所在的位置获取上文词序列和下文词序列,根据所述上文词序列和下文词序列组成上下文词语集合;
    根据所述句法分析结果获取候选替代词语,根据所述候选替代词语组成候选替代词语集合。
  4. 根据权利要求3所述的方法,其中,所述代词消解神经网络分别对所述上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征,包括:
    通过所述代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对所述上下文词语集合中的词序列进行压缩表示,得到对应的第一前向子特征和第一后向子特征;
    通过所述代词消解神经网络对所述上下文词语集合中的所述词序列对应的字序列进行压缩表示,得到第一字向量子特征,将所述第一前向子特征、所述第一后向子特征和所述第一字向量子特征组成所述上下文词语集合对应的第一特征;
    通过所述代词消解神经网络,根据所述前向特征表示子网络和所述后向特征表示子网络对所述候选替代词语集合中的词序列进行压缩表示,得到对应的第二前向子特征和第二后向子特征;
    通过所述代词消解神经网络对所述候选替代词语集合中的所述词序列对应的字序列进行压缩表示,得到第二字向量子特征,将所述第二前向子特征、所述第二后向子特征和所述 第二字向量子特征组成所述候选替代词语集合对应的第二特征。
  5. 根据权利要求1所述的方法,其中,通过所述代词消解神经网络根据所述第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,包括:
    根据所述第一特征和所述第二特征计算得到正例迭代处理的初始正例迭代中心,将所述初始正例迭代中心作为当前正例迭代中心;
    根据预设正例权重系数对所述第一特征和所述第二特征分别进行线性变换,得到对应的第一正例中间特征和第二正例中间特征;
    将所述第一正例中间特征和所述第二正例中间特征分别与所述当前正例迭代中心进行相似度计算,得到对应的第一正例相似度和第二正例相似度;
    对所述第一正例相似度和所述第二正例相似度进行归一化操作,得到对应的第一正例中间相似度和第二正例中间相似度;
    根据所述第一正例中间相似度和对应的第一正例中间特征、所述第二正例相似度和对应的第二正例中间特征计算得到初始正例特征向量模长;
    根据所述初始正例特征向量模长和所述初始正例迭代中心计算得到正例更新迭代中心,将所述正例更新迭代中心作为所述当前正例迭代中心,返回所述将所述第一正例中间特征和所述第二正例中间特征分别与所述当前正例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到所述正例特征向量模长。
  6. 根据权利要求1所述的方法,其中,所述根据所述第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,包括:
    根据所述第一特征和所述第二特征计算得到反例迭代处理的初始反例迭代中心,将所述初始反例迭代中心作为当前反例迭代中心;
    根据预设反例权重系数对所述第一特征和所述第二特征分别进行线性变换,得到对应的第一反例中间特征和第二反例中间特征;
    将所述第一反例中间特征和所述第二反例中间特征分别与所述当前反例迭代中心进行相似度计算,得到对应的第一反例相似度和第二反例相似度;
    对所述第一反例相似度和所述第二反例相似度进行归一化操作,得到对应的第一反例中间相似度和第二反例中间相似度;
    根据所述第一反例中间相似度和对应的第一反例中间特征、所述第二反例相似度和对应的第二反例中间特征计算得到初始反例特征向量模长;
    根据所述初始反例特征向量模长和所述初始反例迭代中心计算得到反例更新迭代中心,将所述反例更新迭代中心作为所述当前反例迭代中心,返回所述将所述第一反例中间特征和所述第二反例中间特征分别与所述当前反例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到所述反例特征向量模长。
  7. 一种代词消解神经网络训练方法,所述方法由计算机设备执行,所述方法包括:
    获取训练文本,所述训练文本存在对应的标准训练文本标签;
    确定所述训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合;
    将所述训练上下文词语集合和所述训练候选替代词语集合输入至初始代词消解神经网络中,所述初始代词消解神经网络分别对所述训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征;
    通过所述初始代词消解神经网络根据所述第一训练特征和所述第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据所述第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据所述正例训练特征向量模长和所述反例训练特征向量模长计算得到所述训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度;
    根据所述各个训练候选替代词语对应的训练替代可能度和对应的所述标准训练文本标签计算训练损失值;
    根据所述训练损失值对所述初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:
    获取抗干扰特征集合;
    将所述抗干扰特征集合中的抗干扰特征输入至所述初始代词消解神经网络中,所述初始代词消解神经网络根据所述第一训练特征、所述第二训练特征和所述抗干扰特征生成额外训练特征;
    通过所述初始代词消解神经网络根据所述第一训练特征和所述第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据所述第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,包括:
    所述初始代词消解神经网络根据所述第一训练特征、所述第二训练特征、所述抗干扰特征和所述额外训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据所述第一训练特征、所述第二训练特征、所述抗干扰特征和所述额外训练特征进行反例迭代处理得到对应的反例训练特征向量模长。
  9. 根据权利要求7所述的方法,其中,所述根据所述正例训练特征向量模长和所述反例训练特征向量模长计算得到所述训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度,包括:
    根据所述正例训练特征向量模长和所述反例训练特征向量模长计算得到所述训练候选替代词语集合中各个训练候选替代词语对应的正例训练替代可能度和反例训练替代可能度;
    所述根据所述各个训练候选替代词语对应的训练替代可能度和对应的所述标准训练文本标签计算训练损失值,包括:
    根据各个训练候选替代词语对应的正例训练替代可能度和对应的标准训练文本标签,所述反例训练替代可能度和对应的标准训练文本标签计算得到训练损失值。
  10. 一种数据处理装置,其中,所述装置包括:
    待检测文本获取模块,用于获取待检测文本,确定所述待检测文本中待检测词语对应的上下文词语集合和候选替代词语集合;
    特征提取模块,用于将所述上下文词语集合和候选替代词语集合输入至代词消解神经网络中,所述代词消解神经网络分别对所述上下文词语集合和候选替代词语集合进行特征提取得到对应的第一特征和第二特征;
    迭代处理模块,用于通过所述代词消解神经网络根据所述第一特征和第二特征进行正例迭代处理得到对应的正例特征向量模长,根据所述第一特征和第二特征进行反例迭代处理得到对应的反例特征向量模长,根据所述正例特征向量模长和所述反例特征向量模长计算得到所述候选替代词语集合中各个候选替代词语对应的替代可能度;
    目标替代词语确定模块,用于根据所述各个候选替代词语对应的替代可能度确定目标替代词语;
    目标替代词语插入模块,用于根据所述待检测词语对应的位置将所述目标替代词语插入所述待检测文本得到目标文本。
  11. 根据权利要求10所述的装置,其中,所述装置还包括特征处理模块,
    所述特征处理模块,用于通过所述代词消解神经网络对所述第一特征和所述第二特征进行维度变换和长度缩放处理,得到对应的第一目标特征和第二目标特征;
    所述迭代处理模块,用于通过所述代词消解神经网络根据所述第一目标特征和所述第二目标特征进行正例迭代处理得到对应的正例特征向量模长,根据所述第一目标特征和第二目 标特征进行反例迭代处理得到对应的反例特征向量模长。
  12. 根据权利要求10所述的装置,其中,所述待检测文本获取模块包括:
    待检测文本分割单元,用于对所述待检测文本进行分割,得到多个词语;
    句法分析单元,用于对各个所述词语进行句法分析,根据句法分析结果确定所述待检测词语所在的位置;
    词序列获取单元,用于根据所述待检测词语所在的位置获取上文词序列和下文词序列,根据所述上文词序列和下文词序列组成上下文词语集合;
    候选替代词语获取单元,用于根据所述句法分析结果获取候选替代词语,根据所述候选替代词语组成候选替代词语集合。
  13. 根据权利要求12所述的装置,其中,所述特征提取模块,还用于:
    通过所述代词消解神经网络通过前向特征表示子网络和后向特征表示子网络对所述上下文词语集合中的词序列进行压缩表示,得到对应的第一前向子特征和第一后向子特征;
    通过所述代词消解神经网络对所述上下文词语集合中的所述词序列对应的字序列进行压缩表示,得到第一字向量子特征,将所述第一前向子特征、所述第一后向子特征和所述第一字向量子特征组成所述上下文词语集合对应的第一特征;
    通过所述代词消解神经网络通过所述前向特征表示子网络和所述后向特征表示子网络对所述候选替代词语集合中的词序列进行压缩表示,得到对应的第二前向子特征和第二后向子特征;
    通过所述代词消解神经网络对所述候选替代词语集合中的所述词序列对应的字序列进行压缩表示,得到第二字向量子特征,将所述第二前向子特征、所述第二后向子特征和所述第二字向量子特征组成所述候选替代词语集合对应的第二特征。
  14. 根据权利要求10所述的装置,其中,所述迭代处理模块,还用于:
    根据所述第一特征和所述第二特征计算得到正例迭代处理的初始正例迭代中心,将所述初始正例迭代中心作为当前正例迭代中心;
    根据预设正例权重系数对所述第一特征和所述第二特征分别进行线性变换,得到对应的第一正例中间特征和第二正例中间特征;
    将所述第一正例中间特征和所述第二正例中间特征分别与所述当前正例迭代中心进行相似度计算,得到对应的第一正例相似度和第二正例相似度;
    对所述第一正例相似度和所述第二正例相似度进行归一化操作,得到对应的第一正例中间相似度和第二正例中间相似度;
    根据所述第一正例中间相似度和对应的第一正例中间特征、所述第二正例相似度和对应的第二正例中间特征计算得到初始正例特征向量模长;
    根据所述初始正例特征向量模长和所述初始正例迭代中心计算得到正例更新迭代中心,将所述正例更新迭代中心作为所述当前正例迭代中心,返回所述将所述第一正例中间特征和所述第二正例中间特征分别与所述当前正例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到所述正例特征向量模长。
  15. 根据权利要求10所述的装置,其中,所述迭代处理模块,还用于:
    根据所述第一特征和所述第二特征计算得到反例迭代处理的初始反例迭代中心,将所述初始反例迭代中心作为当前反例迭代中心;
    根据预设反例权重系数对所述第一特征和所述第二特征分别进行线性变换,得到对应的第一反例中间特征和第二反例中间特征;
    将所述第一反例中间特征和所述第二反例中间特征分别与所述当前反例迭代中心进行相似度计算,得到对应的第一反例相似度和第二反例相似度;
    对所述第一反例相似度和所述第二反例相似度进行归一化操作,得到对应的第一反例中间相似度和第二反例中间相似度;
    根据所述第一反例中间相似度和对应的第一反例中间特征、所述第二反例相似度和对应的第二反例中间特征计算得到初始反例特征向量模长;
    根据所述初始反例特征向量模长和所述初始反例迭代中心计算得到反例更新迭代中心,将所述反例更新迭代中心作为所述当前反例迭代中心,返回所述将所述第一反例中间特征和所述第二反例中间特征分别与所述当前反例迭代中心进行相似度计算的步骤,直至满足收敛条件,得到所述反例特征向量模长。
  16. 一种代词消解神经网络训练装置,其中,所述装置包括:
    训练文本获取模块,用于获取训练文本,所述训练文本存在对应的标准训练文本标签;
    训练文本处理模块,用于确定所述训练文本中待检测词语对应的训练上下文词语集合和训练候选替代词语集合;
    训练特征表示模块,用于将所述训练上下文词语集合和所述训练候选替代词语集合输入至初始代词消解神经网络中,所述初始代词消解神经网络分别对所述训练上下文词语集合和训练候选替代词语集合进行特征提取得到对应的第一训练特征和第二训练特征;
    训练特征迭代处理模块,用于通过所述初始代词消解神经网络根据所述第一训练特征和所述第二训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据所述第一训练特征和第二训练特征进行反例迭代处理得到对应的反例训练特征向量模长,根据所述正例训练特征向量模长和所述反例训练特征向量模长计算得到所述训练候选替代词语集合中各个训练候选替代词语对应的训练替代可能度;
    训练损失值计算模块,用于根据所述各个训练候选替代词语对应的训练替代可能度和对应的所述标准训练文本标签计算训练损失值;
    神经网络训练模块,用于根据所述训练损失值对所述初始代词消解神经网络的模型参数进行调整,直至满足收敛条件,得到代词消解神经网络。
  17. 根据权利要求16所述的装置,其中,所述训练文本获取模块还用于获取抗干扰特征集合;所述训练特征迭代处理模块还用于将所述抗干扰特征集合中的抗干扰特征输入至所述初始代词消解神经网络中,所述初始代词消解神经网络根据所述第一训练特征、所述第二训练特征和所述抗干扰特征生成额外训练特征,所述初始代词消解神经网络根据所述第一训练特征、所述第二训练特征、所述抗干扰特征和所述额外训练特征进行正例迭代处理得到对应的正例训练特征向量模长,根据所述第一训练特征、所述第二训练特征、所述抗干扰特征和所述额外训练特征进行反例迭代处理得到对应的反例训练特征向量模长。
  18. 根据权利要求16所述的装置,其中,
    所述训练特征迭代处理模块,还用于根据所述正例训练特征向量模长和所述反例训练特征向量模长计算得到所述训练候选替代词语集合中各个训练候选替代词语对应的正例训练替代可能度和反例训练替代可能度;
    所述训练损失值计算模块,还用于根据各个训练候选替代词语对应的正例训练替代可能度和对应的标准训练文本标签,所述反例训练替代可能度和对应的标准训练文本标签计算得到训练损失值。
  19. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至9中任一项所述方法的步骤。
  20. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至9中任一项所述方法的步骤。
PCT/CN2020/084432 2019-04-19 2020-04-13 数据处理方法和代词消解神经网络训练方法 WO2020211720A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/339,933 US20210294972A1 (en) 2019-04-19 2021-06-04 Data processing method and pronoun resolution neural network training method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910319013.8 2019-04-19
CN201910319013.8A CN110162785A (zh) 2019-04-19 2019-04-19 数据处理方法和代词消解神经网络训练方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/339,933 Continuation US20210294972A1 (en) 2019-04-19 2021-06-04 Data processing method and pronoun resolution neural network training method

Publications (1)

Publication Number Publication Date
WO2020211720A1 true WO2020211720A1 (zh) 2020-10-22

Family

ID=67639657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084432 WO2020211720A1 (zh) 2019-04-19 2020-04-13 数据处理方法和代词消解神经网络训练方法

Country Status (3)

Country Link
US (1) US20210294972A1 (zh)
CN (1) CN110162785A (zh)
WO (1) WO2020211720A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861518A (zh) * 2020-12-29 2021-05-28 科大讯飞股份有限公司 文本纠错方法、装置和存储介质及电子装置

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162785A (zh) * 2019-04-19 2019-08-23 腾讯科技(深圳)有限公司 数据处理方法和代词消解神经网络训练方法
CN110705206B (zh) * 2019-09-23 2021-08-20 腾讯科技(深圳)有限公司 一种文本信息的处理方法及相关装置
CN111401035A (zh) * 2020-02-18 2020-07-10 平安科技(深圳)有限公司 基于大数据的零指代消解方法、装置、设备及介质
CN111611807B (zh) * 2020-05-18 2022-12-09 北京邮电大学 一种基于神经网络的关键词提取方法、装置及电子设备
CN111666409B (zh) * 2020-05-28 2022-02-08 武汉大学 一种基于综合深度胶囊网络的复杂评论文本的整体情感智能分类方法
CN112597753A (zh) * 2020-12-22 2021-04-02 北京百度网讯科技有限公司 文本纠错处理方法、装置、电子设备和存储介质
CN112989043B (zh) * 2021-03-17 2024-03-12 中国平安人寿保险股份有限公司 指代消解方法、装置、电子设备及可读存储介质
CN113392629B (zh) * 2021-06-29 2022-10-28 哈尔滨工业大学 基于预训练模型的人称代词消解方法
US20230222294A1 (en) * 2022-01-12 2023-07-13 Bank Of America Corporation Anaphoric reference resolution using natural language processing and machine learning
CN114579706B (zh) * 2022-03-07 2023-09-29 桂林旅游学院 一种基于bert神经网络和多任务学习的主观题自动评阅方法
CN115344693B (zh) * 2022-07-11 2023-05-12 北京容联易通信息技术有限公司 一种基于传统算法和神经网络算法融合的聚类方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018174815A1 (en) * 2017-03-24 2018-09-27 Agency For Science, Technology And Research Method and apparatus for semantic coherence analysis of texts
CN108595408A (zh) * 2018-03-15 2018-09-28 中山大学 一种基于端到端神经网络的指代消解方法
CN109165386A (zh) * 2017-08-30 2019-01-08 哈尔滨工业大学 一种中文零代词消解方法及系统
CN110162785A (zh) * 2019-04-19 2019-08-23 腾讯科技(深圳)有限公司 数据处理方法和代词消解神经网络训练方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105988990B (zh) * 2015-02-26 2021-06-01 索尼公司 汉语零指代消解装置和方法、模型训练方法和存储介质
JP6727610B2 (ja) * 2016-09-05 2020-07-22 国立研究開発法人情報通信研究機構 文脈解析装置及びそのためのコンピュータプログラム
US10839284B2 (en) * 2016-11-03 2020-11-17 Salesforce.Com, Inc. Joint many-task neural network model for multiple natural language processing (NLP) tasks
US10049103B2 (en) * 2017-01-17 2018-08-14 Xerox Corporation Author personality trait recognition from short texts with a deep compositional learning approach

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018174815A1 (en) * 2017-03-24 2018-09-27 Agency For Science, Technology And Research Method and apparatus for semantic coherence analysis of texts
CN109165386A (zh) * 2017-08-30 2019-01-08 哈尔滨工业大学 一种中文零代词消解方法及系统
CN108595408A (zh) * 2018-03-15 2018-09-28 中山大学 一种基于端到端神经网络的指代消解方法
CN110162785A (zh) * 2019-04-19 2019-08-23 腾讯科技(深圳)有限公司 数据处理方法和代词消解神经网络训练方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAOUSSEN MATHLOUTHI BOUZID ET AL.: "How to Combine Salience Factors for Arabic Pronoun Anaphora Resolution", 2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 30 November 2017 (2017-11-30), XP033329938, ISSN: 2161-5330, DOI: 20200612114516 *
WU, BINGBING: "Research on Chinese Zero Pronoun Resolution Based on Word Embedding and LSTM", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 February 2017 (2017-02-15), ISSN: 1674-0246, DOI: 20200612113648X *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861518A (zh) * 2020-12-29 2021-05-28 科大讯飞股份有限公司 文本纠错方法、装置和存储介质及电子装置
CN112861518B (zh) * 2020-12-29 2023-12-01 科大讯飞股份有限公司 文本纠错方法、装置和存储介质及电子装置

Also Published As

Publication number Publication date
CN110162785A (zh) 2019-08-23
US20210294972A1 (en) 2021-09-23

Similar Documents

Publication Publication Date Title
WO2020211720A1 (zh) 数据处理方法和代词消解神经网络训练方法
US11163947B2 (en) Methods and systems for multi-label classification of text data
WO2020177230A1 (zh) 基于机器学习的医疗数据分类方法、装置、计算机设备及存储介质
CN108595706B (zh) 一种基于主题词类相似性的文档语义表示方法、文本分类方法和装置
US20150095017A1 (en) System and method for learning word embeddings using neural language models
CN113239700A (zh) 改进bert的文本语义匹配设备、系统、方法及存储介质
CN108475262A (zh) 用于文本处理的电子设备和方法
CN111914097A (zh) 基于注意力机制和多层级特征融合的实体抽取方法与装置
CN110162771B (zh) 事件触发词的识别方法、装置、电子设备
CN110968725B (zh) 图像内容描述信息生成方法、电子设备及存储介质
CN113434683B (zh) 文本分类方法、装置、介质及电子设备
US20240111956A1 (en) Nested named entity recognition method based on part-of-speech awareness, device and storage medium therefor
CN112256822A (zh) 文本搜索方法、装置、计算机设备和存储介质
US20220245353A1 (en) System and method for entity labeling in a natural language understanding (nlu) framework
WO2014073206A1 (ja) 情報処理装置、及び、情報処理方法
CN110674642B (zh) 一种用于含噪稀疏文本的语义关系抽取方法
LeBrun et al. Evaluating distributional distortion in neural language modeling
WO2022116444A1 (zh) 文本分类方法、装置、计算机设备和介质
WO2023116572A1 (zh) 一种词句生成方法及相关设备
WO2023000725A1 (zh) 电力计量的命名实体识别方法、装置和计算机设备
US20220229990A1 (en) System and method for lookup source segmentation scoring in a natural language understanding (nlu) framework
US20220229986A1 (en) System and method for compiling and using taxonomy lookup sources in a natural language understanding (nlu) framework
CN113408296B (zh) 一种文本信息提取方法、装置及设备
WO2021004118A1 (zh) 一种相关值确定方法及装置
CN114595324A (zh) 电网业务数据分域的方法、装置、终端和非暂时性存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20791771

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20791771

Country of ref document: EP

Kind code of ref document: A1