CN107729309B - Deep learning-based Chinese semantic analysis method and device - Google Patents

Deep learning-based Chinese semantic analysis method and device Download PDF

Info

Publication number
CN107729309B
CN107729309B CN201610658579.XA CN201610658579A CN107729309B CN 107729309 B CN107729309 B CN 107729309B CN 201610658579 A CN201610658579 A CN 201610658579A CN 107729309 B CN107729309 B CN 107729309B
Authority
CN
China
Prior art keywords
chinese
chinese text
recognition
text
mobile terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610658579.XA
Other languages
Chinese (zh)
Other versions
CN107729309A (en
Inventor
郑骁庆
陈军
吕永
尚国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
ZTE Corp
Original Assignee
Fudan University
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University, ZTE Corp filed Critical Fudan University
Priority to CN201610658579.XA priority Critical patent/CN107729309B/en
Priority to PCT/CN2016/105977 priority patent/WO2018028077A1/en
Publication of CN107729309A publication Critical patent/CN107729309A/en
Application granted granted Critical
Publication of CN107729309B publication Critical patent/CN107729309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a method and a device for Chinese semantic analysis based on deep learning, which relate to the technical field of natural language processing, and the method comprises the following steps: the mobile terminal obtains a standard Chinese text by performing standardized processing on the acquired Chinese text; the mobile terminal identifies special type vocabularies and/or custom vocabularies and/or Chinese names of the standard Chinese text, and takes an identification result as a constraint condition; the mobile terminal obtains a Chinese word segmentation and part-of-speech tagging model according to the constraint conditions and by utilizing deep learning, and performs Chinese word segmentation and part-of-speech analysis on the standardized Chinese text to obtain the word segmentation and part-of-speech of the standardized Chinese text; and the mobile terminal performs Chinese semantic analysis on the standardized Chinese text by utilizing the word segmentation, the part of speech and/or the naming identification type of the standardized Chinese text.

Description

Deep learning-based Chinese semantic analysis method and device
Technical Field
The invention relates to the technical field of natural language processing, in particular to a Chinese semantic analysis method and device based on deep learning.
Background
Chinese natural language understanding has advanced substantially, and particularly, a great deal of research results have been generated in terms of Chinese word segmentation and part-of-speech analysis. Although chinese automated analysis technology is still relatively lagged behind compared to english and japanese, previous research accumulation has made it possible to develop and apply systems capable of high-level semantic analysis and understanding to practical applications. The system applying the semantic analysis technology can greatly improve the intelligence level and the coping capability of the system. The semantic analysis technology is a key and difficult point of text information analysis and processing, and is also the basis of information extraction, user intention analysis, information fusion, question answering, intelligent reasoning and the like.
On the other hand, deep learning is a breakthrough progress of recent artificial intelligence research, which ends the situation that artificial intelligence cannot progress for ten years and rapidly affects the industry. The deep learning is different from a narrow artificial intelligence system (function simulation for a specific task) which can only complete a specific task, and as a general artificial intelligence technique, it can cope with various situations and problems, has been applied very successfully in the fields of image recognition, voice recognition, and the like, and has also achieved a result in the field of natural language processing (mainly, english).
Disclosure of Invention
The technical problem solved by the scheme provided by the embodiment of the invention is that the automatic analysis of Chinese semantics is inaccurate.
The method for Chinese semantic analysis based on deep learning provided by the embodiment of the invention comprises the following steps:
the mobile terminal obtains a standard Chinese text by performing standardized processing on the acquired Chinese text;
the mobile terminal performs special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and takes a recognition result as a constraint condition;
the mobile terminal obtains a Chinese word segmentation and part-of-speech tagging model according to the constraint conditions and by utilizing deep learning, and performs Chinese word segmentation and part-of-speech analysis on the standardized Chinese text to obtain the word segmentation and part-of-speech of the standardized Chinese text;
and the mobile terminal performs Chinese semantic analysis on the standardized Chinese text by utilizing the word segmentation, the part of speech and/or the naming identification type of the standardized Chinese text.
Preferably, the mobile terminal performs special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition on the canonical Chinese text, and takes the recognition result as a constraint condition, including:
the mobile terminal performs special type vocabulary recognition on the standard Chinese text by using a special type vocabulary template to obtain a special type vocabulary recognition result of the standard Chinese text, and the obtained special type vocabulary recognition result is used as a first constraint condition.
Preferably, the mobile terminal performs special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition on the canonical Chinese text, and takes the recognition result as a constraint condition, including:
and the mobile terminal carries out user-defined vocabulary recognition on the standard Chinese text by using the user-defined dictionary to obtain a user-defined vocabulary recognition result of the standard Chinese text, and the obtained user-defined vocabulary recognition result is used as a second constraint condition.
Preferably, the mobile terminal performs special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition on the canonical Chinese text, and takes the recognition result as a constraint condition, including:
the mobile terminal conducts Chinese naming recognition on the standard Chinese text by utilizing a Chinese naming recognition model obtained through deep learning to obtain a Chinese naming recognition result of the standard Chinese text, and the obtained Chinese naming recognition result is used as a third constraint condition.
Preferably, the constraint condition includes at least one of a first constraint condition, a second constraint condition, and a third constraint condition, or a combination thereof.
Preferably, the performing, by the mobile terminal, chinese semantic analysis on the normalized chinese text by using the segmentation, part of speech, and/or named recognition type of the normalized chinese text includes:
and the mobile terminal classifies the standard Chinese text according to the characters of the standard Chinese text and a Chinese sentence model based on a convolutional neural network with dynamic k-max pooling to obtain a sentence classification result of the standard Chinese text.
Preferably, the performing, by the mobile terminal, chinese semantic analysis on the normalized chinese text by using the segmentation, part of speech, and/or named recognition type of the normalized chinese text includes:
the mobile terminal determines a Chinese semantic role labeling model of a bidirectional LSTM (Long-Short Term Memory) according to sentence classification results, and performs semantic role labeling on each participle and symbol of the standard Chinese text according to the participle, part of speech and/or naming type of the standard Chinese text and the Chinese semantic role labeling model of the bidirectional LSTM to obtain semantic role labeling results of the standard Chinese text.
Preferably, the performing, by the mobile terminal, chinese semantic analysis on the normalized chinese text by using the segmentation, part of speech, and/or named recognition type of the normalized chinese text includes:
and the mobile terminal carries out structured processing on the standard Chinese text according to the semantic role marking result and the event model of the standard Chinese text, and extracts key information of the standard Chinese text.
Preferably, the key information of the canonical chinese text includes an event name, a key attribute, and an attribute value.
The device for Chinese semantic analysis based on deep learning provided by the embodiment of the invention comprises the following components:
the normalization processing module is used for performing normalization processing on the acquired Chinese text to obtain a normalized Chinese text;
the recognition module is used for carrying out special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and taking a recognition result as a constraint condition;
and the analysis module is used for performing Chinese word segmentation and part-of-speech analysis on the standard Chinese text according to the constraint conditions and by utilizing deep learning to obtain a Chinese word segmentation and part-of-speech tagging model, obtaining the segmentation and part-of-speech of the standard Chinese text, and performing Chinese semantic analysis on the standard Chinese text by utilizing the segmentation and part-of-speech and/or naming identification type of the standard Chinese text.
According to the scheme provided by the embodiment of the invention, the input Chinese sentences are subjected to semantic analysis, then structured analysis results are output, and tasks requiring high-level semantic analysis support, such as event analysis, information extraction, emotion analysis and the like, are completed by utilizing the structured analysis results.
Drawings
FIG. 1 is a flowchart of a method for deep learning-based Chinese semantic analysis according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an apparatus for deep learning-based Chinese semantic analysis according to an embodiment of the present invention;
FIG. 3 is a block diagram of Chinese semantic analysis according to an embodiment of the present invention;
FIG. 4 is a diagram of a Chinese sequence annotation network model structure according to an embodiment of the present invention;
FIG. 5 is a block diagram of a convolutional neural network based on pooling with dynamic k-max according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of semantic role labeling of bidirectional LSTM provided by an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and it should be understood that the preferred embodiments described below are only for the purpose of illustrating and explaining the present invention, and are not to be construed as limiting the present invention.
Fig. 1 is a flowchart of a method for deep learning-based chinese semantic analysis according to an embodiment of the present invention, as shown in fig. 1, including:
step S101: the mobile terminal obtains a standard Chinese text by performing standardized processing on the acquired Chinese text;
step S102: the mobile terminal identifies special type vocabularies and/or custom vocabularies and/or Chinese names of the standard Chinese text, and takes an identification result as a constraint condition;
step S103: the mobile terminal obtains a Chinese word segmentation and part-of-speech tagging model according to the constraint conditions and by deep learning, and performs Chinese word segmentation and part-of-speech analysis on the standardized Chinese text to obtain the word segmentation and part-of-speech of the standardized Chinese text;
step S104: and the mobile terminal performs Chinese semantic analysis on the standardized Chinese text by utilizing the word segmentation, the part of speech and/or the naming identification type of the standardized Chinese text.
The mobile terminal performs special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and takes the recognition result as a constraint condition, wherein the constraint condition comprises the following steps: the mobile terminal performs special type vocabulary recognition on the standard Chinese text by using a special type vocabulary template to obtain a special type vocabulary recognition result of the standard Chinese text, and the obtained special type vocabulary recognition result is used as a first constraint condition.
The mobile terminal performs special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and takes the recognition result as a constraint condition, wherein the constraint condition comprises the following steps: and the mobile terminal carries out user-defined vocabulary recognition on the standard Chinese text by using the user-defined dictionary to obtain a user-defined vocabulary recognition result of the standard Chinese text, and the obtained user-defined vocabulary recognition result is used as a second constraint condition.
The mobile terminal performs special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and takes the recognition result as a constraint condition, wherein the constraint condition comprises the following steps: the mobile terminal conducts Chinese naming recognition on the standard Chinese text by utilizing a Chinese naming recognition model obtained through deep learning to obtain a Chinese naming recognition result of the standard Chinese text, and the obtained Chinese naming recognition result is used as a third constraint condition.
Wherein the constraint condition comprises at least one of a first constraint condition, a second constraint condition and a third constraint condition or a combination thereof.
The special type vocabulary recognition and/or the user-defined vocabulary recognition and/or the Chinese naming recognition are pre-word segmentation and part of speech tagging, namely, the special type vocabulary and/or the user-defined vocabulary and/or the Chinese naming recognized in the step are not subjected to word segmentation and part of speech tagging again in the next word segmentation and part of speech tagging step, so that a constraint condition is formed.
The mobile terminal performs Chinese semantic analysis on the standard Chinese text by using the segmentation, the part of speech and/or the naming identification type of the standard Chinese text, and the method comprises the following steps: and the mobile terminal classifies the standard Chinese text according to the characters of the standard Chinese text and a Chinese sentence model based on a convolutional neural network with dynamic k-max pooling to obtain a sentence classification result of the standard Chinese text.
The mobile terminal performs Chinese semantic analysis on the standardized Chinese text by using the word segmentation, the part of speech and/or the naming identification type of the standardized Chinese text, and comprises the following steps: and the mobile terminal determines a Chinese semantic role labeling model of the bidirectional long-and-short-term memory LSTM according to the sentence classification result, and performs semantic role labeling on each participle and symbol of the standard Chinese text according to the participle, part of speech and/or naming type of the standard Chinese text and the Chinese semantic role labeling model of the bidirectional long-and-short-term memory LSTM to obtain a semantic role labeling result of the standard Chinese text.
The mobile terminal performs Chinese semantic analysis on the standardized Chinese text by using the word segmentation, the part of speech and/or the naming identification type of the standardized Chinese text, and comprises the following steps: and the mobile terminal carries out structured processing on the standard Chinese text according to the semantic role marking result and the event model of the standard Chinese text, and extracts key information of the standard Chinese text. Specifically, the key information of the canonical chinese text includes an event name, key attributes, and attribute values.
Fig. 2 is a schematic diagram of a device for deep learning-based chinese semantic analysis according to an embodiment of the present invention, as shown in fig. 2, including: the normalization processing module 201 is configured to perform normalization processing on the acquired chinese text to obtain a normalized chinese text; the recognition module 202 is used for performing special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and taking a recognition result as a constraint condition; the analysis module 203 is configured to obtain a Chinese segmentation and part-of-speech tagging model according to the constraint conditions and by using deep learning, perform Chinese segmentation and part-of-speech analysis on the normalized Chinese text to obtain segmentation and part-of-speech of the normalized Chinese text, and perform Chinese semantic analysis on the normalized Chinese text by using the segmentation, part-of-speech, and/or name recognition types of the normalized Chinese text.
Wherein the analysis module 202 comprises: and the sentence classification unit is used for classifying the sentences of the standard Chinese text according to the characters of the standard Chinese text and a Chinese sentence model based on a convolutional neural network with dynamic k-max pooling to obtain a sentence classification result of the standard Chinese text.
Wherein the analysis module 202 further comprises: and the semantic role labeling unit is used for determining a Chinese semantic role labeling model of the bidirectional long-short-term memory LSTM according to the sentence classification result, and performing semantic role labeling on elements such as single words, participles, special types of words and the like in the standard Chinese text according to the participles, the part of speech and/or the naming recognition types of the standard Chinese text and the Chinese semantic role labeling model of the bidirectional long-short-term memory LSTM to obtain the semantic role labeling result of the standard Chinese text.
Wherein the analysis module 202 further comprises: and the structural processing unit is used for carrying out structural processing on the standard Chinese text by the mobile terminal according to the semantic role marking result and the event model of the standard Chinese text and extracting the key information of the standard Chinese text. Specifically, the key information of the canonical chinese text includes an event name, key attributes, and attribute values. Wherein, the event name may correspond to a sentence classification result. For example, for the text of the short message received by the terminal, the sentence classification model is divided into bank bills, flight and train, appointments, weather forecast and the like. The resulting type of sentence classification can be used as the event name. And the key attribute is a semantic role labeling result. For example, in the bank bill short message, the attribute value is marked as several categories such as the bill date, the consumption amount, the repayment date, the repayment amount, and the like, and the attribute value is marked as a specific value in the original short message text corresponding to the categories, such as a specific date, a specific amount, and the like.
Fig. 3 is a schematic diagram of a module for chinese semantic analysis according to an embodiment of the present invention, and as shown in fig. 3, after performing semantic analysis on an input chinese sentence by using a deep learning technique, a structured analysis result is output, and tasks requiring high-level semantic analysis support, such as event analysis, information extraction, and emotion analysis, are completed by using the structured analysis result, which specifically includes:
text normalization processing: the input Chinese sentence is subjected to normalized processing, and the method comprises the following steps: unified coding, traditional Chinese character conversion to simplified Chinese character conversion, full angle conversion to half angle conversion, special character conversion and non-standard expression replacement (for example, network expression is replaced by standard expression).
Custom vocabulary recognition: utilizing the user-defined dictionary to identify the user-defined vocabulary, comprising: application domain vocabulary, idioms, food, places, works, equipment, names of people, names of places, and names of institutions.
Special type vocabulary recognition: the electronic mail box, the website, the date, the time, the percentage, the quantifier, the currency, the telephone number, the number and the foreign words contained in the input sentence are identified by defining a template for identifying the electronic mail box, the website, the date, the time, the quantifier, the currency, the telephone number, the number and the foreign words, and are replaced by special characters.
Chinese naming recognition: by preparing a corpus of Chinese naming recognition, labeling a network model with a Chinese sequence as shown in fig. 4, and training a Chinese naming recognition model for Chinese naming recognition, a Person name, a place name, and a mechanism name in an input sentence are recognized, that is, specific Person names, place names, and mechanism names in the sentence are recognized and corresponding naming types are saved (for example, they can be respectively expressed by "Person", "Location", and "Organization").
Chinese participles and part-of-speech tagging: taking the result of special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition as a constraint, preparing combined Chinese participles and part-of-speech tagging corpora, adopting a Chinese sequence tagging network model shown in figure 4, training a Chinese participle and part-of-speech tagging model for performing combined tagging of Chinese participles and part-of-speech analysis, and performing combined Chinese participle and part-of-speech analysis on an input sentence.
Sentence classification: before semantic character labeling, sentences are classified by using sentence semantic representations generated by the convolutional neural network with the dynamic k-max pooling shown in FIG. 5, and input sentences which are not interesting in application are filtered. The method comprises the steps of training a Chinese sentence classification model of a convolutional neural network with dynamic k-max pooling by adopting a sentence classification corpus comprising sentences of various types in balance and negative sample sentences (applying uninteresting Chinese sentences), classifying input sentences by the model, and filtering and applying uninteresting input sentences.
Semantic role labeling: determining a semantic annotation network model of the bidirectional LSTM according to the sentence classification result (namely, different parsing models are adopted for different sentence classification categories), and then performing semantic role annotation on the sentences by adopting the semantic annotation network of the bidirectional LSTM shown in FIG. 6 for the participles, parts of speech and/or naming types in the standard text. According to the word segmentation, the part of speech and/or the naming type, preparing semantic role labeling linguistic data of the same sentence category, training a bidirectional LSTM Chinese semantic role labeling model, and performing semantic role labeling on the sentences through the model.
Event analysis: and according to the semantic role labeling result, packaging the semantic-analyzed structural representation by combining an event template, and extracting the name, key attribute and attribute value of the event.
The format of the training corpus labeled by the semantic role is that a word is arranged in a row in the sequence of words in a sentence, each row has 5 columns, and the training corpus sequentially and respectively represents the participles per se (e-mail, website, date, time, percentage, quantifier, currency, telephone number, foreign words and the like are replaced by English tags, single words or punctuation marks and the like also serve as independent participles), semantic tags ('O' represents a class unrelated to a task), part-of-speech tags, naming identification tags and the original word forms of the participles in the sentence. Each sentence sample is separated by an empty row.
When a sequence tagging task based on deep learning, such as Wen Fenci, part-of-speech tagging, chinese naming recognition and the like, is in progress, a decoding algorithm is performed by taking a result of special type vocabulary recognition and/or custom vocabulary recognition as a constraint (the constraint condition during Chinese segmentation and part-of-speech tagging can be added with a Chinese naming recognition result), comprising:
(1) The types of the e-mail, the website, the date, the time, the percentage, the quantifier, the currency, the telephone number, the foreign words and the like are identified in advance through the template.
(2) The self-defining of vocabularies including domain vocabularies, idioms, food, places, works, equipment, names of people, places, names of organizations and the like is supported.
(3) And combining the prediction output of the deep learning network, and performing Viterbi decoding by using the result of special type vocabulary recognition and/or user-defined vocabulary recognition as a constraint.
FIG. 4 is a diagram of a network model structure of Chinese sequence tagging provided in the embodiment of the present invention, which can be used for Chinese naming identification, chinese segmentation and part-of-speech tagging (note: different training corpora, different trained model data, and different constraint conditions). As shown in FIG. 4, the deep-learning Chinese sequence annotation network model receives a Chinese sentence as input and outputs sequence annotation results in units of characters (including Chinese characters, punctuation marks and other characters in the sentence that may occur). The label set adopts a label formed by expanding word segmentation labels and specific task labels. Taking Chinese naming identification as an example, if "PER" is used to represent the name tag, the following sentence:
zhuge Liang is the army teacher of Liu Bei military group. "
The corresponding labeling results are:
“B_PER I_PER E_PER O B_PER E_PER O O O O O O O O”。
wherein: "B" represents the beginning character of the vocabulary, "I" represents the middle character of the vocabulary, "E" represents the ending character of the vocabulary, and "O" represents a character unrelated to the task. Also, "S" represents a character capable of being individually formed into a word (e.g., a single character or a punctuation mark).
The label of a character is typically related to its surrounding characters, so a window model is used, i.e. the character and surrounding characters are taken as input when estimating the likelihood that the current character belongs to a label (see fig. 4). If the window size is set to 5, this character and two characters on the left and right thereof are indicated as an input window. If the number of characters on the left and right is less than the specified size of the window, padding is used instead.
Each input character will be converted to a corresponding vector representation by means of a lookup in a word vector table. The representation of each character may be randomly generated or pre-trained using an unsupervised approach. And then, splicing the vectors to represent the characteristic representation of a certain window. After passing through a linear network layer (middle hidden layer), nonlinear conversion is carried out by using a Sigmoid function, finally, a linear layer is used, vectors with the number equal to that of task tags are output, and each element of the vector represents the possibility of a corresponding tag.
Given a chinese sentence, the network outputs a matrix, each element f θ (t | i) in the matrix representing an estimate of the likelihood that the ith character in the sentence belongs to a tag t, where θ represents a parameter of the network. In the sequence labeling task, because of the strong dependency relationship between the front label and the back label, the matrix Aij is introduced to indicate the possibility of jumping from the label i to the label j (also included in the parameter set theta). Given a sentence s [1:n ] containing n characters, an estimate can be given for a certain label sequence t [1:n ] of equal length:
Figure BDA0001075778080000101
under the condition of given parameters, a Viterbi decoding algorithm can be adopted to obtain a label sequence with the highest score as a labeling result.
The training method is that on the training set, the probability of the occurrence of the correct labeling sequence of each sample is required to be maximum:
Figure BDA0001075778080000102
wherein: (s, t) represents one sample in the training set. The training adopts a gradient descent method, and all parameters of the network are updated by using the following formula:
Figure BDA0001075778080000111
wherein: λ represents a learning step size.
The Chinese sequence labeling network and the learning algorithm based on deep learning are characterized in that:
(1) The necessary preprocessing is carried out on the input Chinese sentence, and comprises the following steps: unified coding, traditional Chinese character conversion to simplified Chinese character conversion, full-angle conversion to half-angle conversion, special character conversion, non-standard phrase replacement, and unified conversion of recognized electronic mail box, web address, date, time, percentage, quantifier, currency, telephone number, number and foreign word into special character.
(2) When Viterbi decoding is used, the results of the user-defined vocabulary recognition, the special type vocabulary recognition and the Chinese naming recognition are used as constraints.
(3) A network configuration with 100 dimensions of word vectors, a window size of 3 or 5, and a number of intermediate hidden layer neurons 300 is used (the specific parameters depend on the corpus sample set size).
Fig. 5 is a structural diagram of a convolutional neural network based on dynamic k-max pooling according to an embodiment of the present invention, as shown in fig. 5, a chinese sentence is used as an input, a semantic representation of a full sentence is generated by the network, and a category related to a task to which the sentence belongs is predicted according to the representation.
The network first converts each character in the input sentence into a corresponding vector representation by looking up a word vector table. The representation of each character may be randomly generated or pre-trained using an unsupervised approach. The sentences are converted to form a feature matrix. The second step is that: and on each dimension of the characteristic matrix, converting the window characteristic input into a new characteristic by adopting a convolution method according to the set window size. The windows are slid sequentially from left to right across the feature matrix, producing a higher level representation of features equal in number to the columns of the feature matrix. Different convolution kernels are used for different dimensions, thereby generating a feature map of the input feature matrix. A set of different convolution kernels may be used simultaneously to generate multiple feature maps. The k most significant features are extracted by adopting a k-max pooling method on each feature map, namely k maximum feature values are extracted in each dimension, but the sequence of the feature values keeps the sequence in the input feature map. The feature transformation is performed on the k-max pooled result matrix using a hardpan nonlinear function. The second step can be performed by stacking a plurality of layers, a new one on top of the other. The k value of k-max pooling of the last layer is fixed (hyper-parameter of the model), and the k value of each previous layer is the larger value of the k value of the last layer and the value calculated by the formula (H-H/H) xL after rounding up. And thirdly, splicing all the characteristic values obtained from the last layer to generate the semantic representation of the whole sentence. On the basis of semantic representation, the type of the sentence is predicted through a linear layer and a Softmax layer.
Due to the use of the Softmax layer, the network output can be seen as a different class of probability distributions. The training adopts a gradient descent method, and the goal of network training is to increase the probability of correct prediction on a training set and simultaneously reduce the probability of wrong prediction.
The Chinese sentence classification model based on the convolutional neural network with the dynamic k-max pooling is characterized in that:
(1) The input Chinese sentence is subjected to necessary preprocessing, which comprises the following steps: unified coding, traditional Chinese character conversion to simplified Chinese character conversion, full-angle conversion to half-angle conversion, special character conversion, non-standard phrase replacement, and unified conversion of recognized electronic mail box, web address, date, time, percentage, quantifier, currency, telephone number, number and foreign word into special character.
(2) The method takes the character (including Chinese characters, punctuations and other characters in sentences which may appear) level as input, is very suitable for the Chinese situation, and avoids the error expansion of Chinese word segmentation to the sentence classification task.
(3) The convolution with one dimension is used, and the number of columns of the feature map output by the convolution layer is the same as the number of columns of the input feature matrix, so that the speed of network processing is increased.
(4) The network employs a convolution of two layers, wherein: the size of the first layer of windows is 5, the number of feature maps is 2, the size of the second layer of windows is 3, and the number of feature maps is 3. The k value of k-max pooling of the last layer is 5.
Fig. 6 is a schematic diagram of semantic role labeling of bidirectional LSTM according to an embodiment of the present invention, as shown in fig. 6, different semantic role labeling models are used for different sentence classification results, and when labeling semantic roles, recognition types are identified by participles, parts of speech, and/or names, and the recognition types are sorted and then used as input, and a semantic tag set associated with sentence categories is used to label the sentences by using the participles as units.
The input of each time of the network (corresponding to each vocabulary of the input sentence) is the spliced vector representation after the current vocabulary, part of speech and/or the name recognition type (i.e. the category in Chinese name recognition, such as the name of Person, place, organization respectively represented by "Person", "Location" and "Organization") are converted into the vector. The input sentence is processed from left to right (forward) and from right to left (backward) using two LSTMs, respectively. For each vocabulary, the LSTM outputs a vector representation, and the concatenation of the forward and backward LSTM produces the output as a vector representation of the vocabulary (fusing context information about itself and its left and right) that is used as input to predict the tags to which the vocabulary belongs using a linear layer.
The dependency relationship between the predicted vocabulary labels, namely the bidirectional LSTM with transition probability, can be further utilized on the basis of the bidirectional LSTM model. That is, given a Chinese sentence, the network outputs a matrix, where each element f θ (t | i) in the matrix represents an estimate of the likelihood that the ith word in the sentence belongs to a tag t, where θ represents a parameter of the network. In the semantic annotation task, since there is also a certain dependency relationship between the front label and the back label, the matrix Aij is introduced to indicate the possibility of jumping from the label i to the label j (also included in the parameter set θ). Given a sentence s [1:n ] containing n words, an estimate can be made for some sequence of equal-length tags t [1:n ]:
Figure BDA0001075778080000131
under the condition of network parameter setting, a Viterbi decoding algorithm can be adopted to obtain a label sequence with the highest score as a labeling result. The training method is that on the training set, the probability of the occurrence of the correct semantic annotation sequence corresponding to each sample is required to be maximum. If the current network parameters generate wrong predictions, the gradient of each parameter to the target function is calculated by using a gradient descent method, and the parameters are updated accordingly.
The Chinese semantic role labeling model of the bidirectional LSTM is characterized in that:
(1) Each time of the LSTM network (corresponding to each vocabulary of the input sentence) takes as input the concatenation of the vectors corresponding to the participles, parts of speech and/or naming types.
(2) The necessary preprocessing is carried out on the input Chinese sentence, and comprises the following steps: unified coding, traditional Chinese character conversion to simplified Chinese character conversion, full-angle conversion to half-angle conversion, special character conversion, non-standard phrase replacement, and unified conversion of recognized electronic mail box, web address, date, time, percentage, quantifier, currency, telephone number, number and foreign word into special character.
(3) Bi-directional LSTM is used to generate a feature representation for each chinese vocabulary.
(4) The model uses the following key parameters: the dimension of the vocabulary feature vector is 30, the dimension of the part-of-speech feature vector is 10, the dimension of the type feature vector is 10, the number of blocks of each LSTM is 50, and each Block comprises 1 Cell unit.
(5) For the bidirectional LSTM with transition probability, the transition probability among semantic labels is introduced at the same time, and then the Viterbi decoding is adopted to label the semantic roles of the Chinese sentences.
The following is a description of specific embodiments of the present invention:
for example, the mobile phone receives an account of a short message' your tail number 5714, and completes an existing transaction by 15 points at 11 days 16/07 th, wherein the amount is 1300.00 yuan and the balance is 3456.03 yuan. [ agricultural Bank of China ] ".
Firstly, the original text is subjected to standard processing, for example, some short messages are written as [ in ] ", so that the standard, full angle and half angle and different forms of various symbols are required to be carried out, and the subsequent processing is convenient after the different forms are unified.
And then recognizing the vocabulary of the special type, mainly searching and recognizing in the text character string by adopting a regular expression mode, thus recognizing:
3-6:DIGIT 5714
11-16 days
17-22, TIME 11 time 15 minutes
35-42
46-53
And meanwhile, the mark symbols in the text can be identified. [] "position of the substrate.
According to the named recognition unit or the custom dictionary (usually, a specific vocabulary which cannot be recognized by the named recognition unit can be added into the custom dictionary, such as a bank keyword is added in the custom dictionary in advance), the following can be recognized:
56-61
Note: the two numbers in the first column are the starting position of the special vocabulary in the original text (the first character is counted from 0).
Then, after preprocessing, the recognized word segments form the next constraint (i.e. the words are not re-segmented and part-of-speech tagged), and the constraint can be represented by a character string, which represents the word segment and part-of-speech of each character, for example "
O O O B_D I_D I_D E_D O O O O B_NT I_NT I_NT I_NT_I NT E_NT B_NT I_NT I_NT I_NT I_NT E_NT O O O O O O O O S_PU O O O B_D I_D I_D I_D I_D I_D I_D E_D S_PU O O B_D I_D I_D I_D I_D I_D I_D E_D S_PU S_PU B_NR I_NR I_NR I_NR I_NR E_NR S_PU”
The above-mentioned "O" indicates other characters, and word segmentation and part-of-speech recognition are performed in the next step. Such as "B _ D" representing the beginning of a digital word, "I _ D" representing the middle of a digital word, and "E _ D" representing the end of a digital word. The underline _ "indicates the position of the character in the word before and the part of speech after, which is to perform joint segmentation and part of speech tagging. "B", "I", "E" indicate the beginning, middle, and end, respectively, of a character in a participle. The "S" symbol represents an individual word, e.g., the punctuation symbol is represented by "S _ PU". "NT" represents a temporal noun, "NR" represents a special noun, and various parts of speech such as other verbs, adjectives, etc., may be specified in advance.
After word segmentation and part-of-speech tagging, each word in the text can be distinguished (original word before "/" and part-of-speech after "/") as follows:
"you/PN tail number/U account/NN of NN 5714/D15 min/NT completed/V one/D pen/M present/V transaction/V at/P07 month 16/NT 11,/PU amount/NN is/V1300.00 yuan/D,/PU balance/NN 3456.03 yuan/D. The term "PU" (/ PU China agricultural Bank/NR ]/PU ").
In the above example, for example, the word "end" is a part of speech, which is a common noun and is denoted by "NN". Also for example, the word "5714" has a part of speech of a number, denoted by "D", the word "transaction", and a part of speech of a verb, denoted by "V". The participle is "[", the part of speech is punctuation, represent with "PU". By analogy, the normalized text is segmented according to word segmentation units (single words and punctuations are also used as independent word segmentation structures), and the part of speech of the word in the text is marked.
When semantic analysis is carried out, words of special types can be uniformly expressed, namely, a label symbol is used for replacing the words, so that the following steps are carried out:
"you/PN tail/NN DIGIT/D/U Account/NN in/P DATE/NT TIME/NT complete/V pen/D pen/M extant/V transaction/V,/PU amount/NN is/V CURRENCY/D,/PU balance/NN CURRENCY/D. Per PU [/PU BANK/NR ]/PU "
Words which are interesting to the user can be extracted through semantic analysis according to word segmentation, part of speech and/or naming identification types, for example, key information such as a bank notification short message, date, time, account number, amount of money in and out, balance, bank name and the like can be extracted, the key information, namely semantic roles are labeled and marked behind the corresponding words, and the words are separated by "/". "/" is followed by "O", i.e., something that does not need to be extracted.
Semantic analysis results for this example: "you/O tail number/O5714/ACCOUNT/O Account/O completed/O one/O pen/O extant/O transaction/O15 min/TIME at/O07 month 16/DATE 11,/O amount/O is/O1300.00 yuan/INCOME,/O BALANCE/O3456.03 yuan/BALANCE. O [/O Chinese agricultural Bank/BANK ]/O ].
Wherein "ACCOUNT", "DATE", "TIME", "INCOME", "BALANCE", "BANK" are semantic role labels and are labeled on the corresponding participles.
Finally, according to the extracted key information, prompting, interaction and the like are carried out in an interface or an application. For example, receiving the above short message may prompt the user to:
event entry
Account number 5714
Day 07, month 16
Time 11 hours and 15 minutes
Posting 1300.00 yuan
Balance 3456.03 yuan
Bank China agricultural Bank
According to the scheme provided by the embodiment of the invention, the Chinese sequence labeling network and the learning algorithm based on deep learning, the Chinese sentence classification model based on the convolutional neural network with the dynamic k-max pooling, the Chinese semantic role labeling model of the bidirectional LSTM with the transition probability, and the integration and integration mode of the key technologies are adopted. The developed system can be deployed on mobile computing platforms with relatively limited computing resources such as mobile phones and the like, can complete complex Chinese semantic analysis tasks without additional computing resources and equipment, and can greatly improve the response speed and user satisfaction of related applications.
Although the present invention has been described in detail hereinabove, the present invention is not limited thereto, and various modifications can be made by those skilled in the art in light of the principle of the present invention. Thus, modifications made in accordance with the principles of the present invention should be understood to fall within the scope of the present invention.

Claims (10)

1. A method for Chinese semantic analysis based on deep learning comprises the following steps:
the mobile terminal obtains a standard Chinese text by performing standardized processing on the acquired Chinese text;
the mobile terminal performs special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and takes a recognition result as a constraint condition;
the mobile terminal obtains a Chinese word segmentation and part-of-speech tagging model according to the constraint conditions and by utilizing deep learning, and performs Chinese word segmentation and part-of-speech analysis on the standardized Chinese text to obtain the word segmentation and part-of-speech of the standardized Chinese text;
and the mobile terminal performs Chinese semantic analysis on the standardized Chinese text by utilizing the word segmentation, the part of speech and/or the naming identification type of the standardized Chinese text.
2. The method of claim 1, wherein the mobile terminal performs special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition on the canonical Chinese text, and the recognition result is used as a constraint condition, comprising:
the mobile terminal performs special type vocabulary recognition on the standard Chinese text by using a special type vocabulary template to obtain a special type vocabulary recognition result of the standard Chinese text, and the obtained special type vocabulary recognition result is used as a first constraint condition.
3. The method of claim 1, wherein the mobile terminal performs special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition on the canonical Chinese text, and the recognition result is used as a constraint condition, comprising:
and the mobile terminal carries out user-defined vocabulary recognition on the standard Chinese text by using the user-defined dictionary to obtain a user-defined vocabulary recognition result of the standard Chinese text, and the obtained user-defined vocabulary recognition result is used as a second constraint condition.
4. The method of claim 1, wherein the mobile terminal performs special type vocabulary recognition and/or custom vocabulary recognition and/or Chinese naming recognition on the canonical Chinese text, and the recognition result is used as a constraint condition, comprising:
the mobile terminal conducts Chinese naming recognition on the standard Chinese text by utilizing the Chinese naming recognition model obtained through deep learning to obtain a Chinese naming recognition result of the standard Chinese text, and the obtained Chinese naming recognition result is used as a third constraint condition.
5. The method of any of claims 2-4, the constraints comprising at least one of a first constraint, a second constraint, and a third constraint, or a combination thereof.
6. The method according to any of claims 1-5, wherein the mobile terminal performing the chinese semantic analysis on the normalized chinese text by using the segmentation, the part of speech and/or the named recognition type of the normalized chinese text comprises:
and the mobile terminal classifies the standard Chinese text according to the characters of the standard Chinese text and a Chinese sentence model based on a convolutional neural network with dynamic k-max pooling to obtain a sentence classification result of the standard Chinese text.
7. The method of claim 6, wherein the mobile terminal performing the chinese semantic analysis on the normalized chinese text using the segmentation, part of speech and/or named recognition type of the normalized chinese text comprises:
and the mobile terminal determines a Chinese semantic role labeling model of the bidirectional long-and-short-term memory LSTM according to the sentence classification result, and performs semantic role labeling on each participle and symbol of the standard Chinese text according to the participle, part of speech and/or name identification type of the standard Chinese text and the Chinese semantic role labeling model of the bidirectional long-and-short-term memory LSTM to obtain a semantic role labeling result of the standard Chinese text.
8. The method of claim 7, wherein the mobile terminal performing the chinese semantic analysis on the normalized chinese text using the segmentation, the part of speech, and/or the named recognition type of the normalized chinese text comprises:
and the mobile terminal carries out structural processing on the standard Chinese text according to the semantic role labeling result and the event model of the standard Chinese text, and extracts key information of the standard Chinese text.
9. The method of claim 8, the key information of the canonical chinese text including an event name, a key attribute, and an attribute value.
10. An apparatus for deep learning based Chinese semantic analysis, comprising:
the normalization processing module is used for performing normalization processing on the acquired Chinese text to obtain a normalized Chinese text;
the recognition module is used for carrying out special type vocabulary recognition and/or user-defined vocabulary recognition and/or Chinese naming recognition on the standard Chinese text, and taking a recognition result as a constraint condition;
and the analysis module is used for performing Chinese word segmentation and part-of-speech analysis on the standardized Chinese text according to the constraint conditions and by utilizing deep learning to obtain a Chinese word segmentation and part-of-speech tagging model, obtaining the segmentation and part-of-speech of the standardized Chinese text, and performing Chinese semantic analysis on the standardized Chinese text by utilizing the segmentation, part-of-speech and/or naming identification types of the standardized Chinese text.
CN201610658579.XA 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device Active CN107729309B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610658579.XA CN107729309B (en) 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device
PCT/CN2016/105977 WO2018028077A1 (en) 2016-08-11 2016-11-15 Deep learning based method and device for chinese semantics analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610658579.XA CN107729309B (en) 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device

Publications (2)

Publication Number Publication Date
CN107729309A CN107729309A (en) 2018-02-23
CN107729309B true CN107729309B (en) 2022-11-08

Family

ID=61161388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610658579.XA Active CN107729309B (en) 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device

Country Status (2)

Country Link
CN (1) CN107729309B (en)
WO (1) WO2018028077A1 (en)

Families Citing this family (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232182B (en) * 2018-04-10 2023-05-16 蔚来控股有限公司 Semantic recognition method and device and voice dialogue system
CN110413983B (en) * 2018-04-27 2022-09-27 北京海马轻帆娱乐科技有限公司 Method and device for identifying name
CN108806671B (en) * 2018-05-29 2019-06-28 杭州认识科技有限公司 Semantic analysis, device and electronic equipment
CN108764194A (en) * 2018-06-04 2018-11-06 科大讯飞股份有限公司 A kind of text method of calibration, device, equipment and readable storage medium storing program for executing
CN108874776B (en) * 2018-06-11 2022-06-03 北京奇艺世纪科技有限公司 Junk text recognition method and device
CN109101584B (en) * 2018-07-23 2020-11-03 湖南大学 Sentence classification improvement method combining deep learning and mathematical analysis
CN109145296A (en) * 2018-08-09 2019-01-04 新华智云科技有限公司 A kind of general word recognition method and device based on monitor model
CN109582949B (en) * 2018-09-14 2022-11-22 创新先进技术有限公司 Event element extraction method and device, computing equipment and storage medium
CN109344406B (en) * 2018-09-30 2023-06-20 创新先进技术有限公司 Part-of-speech tagging method and device and electronic equipment
CN109543187B (en) * 2018-11-23 2021-09-17 中山大学 Method and device for generating electronic medical record characteristics and storage medium
CN109657207B (en) * 2018-11-29 2023-11-03 爱保科技有限公司 Formatting processing method and processing device for clauses
CN109710924B (en) * 2018-12-07 2022-04-12 平安科技(深圳)有限公司 Text model training method, text recognition method, device, equipment and medium
CN109615006B (en) * 2018-12-10 2021-08-17 北京市商汤科技开发有限公司 Character recognition method and device, electronic equipment and storage medium
CN109753564A (en) * 2018-12-13 2019-05-14 四川大学 The construction method of Chinese RCT Intelligence Classifier based on machine learning
CN111368506B (en) * 2018-12-24 2023-04-28 阿里巴巴集团控股有限公司 Text processing method and device
CN109740160B (en) * 2018-12-31 2022-11-25 浙江成功软件开发有限公司 Task issuing method based on artificial intelligence semantic analysis
CN109918506B (en) * 2019-03-07 2022-12-16 安徽省泰岳祥升软件有限公司 Text classification method and device
CN110032634A (en) * 2019-04-17 2019-07-19 北京理工大学 A kind of Chinese name entity recognition method based on polymorphic type depth characteristic
CN110134954B (en) * 2019-05-06 2023-12-22 北京工业大学 Named entity recognition method based on Attention mechanism
CN110222338B (en) * 2019-05-28 2022-11-22 浙江邦盛科技股份有限公司 Organization name entity identification method
CN110321565B (en) * 2019-07-09 2024-02-23 广东工业大学 Real-time text emotion analysis method, device and equipment based on deep learning
CN110427615B (en) * 2019-07-17 2022-11-22 宁波深擎信息科技有限公司 Method for analyzing modification tense of financial event based on attention mechanism
CN110443291B (en) * 2019-07-24 2023-04-14 创新先进技术有限公司 Model training method, device and equipment
CN110674639B (en) * 2019-09-24 2022-12-09 识因智能科技有限公司 Natural language understanding method based on pre-training model
CN110826330B (en) * 2019-10-12 2023-11-07 上海数禾信息科技有限公司 Name recognition method and device, computer equipment and readable storage medium
CN110837735B (en) * 2019-11-17 2023-11-03 内蒙古中媒互动科技有限公司 Intelligent data analysis and identification method and system
CN110866401A (en) * 2019-11-18 2020-03-06 山东健康医疗大数据有限公司 Chinese electronic medical record named entity identification method and system based on attention mechanism
CN111078947B (en) * 2019-11-19 2023-06-02 太极计算机股份有限公司 XML-based domain element extraction configuration language system
CN110990532A (en) * 2019-11-28 2020-04-10 中国银行股份有限公司 Method and device for processing text
CN111144127B (en) * 2019-12-25 2023-07-25 科大讯飞股份有限公司 Text semantic recognition method, text semantic recognition model acquisition method and related device
CN113052544A (en) * 2019-12-26 2021-06-29 东软集团(上海)有限公司 Method and device for intelligently adapting workflow according to user behavior and storage medium
CN111310468B (en) * 2020-01-15 2023-05-05 同济大学 Method for realizing Chinese named entity recognition by utilizing uncertain word segmentation information
CN111507104B (en) * 2020-03-19 2022-03-25 北京百度网讯科技有限公司 Method and device for establishing label labeling model, electronic equipment and readable storage medium
CN111460831B (en) * 2020-03-27 2024-04-19 科大讯飞股份有限公司 Event determination method, related device and readable storage medium
CN112749561B (en) * 2020-04-17 2023-11-03 腾讯科技(深圳)有限公司 Entity identification method and equipment
CN111563161B (en) * 2020-04-26 2023-05-23 深圳市优必选科技股份有限公司 Statement identification method, statement identification device and intelligent equipment
CN111597350B (en) * 2020-04-30 2023-06-02 西安理工大学 Rail transit event knowledge graph construction method based on deep learning
CN111709241B (en) * 2020-05-27 2023-03-28 西安交通大学 Named entity identification method oriented to network security field
CN111666381B (en) * 2020-06-17 2022-11-18 中国电子科技集团公司第二十八研究所 Task type question-answer interaction system oriented to intelligent control
CN111931481A (en) * 2020-07-03 2020-11-13 北京新联财通咨询有限公司 Text emotion recognition method and device, storage medium and computer equipment
CN111859858B (en) * 2020-07-22 2024-03-01 智者四海(北京)技术有限公司 Method and device for extracting relation from text
CN111966579A (en) * 2020-07-24 2020-11-20 复旦大学 Self-adaptive text input generation method based on natural language processing and machine learning
CN111914538A (en) * 2020-07-31 2020-11-10 长江航道测量中心 Intelligent space matching method and system for channel announcement information
CN112101014B (en) * 2020-08-20 2022-07-26 淮阴工学院 Chinese chemical industry document word segmentation method based on mixed feature fusion
CN112052670B (en) * 2020-08-28 2024-04-02 丰图科技(深圳)有限公司 Address text word segmentation method, device, computer equipment and storage medium
CN112069814A (en) * 2020-09-01 2020-12-11 应急管理部沈阳消防研究所 Fire-fighting plan classification method based on deep learning
CN112528653B (en) * 2020-12-02 2023-11-28 支付宝(杭州)信息技术有限公司 Short text entity recognition method and system
CN112700881B (en) * 2020-12-29 2022-04-08 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN112965909B (en) * 2021-03-19 2024-04-09 湖南大学 Test data, test case generation method and system and storage medium
CN112966525B (en) * 2021-03-31 2023-02-10 上海大学 Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN113177108A (en) * 2021-05-27 2021-07-27 中国平安人寿保险股份有限公司 Semantic role labeling method and device, computer equipment and storage medium
CN114706942B (en) * 2022-03-16 2023-11-24 马上消费金融股份有限公司 Text conversion model training method, text conversion device and electronic equipment
CN115048940B (en) * 2022-06-23 2024-04-09 之江实验室 Chinese financial text data enhancement method based on entity word attribute characteristics and back translation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
WO2014087506A1 (en) * 2012-12-05 2014-06-12 三菱電機株式会社 Word meaning estimation device, word meaning estimation method, and word meaning estimation program
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105677802A (en) * 2015-12-31 2016-06-15 宁波公众信息产业有限公司 Internet information analysis system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7047183B2 (en) * 2001-08-21 2006-05-16 Microsoft Corporation Method and apparatus for using wildcards in semantic parsing
US8326809B2 (en) * 2008-10-27 2012-12-04 Sas Institute Inc. Systems and methods for defining and processing text segmentation rules
CN104268200A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Unsupervised named entity semantic disambiguation method based on deep learning
CN104965822B (en) * 2015-07-29 2017-08-25 中南大学 A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech
CN105243055B (en) * 2015-09-28 2018-07-31 北京橙鑫数据科技有限公司 Based on multilingual segmenting method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
WO2014087506A1 (en) * 2012-12-05 2014-06-12 三菱電機株式会社 Word meaning estimation device, word meaning estimation method, and word meaning estimation program
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105677802A (en) * 2015-12-31 2016-06-15 宁波公众信息产业有限公司 Internet information analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于条件随机场的无监督中文词性标注;孙静等;《计算机应用与软件》;20110415(第04期);全文 *

Also Published As

Publication number Publication date
WO2018028077A1 (en) 2018-02-15
CN107729309A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN107729309B (en) Deep learning-based Chinese semantic analysis method and device
CN111966917B (en) Event detection and summarization method based on pre-training language model
CN109657230B (en) Named entity recognition method and device integrating word vector and part-of-speech vector
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN111291566B (en) Event main body recognition method, device and storage medium
CN111274394A (en) Method, device and equipment for extracting entity relationship and storage medium
CN110263325A (en) Chinese automatic word-cut
CN111709242B (en) Chinese punctuation mark adding method based on named entity recognition
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN110276069A (en) A kind of Chinese braille mistake automatic testing method, system and storage medium
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN112434535A (en) Multi-model-based factor extraction method, device, equipment and storage medium
CN108829823A (en) A kind of file classification method
CN114416942A (en) Automatic question-answering method based on deep learning
CN114298035A (en) Text recognition desensitization method and system thereof
CN115080750B (en) Weak supervision text classification method, system and device based on fusion prompt sequence
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN113051887A (en) Method, system and device for extracting announcement information elements
CN114781392A (en) Text emotion analysis method based on BERT improved model
CN115269834A (en) High-precision text classification method and device based on BERT
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN113221553A (en) Text processing method, device and equipment and readable storage medium
CN111950281B (en) Demand entity co-reference detection method and device based on deep learning and context semantics
CN116522165B (en) Public opinion text matching system and method based on twin structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant