CN107729309A - A kind of method and device of the Chinese semantic analysis based on deep learning - Google Patents
A kind of method and device of the Chinese semantic analysis based on deep learning Download PDFInfo
- Publication number
- CN107729309A CN107729309A CN201610658579.XA CN201610658579A CN107729309A CN 107729309 A CN107729309 A CN 107729309A CN 201610658579 A CN201610658579 A CN 201610658579A CN 107729309 A CN107729309 A CN 107729309A
- Authority
- CN
- China
- Prior art keywords
- chinese
- text
- chinese text
- identification
- mobile terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of method and device of the Chinese semantic analysis based on deep learning, it is related to natural language processing technique field, its method includes:Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text;Mobile terminal carries out the identification of specific type vocabulary and/or the identification of self-defined vocabulary and/or Chinese name identification to specification Chinese text, and using recognition result as constraints;Mobile terminal obtains Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning, and Chinese word segmentation is carried out to institute's specification Chinese text and part of speech is analyzed, obtains the participle and part of speech of specification Chinese text;Mobile terminal carries out Chinese semantic analysis using the participle of the specification Chinese text, part of speech and/or name identification types to institute's specification Chinese text.
Description
Technical field
The present invention relates to natural language processing technique field, more particularly to a kind of Chinese semantic analysis based on deep learning
Method and device.
Background technology
CNLU has been achieved with rapid progress at present, is particularly produced in terms of Chinese word segmentation and part of speech analysis
Substantial amounts of achievement in research is given birth to.Although for English and Japanese, Chinese automated analysis technology is still relatively backward, it
Preceding research accumulation causes research and development to carry out high-level semantic analysis and the system understood, and is applied to and actually turns into
May.With the system of semantic analysis technology by the level of intelligence and adaptibility to response of the system that is greatly enhanced.Semantic analysis technology
Be text message analysis with processing key and difficult point, and the analysis of information extraction, user view, information fusion, question answering,
The bases such as intelligent inference.
On the other hand, deep learning is the progress of recent making a breakthrough property of artificial intelligence study, and it finishes artificial intelligence
Up to the situation that 10 years fail to have breakthrough, and had an impact in industrial quarters rapidly.Deep learning is different from only can be with complete
Into the narrow artificial intelligence system (towards the functional simulation of particular task) of particular task, as general artificial intelligence skill
Art, various situations and problem can be tackled, obtain extremely successfully applying in fields such as image recognition, speech recognitions, certainly
Right Language Processing field (mainly English) also obtains effect.
The content of the invention
The technical problem that the scheme provided according to embodiments of the present invention solves is that Chinese semantic automated analysis is inaccurate.
A kind of method of the Chinese semantic analysis based on deep learning provided according to embodiments of the present invention, including:
Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text;
Mobile terminal to specification Chinese text carry out the identification of specific type vocabulary and/or self-defined vocabulary identification and/or in
Text name identification, and using recognition result as constraints;
Mobile terminal obtains Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning, to institute
Specification Chinese text carries out Chinese word segmentation and part of speech analysis, obtains the participle and part of speech of specification Chinese text;
Mobile terminal is using the participle of the specification Chinese text, part of speech and/or name identification types, to institute's specification Chinese
Text carries out Chinese semantic analysis.
Preferably, the mobile terminal carries out the identification of specific type vocabulary and/or self-defined vocabulary to specification Chinese text
Identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out specific type vocabulary identification using specific type vocabulary template to specification Chinese text, obtains
To the specific type vocabulary recognition result of the specification Chinese text, and using obtained specific type vocabulary recognition result as
One constraints.
Preferably, the mobile terminal carries out the identification of specific type vocabulary and/or self-defined vocabulary to specification Chinese text
Identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out self-defined vocabulary identification using Custom Dictionaries to specification Chinese text, obtains the rule
The self-defined vocabulary recognition result of model Chinese text, and using obtained self-defined vocabulary recognition result as the second constraints.
Preferably, the mobile terminal carries out the identification of specific type vocabulary and/or self-defined vocabulary to specification Chinese text
Identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal obtains Chinese name identification model using deep learning and carries out Chinese life to specification Chinese text
Name identification, the Chinese name recognition result of the specification Chinese text is obtained, and resulting Chinese name recognition result is made
For the 3rd constraints.
Preferably, the constraints is included in the first constraints, the second constraints and the 3rd constraints
At least one or its combination.
Preferably, the mobile terminal using the participle of the specification Chinese text, part of speech and/or names identification types,
The semantic analysis bag of Chinese is carried out to institute's specification Chinese text to include:
The mobile terminal is according to the character of the specification Chinese text and based on the convolutional Neural with dynamic k-max ponds
The Chinese sentence model of network, sentence classification is carried out to the specification Chinese text, obtains the sentence of the specification Chinese text
Classification results.
Preferably, the mobile terminal using the participle of the specification Chinese text, part of speech and/or names identification types,
The semantic analysis bag of Chinese is carried out to institute's specification Chinese text to include:
The mobile terminal determines two-way LSTM (Long-Short Term Memory, length according to sentence classification results
When remember) Chinese Semantic Role Labeling model, further according to the participle of the specification Chinese text, part of speech and/or name type,
And the Chinese Semantic Role Labeling model of the two-way LSTM, each participle and symbol of the specification Chinese text are carried out
Semantic character labeling, obtain the semantic character labeling result of the specification Chinese text.
Preferably, the mobile terminal using the participle of the specification Chinese text, part of speech and/or names identification types,
The semantic analysis bag of Chinese is carried out to institute's specification Chinese text to include:
The mobile terminal is according to the semantic character labeling result and event model of the specification Chinese text, to the rule
Model Chinese text carries out structuring processing, extracts the key message of the specification Chinese text.
Preferably, the key message of the specification Chinese text includes event title, determinant attribute and property value.
A kind of device of the Chinese semantic analysis based on deep learning provided according to embodiments of the present invention, including:
Standardization processing module, for by carrying out standardization processing to acquired Chinese text, obtaining specification Chinese
Text;
Identification module, for specification Chinese text carry out the identification of specific type vocabulary and/or self-defined vocabulary identification and/
Or Chinese name identification, and using recognition result as constraints;
Analysis module, for obtaining Chinese word segmentation and part-of-speech tagging mould according to the constraints and using deep learning
Type, Chinese word segmentation is carried out to institute's specification Chinese text and part of speech is analyzed, and obtains the participle and part of speech of specification Chinese text, and utilize
The participle and part of speech and/or name identification types of the specification Chinese text, semantic point of Chinese is carried out to institute's specification Chinese text
Analysis.
The scheme provided according to embodiments of the present invention, to the Chinese sentence inputted, after semantic analysis, export structure
The analysis result of change, and using the analysis result of structuring, completing event analysis, information extraction and sentiment analysis etc. needs high level
The task that semantic analysis is supported.
Brief description of the drawings
Fig. 1 is a kind of method flow diagram of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic device of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention;
Fig. 3 is the module diagram of Chinese semantic analysis provided in an embodiment of the present invention;
Fig. 4 is Chinese sequence labelling network architecture figure provided in an embodiment of the present invention;
Fig. 5 is provided in an embodiment of the present invention based on the convolutional neural networks structure chart with dynamic k-max ponds;
Fig. 6 is two-way LSTM provided in an embodiment of the present invention semantic character labeling schematic diagram.
Embodiment
Below in conjunction with accompanying drawing to a preferred embodiment of the present invention will be described in detail, it will be appreciated that described below is excellent
Select embodiment to be merely to illustrate and explain the present invention, be not intended to limit the present invention.
Fig. 1 is a kind of method flow diagram of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention, such as
Shown in Fig. 1, including:
Step S101:Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text
This;
Step S102:Mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known
Not and/or Chinese name identifies, and using recognition result as constraints;
Step S103:Mobile terminal obtains Chinese word segmentation and part-of-speech tagging according to the constraints and using deep learning
Model, Chinese word segmentation is carried out to institute's specification Chinese text and part of speech is analyzed, and obtains the participle and part of speech of specification Chinese text;
Step S104:Mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types
Institute's specification Chinese text carries out Chinese semantic analysis.
Wherein, the mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known
Not and/or Chinese name identifies, and includes recognition result as constraints:The mobile terminal utilizes specific type vocabulary
Template carries out specific type vocabulary identification to specification Chinese text, obtains the specific type vocabulary identification of the specification Chinese text
As a result, and using obtained specific type vocabulary recognition result as the first constraints.
Wherein, the mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known
Not and/or Chinese name identifies, and includes recognition result as constraints:The mobile terminal utilizes Custom Dictionaries pair
Specification Chinese text carries out self-defined vocabulary identification, obtains the self-defined vocabulary recognition result of the specification Chinese text, and will
Obtained self-defined vocabulary recognition result is as the second constraints.
Wherein, the mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known
Not and/or Chinese name identifies, and includes recognition result as constraints:The mobile terminal is obtained using deep learning
Chinese name identification model carries out Chinese name identification to specification Chinese text, obtains the Chinese name of the specification Chinese text
Recognition result, and using resulting Chinese name recognition result as the 3rd constraints.
Wherein, the constraints is included in the first constraints, the second constraints and the 3rd constraints extremely
Few a kind of or its combination.
Wherein, the identification of specific type vocabulary and/or the identification of self-defined vocabulary and/or Chinese name identification are a kind of pre- participles
And part-of-speech tagging, i.e., the specific type vocabulary and/or self-defined vocabulary and/or Chinese name that the step identifies, next
Participle and part-of-speech tagging are no longer re-started in participle and part-of-speech tagging step, therefore just constitutes a kind of constraints.
Wherein, the mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types
Institute's specification Chinese text carries out the semantic analysis bag of Chinese and included:The mobile terminal is according to the character and base of the specification Chinese text
In the Chinese sentence model of the convolutional neural networks with dynamic k-max ponds, sentence classification is carried out to the specification Chinese text,
Obtain the sentence classification results of the specification Chinese text.
Wherein, the mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types
Institute's specification Chinese text carries out the semantic analysis bag of Chinese and included:The mobile terminal determines two-way length according to the sentence classification results
Short-term memory LSTM Chinese Semantic Role Labeling model, further according to the participle of specification Chinese text, part of speech and/or name class
Type, and the Chinese Semantic Role Labeling model of the two-way long short-term memory LSTM, to each of the specification Chinese text
Participle and symbol carry out semantic character labeling, obtain the semantic character labeling result of the specification Chinese text.
Wherein, the mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types
Institute's specification Chinese text carries out the semantic analysis bag of Chinese and included:The mobile terminal is according to the semantic role of the specification Chinese text
Annotation results and event model, structuring processing is carried out to the specification Chinese text, extracts the specification Chinese text
Key message.Specifically, the key message of the specification Chinese text includes event title, determinant attribute and property value.
Fig. 2 is a kind of schematic device of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention, such as
Shown in Fig. 2, including:Standardization processing module 201, for by carrying out standardization processing to acquired Chinese text, obtaining
Specification Chinese text;Identification module 202, for carrying out the identification of specific type vocabulary and/or custom words to specification Chinese text
Remittance identification and/or Chinese name identification, and using recognition result as constraints;Analysis module 203, for according to the constraint
Condition and Chinese word segmentation and part-of-speech tagging model are obtained using deep learning, Chinese word segmentation and word are carried out to institute's specification Chinese text
Property analysis, obtain the participle and part of speech of specification Chinese text, and utilize participle, part of speech and/or the life of the specification Chinese text
Name identification types, Chinese semantic analysis is carried out to institute's specification Chinese text.
Wherein, the analysis module 202 includes:Sentence taxon, for the character according to the specification Chinese text
With the Chinese sentence model based on the convolutional neural networks with dynamic k-max ponds, sentence is carried out to the specification Chinese text
Classification, obtains the sentence classification results of the specification Chinese text.
Wherein, the analysis module 202 also includes:Semantic character labeling unit, for according to the sentence classification results
Determine two-way long short-term memory LSTM Chinese Semantic Role Labeling model, according to the participle of specification Chinese text, part of speech and/or
Identification types, and the Chinese Semantic Role Labeling model of the two-way long short-term memory LSTM are named, to specification Chinese
The elements such as individual character, participle, specific type vocabulary in text carry out semantic character labeling, obtain the language of the specification Chinese text
Adopted character labeling result.
Wherein, the analysis module 202 also includes:Structuring processing unit, for the mobile terminal according to the rule
The semantic character labeling result and event model of model Chinese text, structuring processing, extraction are carried out to the specification Chinese text
Go out the key message of the specification Chinese text.Specifically, the key message of the specification Chinese text include event title,
Determinant attribute and property value.Wherein, event title can correspond to sentence classification results.Such as the short message text to terminal reception
For, sentence disaggregated model be divided into bank statement, flight train, appointment, weather forecast and other etc..Sentence can so be classified
Result type as event title.Determinant attribute is semantic character labeling result.For example in bank statement short message, it is labeled as
Bill day, spending amount, several classifications such as date, repayment amount of refunding, property value are to be labeled as the original corresponding to above-mentioned classification
Occurrence in beginning short message text, such as exact date, the specific amount of money.
Fig. 3 is the module diagram of Chinese semantic analysis provided in an embodiment of the present invention, as shown in figure 3, utilizing depth
Habit technology for the Chinese sentence that is inputted after semantic analysis, the analysis result of export structure, and utilize structuring
Analysis result, complete the task that event analysis, information extraction and sentiment analysis etc. need high-level semantic analysis to support, specific bag
Include:
Text normalization processing:Standardization processing is carried out to inputting Chinese sentence, including:Unified coding, traditional font turn letter
Body, full-shape turn half-angle, spcial character conversion, term lack of standardization replacement (such as:Cyberspeak is substituted for canonical representation).
Self-defined vocabulary identification:Customized vocabulary is identified using Custom Dictionaries, including:Application field word
Remittance, Chinese idiom, food, place, works, equipment, name, place name and mechanism name.
Specific type vocabulary identifies:By defining identification E-mail address, network address, date, time, percentage, measure word, goods
E-mail address that coin, telephone number, numeral, the template of outer cliction include to read statement, network address, the date, the time, percentage,
Measure word, currency, telephone number, numeral, outer cliction are identified, and are replaced with special character.
Chinese name identification:By preparing the language material of Chinese name identification, using the Chinese sequence labelling net shown in Fig. 4
Network model, and train for carry out Chinese name identification Chinese name identification model, to the name in read statement,
Name, mechanism name are identified, that is, identify specific name, place name, mechanism name and while the corresponding name class of preservation in sentence
Type (for example available " Person ", " Location ", " Organization " are represented respectively).
Chinese word segmentation and part-of-speech tagging:With the identification of specific type vocabulary and/or the identification of self-defined vocabulary, and/or Chinese
The result of name identification prepares joint Chinese word segmentation and part-of-speech tagging language material as constraint, using the Chinese sequence shown in Fig. 4
Network model is marked, is trained for carrying out Chinese word segmentation and part of speech analysis joint note target Chinese word segmentation and part-of-speech tagging model,
Joint Chinese word segmentation and part of speech analysis are carried out to read statement.
Sentence is classified:Before semantic character labeling is carried out, using the volume in the band dynamic k-max ponds shown in institute Fig. 5
Sentence semantics caused by product neutral net represent to classify to sentence, while filter using uninterested input sentence.
The sentence classification language material of the sentence for including all types of balances and negative sample sentence (applying uninterested Chinese sentence) is used,
The Chinese sentence disaggregated model of the convolutional neural networks in training band dynamic k-max ponds, input sentence is divided with this model
Class, while filter using uninterested input sentence.
Semantic character labeling:Determine that two-way LSTM semantic tagger network model is (i.e. different according to sentence classification results
Sentence class categories use different analytic modell analytical models), then to the participle in specification text, part of speech and/or name type use
The semantic tagger network of two-way LSTM shown in institute Fig. 6 carries out semantic character labeling to sentence.I.e. according to participle, part of speech and/or
Type is named, prepares the semantic character labeling language material of same sentence classification, and train two-way LSTM Chinese semantic role
Marking model, semantic character labeling is carried out to sentence with this model.
Event analysis:According to semantic character labeling result, binding events template is packaged into the structuring after semantic analysis
Represent, extract the title, determinant attribute and property value of event.
Wherein, the form of the training corpus of semantic character labeling is one vocabulary a line of order that word is pressed in sentence, is often gone
Totally 5 row, successively respectively represent participle in itself (E-mail address, network address, the date, the time, percentage, measure word, currency, telephone number,
Digital, outer cliction etc. also treats as independent participle with English tag replacement, individual character or punctuation mark etc.), semantic label (" O " table
Show class unrelated with task), part of speech label, name identification label, former word form of the participle in sentence.Between each sample by
One null separates.
Wherein, the sequence labelling based on deep learning such as underway text participle and part-of-speech tagging, Chinese name identification is appointed
During business, specific type vocabulary is identified and/or the result of self-defined vocabulary identification carries out decoding algorithm as constraint and (carries out Chinese
Constraints when participle and part-of-speech tagging can also increase Chinese name recognition result), including:
(1) by template in advance to E-mail address, network address, date, time, percentage, measure word, currency, telephone number, number
The types such as word, outer cliction are identified.
(2) support to including words such as Field Words, Chinese idiom, food, place, works, equipment, name, place name and mechanism names
The progress that converges is self-defined.
(3) combine deep learning neural network forecast to export, by the knot that specific type vocabulary identifies and/or self-defined vocabulary identifies
Fruit carries out the algorithm of Veterbi decoding as constraint.
Fig. 4 is Chinese sequence labelling network architecture figure provided in an embodiment of the present invention, can be used for Chinese name and knows
Not, Chinese word segmentation and part-of-speech tagging (note:Training corpus is different, and the model data trained is different, and constraints is also different).Such as
Shown in Fig. 4, the Chinese sequence labelling network model model of deep learning receives a Chinese sentence as input, exports with character
(including:Character in Chinese character, punctuation mark and other sentences being likely to occur) be unit sequence labelling result.Tag set
Add the label after particular task tag extension using participle label.By taking Chinese name identification as an example, if representing name with " PER "
Label, then following sentence:
" Zhuge Liang is military counsellor of Liu for military bloc.”
Annotation results are accordingly for institute:
“B_PER I_PER E_PER O B_PER E_PER O O O O O O O O”。
Wherein:" B " represents the beginning character of vocabulary, and " I " represents the intermediate character of vocabulary, and " E " represents the end word of vocabulary
Symbol, " O " represent the character unrelated with task.Also having " S " to represent in addition can be individually into character (such as individual character or punctuate symbol of word
Number).
The label of one character is typically related to character around it, thus uses window model, i.e., in estimation current character
When belonging to the possibility of some label, using this character and the character of surrounding as input (as shown in Figure 4).If window is big
It is small to be arranged to 5, then it represents that using each two characters of this character and its left side and the right as input window.If the left side and the right
Character quantity be not enough to size as defined in window, then replaced using filler.
The character of each input will be converted into corresponding vector representation by way of searching word vector table.Each word
The expression of symbol can at random generate or carry out pre-training using unsupervised method.These vectors are spliced afterwards, table
Show the character representation of some window.After a linear net network layers (middle hidden layer), carried out using Sigmoid functions non-thread
Property conversion, finally reuse a linear layer, the output vector equal with task number of labels, each element representation correspondence of vector
The possibility of label.
A Chinese sentence is given, network can export a matrix, each element f θ in matrix (t | i) represent sentence
In i-th of character belong to label t possibility estimation, wherein θ represent network parameter.In sequence labelling task, due to
There is very strong dependence between front and rear label, introduce the possibility that matrix A ij represents to jump to label j from label i and (also include
In parameter sets θ).Give a sentence s [1 containing n character:N], can be some isometric sequence label t [1:n]
Estimate point:
In the case where parameter is given, a score value highest sequence label can be obtained using Viterbi decoding algorithm and is made
For annotation results.
The method of training is on training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
Wherein:(s, t) represents a sample in training set.Training uses gradient descent method, and all parameters of network use
Below equation is updated:
Wherein:λ represents Learning Step.
Wherein, Chinese sequence labelling network and learning algorithm based on deep learning are characterised by:
(1) the Chinese sentence for input has carried out necessary pretreatment, including:Unified coding, traditional font turn simplified, full-shape
Turn half-angle, spcial character conversion, term lack of standardization replace, by the E-mail address recognized, network address, the date, the time, percentage,
Measure word, currency, telephone number, numeral, outer cliction are identified and are uniformly converted into spcial character.
(2) when using Veterbi decoding, the identification of self-defined vocabulary, the identification of specific type vocabulary and Chinese are named into identification
As a result as constraint.
(3) using the dimension of word vector 100, the network configuration (tool that window size is 3 or 5, middle hidden neuron quantity 300
Body parameter relies on language material sample set size).
Fig. 5 is provided in an embodiment of the present invention based on the convolutional neural networks structure chart with dynamic k-max ponds, such as Fig. 5
It is shown, using Chinese sentence as input, the semantic expressiveness of full sentence, according to belonging to sentence is predicted in the expression and task are produced by network
Related classification.
Each character in input sentence is converted into corresponding vector table by network by way of searching word vector table first
Show.The expression of each character can at random generate or carry out pre-training using unsupervised method.Shape after sentence is converted
Into an eigenmatrix.Second step:On every dimension of eigenmatrix, according to the window size of setting, using the side of convolution
Window feature input is converted into new feature by method.Window slides successively from left to right on eigenmatrix, produces and feature square
The character representation of number of arrays identical higher level.Different dimensions use different convolution kernels, so as to produce input feature vector matrix
A characteristics map.Multiple characteristics maps can be produced using a different set of convolution kernel simultaneously.Each characteristics map
On k features the most significant are produced using the method in k-max ponds, i.e., the characteristic value of k maximum of extraction on every dimension,
But the order of these characteristic values keeps its order in input feature vector map.Used on matrix of consequence behind k-max ponds
HardTanh nonlinear functions carry out Feature Conversion.Above-mentioned second step can be superimposed multilayer, one layer of new result in last layer
Upper progress.The k values in the k-max ponds of last layer are fixed (hyper parameter of model), and each before layer of k values take last layer
Higher value after rounding up in both of k values and the values that are calculated of formula (H-h/H) × L.3rd step by last
All characteristic values that layer obtains are spliced, to produce the semantic expressiveness of full sentence.On the basis of semantic expressiveness, pass through a line
Property layer and Softmax layers are predicted to the affiliated type of sentence.
Due to having used Softmax layers, network output can regard different classes of probability distribution as.Training is used under gradient
Drop method, the target of network training is to increase correctly predicted probability on training set, while reduces the probability of error prediction.
Wherein, the Chinese sentence disaggregated model based on the convolutional neural networks with dynamic k-max ponds is characterised by:
(1) the Chinese sentence for input has carried out necessary pretreatment, including:Unified coding, traditional font turn simplified, full-shape
Turn half-angle, spcial character conversion, term lack of standardization replace, by the E-mail address recognized, network address, the date, the time, percentage,
Measure word, currency, telephone number, numeral, outer cliction are identified and are uniformly converted into spcial character.
(2) with character (including:Character in Chinese character, punctuate and other sentences being likely to occur) rank conduct input, it is non-
The situation of Chinese is very suitable for, avoids error diffusion because of Chinese word segmentation to sentence classification task.
(3) convolution of single dimension, and the columns and input feature vector matrix columns of the characteristics map of convolutional layer output are used
It is identical, for increasing the speed of network processes.
(4) network uses two layers of convolution, wherein:First layer window size is 5, characteristics map quantity is 2, second layer window
Mouth size is 3, characteristics map quantity is 3.The k values in the k-max ponds of last layer are 5.
Fig. 6 is two-way LSTM provided in an embodiment of the present invention semantic character labeling schematic diagram, as shown in fig. 6, to difference
Sentence classification results use different semantic character labeling models, in semantic character labeling to segment, part of speech and/or name
Identification types, input is used as after collated, using the semantic label set associated by sentence classification, to sentence in units of segmenting
Carry out semantic character labeling.
The input at each moment (each vocabulary of corresponding input sentence) of network is current vocabulary, part of speech and/or life
Name identification types (the classification i.e. in Chinese name identification, such as with " Person ", " Location ", " Organization " point
The name that does not represent, place name, mechanism name) it is converted into the splicing vector representation after vector.Using two LSTM difference from left to right
(forward direction) and from right to left (backward) processing input sentence.For each vocabulary, LSTM can export a vector representation, splicing
Vector representation (contextual information itself that merged and its left and right) of the output caused by forward and backward LSTM as vocabulary,
It is denoted as inputting with this, the label belonging to vocabulary is predicted using a linear layer.
The dependence predicted between vocabulary label, i.e. band can also be further utilized on the basis of two-way LSTM models
The two-way LSTM of transition probability.A Chinese sentence is given, network can export a matrix, each element f θ in matrix
(t | i) represents that i-th of vocabulary in sentence belongs to the estimation of label t possibility, and wherein θ represents the parameter of network.Marked in semanteme
In note task, due to also there is certain dependence between front and rear label, introduce matrix A ij and represent to jump to label j from label i
Possibility (being also contained in parameter sets θ).The given sentence s [1 containing n vocabulary:N], can be some isometric mark
Sign sequence t [1:N] estimate point:
In the case where network parameter is given, a score value highest label sequence can be obtained using Viterbi decoding algorithm
Row are used as annotation results.The method of training is on training set, it is desirable to the correct semantic tagger sequence corresponding to each sample
The maximum probability of generation.If current network parameter produces error prediction, each parameter is calculated for target using gradient descent method
The gradient of function, accordingly undated parameter.
Two-way LSTM Chinese Semantic Role Labeling model is characterised by:
(1) each moment (each vocabulary of corresponding input sentence) of LSTM networks is with participle, part of speech and/or name class
Vectorial splicing is as input corresponding to type.
(2) the Chinese sentence for input has carried out necessary pretreatment, including:Unified coding, traditional font turn simplified, full-shape
Turn half-angle, spcial character conversion, term lack of standardization replace, by the E-mail address recognized, network address, the date, the time, percentage,
Measure word, currency, telephone number, numeral, outer cliction are identified and are uniformly converted into spcial character.
(3) character representation of each Chinese vocabulary is produced using two-way LSTM.
(4) model uses following key parameter:Lexical feature vector dimension is 30, part of speech feature vector dimension is 10, class
Type characteristic vector dimension is 10, and each LSTM Block quantity is 50, and each Block includes 1 Cell unit.
(5) for the two-way LSTM with transition probability, while the transition probability introduced between semantic label, then use are tieed up
Special ratio decoder carries out the semantic character labeling of Chinese sentence.
Illustrate the particular content of the embodiment of the present invention with specific embodiment below:
Such as mobile phone receive a short message " account of your tail number 5714 when 07 month 16 days 11 15 divide complete an existing friendship
Easily, the amount of money is 1300.00 yuan, 3456.03 yuan of remaining sum.[Agricultural Bank of China] ".
First to urtext carry out specification handles, than if any short message in " [" write as "【", this requires to carry out specification
Change, full-shape half-angle, the multi-form of various symbols, be easy to subsequent treatment after reunification.
The vocabulary of specific type is identified again, mainly searched for by the way of regular expression in text-string
Identification, thus can recognize that:
3-6:DIGIT 5714
11-16:16 days DATE 07 months
17-22:15 divide during TIME 11
35-42:1300.00 yuan of CURRENCY
46-53:3456.03 yuan of CURRENCY
Simultaneously also can recognize that punctuation mark in text ",.The position of [] ".
According to name recognition unit or Custom Dictionaries, (the generally specific vocabulary that cannot recognize that of name recognition unit can be with
It is added in Custom Dictionaries, for example with the addition of the keyword of bank's class in advance in Custom Dictionaries) also it can recognize that:
56-61:The BANK Agricultural Bank of China
Note:Two numerals of above-mentioned first row are that original position of the special word in urtext (from 0 count by initial character
Number).
So by pretreatment, the above-mentioned participle identified just forms the constraint of next step, and (i.e. these vocabulary are no longer
Again segmented and part-of-speech tagging), constraints can use a string representation, represent the participle and part of speech of each character,
Such as "
O O O B_D I_D I_D E_D O O O O B_NT I_NT I_NT I_NT_I NT E_NT B_NT I_NT
I_NT I_NT I_NT E_NT O O O O O O O O S_PU O O O B_D I_D I_D I_D I_D I_D I_D E_
D S_PU O O B_D I_D I_D I_D I_D I_D I_D E_D S_PU S_PU B_NR I_NR I_NR I_NR I_NR
E_NR S_PU”
Above-mentioned " O " represents other characters, and participle and part of speech identification are carried out in next step.Such as " B_D " represents one
The beginning of digital word, " I_D " represent the centre of digital word, and " E_D " represents the ending of digital word.Underscore " _ " above represents word
Accord with position in word, behind represent part of speech, here it is carry out joint participle and part-of-speech tagging." B ", " I ", " E " are represented respectively
Beginning of the character in participle, centre, at ending." S " symbol represents that single words, such as punctuation mark just use " S_PU " table
Show." NT " represents time noun, and " NR " represents special noun, and also such as various parts of speech of other verbs, adjective etc. can be with
It is pre-specified.
After participle and part-of-speech tagging, each words resolution in text can just be come (it is original words before "/", after
Face represents part of speech), such as:
" you/PN tail numbers/NN 5714/D /U accounts/NN 15 divides/NT completions/mono-/D of V when/P 07 months 16 days/NT 11
Pen/M is existing/and V transaction/V ,/PU amount of money/NN be /V 1300.00 yuan/D ,/PU remaining sum/3456.03 yuan/D of NN./ PU is [in/PU
Agricultural bank of state/NR]/PU ".
In above-mentioned example, such as participle " tail number ", its part of speech is common noun, is represented with " NN "." 5714 " and for example are segmented,
Its part of speech is numeral, is represented with " D ", is segmented " transaction ", and part of speech is verb, is represented with " V "." [", its part of speech are punctuate symbols to participle
Number, represented with " PU ".By that analogy, the text of standardization is severed from according to participle for unit that (individual character, punctuation mark are also made
For single participial construction), and mark participle part of speech in the text.
When carrying out semantic analysis, the vocabulary of specific type can be replaced, thus with unified representation with a label symbol
Have:
" you/PN tail numbers/NN DIGIT/D /U accounts/NN is existing in/P DATE/NT TIME/NT completions/mono-/D of V pens/M
Deposit/V transaction/V ,/PU the amount of money/NN is /V CURRENCY/D ,/PU remaining sums/NN CURRENCY/D./PU[/PU BANK/NR]/
PU”
Pass through semantic analysis according to participle, part of speech and/or name identification types, it is possible to it is interested to extract user
Word, for example to bank's notifying messages, date, time, account number, the crucial letter of the come in and go out amount of money, remaining sum and Bank Name etc. can be extracted
Breath, these key messages are semantic character labeling, mark behind equivalent, are separated with "/".It is not required to take out for " O " behind "/"
The content taken.
The semantic analysis result of this example:" you/O tail numbers/O 5714/ACCOUNT /O accounts/O in/16 days O 07 months/
15 divide during DATE 11/TIME completions/mono-/O of O pens/O is existing/O transaction/O ,/O amount of money/O be /O 1300.00 yuan/INCOME ,/O
Remaining sum/O3456.03 members/BALANCE./ O [/O the Agricultural Bank of China/BANK]/O ".
Wherein " ACCOUNT ", " DATE ", " TIME ", " INCOME ", " BALANCE ", " BANK " are exactly semantic role label
And corresponding to being labeled on participle.
Finally, according to the key message of extraction, prompted in interface or application, interaction etc..For example receive above
Short message, user can be prompted:
Event:Keep accounts
Account:5714
Date:16 days 07 month
Time:15 divide when 11
Keep accounts:It is 1300.00 first
Remaining sum:It is 3456.03 first
Bank:The Agricultural Bank of China
The scheme provided according to embodiments of the present invention, Chinese sequence labelling network and learning algorithm based on deep learning,
Chinese sentence disaggregated model based on the convolutional neural networks with dynamic k-max ponds, in the two-way LSTM with transition probability
Literary semantic character labeling model, and integration and the integration mode of these key technologies.Developed system is used, can be disposed
On the relatively limited mobile computing platform of the computing resources such as mobile phone, it is not necessary to just can by extra computing resource and equipment
Complicated Chinese semantic analysis task is completed, can significantly improve the response speed and user satisfaction of related application.
Although the present invention is described in detail above, the invention is not restricted to this, those skilled in the art of the present technique
Various modifications can be carried out according to the principle of the present invention.Therefore, all modifications made according to the principle of the invention, all should be understood to
Fall into protection scope of the present invention.
Claims (10)
1. a kind of method of the Chinese semantic analysis based on deep learning, including:
Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text;
Mobile terminal carries out the identification of specific type vocabulary and/or the identification of self-defined vocabulary to specification Chinese text and/or Chinese is ordered
Name identification, and using recognition result as constraints;
Mobile terminal obtains Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning, to institute's specification
Chinese text carries out Chinese word segmentation and part of speech analysis, obtains the participle and part of speech of specification Chinese text;
Mobile terminal is using the participle of the specification Chinese text, part of speech and/or name identification types, to institute's specification Chinese text
Carry out Chinese semantic analysis.
2. according to the method for claim 1, the mobile terminal carries out specific type vocabulary identification to specification Chinese text
And/or self-defined vocabulary identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out specific type vocabulary identification using specific type vocabulary template to specification Chinese text, obtains institute
State the specific type vocabulary recognition result of specification Chinese text, and using obtained specific type vocabulary recognition result as first about
Beam condition.
3. according to the method for claim 1, the mobile terminal carries out specific type vocabulary identification to specification Chinese text
And/or self-defined vocabulary identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out self-defined vocabulary identification using Custom Dictionaries to specification Chinese text, obtains in the specification
The self-defined vocabulary recognition result of text, and using obtained self-defined vocabulary recognition result as the second constraints.
4. according to the method for claim 1, the mobile terminal carries out specific type vocabulary identification to specification Chinese text
And/or self-defined vocabulary identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal obtains Chinese name identification model using deep learning to be known to the progress Chinese name of specification Chinese text
Not, obtain the Chinese name recognition result of the specification Chinese text, and regard resulting Chinese name recognition result as the
Three constraintss.
5. according to any described methods of claim 2-4, the constraints includes the first constraints, the second constraints
And the 3rd at least one of constraints or its combination.
6. according to any described methods of claim 1-5, the mobile terminal utilizes participle, the word of the specification Chinese text
Property and/or name identification types, the semantic analysis bag of Chinese is carried out to institute specification Chinese text and is included:
The mobile terminal is according to the character of the specification Chinese text and based on the convolutional neural networks with dynamic k-max ponds
Chinese sentence model, to the specification Chinese text carry out sentence classification, obtain the specification Chinese text sentence classification
As a result.
7. according to the method for claim 6, the mobile terminal using the participle of the specification Chinese text, part of speech and/
Or name identification types, the semantic analysis bag of Chinese is carried out to institute's specification Chinese text and is included:
The mobile terminal determines two-way long short-term memory LSTM Chinese Semantic Role Labeling model according to sentence classification results,
Further according to the participle of the specification Chinese text, part of speech and/or name identification types and the two-way long short-term memory LSTM
Chinese Semantic Role Labeling model, semantic character labeling is carried out to each participle and symbol of the specification Chinese text, obtained
To the semantic character labeling result of the specification Chinese text.
8. according to the method for claim 7, the mobile terminal using the participle of the specification Chinese text, part of speech and/
Or name identification types, the semantic analysis bag of Chinese is carried out to institute's specification Chinese text and is included:
The mobile terminal is according to the semantic character labeling result and event model of the specification Chinese text, in the specification
Text carries out structuring processing, extracts the key message of the specification Chinese text.
9. according to the method for claim 8, the key message of the specification Chinese text includes event title, determinant attribute
And property value.
10. a kind of device of the Chinese semantic analysis based on deep learning, including:
Standardization processing module, for by carrying out standardization processing to acquired Chinese text, obtaining specification Chinese text;
Identification module, for specification Chinese text carry out the identification of specific type vocabulary and/or self-defined vocabulary identification and/or in
Text name identification, and using recognition result as constraints;
Analysis module is right for obtaining Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning
Institute's specification Chinese text carries out Chinese word segmentation and part of speech analysis, obtains the participle and part of speech of specification Chinese text, and described in utilization
Participle, part of speech and/or the name identification types of specification Chinese text, Chinese semantic analysis is carried out to institute's specification Chinese text.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610658579.XA CN107729309B (en) | 2016-08-11 | 2016-08-11 | Deep learning-based Chinese semantic analysis method and device |
PCT/CN2016/105977 WO2018028077A1 (en) | 2016-08-11 | 2016-11-15 | Deep learning based method and device for chinese semantics analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610658579.XA CN107729309B (en) | 2016-08-11 | 2016-08-11 | Deep learning-based Chinese semantic analysis method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107729309A true CN107729309A (en) | 2018-02-23 |
CN107729309B CN107729309B (en) | 2022-11-08 |
Family
ID=61161388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610658579.XA Active CN107729309B (en) | 2016-08-11 | 2016-08-11 | Deep learning-based Chinese semantic analysis method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107729309B (en) |
WO (1) | WO2018028077A1 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764194A (en) * | 2018-06-04 | 2018-11-06 | 科大讯飞股份有限公司 | A kind of text method of calibration, device, equipment and readable storage medium storing program for executing |
CN108806671A (en) * | 2018-05-29 | 2018-11-13 | 杭州认识科技有限公司 | Semantic analysis, device and electronic equipment |
CN109101584A (en) * | 2018-07-23 | 2018-12-28 | 湖南大学 | A kind of sentence classification improved method combining deep learning with mathematical analysis |
CN109344406A (en) * | 2018-09-30 | 2019-02-15 | 阿里巴巴集团控股有限公司 | Part-of-speech tagging method, apparatus and electronic equipment |
CN109543187A (en) * | 2018-11-23 | 2019-03-29 | 中山大学 | Generation method, device and the storage medium of electronic health record feature |
CN109615006A (en) * | 2018-12-10 | 2019-04-12 | 北京市商汤科技开发有限公司 | Character recognition method and device, electronic equipment and storage medium |
CN109657207A (en) * | 2018-11-29 | 2019-04-19 | 爱保科技(横琴)有限公司 | The formatting processing method and processing unit of clause |
CN109753564A (en) * | 2018-12-13 | 2019-05-14 | 四川大学 | The construction method of Chinese RCT Intelligence Classifier based on machine learning |
CN110232182A (en) * | 2018-04-10 | 2019-09-13 | 蔚来汽车有限公司 | Method for recognizing semantics, device and speech dialogue system |
CN110413983A (en) * | 2018-04-27 | 2019-11-05 | 北京海马轻帆娱乐科技有限公司 | A kind of method and device identifying name |
CN111078947A (en) * | 2019-11-19 | 2020-04-28 | 太极计算机股份有限公司 | XML-based domain element extraction configuration language system |
CN111310468A (en) * | 2020-01-15 | 2020-06-19 | 同济大学 | Method for realizing Chinese named entity recognition by using uncertain word segmentation information |
CN111460831A (en) * | 2020-03-27 | 2020-07-28 | 科大讯飞股份有限公司 | Event determination method, related device and readable storage medium |
CN111931481A (en) * | 2020-07-03 | 2020-11-13 | 北京新联财通咨询有限公司 | Text emotion recognition method and device, storage medium and computer equipment |
CN111966579A (en) * | 2020-07-24 | 2020-11-20 | 复旦大学 | Self-adaptive text input generation method based on natural language processing and machine learning |
CN112965909A (en) * | 2021-03-19 | 2021-06-15 | 湖南大学 | Test data, test case generation method and system, and storage medium |
CN113177108A (en) * | 2021-05-27 | 2021-07-27 | 中国平安人寿保险股份有限公司 | Semantic role labeling method and device, computer equipment and storage medium |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874776B (en) * | 2018-06-11 | 2022-06-03 | 北京奇艺世纪科技有限公司 | Junk text recognition method and device |
CN109145296A (en) * | 2018-08-09 | 2019-01-04 | 新华智云科技有限公司 | A kind of general word recognition method and device based on monitor model |
CN109582949B (en) * | 2018-09-14 | 2022-11-22 | 创新先进技术有限公司 | Event element extraction method and device, computing equipment and storage medium |
CN109710924B (en) * | 2018-12-07 | 2022-04-12 | 平安科技(深圳)有限公司 | Text model training method, text recognition method, device, equipment and medium |
CN111368506B (en) * | 2018-12-24 | 2023-04-28 | 阿里巴巴集团控股有限公司 | Text processing method and device |
CN109740160B (en) * | 2018-12-31 | 2022-11-25 | 浙江成功软件开发有限公司 | Task issuing method based on artificial intelligence semantic analysis |
CN109918506B (en) * | 2019-03-07 | 2022-12-16 | 安徽省泰岳祥升软件有限公司 | Text classification method and device |
CN110032634A (en) * | 2019-04-17 | 2019-07-19 | 北京理工大学 | A kind of Chinese name entity recognition method based on polymorphic type depth characteristic |
CN110134954B (en) * | 2019-05-06 | 2023-12-22 | 北京工业大学 | Named entity recognition method based on Attention mechanism |
CN112069792A (en) * | 2019-05-24 | 2020-12-11 | 阿里巴巴集团控股有限公司 | Named entity identification method, device and equipment |
CN110222338B (en) * | 2019-05-28 | 2022-11-22 | 浙江邦盛科技股份有限公司 | Organization name entity identification method |
CN110321565B (en) * | 2019-07-09 | 2024-02-23 | 广东工业大学 | Real-time text emotion analysis method, device and equipment based on deep learning |
CN110427615B (en) * | 2019-07-17 | 2022-11-22 | 宁波深擎信息科技有限公司 | Method for analyzing modification tense of financial event based on attention mechanism |
CN110443291B (en) * | 2019-07-24 | 2023-04-14 | 创新先进技术有限公司 | Model training method, device and equipment |
CN110674639B (en) * | 2019-09-24 | 2022-12-09 | 识因智能科技有限公司 | Natural language understanding method based on pre-training model |
CN110826330B (en) * | 2019-10-12 | 2023-11-07 | 上海数禾信息科技有限公司 | Name recognition method and device, computer equipment and readable storage medium |
CN110837735B (en) * | 2019-11-17 | 2023-11-03 | 内蒙古中媒互动科技有限公司 | Intelligent data analysis and identification method and system |
CN110866401A (en) * | 2019-11-18 | 2020-03-06 | 山东健康医疗大数据有限公司 | Chinese electronic medical record named entity identification method and system based on attention mechanism |
CN110990532A (en) * | 2019-11-28 | 2020-04-10 | 中国银行股份有限公司 | Method and device for processing text |
CN111144127B (en) * | 2019-12-25 | 2023-07-25 | 科大讯飞股份有限公司 | Text semantic recognition method, text semantic recognition model acquisition method and related device |
CN113052544A (en) * | 2019-12-26 | 2021-06-29 | 东软集团(上海)有限公司 | Method and device for intelligently adapting workflow according to user behavior and storage medium |
CN111507104B (en) * | 2020-03-19 | 2022-03-25 | 北京百度网讯科技有限公司 | Method and device for establishing label labeling model, electronic equipment and readable storage medium |
CN112749561B (en) * | 2020-04-17 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Entity identification method and equipment |
CN111563161B (en) * | 2020-04-26 | 2023-05-23 | 深圳市优必选科技股份有限公司 | Statement identification method, statement identification device and intelligent equipment |
CN111597350B (en) * | 2020-04-30 | 2023-06-02 | 西安理工大学 | Rail transit event knowledge graph construction method based on deep learning |
CN111709241B (en) * | 2020-05-27 | 2023-03-28 | 西安交通大学 | Named entity identification method oriented to network security field |
CN111666381B (en) * | 2020-06-17 | 2022-11-18 | 中国电子科技集团公司第二十八研究所 | Task type question-answer interaction system oriented to intelligent control |
CN111859858B (en) * | 2020-07-22 | 2024-03-01 | 智者四海(北京)技术有限公司 | Method and device for extracting relation from text |
CN111914539B (en) * | 2020-07-31 | 2024-09-10 | 长江航道测量中心 | Channel notification information extraction method and system based on BiLSTM-CRF model |
CN111914538B (en) * | 2020-07-31 | 2024-05-31 | 长江航道测量中心 | Channel notification information intelligent space matching method and system |
CN112101014B (en) * | 2020-08-20 | 2022-07-26 | 淮阴工学院 | Chinese chemical industry document word segmentation method based on mixed feature fusion |
CN112052670B (en) * | 2020-08-28 | 2024-04-02 | 丰图科技(深圳)有限公司 | Address text word segmentation method, device, computer equipment and storage medium |
CN112069814A (en) * | 2020-09-01 | 2020-12-11 | 应急管理部沈阳消防研究所 | Fire-fighting plan classification method based on deep learning |
CN112149417A (en) * | 2020-09-16 | 2020-12-29 | 北京小米松果电子有限公司 | Part-of-speech tagging method and device, storage medium and electronic equipment |
CN112269862B (en) * | 2020-10-14 | 2024-04-26 | 北京百度网讯科技有限公司 | Text role labeling method, device, electronic equipment and storage medium |
CN112528653B (en) * | 2020-12-02 | 2023-11-28 | 支付宝(杭州)信息技术有限公司 | Short text entity recognition method and system |
CN112700881B (en) * | 2020-12-29 | 2022-04-08 | 医渡云(北京)技术有限公司 | Text standardization processing method and device, electronic equipment and computer medium |
CN115114917A (en) * | 2021-03-17 | 2022-09-27 | 航天科工深圳(集团)有限公司 | Military named entity recognition method and device based on vocabulary enhancement |
CN112966525B (en) * | 2021-03-31 | 2023-02-10 | 上海大学 | Law field event extraction method based on pre-training model and convolutional neural network algorithm |
CN114706942B (en) * | 2022-03-16 | 2023-11-24 | 马上消费金融股份有限公司 | Text conversion model training method, text conversion device and electronic equipment |
CN115048940B (en) * | 2022-06-23 | 2024-04-09 | 之江实验室 | Chinese financial text data enhancement method based on entity word attribute characteristics and back translation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN103077164A (en) * | 2012-12-27 | 2013-05-01 | 新浪网技术(中国)有限公司 | Text analysis method and text analyzer |
WO2014087506A1 (en) * | 2012-12-05 | 2014-06-12 | 三菱電機株式会社 | Word meaning estimation device, word meaning estimation method, and word meaning estimation program |
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN105677802A (en) * | 2015-12-31 | 2016-06-15 | 宁波公众信息产业有限公司 | Internet information analysis system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7047183B2 (en) * | 2001-08-21 | 2006-05-16 | Microsoft Corporation | Method and apparatus for using wildcards in semantic parsing |
US8326809B2 (en) * | 2008-10-27 | 2012-12-04 | Sas Institute Inc. | Systems and methods for defining and processing text segmentation rules |
CN104268200A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Unsupervised named entity semantic disambiguation method based on deep learning |
CN104965822B (en) * | 2015-07-29 | 2017-08-25 | 中南大学 | A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech |
CN105243055B (en) * | 2015-09-28 | 2018-07-31 | 北京橙鑫数据科技有限公司 | Based on multilingual segmenting method and device |
-
2016
- 2016-08-11 CN CN201610658579.XA patent/CN107729309B/en active Active
- 2016-11-15 WO PCT/CN2016/105977 patent/WO2018028077A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
WO2014087506A1 (en) * | 2012-12-05 | 2014-06-12 | 三菱電機株式会社 | Word meaning estimation device, word meaning estimation method, and word meaning estimation program |
CN103077164A (en) * | 2012-12-27 | 2013-05-01 | 新浪网技术(中国)有限公司 | Text analysis method and text analyzer |
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN105677802A (en) * | 2015-12-31 | 2016-06-15 | 宁波公众信息产业有限公司 | Internet information analysis system |
Non-Patent Citations (1)
Title |
---|
孙静等: "基于条件随机场的无监督中文词性标注", 《计算机应用与软件》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232182B (en) * | 2018-04-10 | 2023-05-16 | 蔚来控股有限公司 | Semantic recognition method and device and voice dialogue system |
CN110232182A (en) * | 2018-04-10 | 2019-09-13 | 蔚来汽车有限公司 | Method for recognizing semantics, device and speech dialogue system |
CN110413983B (en) * | 2018-04-27 | 2022-09-27 | 北京海马轻帆娱乐科技有限公司 | Method and device for identifying name |
CN110413983A (en) * | 2018-04-27 | 2019-11-05 | 北京海马轻帆娱乐科技有限公司 | A kind of method and device identifying name |
CN108806671B (en) * | 2018-05-29 | 2019-06-28 | 杭州认识科技有限公司 | Semantic analysis, device and electronic equipment |
CN108806671A (en) * | 2018-05-29 | 2018-11-13 | 杭州认识科技有限公司 | Semantic analysis, device and electronic equipment |
CN108764194A (en) * | 2018-06-04 | 2018-11-06 | 科大讯飞股份有限公司 | A kind of text method of calibration, device, equipment and readable storage medium storing program for executing |
CN109101584B (en) * | 2018-07-23 | 2020-11-03 | 湖南大学 | Sentence classification improvement method combining deep learning and mathematical analysis |
CN109101584A (en) * | 2018-07-23 | 2018-12-28 | 湖南大学 | A kind of sentence classification improved method combining deep learning with mathematical analysis |
CN109344406A (en) * | 2018-09-30 | 2019-02-15 | 阿里巴巴集团控股有限公司 | Part-of-speech tagging method, apparatus and electronic equipment |
CN109344406B (en) * | 2018-09-30 | 2023-06-20 | 创新先进技术有限公司 | Part-of-speech tagging method and device and electronic equipment |
CN109543187A (en) * | 2018-11-23 | 2019-03-29 | 中山大学 | Generation method, device and the storage medium of electronic health record feature |
CN109657207A (en) * | 2018-11-29 | 2019-04-19 | 爱保科技(横琴)有限公司 | The formatting processing method and processing unit of clause |
CN109657207B (en) * | 2018-11-29 | 2023-11-03 | 爱保科技有限公司 | Formatting processing method and processing device for clauses |
CN109615006A (en) * | 2018-12-10 | 2019-04-12 | 北京市商汤科技开发有限公司 | Character recognition method and device, electronic equipment and storage medium |
CN109753564A (en) * | 2018-12-13 | 2019-05-14 | 四川大学 | The construction method of Chinese RCT Intelligence Classifier based on machine learning |
CN111078947A (en) * | 2019-11-19 | 2020-04-28 | 太极计算机股份有限公司 | XML-based domain element extraction configuration language system |
CN111078947B (en) * | 2019-11-19 | 2023-06-02 | 太极计算机股份有限公司 | XML-based domain element extraction configuration language system |
CN111310468A (en) * | 2020-01-15 | 2020-06-19 | 同济大学 | Method for realizing Chinese named entity recognition by using uncertain word segmentation information |
CN111310468B (en) * | 2020-01-15 | 2023-05-05 | 同济大学 | Method for realizing Chinese named entity recognition by utilizing uncertain word segmentation information |
CN111460831A (en) * | 2020-03-27 | 2020-07-28 | 科大讯飞股份有限公司 | Event determination method, related device and readable storage medium |
CN111460831B (en) * | 2020-03-27 | 2024-04-19 | 科大讯飞股份有限公司 | Event determination method, related device and readable storage medium |
CN111931481A (en) * | 2020-07-03 | 2020-11-13 | 北京新联财通咨询有限公司 | Text emotion recognition method and device, storage medium and computer equipment |
CN111966579A (en) * | 2020-07-24 | 2020-11-20 | 复旦大学 | Self-adaptive text input generation method based on natural language processing and machine learning |
CN112965909A (en) * | 2021-03-19 | 2021-06-15 | 湖南大学 | Test data, test case generation method and system, and storage medium |
CN112965909B (en) * | 2021-03-19 | 2024-04-09 | 湖南大学 | Test data, test case generation method and system and storage medium |
CN113177108A (en) * | 2021-05-27 | 2021-07-27 | 中国平安人寿保险股份有限公司 | Semantic role labeling method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107729309B (en) | 2022-11-08 |
WO2018028077A1 (en) | 2018-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
CN111966917B (en) | Event detection and summarization method based on pre-training language model | |
CN111753081B (en) | System and method for text classification based on deep SKIP-GRAM network | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN107239444B (en) | A kind of term vector training method and system merging part of speech and location information | |
CN108446271B (en) | Text emotion analysis method of convolutional neural network based on Chinese character component characteristics | |
CN110427623A (en) | Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium | |
CN109960728B (en) | Method and system for identifying named entities of open domain conference information | |
CN106599032A (en) | Text event extraction method in combination of sparse coding and structural perceptron | |
CN110489523B (en) | Fine-grained emotion analysis method based on online shopping evaluation | |
CN110263325A (en) | Chinese automatic word-cut | |
CN110532563A (en) | The detection method and device of crucial paragraph in text | |
CN108108468A (en) | A kind of short text sentiment analysis method and apparatus based on concept and text emotion | |
CN112434535A (en) | Multi-model-based factor extraction method, device, equipment and storage medium | |
CN113987187A (en) | Multi-label embedding-based public opinion text classification method, system, terminal and medium | |
CN108829823A (en) | A kind of file classification method | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
Prabha et al. | A deep learning approach for part-of-speech tagging in nepali language | |
CN109840328A (en) | Deep learning comment on commodity text emotion trend analysis method | |
CN115080750B (en) | Weak supervision text classification method, system and device based on fusion prompt sequence | |
CN112905736A (en) | Unsupervised text emotion analysis method based on quantum theory | |
CN110222338A (en) | A kind of mechanism name entity recognition method | |
CN114756681A (en) | Evaluation text fine-grained suggestion mining method based on multi-attention fusion | |
CN113051887A (en) | Method, system and device for extracting announcement information elements | |
CN115906816A (en) | Text emotion analysis method of two-channel Attention model based on Bert |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |