CN107729309A - A kind of method and device of the Chinese semantic analysis based on deep learning - Google Patents

A kind of method and device of the Chinese semantic analysis based on deep learning Download PDF

Info

Publication number
CN107729309A
CN107729309A CN201610658579.XA CN201610658579A CN107729309A CN 107729309 A CN107729309 A CN 107729309A CN 201610658579 A CN201610658579 A CN 201610658579A CN 107729309 A CN107729309 A CN 107729309A
Authority
CN
China
Prior art keywords
chinese
text
chinese text
identification
mobile terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610658579.XA
Other languages
Chinese (zh)
Other versions
CN107729309B (en
Inventor
郑骁庆
陈军
吕永
尚国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
ZTE Corp
Original Assignee
Fudan University
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University, ZTE Corp filed Critical Fudan University
Priority to CN201610658579.XA priority Critical patent/CN107729309B/en
Priority to PCT/CN2016/105977 priority patent/WO2018028077A1/en
Publication of CN107729309A publication Critical patent/CN107729309A/en
Application granted granted Critical
Publication of CN107729309B publication Critical patent/CN107729309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of method and device of the Chinese semantic analysis based on deep learning, it is related to natural language processing technique field, its method includes:Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text;Mobile terminal carries out the identification of specific type vocabulary and/or the identification of self-defined vocabulary and/or Chinese name identification to specification Chinese text, and using recognition result as constraints;Mobile terminal obtains Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning, and Chinese word segmentation is carried out to institute's specification Chinese text and part of speech is analyzed, obtains the participle and part of speech of specification Chinese text;Mobile terminal carries out Chinese semantic analysis using the participle of the specification Chinese text, part of speech and/or name identification types to institute's specification Chinese text.

Description

A kind of method and device of the Chinese semantic analysis based on deep learning
Technical field
The present invention relates to natural language processing technique field, more particularly to a kind of Chinese semantic analysis based on deep learning Method and device.
Background technology
CNLU has been achieved with rapid progress at present, is particularly produced in terms of Chinese word segmentation and part of speech analysis Substantial amounts of achievement in research is given birth to.Although for English and Japanese, Chinese automated analysis technology is still relatively backward, it Preceding research accumulation causes research and development to carry out high-level semantic analysis and the system understood, and is applied to and actually turns into May.With the system of semantic analysis technology by the level of intelligence and adaptibility to response of the system that is greatly enhanced.Semantic analysis technology Be text message analysis with processing key and difficult point, and the analysis of information extraction, user view, information fusion, question answering, The bases such as intelligent inference.
On the other hand, deep learning is the progress of recent making a breakthrough property of artificial intelligence study, and it finishes artificial intelligence Up to the situation that 10 years fail to have breakthrough, and had an impact in industrial quarters rapidly.Deep learning is different from only can be with complete Into the narrow artificial intelligence system (towards the functional simulation of particular task) of particular task, as general artificial intelligence skill Art, various situations and problem can be tackled, obtain extremely successfully applying in fields such as image recognition, speech recognitions, certainly Right Language Processing field (mainly English) also obtains effect.
The content of the invention
The technical problem that the scheme provided according to embodiments of the present invention solves is that Chinese semantic automated analysis is inaccurate.
A kind of method of the Chinese semantic analysis based on deep learning provided according to embodiments of the present invention, including:
Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text;
Mobile terminal to specification Chinese text carry out the identification of specific type vocabulary and/or self-defined vocabulary identification and/or in Text name identification, and using recognition result as constraints;
Mobile terminal obtains Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning, to institute Specification Chinese text carries out Chinese word segmentation and part of speech analysis, obtains the participle and part of speech of specification Chinese text;
Mobile terminal is using the participle of the specification Chinese text, part of speech and/or name identification types, to institute's specification Chinese Text carries out Chinese semantic analysis.
Preferably, the mobile terminal carries out the identification of specific type vocabulary and/or self-defined vocabulary to specification Chinese text Identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out specific type vocabulary identification using specific type vocabulary template to specification Chinese text, obtains To the specific type vocabulary recognition result of the specification Chinese text, and using obtained specific type vocabulary recognition result as One constraints.
Preferably, the mobile terminal carries out the identification of specific type vocabulary and/or self-defined vocabulary to specification Chinese text Identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out self-defined vocabulary identification using Custom Dictionaries to specification Chinese text, obtains the rule The self-defined vocabulary recognition result of model Chinese text, and using obtained self-defined vocabulary recognition result as the second constraints.
Preferably, the mobile terminal carries out the identification of specific type vocabulary and/or self-defined vocabulary to specification Chinese text Identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal obtains Chinese name identification model using deep learning and carries out Chinese life to specification Chinese text Name identification, the Chinese name recognition result of the specification Chinese text is obtained, and resulting Chinese name recognition result is made For the 3rd constraints.
Preferably, the constraints is included in the first constraints, the second constraints and the 3rd constraints At least one or its combination.
Preferably, the mobile terminal using the participle of the specification Chinese text, part of speech and/or names identification types, The semantic analysis bag of Chinese is carried out to institute's specification Chinese text to include:
The mobile terminal is according to the character of the specification Chinese text and based on the convolutional Neural with dynamic k-max ponds The Chinese sentence model of network, sentence classification is carried out to the specification Chinese text, obtains the sentence of the specification Chinese text Classification results.
Preferably, the mobile terminal using the participle of the specification Chinese text, part of speech and/or names identification types, The semantic analysis bag of Chinese is carried out to institute's specification Chinese text to include:
The mobile terminal determines two-way LSTM (Long-Short Term Memory, length according to sentence classification results When remember) Chinese Semantic Role Labeling model, further according to the participle of the specification Chinese text, part of speech and/or name type, And the Chinese Semantic Role Labeling model of the two-way LSTM, each participle and symbol of the specification Chinese text are carried out Semantic character labeling, obtain the semantic character labeling result of the specification Chinese text.
Preferably, the mobile terminal using the participle of the specification Chinese text, part of speech and/or names identification types, The semantic analysis bag of Chinese is carried out to institute's specification Chinese text to include:
The mobile terminal is according to the semantic character labeling result and event model of the specification Chinese text, to the rule Model Chinese text carries out structuring processing, extracts the key message of the specification Chinese text.
Preferably, the key message of the specification Chinese text includes event title, determinant attribute and property value.
A kind of device of the Chinese semantic analysis based on deep learning provided according to embodiments of the present invention, including:
Standardization processing module, for by carrying out standardization processing to acquired Chinese text, obtaining specification Chinese Text;
Identification module, for specification Chinese text carry out the identification of specific type vocabulary and/or self-defined vocabulary identification and/ Or Chinese name identification, and using recognition result as constraints;
Analysis module, for obtaining Chinese word segmentation and part-of-speech tagging mould according to the constraints and using deep learning Type, Chinese word segmentation is carried out to institute's specification Chinese text and part of speech is analyzed, and obtains the participle and part of speech of specification Chinese text, and utilize The participle and part of speech and/or name identification types of the specification Chinese text, semantic point of Chinese is carried out to institute's specification Chinese text Analysis.
The scheme provided according to embodiments of the present invention, to the Chinese sentence inputted, after semantic analysis, export structure The analysis result of change, and using the analysis result of structuring, completing event analysis, information extraction and sentiment analysis etc. needs high level The task that semantic analysis is supported.
Brief description of the drawings
Fig. 1 is a kind of method flow diagram of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic device of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention;
Fig. 3 is the module diagram of Chinese semantic analysis provided in an embodiment of the present invention;
Fig. 4 is Chinese sequence labelling network architecture figure provided in an embodiment of the present invention;
Fig. 5 is provided in an embodiment of the present invention based on the convolutional neural networks structure chart with dynamic k-max ponds;
Fig. 6 is two-way LSTM provided in an embodiment of the present invention semantic character labeling schematic diagram.
Embodiment
Below in conjunction with accompanying drawing to a preferred embodiment of the present invention will be described in detail, it will be appreciated that described below is excellent Select embodiment to be merely to illustrate and explain the present invention, be not intended to limit the present invention.
Fig. 1 is a kind of method flow diagram of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention, such as Shown in Fig. 1, including:
Step S101:Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text This;
Step S102:Mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known Not and/or Chinese name identifies, and using recognition result as constraints;
Step S103:Mobile terminal obtains Chinese word segmentation and part-of-speech tagging according to the constraints and using deep learning Model, Chinese word segmentation is carried out to institute's specification Chinese text and part of speech is analyzed, and obtains the participle and part of speech of specification Chinese text;
Step S104:Mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types Institute's specification Chinese text carries out Chinese semantic analysis.
Wherein, the mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known Not and/or Chinese name identifies, and includes recognition result as constraints:The mobile terminal utilizes specific type vocabulary Template carries out specific type vocabulary identification to specification Chinese text, obtains the specific type vocabulary identification of the specification Chinese text As a result, and using obtained specific type vocabulary recognition result as the first constraints.
Wherein, the mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known Not and/or Chinese name identifies, and includes recognition result as constraints:The mobile terminal utilizes Custom Dictionaries pair Specification Chinese text carries out self-defined vocabulary identification, obtains the self-defined vocabulary recognition result of the specification Chinese text, and will Obtained self-defined vocabulary recognition result is as the second constraints.
Wherein, the mobile terminal carries out the identification of specific type vocabulary to specification Chinese text and/or self-defined vocabulary is known Not and/or Chinese name identifies, and includes recognition result as constraints:The mobile terminal is obtained using deep learning Chinese name identification model carries out Chinese name identification to specification Chinese text, obtains the Chinese name of the specification Chinese text Recognition result, and using resulting Chinese name recognition result as the 3rd constraints.
Wherein, the constraints is included in the first constraints, the second constraints and the 3rd constraints extremely Few a kind of or its combination.
Wherein, the identification of specific type vocabulary and/or the identification of self-defined vocabulary and/or Chinese name identification are a kind of pre- participles And part-of-speech tagging, i.e., the specific type vocabulary and/or self-defined vocabulary and/or Chinese name that the step identifies, next Participle and part-of-speech tagging are no longer re-started in participle and part-of-speech tagging step, therefore just constitutes a kind of constraints.
Wherein, the mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types Institute's specification Chinese text carries out the semantic analysis bag of Chinese and included:The mobile terminal is according to the character and base of the specification Chinese text In the Chinese sentence model of the convolutional neural networks with dynamic k-max ponds, sentence classification is carried out to the specification Chinese text, Obtain the sentence classification results of the specification Chinese text.
Wherein, the mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types Institute's specification Chinese text carries out the semantic analysis bag of Chinese and included:The mobile terminal determines two-way length according to the sentence classification results Short-term memory LSTM Chinese Semantic Role Labeling model, further according to the participle of specification Chinese text, part of speech and/or name class Type, and the Chinese Semantic Role Labeling model of the two-way long short-term memory LSTM, to each of the specification Chinese text Participle and symbol carry out semantic character labeling, obtain the semantic character labeling result of the specification Chinese text.
Wherein, the mobile terminal is right using the participle of the specification Chinese text, part of speech and/or name identification types Institute's specification Chinese text carries out the semantic analysis bag of Chinese and included:The mobile terminal is according to the semantic role of the specification Chinese text Annotation results and event model, structuring processing is carried out to the specification Chinese text, extracts the specification Chinese text Key message.Specifically, the key message of the specification Chinese text includes event title, determinant attribute and property value.
Fig. 2 is a kind of schematic device of Chinese semantic analysis based on deep learning provided in an embodiment of the present invention, such as Shown in Fig. 2, including:Standardization processing module 201, for by carrying out standardization processing to acquired Chinese text, obtaining Specification Chinese text;Identification module 202, for carrying out the identification of specific type vocabulary and/or custom words to specification Chinese text Remittance identification and/or Chinese name identification, and using recognition result as constraints;Analysis module 203, for according to the constraint Condition and Chinese word segmentation and part-of-speech tagging model are obtained using deep learning, Chinese word segmentation and word are carried out to institute's specification Chinese text Property analysis, obtain the participle and part of speech of specification Chinese text, and utilize participle, part of speech and/or the life of the specification Chinese text Name identification types, Chinese semantic analysis is carried out to institute's specification Chinese text.
Wherein, the analysis module 202 includes:Sentence taxon, for the character according to the specification Chinese text With the Chinese sentence model based on the convolutional neural networks with dynamic k-max ponds, sentence is carried out to the specification Chinese text Classification, obtains the sentence classification results of the specification Chinese text.
Wherein, the analysis module 202 also includes:Semantic character labeling unit, for according to the sentence classification results Determine two-way long short-term memory LSTM Chinese Semantic Role Labeling model, according to the participle of specification Chinese text, part of speech and/or Identification types, and the Chinese Semantic Role Labeling model of the two-way long short-term memory LSTM are named, to specification Chinese The elements such as individual character, participle, specific type vocabulary in text carry out semantic character labeling, obtain the language of the specification Chinese text Adopted character labeling result.
Wherein, the analysis module 202 also includes:Structuring processing unit, for the mobile terminal according to the rule The semantic character labeling result and event model of model Chinese text, structuring processing, extraction are carried out to the specification Chinese text Go out the key message of the specification Chinese text.Specifically, the key message of the specification Chinese text include event title, Determinant attribute and property value.Wherein, event title can correspond to sentence classification results.Such as the short message text to terminal reception For, sentence disaggregated model be divided into bank statement, flight train, appointment, weather forecast and other etc..Sentence can so be classified Result type as event title.Determinant attribute is semantic character labeling result.For example in bank statement short message, it is labeled as Bill day, spending amount, several classifications such as date, repayment amount of refunding, property value are to be labeled as the original corresponding to above-mentioned classification Occurrence in beginning short message text, such as exact date, the specific amount of money.
Fig. 3 is the module diagram of Chinese semantic analysis provided in an embodiment of the present invention, as shown in figure 3, utilizing depth Habit technology for the Chinese sentence that is inputted after semantic analysis, the analysis result of export structure, and utilize structuring Analysis result, complete the task that event analysis, information extraction and sentiment analysis etc. need high-level semantic analysis to support, specific bag Include:
Text normalization processing:Standardization processing is carried out to inputting Chinese sentence, including:Unified coding, traditional font turn letter Body, full-shape turn half-angle, spcial character conversion, term lack of standardization replacement (such as:Cyberspeak is substituted for canonical representation).
Self-defined vocabulary identification:Customized vocabulary is identified using Custom Dictionaries, including:Application field word Remittance, Chinese idiom, food, place, works, equipment, name, place name and mechanism name.
Specific type vocabulary identifies:By defining identification E-mail address, network address, date, time, percentage, measure word, goods E-mail address that coin, telephone number, numeral, the template of outer cliction include to read statement, network address, the date, the time, percentage, Measure word, currency, telephone number, numeral, outer cliction are identified, and are replaced with special character.
Chinese name identification:By preparing the language material of Chinese name identification, using the Chinese sequence labelling net shown in Fig. 4 Network model, and train for carry out Chinese name identification Chinese name identification model, to the name in read statement, Name, mechanism name are identified, that is, identify specific name, place name, mechanism name and while the corresponding name class of preservation in sentence Type (for example available " Person ", " Location ", " Organization " are represented respectively).
Chinese word segmentation and part-of-speech tagging:With the identification of specific type vocabulary and/or the identification of self-defined vocabulary, and/or Chinese The result of name identification prepares joint Chinese word segmentation and part-of-speech tagging language material as constraint, using the Chinese sequence shown in Fig. 4 Network model is marked, is trained for carrying out Chinese word segmentation and part of speech analysis joint note target Chinese word segmentation and part-of-speech tagging model, Joint Chinese word segmentation and part of speech analysis are carried out to read statement.
Sentence is classified:Before semantic character labeling is carried out, using the volume in the band dynamic k-max ponds shown in institute Fig. 5 Sentence semantics caused by product neutral net represent to classify to sentence, while filter using uninterested input sentence. The sentence classification language material of the sentence for including all types of balances and negative sample sentence (applying uninterested Chinese sentence) is used, The Chinese sentence disaggregated model of the convolutional neural networks in training band dynamic k-max ponds, input sentence is divided with this model Class, while filter using uninterested input sentence.
Semantic character labeling:Determine that two-way LSTM semantic tagger network model is (i.e. different according to sentence classification results Sentence class categories use different analytic modell analytical models), then to the participle in specification text, part of speech and/or name type use The semantic tagger network of two-way LSTM shown in institute Fig. 6 carries out semantic character labeling to sentence.I.e. according to participle, part of speech and/or Type is named, prepares the semantic character labeling language material of same sentence classification, and train two-way LSTM Chinese semantic role Marking model, semantic character labeling is carried out to sentence with this model.
Event analysis:According to semantic character labeling result, binding events template is packaged into the structuring after semantic analysis Represent, extract the title, determinant attribute and property value of event.
Wherein, the form of the training corpus of semantic character labeling is one vocabulary a line of order that word is pressed in sentence, is often gone Totally 5 row, successively respectively represent participle in itself (E-mail address, network address, the date, the time, percentage, measure word, currency, telephone number, Digital, outer cliction etc. also treats as independent participle with English tag replacement, individual character or punctuation mark etc.), semantic label (" O " table Show class unrelated with task), part of speech label, name identification label, former word form of the participle in sentence.Between each sample by One null separates.
Wherein, the sequence labelling based on deep learning such as underway text participle and part-of-speech tagging, Chinese name identification is appointed During business, specific type vocabulary is identified and/or the result of self-defined vocabulary identification carries out decoding algorithm as constraint and (carries out Chinese Constraints when participle and part-of-speech tagging can also increase Chinese name recognition result), including:
(1) by template in advance to E-mail address, network address, date, time, percentage, measure word, currency, telephone number, number The types such as word, outer cliction are identified.
(2) support to including words such as Field Words, Chinese idiom, food, place, works, equipment, name, place name and mechanism names The progress that converges is self-defined.
(3) combine deep learning neural network forecast to export, by the knot that specific type vocabulary identifies and/or self-defined vocabulary identifies Fruit carries out the algorithm of Veterbi decoding as constraint.
Fig. 4 is Chinese sequence labelling network architecture figure provided in an embodiment of the present invention, can be used for Chinese name and knows Not, Chinese word segmentation and part-of-speech tagging (note:Training corpus is different, and the model data trained is different, and constraints is also different).Such as Shown in Fig. 4, the Chinese sequence labelling network model model of deep learning receives a Chinese sentence as input, exports with character (including:Character in Chinese character, punctuation mark and other sentences being likely to occur) be unit sequence labelling result.Tag set Add the label after particular task tag extension using participle label.By taking Chinese name identification as an example, if representing name with " PER " Label, then following sentence:
" Zhuge Liang is military counsellor of Liu for military bloc.”
Annotation results are accordingly for institute:
“B_PER I_PER E_PER O B_PER E_PER O O O O O O O O”。
Wherein:" B " represents the beginning character of vocabulary, and " I " represents the intermediate character of vocabulary, and " E " represents the end word of vocabulary Symbol, " O " represent the character unrelated with task.Also having " S " to represent in addition can be individually into character (such as individual character or punctuate symbol of word Number).
The label of one character is typically related to character around it, thus uses window model, i.e., in estimation current character When belonging to the possibility of some label, using this character and the character of surrounding as input (as shown in Figure 4).If window is big It is small to be arranged to 5, then it represents that using each two characters of this character and its left side and the right as input window.If the left side and the right Character quantity be not enough to size as defined in window, then replaced using filler.
The character of each input will be converted into corresponding vector representation by way of searching word vector table.Each word The expression of symbol can at random generate or carry out pre-training using unsupervised method.These vectors are spliced afterwards, table Show the character representation of some window.After a linear net network layers (middle hidden layer), carried out using Sigmoid functions non-thread Property conversion, finally reuse a linear layer, the output vector equal with task number of labels, each element representation correspondence of vector The possibility of label.
A Chinese sentence is given, network can export a matrix, each element f θ in matrix (t | i) represent sentence In i-th of character belong to label t possibility estimation, wherein θ represent network parameter.In sequence labelling task, due to There is very strong dependence between front and rear label, introduce the possibility that matrix A ij represents to jump to label j from label i and (also include In parameter sets θ).Give a sentence s [1 containing n character:N], can be some isometric sequence label t [1:n] Estimate point:
In the case where parameter is given, a score value highest sequence label can be obtained using Viterbi decoding algorithm and is made For annotation results.
The method of training is on training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
Wherein:(s, t) represents a sample in training set.Training uses gradient descent method, and all parameters of network use Below equation is updated:
Wherein:λ represents Learning Step.
Wherein, Chinese sequence labelling network and learning algorithm based on deep learning are characterised by:
(1) the Chinese sentence for input has carried out necessary pretreatment, including:Unified coding, traditional font turn simplified, full-shape Turn half-angle, spcial character conversion, term lack of standardization replace, by the E-mail address recognized, network address, the date, the time, percentage, Measure word, currency, telephone number, numeral, outer cliction are identified and are uniformly converted into spcial character.
(2) when using Veterbi decoding, the identification of self-defined vocabulary, the identification of specific type vocabulary and Chinese are named into identification As a result as constraint.
(3) using the dimension of word vector 100, the network configuration (tool that window size is 3 or 5, middle hidden neuron quantity 300 Body parameter relies on language material sample set size).
Fig. 5 is provided in an embodiment of the present invention based on the convolutional neural networks structure chart with dynamic k-max ponds, such as Fig. 5 It is shown, using Chinese sentence as input, the semantic expressiveness of full sentence, according to belonging to sentence is predicted in the expression and task are produced by network Related classification.
Each character in input sentence is converted into corresponding vector table by network by way of searching word vector table first Show.The expression of each character can at random generate or carry out pre-training using unsupervised method.Shape after sentence is converted Into an eigenmatrix.Second step:On every dimension of eigenmatrix, according to the window size of setting, using the side of convolution Window feature input is converted into new feature by method.Window slides successively from left to right on eigenmatrix, produces and feature square The character representation of number of arrays identical higher level.Different dimensions use different convolution kernels, so as to produce input feature vector matrix A characteristics map.Multiple characteristics maps can be produced using a different set of convolution kernel simultaneously.Each characteristics map On k features the most significant are produced using the method in k-max ponds, i.e., the characteristic value of k maximum of extraction on every dimension, But the order of these characteristic values keeps its order in input feature vector map.Used on matrix of consequence behind k-max ponds HardTanh nonlinear functions carry out Feature Conversion.Above-mentioned second step can be superimposed multilayer, one layer of new result in last layer Upper progress.The k values in the k-max ponds of last layer are fixed (hyper parameter of model), and each before layer of k values take last layer Higher value after rounding up in both of k values and the values that are calculated of formula (H-h/H) × L.3rd step by last All characteristic values that layer obtains are spliced, to produce the semantic expressiveness of full sentence.On the basis of semantic expressiveness, pass through a line Property layer and Softmax layers are predicted to the affiliated type of sentence.
Due to having used Softmax layers, network output can regard different classes of probability distribution as.Training is used under gradient Drop method, the target of network training is to increase correctly predicted probability on training set, while reduces the probability of error prediction.
Wherein, the Chinese sentence disaggregated model based on the convolutional neural networks with dynamic k-max ponds is characterised by:
(1) the Chinese sentence for input has carried out necessary pretreatment, including:Unified coding, traditional font turn simplified, full-shape Turn half-angle, spcial character conversion, term lack of standardization replace, by the E-mail address recognized, network address, the date, the time, percentage, Measure word, currency, telephone number, numeral, outer cliction are identified and are uniformly converted into spcial character.
(2) with character (including:Character in Chinese character, punctuate and other sentences being likely to occur) rank conduct input, it is non- The situation of Chinese is very suitable for, avoids error diffusion because of Chinese word segmentation to sentence classification task.
(3) convolution of single dimension, and the columns and input feature vector matrix columns of the characteristics map of convolutional layer output are used It is identical, for increasing the speed of network processes.
(4) network uses two layers of convolution, wherein:First layer window size is 5, characteristics map quantity is 2, second layer window Mouth size is 3, characteristics map quantity is 3.The k values in the k-max ponds of last layer are 5.
Fig. 6 is two-way LSTM provided in an embodiment of the present invention semantic character labeling schematic diagram, as shown in fig. 6, to difference Sentence classification results use different semantic character labeling models, in semantic character labeling to segment, part of speech and/or name Identification types, input is used as after collated, using the semantic label set associated by sentence classification, to sentence in units of segmenting Carry out semantic character labeling.
The input at each moment (each vocabulary of corresponding input sentence) of network is current vocabulary, part of speech and/or life Name identification types (the classification i.e. in Chinese name identification, such as with " Person ", " Location ", " Organization " point The name that does not represent, place name, mechanism name) it is converted into the splicing vector representation after vector.Using two LSTM difference from left to right (forward direction) and from right to left (backward) processing input sentence.For each vocabulary, LSTM can export a vector representation, splicing Vector representation (contextual information itself that merged and its left and right) of the output caused by forward and backward LSTM as vocabulary, It is denoted as inputting with this, the label belonging to vocabulary is predicted using a linear layer.
The dependence predicted between vocabulary label, i.e. band can also be further utilized on the basis of two-way LSTM models The two-way LSTM of transition probability.A Chinese sentence is given, network can export a matrix, each element f θ in matrix (t | i) represents that i-th of vocabulary in sentence belongs to the estimation of label t possibility, and wherein θ represents the parameter of network.Marked in semanteme In note task, due to also there is certain dependence between front and rear label, introduce matrix A ij and represent to jump to label j from label i Possibility (being also contained in parameter sets θ).The given sentence s [1 containing n vocabulary:N], can be some isometric mark Sign sequence t [1:N] estimate point:
In the case where network parameter is given, a score value highest label sequence can be obtained using Viterbi decoding algorithm Row are used as annotation results.The method of training is on training set, it is desirable to the correct semantic tagger sequence corresponding to each sample The maximum probability of generation.If current network parameter produces error prediction, each parameter is calculated for target using gradient descent method The gradient of function, accordingly undated parameter.
Two-way LSTM Chinese Semantic Role Labeling model is characterised by:
(1) each moment (each vocabulary of corresponding input sentence) of LSTM networks is with participle, part of speech and/or name class Vectorial splicing is as input corresponding to type.
(2) the Chinese sentence for input has carried out necessary pretreatment, including:Unified coding, traditional font turn simplified, full-shape Turn half-angle, spcial character conversion, term lack of standardization replace, by the E-mail address recognized, network address, the date, the time, percentage, Measure word, currency, telephone number, numeral, outer cliction are identified and are uniformly converted into spcial character.
(3) character representation of each Chinese vocabulary is produced using two-way LSTM.
(4) model uses following key parameter:Lexical feature vector dimension is 30, part of speech feature vector dimension is 10, class Type characteristic vector dimension is 10, and each LSTM Block quantity is 50, and each Block includes 1 Cell unit.
(5) for the two-way LSTM with transition probability, while the transition probability introduced between semantic label, then use are tieed up Special ratio decoder carries out the semantic character labeling of Chinese sentence.
Illustrate the particular content of the embodiment of the present invention with specific embodiment below:
Such as mobile phone receive a short message " account of your tail number 5714 when 07 month 16 days 11 15 divide complete an existing friendship Easily, the amount of money is 1300.00 yuan, 3456.03 yuan of remaining sum.[Agricultural Bank of China] ".
First to urtext carry out specification handles, than if any short message in " [" write as "【", this requires to carry out specification Change, full-shape half-angle, the multi-form of various symbols, be easy to subsequent treatment after reunification.
The vocabulary of specific type is identified again, mainly searched for by the way of regular expression in text-string Identification, thus can recognize that:
3-6:DIGIT 5714
11-16:16 days DATE 07 months
17-22:15 divide during TIME 11
35-42:1300.00 yuan of CURRENCY
46-53:3456.03 yuan of CURRENCY
Simultaneously also can recognize that punctuation mark in text ",.The position of [] ".
According to name recognition unit or Custom Dictionaries, (the generally specific vocabulary that cannot recognize that of name recognition unit can be with It is added in Custom Dictionaries, for example with the addition of the keyword of bank's class in advance in Custom Dictionaries) also it can recognize that:
56-61:The BANK Agricultural Bank of China
Note:Two numerals of above-mentioned first row are that original position of the special word in urtext (from 0 count by initial character Number).
So by pretreatment, the above-mentioned participle identified just forms the constraint of next step, and (i.e. these vocabulary are no longer Again segmented and part-of-speech tagging), constraints can use a string representation, represent the participle and part of speech of each character, Such as "
O O O B_D I_D I_D E_D O O O O B_NT I_NT I_NT I_NT_I NT E_NT B_NT I_NT I_NT I_NT I_NT E_NT O O O O O O O O S_PU O O O B_D I_D I_D I_D I_D I_D I_D E_ D S_PU O O B_D I_D I_D I_D I_D I_D I_D E_D S_PU S_PU B_NR I_NR I_NR I_NR I_NR E_NR S_PU”
Above-mentioned " O " represents other characters, and participle and part of speech identification are carried out in next step.Such as " B_D " represents one The beginning of digital word, " I_D " represent the centre of digital word, and " E_D " represents the ending of digital word.Underscore " _ " above represents word Accord with position in word, behind represent part of speech, here it is carry out joint participle and part-of-speech tagging." B ", " I ", " E " are represented respectively Beginning of the character in participle, centre, at ending." S " symbol represents that single words, such as punctuation mark just use " S_PU " table Show." NT " represents time noun, and " NR " represents special noun, and also such as various parts of speech of other verbs, adjective etc. can be with It is pre-specified.
After participle and part-of-speech tagging, each words resolution in text can just be come (it is original words before "/", after Face represents part of speech), such as:
" you/PN tail numbers/NN 5714/D /U accounts/NN 15 divides/NT completions/mono-/D of V when/P 07 months 16 days/NT 11 Pen/M is existing/and V transaction/V ,/PU amount of money/NN be /V 1300.00 yuan/D ,/PU remaining sum/3456.03 yuan/D of NN./ PU is [in/PU Agricultural bank of state/NR]/PU ".
In above-mentioned example, such as participle " tail number ", its part of speech is common noun, is represented with " NN "." 5714 " and for example are segmented, Its part of speech is numeral, is represented with " D ", is segmented " transaction ", and part of speech is verb, is represented with " V "." [", its part of speech are punctuate symbols to participle Number, represented with " PU ".By that analogy, the text of standardization is severed from according to participle for unit that (individual character, punctuation mark are also made For single participial construction), and mark participle part of speech in the text.
When carrying out semantic analysis, the vocabulary of specific type can be replaced, thus with unified representation with a label symbol Have:
" you/PN tail numbers/NN DIGIT/D /U accounts/NN is existing in/P DATE/NT TIME/NT completions/mono-/D of V pens/M Deposit/V transaction/V ,/PU the amount of money/NN is /V CURRENCY/D ,/PU remaining sums/NN CURRENCY/D./PU[/PU BANK/NR]/ PU”
Pass through semantic analysis according to participle, part of speech and/or name identification types, it is possible to it is interested to extract user Word, for example to bank's notifying messages, date, time, account number, the crucial letter of the come in and go out amount of money, remaining sum and Bank Name etc. can be extracted Breath, these key messages are semantic character labeling, mark behind equivalent, are separated with "/".It is not required to take out for " O " behind "/" The content taken.
The semantic analysis result of this example:" you/O tail numbers/O 5714/ACCOUNT /O accounts/O in/16 days O 07 months/ 15 divide during DATE 11/TIME completions/mono-/O of O pens/O is existing/O transaction/O ,/O amount of money/O be /O 1300.00 yuan/INCOME ,/O Remaining sum/O3456.03 members/BALANCE./ O [/O the Agricultural Bank of China/BANK]/O ".
Wherein " ACCOUNT ", " DATE ", " TIME ", " INCOME ", " BALANCE ", " BANK " are exactly semantic role label And corresponding to being labeled on participle.
Finally, according to the key message of extraction, prompted in interface or application, interaction etc..For example receive above Short message, user can be prompted:
Event:Keep accounts
Account:5714
Date:16 days 07 month
Time:15 divide when 11
Keep accounts:It is 1300.00 first
Remaining sum:It is 3456.03 first
Bank:The Agricultural Bank of China
The scheme provided according to embodiments of the present invention, Chinese sequence labelling network and learning algorithm based on deep learning, Chinese sentence disaggregated model based on the convolutional neural networks with dynamic k-max ponds, in the two-way LSTM with transition probability Literary semantic character labeling model, and integration and the integration mode of these key technologies.Developed system is used, can be disposed On the relatively limited mobile computing platform of the computing resources such as mobile phone, it is not necessary to just can by extra computing resource and equipment Complicated Chinese semantic analysis task is completed, can significantly improve the response speed and user satisfaction of related application.
Although the present invention is described in detail above, the invention is not restricted to this, those skilled in the art of the present technique Various modifications can be carried out according to the principle of the present invention.Therefore, all modifications made according to the principle of the invention, all should be understood to Fall into protection scope of the present invention.

Claims (10)

1. a kind of method of the Chinese semantic analysis based on deep learning, including:
Mobile terminal obtains specification Chinese text by carrying out standardization processing to acquired Chinese text;
Mobile terminal carries out the identification of specific type vocabulary and/or the identification of self-defined vocabulary to specification Chinese text and/or Chinese is ordered Name identification, and using recognition result as constraints;
Mobile terminal obtains Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning, to institute's specification Chinese text carries out Chinese word segmentation and part of speech analysis, obtains the participle and part of speech of specification Chinese text;
Mobile terminal is using the participle of the specification Chinese text, part of speech and/or name identification types, to institute's specification Chinese text Carry out Chinese semantic analysis.
2. according to the method for claim 1, the mobile terminal carries out specific type vocabulary identification to specification Chinese text And/or self-defined vocabulary identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out specific type vocabulary identification using specific type vocabulary template to specification Chinese text, obtains institute State the specific type vocabulary recognition result of specification Chinese text, and using obtained specific type vocabulary recognition result as first about Beam condition.
3. according to the method for claim 1, the mobile terminal carries out specific type vocabulary identification to specification Chinese text And/or self-defined vocabulary identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal carries out self-defined vocabulary identification using Custom Dictionaries to specification Chinese text, obtains in the specification The self-defined vocabulary recognition result of text, and using obtained self-defined vocabulary recognition result as the second constraints.
4. according to the method for claim 1, the mobile terminal carries out specific type vocabulary identification to specification Chinese text And/or self-defined vocabulary identification and/or Chinese name identification, and include recognition result as constraints:
The mobile terminal obtains Chinese name identification model using deep learning to be known to the progress Chinese name of specification Chinese text Not, obtain the Chinese name recognition result of the specification Chinese text, and regard resulting Chinese name recognition result as the Three constraintss.
5. according to any described methods of claim 2-4, the constraints includes the first constraints, the second constraints And the 3rd at least one of constraints or its combination.
6. according to any described methods of claim 1-5, the mobile terminal utilizes participle, the word of the specification Chinese text Property and/or name identification types, the semantic analysis bag of Chinese is carried out to institute specification Chinese text and is included:
The mobile terminal is according to the character of the specification Chinese text and based on the convolutional neural networks with dynamic k-max ponds Chinese sentence model, to the specification Chinese text carry out sentence classification, obtain the specification Chinese text sentence classification As a result.
7. according to the method for claim 6, the mobile terminal using the participle of the specification Chinese text, part of speech and/ Or name identification types, the semantic analysis bag of Chinese is carried out to institute's specification Chinese text and is included:
The mobile terminal determines two-way long short-term memory LSTM Chinese Semantic Role Labeling model according to sentence classification results, Further according to the participle of the specification Chinese text, part of speech and/or name identification types and the two-way long short-term memory LSTM Chinese Semantic Role Labeling model, semantic character labeling is carried out to each participle and symbol of the specification Chinese text, obtained To the semantic character labeling result of the specification Chinese text.
8. according to the method for claim 7, the mobile terminal using the participle of the specification Chinese text, part of speech and/ Or name identification types, the semantic analysis bag of Chinese is carried out to institute's specification Chinese text and is included:
The mobile terminal is according to the semantic character labeling result and event model of the specification Chinese text, in the specification Text carries out structuring processing, extracts the key message of the specification Chinese text.
9. according to the method for claim 8, the key message of the specification Chinese text includes event title, determinant attribute And property value.
10. a kind of device of the Chinese semantic analysis based on deep learning, including:
Standardization processing module, for by carrying out standardization processing to acquired Chinese text, obtaining specification Chinese text;
Identification module, for specification Chinese text carry out the identification of specific type vocabulary and/or self-defined vocabulary identification and/or in Text name identification, and using recognition result as constraints;
Analysis module is right for obtaining Chinese word segmentation and part-of-speech tagging model according to the constraints and using deep learning Institute's specification Chinese text carries out Chinese word segmentation and part of speech analysis, obtains the participle and part of speech of specification Chinese text, and described in utilization Participle, part of speech and/or the name identification types of specification Chinese text, Chinese semantic analysis is carried out to institute's specification Chinese text.
CN201610658579.XA 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device Active CN107729309B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610658579.XA CN107729309B (en) 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device
PCT/CN2016/105977 WO2018028077A1 (en) 2016-08-11 2016-11-15 Deep learning based method and device for chinese semantics analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610658579.XA CN107729309B (en) 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device

Publications (2)

Publication Number Publication Date
CN107729309A true CN107729309A (en) 2018-02-23
CN107729309B CN107729309B (en) 2022-11-08

Family

ID=61161388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610658579.XA Active CN107729309B (en) 2016-08-11 2016-08-11 Deep learning-based Chinese semantic analysis method and device

Country Status (2)

Country Link
CN (1) CN107729309B (en)
WO (1) WO2018028077A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764194A (en) * 2018-06-04 2018-11-06 科大讯飞股份有限公司 A kind of text method of calibration, device, equipment and readable storage medium storing program for executing
CN108806671A (en) * 2018-05-29 2018-11-13 杭州认识科技有限公司 Semantic analysis, device and electronic equipment
CN109101584A (en) * 2018-07-23 2018-12-28 湖南大学 A kind of sentence classification improved method combining deep learning with mathematical analysis
CN109344406A (en) * 2018-09-30 2019-02-15 阿里巴巴集团控股有限公司 Part-of-speech tagging method, apparatus and electronic equipment
CN109543187A (en) * 2018-11-23 2019-03-29 中山大学 Generation method, device and the storage medium of electronic health record feature
CN109615006A (en) * 2018-12-10 2019-04-12 北京市商汤科技开发有限公司 Character recognition method and device, electronic equipment and storage medium
CN109657207A (en) * 2018-11-29 2019-04-19 爱保科技(横琴)有限公司 The formatting processing method and processing unit of clause
CN109753564A (en) * 2018-12-13 2019-05-14 四川大学 The construction method of Chinese RCT Intelligence Classifier based on machine learning
CN110232182A (en) * 2018-04-10 2019-09-13 蔚来汽车有限公司 Method for recognizing semantics, device and speech dialogue system
CN110413983A (en) * 2018-04-27 2019-11-05 北京海马轻帆娱乐科技有限公司 A kind of method and device identifying name
CN111078947A (en) * 2019-11-19 2020-04-28 太极计算机股份有限公司 XML-based domain element extraction configuration language system
CN111310468A (en) * 2020-01-15 2020-06-19 同济大学 Method for realizing Chinese named entity recognition by using uncertain word segmentation information
CN111460831A (en) * 2020-03-27 2020-07-28 科大讯飞股份有限公司 Event determination method, related device and readable storage medium
CN111931481A (en) * 2020-07-03 2020-11-13 北京新联财通咨询有限公司 Text emotion recognition method and device, storage medium and computer equipment
CN111966579A (en) * 2020-07-24 2020-11-20 复旦大学 Self-adaptive text input generation method based on natural language processing and machine learning
CN112965909A (en) * 2021-03-19 2021-06-15 湖南大学 Test data, test case generation method and system, and storage medium
CN113177108A (en) * 2021-05-27 2021-07-27 中国平安人寿保险股份有限公司 Semantic role labeling method and device, computer equipment and storage medium

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874776B (en) * 2018-06-11 2022-06-03 北京奇艺世纪科技有限公司 Junk text recognition method and device
CN109145296A (en) * 2018-08-09 2019-01-04 新华智云科技有限公司 A kind of general word recognition method and device based on monitor model
CN109582949B (en) * 2018-09-14 2022-11-22 创新先进技术有限公司 Event element extraction method and device, computing equipment and storage medium
CN109710924B (en) * 2018-12-07 2022-04-12 平安科技(深圳)有限公司 Text model training method, text recognition method, device, equipment and medium
CN111368506B (en) * 2018-12-24 2023-04-28 阿里巴巴集团控股有限公司 Text processing method and device
CN109740160B (en) * 2018-12-31 2022-11-25 浙江成功软件开发有限公司 Task issuing method based on artificial intelligence semantic analysis
CN109918506B (en) * 2019-03-07 2022-12-16 安徽省泰岳祥升软件有限公司 Text classification method and device
CN110032634A (en) * 2019-04-17 2019-07-19 北京理工大学 A kind of Chinese name entity recognition method based on polymorphic type depth characteristic
CN110134954B (en) * 2019-05-06 2023-12-22 北京工业大学 Named entity recognition method based on Attention mechanism
CN112069792A (en) * 2019-05-24 2020-12-11 阿里巴巴集团控股有限公司 Named entity identification method, device and equipment
CN110222338B (en) * 2019-05-28 2022-11-22 浙江邦盛科技股份有限公司 Organization name entity identification method
CN110321565B (en) * 2019-07-09 2024-02-23 广东工业大学 Real-time text emotion analysis method, device and equipment based on deep learning
CN110427615B (en) * 2019-07-17 2022-11-22 宁波深擎信息科技有限公司 Method for analyzing modification tense of financial event based on attention mechanism
CN110443291B (en) * 2019-07-24 2023-04-14 创新先进技术有限公司 Model training method, device and equipment
CN110674639B (en) * 2019-09-24 2022-12-09 识因智能科技有限公司 Natural language understanding method based on pre-training model
CN110826330B (en) * 2019-10-12 2023-11-07 上海数禾信息科技有限公司 Name recognition method and device, computer equipment and readable storage medium
CN110837735B (en) * 2019-11-17 2023-11-03 内蒙古中媒互动科技有限公司 Intelligent data analysis and identification method and system
CN110866401A (en) * 2019-11-18 2020-03-06 山东健康医疗大数据有限公司 Chinese electronic medical record named entity identification method and system based on attention mechanism
CN110990532A (en) * 2019-11-28 2020-04-10 中国银行股份有限公司 Method and device for processing text
CN111144127B (en) * 2019-12-25 2023-07-25 科大讯飞股份有限公司 Text semantic recognition method, text semantic recognition model acquisition method and related device
CN113052544A (en) * 2019-12-26 2021-06-29 东软集团(上海)有限公司 Method and device for intelligently adapting workflow according to user behavior and storage medium
CN111507104B (en) * 2020-03-19 2022-03-25 北京百度网讯科技有限公司 Method and device for establishing label labeling model, electronic equipment and readable storage medium
CN112749561B (en) * 2020-04-17 2023-11-03 腾讯科技(深圳)有限公司 Entity identification method and equipment
CN111563161B (en) * 2020-04-26 2023-05-23 深圳市优必选科技股份有限公司 Statement identification method, statement identification device and intelligent equipment
CN111597350B (en) * 2020-04-30 2023-06-02 西安理工大学 Rail transit event knowledge graph construction method based on deep learning
CN111709241B (en) * 2020-05-27 2023-03-28 西安交通大学 Named entity identification method oriented to network security field
CN111666381B (en) * 2020-06-17 2022-11-18 中国电子科技集团公司第二十八研究所 Task type question-answer interaction system oriented to intelligent control
CN111859858B (en) * 2020-07-22 2024-03-01 智者四海(北京)技术有限公司 Method and device for extracting relation from text
CN111914539B (en) * 2020-07-31 2024-09-10 长江航道测量中心 Channel notification information extraction method and system based on BiLSTM-CRF model
CN111914538B (en) * 2020-07-31 2024-05-31 长江航道测量中心 Channel notification information intelligent space matching method and system
CN112101014B (en) * 2020-08-20 2022-07-26 淮阴工学院 Chinese chemical industry document word segmentation method based on mixed feature fusion
CN112052670B (en) * 2020-08-28 2024-04-02 丰图科技(深圳)有限公司 Address text word segmentation method, device, computer equipment and storage medium
CN112069814A (en) * 2020-09-01 2020-12-11 应急管理部沈阳消防研究所 Fire-fighting plan classification method based on deep learning
CN112149417A (en) * 2020-09-16 2020-12-29 北京小米松果电子有限公司 Part-of-speech tagging method and device, storage medium and electronic equipment
CN112269862B (en) * 2020-10-14 2024-04-26 北京百度网讯科技有限公司 Text role labeling method, device, electronic equipment and storage medium
CN112528653B (en) * 2020-12-02 2023-11-28 支付宝(杭州)信息技术有限公司 Short text entity recognition method and system
CN112700881B (en) * 2020-12-29 2022-04-08 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN115114917A (en) * 2021-03-17 2022-09-27 航天科工深圳(集团)有限公司 Military named entity recognition method and device based on vocabulary enhancement
CN112966525B (en) * 2021-03-31 2023-02-10 上海大学 Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN114706942B (en) * 2022-03-16 2023-11-24 马上消费金融股份有限公司 Text conversion model training method, text conversion device and electronic equipment
CN115048940B (en) * 2022-06-23 2024-04-09 之江实验室 Chinese financial text data enhancement method based on entity word attribute characteristics and back translation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
WO2014087506A1 (en) * 2012-12-05 2014-06-12 三菱電機株式会社 Word meaning estimation device, word meaning estimation method, and word meaning estimation program
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105677802A (en) * 2015-12-31 2016-06-15 宁波公众信息产业有限公司 Internet information analysis system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7047183B2 (en) * 2001-08-21 2006-05-16 Microsoft Corporation Method and apparatus for using wildcards in semantic parsing
US8326809B2 (en) * 2008-10-27 2012-12-04 Sas Institute Inc. Systems and methods for defining and processing text segmentation rules
CN104268200A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Unsupervised named entity semantic disambiguation method based on deep learning
CN104965822B (en) * 2015-07-29 2017-08-25 中南大学 A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech
CN105243055B (en) * 2015-09-28 2018-07-31 北京橙鑫数据科技有限公司 Based on multilingual segmenting method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
WO2014087506A1 (en) * 2012-12-05 2014-06-12 三菱電機株式会社 Word meaning estimation device, word meaning estimation method, and word meaning estimation program
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105677802A (en) * 2015-12-31 2016-06-15 宁波公众信息产业有限公司 Internet information analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙静等: "基于条件随机场的无监督中文词性标注", 《计算机应用与软件》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232182B (en) * 2018-04-10 2023-05-16 蔚来控股有限公司 Semantic recognition method and device and voice dialogue system
CN110232182A (en) * 2018-04-10 2019-09-13 蔚来汽车有限公司 Method for recognizing semantics, device and speech dialogue system
CN110413983B (en) * 2018-04-27 2022-09-27 北京海马轻帆娱乐科技有限公司 Method and device for identifying name
CN110413983A (en) * 2018-04-27 2019-11-05 北京海马轻帆娱乐科技有限公司 A kind of method and device identifying name
CN108806671B (en) * 2018-05-29 2019-06-28 杭州认识科技有限公司 Semantic analysis, device and electronic equipment
CN108806671A (en) * 2018-05-29 2018-11-13 杭州认识科技有限公司 Semantic analysis, device and electronic equipment
CN108764194A (en) * 2018-06-04 2018-11-06 科大讯飞股份有限公司 A kind of text method of calibration, device, equipment and readable storage medium storing program for executing
CN109101584B (en) * 2018-07-23 2020-11-03 湖南大学 Sentence classification improvement method combining deep learning and mathematical analysis
CN109101584A (en) * 2018-07-23 2018-12-28 湖南大学 A kind of sentence classification improved method combining deep learning with mathematical analysis
CN109344406A (en) * 2018-09-30 2019-02-15 阿里巴巴集团控股有限公司 Part-of-speech tagging method, apparatus and electronic equipment
CN109344406B (en) * 2018-09-30 2023-06-20 创新先进技术有限公司 Part-of-speech tagging method and device and electronic equipment
CN109543187A (en) * 2018-11-23 2019-03-29 中山大学 Generation method, device and the storage medium of electronic health record feature
CN109657207A (en) * 2018-11-29 2019-04-19 爱保科技(横琴)有限公司 The formatting processing method and processing unit of clause
CN109657207B (en) * 2018-11-29 2023-11-03 爱保科技有限公司 Formatting processing method and processing device for clauses
CN109615006A (en) * 2018-12-10 2019-04-12 北京市商汤科技开发有限公司 Character recognition method and device, electronic equipment and storage medium
CN109753564A (en) * 2018-12-13 2019-05-14 四川大学 The construction method of Chinese RCT Intelligence Classifier based on machine learning
CN111078947A (en) * 2019-11-19 2020-04-28 太极计算机股份有限公司 XML-based domain element extraction configuration language system
CN111078947B (en) * 2019-11-19 2023-06-02 太极计算机股份有限公司 XML-based domain element extraction configuration language system
CN111310468A (en) * 2020-01-15 2020-06-19 同济大学 Method for realizing Chinese named entity recognition by using uncertain word segmentation information
CN111310468B (en) * 2020-01-15 2023-05-05 同济大学 Method for realizing Chinese named entity recognition by utilizing uncertain word segmentation information
CN111460831A (en) * 2020-03-27 2020-07-28 科大讯飞股份有限公司 Event determination method, related device and readable storage medium
CN111460831B (en) * 2020-03-27 2024-04-19 科大讯飞股份有限公司 Event determination method, related device and readable storage medium
CN111931481A (en) * 2020-07-03 2020-11-13 北京新联财通咨询有限公司 Text emotion recognition method and device, storage medium and computer equipment
CN111966579A (en) * 2020-07-24 2020-11-20 复旦大学 Self-adaptive text input generation method based on natural language processing and machine learning
CN112965909A (en) * 2021-03-19 2021-06-15 湖南大学 Test data, test case generation method and system, and storage medium
CN112965909B (en) * 2021-03-19 2024-04-09 湖南大学 Test data, test case generation method and system and storage medium
CN113177108A (en) * 2021-05-27 2021-07-27 中国平安人寿保险股份有限公司 Semantic role labeling method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN107729309B (en) 2022-11-08
WO2018028077A1 (en) 2018-02-15

Similar Documents

Publication Publication Date Title
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN111966917B (en) Event detection and summarization method based on pre-training language model
CN111753081B (en) System and method for text classification based on deep SKIP-GRAM network
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN107239444B (en) A kind of term vector training method and system merging part of speech and location information
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
CN110427623A (en) Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium
CN109960728B (en) Method and system for identifying named entities of open domain conference information
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN110263325A (en) Chinese automatic word-cut
CN110532563A (en) The detection method and device of crucial paragraph in text
CN108108468A (en) A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN112434535A (en) Multi-model-based factor extraction method, device, equipment and storage medium
CN113987187A (en) Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN108829823A (en) A kind of file classification method
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
Prabha et al. A deep learning approach for part-of-speech tagging in nepali language
CN109840328A (en) Deep learning comment on commodity text emotion trend analysis method
CN115080750B (en) Weak supervision text classification method, system and device based on fusion prompt sequence
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN110222338A (en) A kind of mechanism name entity recognition method
CN114756681A (en) Evaluation text fine-grained suggestion mining method based on multi-attention fusion
CN113051887A (en) Method, system and device for extracting announcement information elements
CN115906816A (en) Text emotion analysis method of two-channel Attention model based on Bert

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant