CN103309926A - Chinese and English-named entity identification method and system based on conditional random field (CRF) - Google Patents

Chinese and English-named entity identification method and system based on conditional random field (CRF) Download PDF

Info

Publication number
CN103309926A
CN103309926A CN2013100782042A CN201310078204A CN103309926A CN 103309926 A CN103309926 A CN 103309926A CN 2013100782042 A CN2013100782042 A CN 2013100782042A CN 201310078204 A CN201310078204 A CN 201310078204A CN 103309926 A CN103309926 A CN 103309926A
Authority
CN
China
Prior art keywords
english
chinese
word
random field
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013100782042A
Other languages
Chinese (zh)
Inventor
张艳
李艳玲
徐为群
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN2013100782042A priority Critical patent/CN103309926A/en
Publication of CN103309926A publication Critical patent/CN103309926A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a Chinese and English-named entity identification method and a system based on a conditional random field (CRF). The method comprises the following steps: (101) converting inquiry voice of a user into a text; (102) separating text information into Chinese characters and English letters on the basis of a finite state machine; (103) extracting the characteristics of a text of separated vocabularies; (104) performing entity identification on the text by adopting a training CRF model according to a characteristic extraction result, and marking an entity type, wherein the CRF model is a conditional random field model of a linear chain structure. The step (102) further comprises the following steps: (102-1) performing character separation on Chinese and English; (102-2) identifying English word strings by using the finite state machine, namely, combining adjacent English letters, blank spaces and symbols in English; (102-3) performing word segmentation on the English word strings.

Description

Chinese and English mixing named entity recognition method and system based on condition random field
Technical field
The present invention relates to the sequence labelling model of finite state machine and condition random field, in interactive process, the phenomenon that user's query statement exists Chinese and English to mix, the method and system of Chinese and English named entity recognition of mixing are carried out in proposition to sentence.
Background technology
Man-machine interactive system is to propose query requests by the user by spoken language, and system provides information service.A typical man-machine interactive system comprises: automatic speech recognition, speech understanding, these four ingredients of dialogue management and phonetic synthesis.Speech understanding partly is that the query statement after the speech recognition is changed into corresponding semantic expressiveness.Yet along with the large fusion of internationalization information, the saying that multilingual mixes is seen everywhere, and this has just brought difficulty to speech understanding.And especially more common with the saying of Chinese and English mixing in the multilingual mixing saying, for the inquiry field of man-machine interactive system about the video display song, especially outstanding Chinese and English mixes saying and comes from some external video display songs and some English name-tos etc.The task that man-machine interaction service will be finished is how no matter user's Chinese and English expresses, and this Chinese and English interactive service system can both carry out correct understanding.And the substantive noun that wherein corresponding user inquires about is namely found out in the identification that step is exactly named entity of the process of understanding.The task of traditional named entity recognition is carried out mainly for pure English or pure Chinese.The Entity recognition of pure English is owing to having the interval between the English word, so Entity recognition does not need to carry out participle, identification is easier to; The Entity recognition of pure Chinese is larger with respect to the Entity recognition difficulty of pure English, yet the Entity recognition difficulty of mixing for the Chinese and English in the spoken language is just larger, simultaneously spoken have the characteristics such as grammer is lack of standardization, random than written word, and it is insurmountable therefore only using the named entity recognition method of prior art.
Summary of the invention
The object of the invention is to, for overcoming above-mentioned technical matters, the invention provides a kind of method and system of mixing named entity recognition based on the Chinese and English of condition random field.
For achieving the above object, the invention provides a kind of Chinese and English mixing named entity recognition method based on condition random field, described method comprises:
Step 101) for the step that user's voice inquirement is converted to text;
Step 102) for the step that text message is separated into Chinese individual character and English word based on finite state machine;
Step 103) for the step that the text of separating character is carried out feature extraction;
Step 104) is used for according to the result of feature extraction and the CRF model of employing training the character that separates being carried out Entity recognition, marks entity class;
Wherein, described CRF model is the conditional random field models of linear chain structure, and described named entity refers to: name, place name and mechanism's name, and for field of media, described named entity specially refers to name, song title (comprising video display name, website name, TV station); Described Entity recognition is that the type of the character ownership that will separate is identified, and for example, the song that I want to listen Liu De China is water lustily.Here " Liu Dehua " is the name entity, and " lustily water " is the song title entity.There are some researches show that spoken language text is better than the result who obtains with participle as research object with individual character, so carry out for each word when carrying out Entity recognition.So-called statistical model is exactly the type of judging current word by each word and context thereof.Such as training time, " my song water lustily of wanting to listen Liu De China " the words will be noted as: the bent O of Wo O Xiang O Ting O Liu B-PER De I-PER China I-PER De O song O Wang B-NAME feelings I-NAME water I-NAME.Wherein " O " represents other; " B " expression " begin ", the beginning of entity; " I " represents inner, " PER " and " NAME " distinguish classification, name and the field name of presentation-entity.During test, for the sentence of input, by the training good model, just can go out the classification of each word by automatic marking, thereby obtain each entity.
Above-mentioned steps 102) further comprise:
Step 102-1) Chinese and English is carried out character separation;
Step 102-2) carries out the identification of English word with finite state machine, namely merge the symbol in adjacent English alphabet, space and the English;
Step 102-3) the english string is carried out participle.
Above-mentioned feature extraction comprises:
Current word or English word be the individual character in Chinese personal name and the surname everyday character dictionary whether;
Current word or English word whether name or video display name about refer to boundary's individual character or double word;
Current word or English word be English word whether,
Wherein, described feature extraction also comprises assemblage characteristic and the contextual feature of extracting between the above-mentioned feature.
Above-mentioned steps 104) specifically adopt following strategy to obtain entity class:
Carry out sequence labelling with the linear chain condition random field, wherein carry out the CRF model parameter estimation with the L-BFGS algorithm, obtain optimum mark sequence with the Veterbi decoding algorithm, from the flag sequence of optimum, obtain entity class at last.
In order to realize said method, the present invention also provides a kind of Chinese and English based on condition random field to mix named entity recognition system, and described system comprises:
Modular converter is used for user's voice inquirement information is converted to text message;
Pretreatment module is used for carrying out Chinese based on the text of finite state machine after with speech recognition and divides word and English string segmentation;
Characteristic extracting module is used for the text that separates vocabulary is carried out feature extraction;
Type is judged identification module, is used for according to the result of feature extraction and the CRF model of employing training text being carried out Entity recognition, marks entity class;
Wherein, the described CRF model conditional random field models that is the linear chain structure.
Above-mentioned pretreatment module further comprises:
First processes submodule, be used for text message is carried out the separation of character,
Second processes submodule, is used for based on the method for finite state machine the English character that separates being carried out the identification of english string, namely merges the symbol in adjacent English alphabet, space and the English.
The 3rd processes submodule, is used for the english string is carried out participle.
Above-mentioned feature extraction comprises:
Current word or English word be the individual character in Chinese personal name and the surname everyday character dictionary whether;
Current word or English word whether name or video display name about refer to boundary's individual character or double word;
Current word or English word be English word whether,
Wherein, described feature extraction also comprises assemblage characteristic and the contextual feature of extracting between the above-mentioned feature.
The above-mentioned type judges that identification module further comprises:
The CRF model is set up submodule, is used for adopting the L-BFGS algorithm to obtain the CRF model parameter, namely obtains the CRF model that trains;
The type mark submodule is used for based on the CRF model that makes up wherein also comprising the saying that Chinese and English mixes according to adopting the artificial various inquiry sayings that should the restriction field of collecting, and then manually marks, and CRF is trained; Wherein, that training tool uses is Open-Source Tools CRF++, and training step comprises: at first the form according to training text carries out feature extraction; Select individual character to carry out feature extraction as research object, when carrying out feature selecting, respectively based on single feature and contextual feature thereof, added the assemblage characteristic between each feature, train at last CRF can obtain a model file, carry out the type mark based on this model file;
The decoding submodule carries out the text of Entity recognition to needs, adopt the extraction feature consistent with the training process of type mark submodule, then tests with the training good model, and the Viterbi algorithm is decoded, and obtains the annotation results for each word.
Compared with prior art, technical advantage of the present invention is:
The present invention adopts a kind of method of finite state machine conjugation condition random field to carry out the named entity recognition that the Chinese and English in the interactive process mixes, key step is: first, pre-service (Chinese and English word-dividing mode), the text after adopting finite state machine to speech recognition carries out Chinese and English participle; The second, the extraction of feature adds the effective feature of three classes: refer to the feature of boundary's word, the regular expression feature of differentiation English about word and the name feature of word of the surname in the differentiation Chinese personal name, differentiation name and physical name; The 3rd, carry out sequence labelling with the linear chain condition random field, wherein carry out model parameter estimation with the L-BFGS algorithm, obtain optimum mark sequence with the Veterbi decoding algorithm.From the flag sequence of optimum, obtain entity class at last.The advantage of the method that the present invention proposes: at first, the query statement that mixes for the Chinese and English in the man-machine interactive system can be good at solving the Entity recognition problem, has very widely using value; Secondly, added rule-based feature under statistical framework, the training data Sparse Problems for condition random field also can well solve; At last, the method not only can solve the Entity recognition problem that the Chinese and English in the man-machine interactive system mixes, and also can promote rear solution based on the Entity recognition problem of web simultaneously.In a word, the method that the present invention proposes can be good at solving the Entity recognition problem for the query statement that the Chinese and English in the man-machine interactive system mixes, and because partly added more special rule feature in feature extraction, training data Sparse Problems for condition random field also can well solve.
Description of drawings
Fig. 1 is that Chinese and English provided by the invention mixes the Entity recognition block diagram.
Embodiment
Below in conjunction with drawings and Examples the method for the invention is elaborated.
The method that Chinese and English provided by the invention mixes named entity recognition mainly solves the named entity recognition problem that there is Chinese and English mixing phenomena in query statement in the interactive process, based on the system of the method towards research field relate to inquiry to TV station, website, application and media.Wherein media portion also comprises performer, director and singer's inquiry, and the inquiry of film, TV play and song title.
Method for Chinese and English mixing named entity recognition provided by the invention can adopt following recognition system to carry out the identification that Chinese and English mixes named entity:
At first user's voice inquirement is converted to text by speech recognition system, and next enters Chinese and English word-dividing mode, then the sentence that separates vocabulary is carried out feature extraction, carries out Entity recognition with the CRF model that has trained at last, marks entity class.Next introduce in detail the composition of each several part.
(1) preprocessing part (being Chinese and English word-dividing mode):
Mainly Chinese is carried out the separation of individual character, English is carried out the separation of word, this module is divided into two parts, and the first step is carried out the separation of character; Second step carries out the identification of english string with finite state machine, namely merge the symbol in adjacent English alphabet, space and the English.The 3rd step English string segmentation namely carries out cutting to the english string with the space.Such as:
The user inquires about former sentence: I want to listen the song of Michael Jackson
The first step (behind the character separation): I | think | listen | M|i|c|h|a|e|l||J|a|c|k|s|o|n|'s | song | song
Second step (through behind the finite state machine): I | think | listen | Michael Jackson|'s | song | song
The 3rd step (behind the English string segmentation): I | think | listen | Michael|Jackson|'s | song | song
The user inquires about former sentence: I want to listen song south of the River style
The first step (behind the character separation): I | think | listen | song | song | the river | south | s|t|y|l|e
Second step (through behind the finite state machine): I | think | listen | song | song | the river | south | style
The same second step of the 3rd step result.
The user inquires about former sentence: I want to listen the song of S.H.E.
The first step (behind the character separation): I | think | listen | S|.|H|.|E|.|'s | song | song
Second step (through behind the finite state machine): I | think | listen | S.H.E.|'s | song | song
The same second step of the 3rd step result.
The user inquires about former sentence: ask for song I ' ll always love you to me
The first step (behind the character separation): give | I | look for | one | lower | song | song | I| ' | l|l||a|l|w|a|y|s||l|o|v|e||y|o|u
Second step (through behind the finite state machine): give | I | look for | one | lower | song | song | I ' ll always love you
The 3rd step (behind the English string segmentation): give | I | look for | one | lower | song | song | I ' ll|always|love|you
(2) named entity recognition module:
This problem of Entity recognition can be described as a given τ literal as observed value o, asks for its corresponding status switch (being entity class)
Figure BDA00002909839900051
The status switch that hope obtains
Figure BDA00002909839900052
Should satisfy: the setting models parameter lambda, so that posterior probability
Figure BDA00002909839900053
Maximization.Here the non-directed graph model is used for posterior probability, is expressed as:
p ( s 1 τ | o ; λ ) = 1 z ( o ; λ ) exp ( λ · F ( s 1 τ , o ) ) - - - ( 1 )
Wherein, Be proper vector, feature is a series of functions that obtained by status switch and observed value; λ is parameter vector, as the weight of feature;
Figure BDA00002909839900056
All status switches to be distributed carry out the normalized factor, so that above-mentioned probability distribution is between (0,1).Aspect the art of computation, it has been generally acknowledged that
Figure BDA00002909839900057
What form is Markov chain, each feature f among the characteristic set F kOnly depend on two adjacent states, so have:
p ( s 1 τ | o ; λ ) = 1 z ( o ; λ ) exp ( Σ k λ k Σ t = 1 τ f k ( s ( t - 1 ) , s ( t ) , o , t ) ) - - - ( 2 )
Mainly sentence is carried out the extraction of feature, and carry out sequence mark with the CRF model that has trained, obtain entity class mark result.Feature wherein is divided three classes: the first kind, current word (perhaps English word) be the individual character in Chinese personal name and the surname everyday character dictionary whether; Equations of The Second Kind, current word (perhaps English word) whether name or video display name about refer to boundary's individual character or double word; The 3rd class, current word (perhaps English word) be English word whether.Except these individual character features, we have also added assemblage characteristic and contextual feature between these features.
The model parameter estimation of CRF is finished with the L-BFGS algorithm usually.The decode procedure of CRF is the process of finding the solution unknown string mark, and on linear chain CRF, this calculation task can be finished with the Viterbi algorithm.
Embodiment
1, the text after the speech recognition is carried out Chinese and English participle, this part was divided into for two steps: the first step, carry out the separation of Chinese and English character, second step, carry out the identification of english string with finite state machine, namely merge the symbol in adjacent English alphabet, space and the English, the 3rd step, English string segmentation namely carries out cutting to the english string with the space.
2, the training data of structure CRF, data be the interior common saying of various spoken languages of Covering domain as far as possible.
3, training data is marked, namely mark out the classification of the substantive noun in each query statement.
4, feature extraction one: in order better to extract the various substantive nouns (comprising name and other nouns) in the field, characteristics according to the Chinese personal name word-building, we have set up the everyday character dictionary of using word and name about the surname of Chinese personal name, are used for the structural attitude template.
5, feature extraction two: for name and video display name are extracted more accurately, counted individual character and the double word that appears at name and video display name front and back position by mass data, set up name and field name about refer to boundary's word dictionary, carry out the extraction of feature.
6, feature extraction three: judge that for long English song name being identified as a complete song title, having added whether current word is English feature, judges by regular expression.
7, feature extraction four: extract the assemblage characteristic of contextual feature and above-mentioned three kinds of features, wherein contextual feature is got former and later two words of current word as contextual feature.
8, obtain the CRF model parameter with the L-BFGS algorithm, namely obtain the CRF model that trains.
9, obtain the mark of entity with the decoding of Viterbi algorithm, and finally obtain the Entity recognition result that Chinese and English mixes.
It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although with reference to embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (8)

1. Chinese and English mixing named entity recognition method based on condition random field, described method comprises:
Step 101) for the step that user's voice inquirement information is converted to text message;
Step 102) for the step that text message is separated into Chinese individual character and English word based on finite state machine;
Step 103) for the step that the text that separates vocabulary is carried out feature extraction;
Step 104) is used for according to the result of feature extraction and the CRF model of employing training individual character or the word that separates being carried out Entity recognition, marks entity class;
Wherein, the described CRF model conditional random field models that is the linear chain structure.
2. the Chinese and English mixing named entity recognition method based on condition random field according to claim 1 is characterized in that described step 102) further comprise:
Step 102-1) Chinese and English is carried out character separation;
Step 102-2) carries out the identification of english string with finite state machine, namely merge the symbol in adjacent English alphabet, space and the English;
Step 102-3) the english string is carried out participle.
3. the Chinese and English mixing named entity recognition method based on condition random field according to claim 1 is characterized in that, described feature extraction comprises:
Current word or English word be the individual character in Chinese personal name and the surname everyday character dictionary whether;
Current word or English word whether name or video display name about refer to boundary's individual character or double word;
Current word or English word be English word whether,
Wherein, described feature extraction also comprises assemblage characteristic and the contextual feature of extracting between the above-mentioned feature.
4. the Chinese and English mixing named entity recognition method based on condition random field according to claim 1 is characterized in that described step 104) specifically adopt following strategy to obtain entity class:
Carry out sequence labelling with the linear chain condition random field, wherein carry out the CRF model parameter estimation with the L-BFGS algorithm, obtain optimum mark sequence with the Veterbi decoding algorithm, from the flag sequence of optimum, obtain entity class at last.
5. the Chinese and English based on condition random field mixes named entity recognition system, and described system comprises:
Modular converter is used for user's voice inquirement is converted to text;
Pretreatment module is used for text is carried out Chinese minute word and English string segmentation;
Characteristic extracting module is used for the text of separating character is carried out feature extraction;
Type is judged identification module, is used for according to the result of feature extraction and the CRF model of employing training text being carried out Entity recognition, marks entity class;
Wherein, the described CRF model conditional random field models that is the linear chain structure.
6. the Chinese and English mixing named entity recognition method based on condition random field according to claim 5 is characterized in that, described pretreatment module further comprises:
First processes submodule, is used for text message is carried out the separation of character;
Second processes submodule, is used for based on the method for finite state machine the English character that separates being carried out the identification of english string, namely merges the symbol in adjacent English alphabet, space and the English;
The 3rd processes submodule, is used for the english string is carried out participle.
7. the Chinese and English mixing named entity recognition method based on condition random field according to claim 5 is characterized in that, described feature extraction comprises:
Current word or English word be the individual character in Chinese personal name and the surname everyday character dictionary whether;
Current word or English word whether name or video display name about refer to boundary's individual character or double word;
Current word or English word be English word whether,
Wherein, described feature extraction also comprises assemblage characteristic and the contextual feature of extracting between the above-mentioned feature.
8. the Chinese and English mixing named entity recognition method based on condition random field according to claim 5 is characterized in that, described type judges that identification module further comprises:
The CRF model is set up submodule, is used for adopting the L-BFGS algorithm to obtain the CRF model parameter, namely obtains the CRF model that trains;
The type mark submodule is used for based on the CRF model that makes up wherein also comprising the saying that Chinese and English mixes according to adopting the artificial various inquiry sayings that should the restriction field of collecting, and then manually marks, and CRF is trained; Wherein, that training tool uses is Open-Source Tools CRF++, and training step comprises: at first the form according to training text carries out feature extraction; Select individual character to carry out feature extraction as research object, when carrying out feature selecting, respectively based on single feature and contextual feature thereof, added the assemblage characteristic between each feature, train at last CRF can obtain a model file, carry out the type mark based on this model file;
The decoding submodule carries out the text of Entity recognition to needs, adopt the feature extraction consistent with the training process of type mark submodule, then tests with the training good model, and the Viterbi algorithm is decoded, and obtains the annotation results for each word.
CN2013100782042A 2013-03-12 2013-03-12 Chinese and English-named entity identification method and system based on conditional random field (CRF) Pending CN103309926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013100782042A CN103309926A (en) 2013-03-12 2013-03-12 Chinese and English-named entity identification method and system based on conditional random field (CRF)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013100782042A CN103309926A (en) 2013-03-12 2013-03-12 Chinese and English-named entity identification method and system based on conditional random field (CRF)

Publications (1)

Publication Number Publication Date
CN103309926A true CN103309926A (en) 2013-09-18

Family

ID=49135150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013100782042A Pending CN103309926A (en) 2013-03-12 2013-03-12 Chinese and English-named entity identification method and system based on conditional random field (CRF)

Country Status (1)

Country Link
CN (1) CN103309926A (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166643A (en) * 2014-08-19 2014-11-26 南京金娃娃软件科技有限公司 Dialogue act analyzing method in intelligent question-answering system
CN104391893A (en) * 2014-11-11 2015-03-04 成都锐理开创信息技术有限公司 Method for timely discovering and tracking dynamic conditions of real estate projects
CN104881398A (en) * 2014-08-29 2015-09-02 北京大学 Method for extracting author affiliation information of English literature published by Chinese authors
WO2015142626A1 (en) * 2014-03-18 2015-09-24 Microsoft Technology Licensing, Llc Named entitty platform and store
CN105138515A (en) * 2015-09-02 2015-12-09 百度在线网络技术(北京)有限公司 Named entity recognition method and device
CN105243055A (en) * 2015-09-28 2016-01-13 北京橙鑫数据科技有限公司 Multi-language based word segmentation method and apparatus
CN105531758A (en) * 2014-07-17 2016-04-27 微软技术许可有限责任公司 Speech recognition using foreign word grammar
CN106598950A (en) * 2016-12-23 2017-04-26 东北大学 Method for recognizing named entity based on mixing stacking model
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN106899572A (en) * 2017-01-05 2017-06-27 浙江大学 Sterility testing data staging encryption method based on condition random field algorithm
CN106919793A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 A kind of data standardization processing method and device of medical big data
CN106919794A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 Towards the drug class entity recognition method and device of multi-data source
CN106933795A (en) * 2015-12-30 2017-07-07 贺惠新 A kind of extraction method of the discussion main body of discussion type article
CN107330011A (en) * 2017-06-14 2017-11-07 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of many strategy fusions and device
WO2018000278A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Context sensitive multi-round dialogue management system and method based on state machines
CN107808124A (en) * 2017-10-09 2018-03-16 平安科技(深圳)有限公司 Electronic installation, the recognition methods of medical text entities name and storage medium
CN108363701A (en) * 2018-04-13 2018-08-03 达而观信息科技(上海)有限公司 Name entity recognition method and system
CN108829894A (en) * 2018-06-29 2018-11-16 北京百度网讯科技有限公司 Spoken word identification and method for recognizing semantics and its device
CN108959529A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Determination method, apparatus, equipment and the storage medium of problem answers type
US10204289B2 (en) 2017-06-14 2019-02-12 International Business Machines Corporation Hieroglyphic feature-based data processing
US10229674B2 (en) 2015-05-15 2019-03-12 Microsoft Technology Licensing, Llc Cross-language speech recognition and translation
CN109753650A (en) * 2018-12-14 2019-05-14 昆明理工大学 A kind of Laotian name place name entity recognition method merging multiple features
CN110196963A (en) * 2018-02-27 2019-09-03 北京京东尚科信息技术有限公司 Model generation, the method for semantics recognition, system, equipment and storage medium
CN110543638A (en) * 2019-09-10 2019-12-06 杭州橙鹰数据技术有限公司 Named entity identification method and device
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
CN112464667A (en) * 2020-11-18 2021-03-09 北京华彬立成科技有限公司 Text entity identification method and device, electronic equipment and storage medium
CN114330349A (en) * 2022-01-05 2022-04-12 北京航空航天大学 Specific field named entity recognition method
US11966699B2 (en) 2021-06-17 2024-04-23 International Business Machines Corporation Intent classification using non-correlated features

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101447184A (en) * 2007-11-28 2009-06-03 中国科学院声学研究所 Chinese-English bilingual speech recognition method based on phoneme confusion
CN102955773A (en) * 2011-08-31 2013-03-06 国际商业机器公司 Method and system for identifying chemical names in Chinese document

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101447184A (en) * 2007-11-28 2009-06-03 中国科学院声学研究所 Chinese-English bilingual speech recognition method based on phoneme confusion
CN102955773A (en) * 2011-08-31 2013-03-06 国际商业机器公司 Method and system for identifying chemical names in Chinese document

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭家清: "基于条件随机场的命名实体识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015142626A1 (en) * 2014-03-18 2015-09-24 Microsoft Technology Licensing, Llc Named entitty platform and store
CN105531758A (en) * 2014-07-17 2016-04-27 微软技术许可有限责任公司 Speech recognition using foreign word grammar
US10290299B2 (en) 2014-07-17 2019-05-14 Microsoft Technology Licensing, Llc Speech recognition using a foreign word grammar
CN104166643A (en) * 2014-08-19 2014-11-26 南京金娃娃软件科技有限公司 Dialogue act analyzing method in intelligent question-answering system
CN104881398A (en) * 2014-08-29 2015-09-02 北京大学 Method for extracting author affiliation information of English literature published by Chinese authors
CN104881398B (en) * 2014-08-29 2018-03-30 北京大学 Chinese author sends out author's mechanism information abstracting method of english literature
CN104391893A (en) * 2014-11-11 2015-03-04 成都锐理开创信息技术有限公司 Method for timely discovering and tracking dynamic conditions of real estate projects
CN104391893B (en) * 2014-11-11 2018-10-30 成都锐理数据处理技术股份有限公司 Find and track in time the dynamic method of real estate projects
US10229674B2 (en) 2015-05-15 2019-03-12 Microsoft Technology Licensing, Llc Cross-language speech recognition and translation
CN105138515A (en) * 2015-09-02 2015-12-09 百度在线网络技术(北京)有限公司 Named entity recognition method and device
CN105138515B (en) * 2015-09-02 2018-10-19 百度在线网络技术(北京)有限公司 Name entity recognition method and device
CN105243055A (en) * 2015-09-28 2016-01-13 北京橙鑫数据科技有限公司 Multi-language based word segmentation method and apparatus
CN105243055B (en) * 2015-09-28 2018-07-31 北京橙鑫数据科技有限公司 Based on multilingual segmenting method and device
CN106933795A (en) * 2015-12-30 2017-07-07 贺惠新 A kind of extraction method of the discussion main body of discussion type article
WO2018000278A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Context sensitive multi-round dialogue management system and method based on state machines
CN106598950A (en) * 2016-12-23 2017-04-26 东北大学 Method for recognizing named entity based on mixing stacking model
CN106899572A (en) * 2017-01-05 2017-06-27 浙江大学 Sterility testing data staging encryption method based on condition random field algorithm
CN106919793A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 A kind of data standardization processing method and device of medical big data
CN106919793B (en) * 2017-02-24 2019-12-06 黑龙江特士信息技术有限公司 Data standardization processing method and device for medical big data
CN106919794B (en) * 2017-02-24 2019-12-06 黑龙江特士信息技术有限公司 Multi-data-source-oriented medicine entity identification method and device
CN106919794A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 Towards the drug class entity recognition method and device of multi-data source
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
US10217030B2 (en) 2017-06-14 2019-02-26 International Business Machines Corporation Hieroglyphic feature-based data processing
US10204289B2 (en) 2017-06-14 2019-02-12 International Business Machines Corporation Hieroglyphic feature-based data processing
CN107330011A (en) * 2017-06-14 2017-11-07 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of many strategy fusions and device
CN107808124B (en) * 2017-10-09 2019-03-26 平安科技(深圳)有限公司 Electronic device, the recognition methods of medical text entities name and storage medium
CN107808124A (en) * 2017-10-09 2018-03-16 平安科技(深圳)有限公司 Electronic installation, the recognition methods of medical text entities name and storage medium
CN110196963A (en) * 2018-02-27 2019-09-03 北京京东尚科信息技术有限公司 Model generation, the method for semantics recognition, system, equipment and storage medium
CN108363701A (en) * 2018-04-13 2018-08-03 达而观信息科技(上海)有限公司 Name entity recognition method and system
CN108959529A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Determination method, apparatus, equipment and the storage medium of problem answers type
CN108829894A (en) * 2018-06-29 2018-11-16 北京百度网讯科技有限公司 Spoken word identification and method for recognizing semantics and its device
CN108829894B (en) * 2018-06-29 2021-11-12 北京百度网讯科技有限公司 Spoken word recognition and semantic recognition method and device
CN109753650A (en) * 2018-12-14 2019-05-14 昆明理工大学 A kind of Laotian name place name entity recognition method merging multiple features
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
CN110543638A (en) * 2019-09-10 2019-12-06 杭州橙鹰数据技术有限公司 Named entity identification method and device
CN110543638B (en) * 2019-09-10 2022-12-27 杭州橙鹰数据技术有限公司 Named entity identification method and device
CN112464667A (en) * 2020-11-18 2021-03-09 北京华彬立成科技有限公司 Text entity identification method and device, electronic equipment and storage medium
CN112464667B (en) * 2020-11-18 2021-11-16 北京华彬立成科技有限公司 Text entity identification method and device, electronic equipment and storage medium
US11966699B2 (en) 2021-06-17 2024-04-23 International Business Machines Corporation Intent classification using non-correlated features
CN114330349A (en) * 2022-01-05 2022-04-12 北京航空航天大学 Specific field named entity recognition method

Similar Documents

Publication Publication Date Title
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN105718586B (en) The method and device of participle
CN105957518B (en) A kind of method of Mongol large vocabulary continuous speech recognition
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
Rasooli et al. Joint parsing and disfluency detection in linear time
CN110210019A (en) A kind of event argument abstracting method based on recurrent neural network
CN105404621B (en) A kind of method and system that Chinese character is read for blind person
CN100568225C (en) The Words symbolization processing method and the system of numeral and special symbol string in the text
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
CN103678684A (en) Chinese word segmentation method based on navigation information retrieval
CN103020230A (en) Semantic fuzzy matching method
CN104166462A (en) Input method and system for characters
CN110119510B (en) Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word
CN102737013A (en) Device and method for identifying statement emotion based on dependency relation
CN102693279A (en) Method, device and system for fast calculating comment similarity
JP2007087397A (en) Morphological analysis program, correction program, morphological analyzer, correcting device, morphological analysis method, and correcting method
CN104485107A (en) Name voice recognition method, name voice recognition system and name voice recognition equipment
CN103902525A (en) Uygur language part-of-speech tagging method
CN104346326A (en) Method and device for determining emotional characteristics of emotional texts
CN106383814A (en) Word segmentation method of English social media short text
CN110222338A (en) A kind of mechanism name entity recognition method
CN103885924A (en) Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method
Álvarez et al. Towards customized automatic segmentation of subtitles
CN102184172A (en) Chinese character reading system and method for blind people
Comas et al. Sibyl, a factoid question-answering system for spoken documents

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130918