CN105373529B - A kind of Word Intelligent Segmentation method based on Hidden Markov Model - Google Patents

A kind of Word Intelligent Segmentation method based on Hidden Markov Model Download PDF

Info

Publication number
CN105373529B
CN105373529B CN201510708169.7A CN201510708169A CN105373529B CN 105373529 B CN105373529 B CN 105373529B CN 201510708169 A CN201510708169 A CN 201510708169A CN 105373529 B CN105373529 B CN 105373529B
Authority
CN
China
Prior art keywords
state
matrix
probability
observed value
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510708169.7A
Other languages
Chinese (zh)
Other versions
CN105373529A (en
Inventor
邓剑波
马润宇
刘毓智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gansu Zhicheng Network Technology Co Ltd
Original Assignee
Gansu Zhicheng Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gansu Zhicheng Network Technology Co Ltd filed Critical Gansu Zhicheng Network Technology Co Ltd
Priority to CN201510708169.7A priority Critical patent/CN105373529B/en
Publication of CN105373529A publication Critical patent/CN105373529A/en
Application granted granted Critical
Publication of CN105373529B publication Critical patent/CN105373529B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of Word Intelligent Segmentation method based on Hidden Markov Model, this method comprises the following steps:(1) hidden Markov model parameter is established;(2) the state set Θ in article is determined;(3) determiningN, M, LAfterwards, willIt is abbreviated as;(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then its state is labeled with computer, so formed initial π matrixes, A matrixes,B 1 Matrix,B 2 Matrix;To the initial A matrixes of formation andB 1 Matrix andB 2 Matrix carries out article training using BW algorithms, and carries out revaluation by BW algorithm revaluations formula, obtains newπMatrix, A matrixes andB 1 、B 2 Matrix;(6) the parameter of new hidden Markov model is used

Description

A kind of Word Intelligent Segmentation method based on Hidden Markov Model
Technical field
The present invention relates to a kind of Chinese word cutting method, more particularly to a kind of Word Intelligent Segmentation side based on Hidden Markov Model Method.
Background technology
With the development of Internet technology, requirement of the people to computer disposal text is higher and higher.Wherein, software needs There is input to article, display, editor, output etc., and realize that the bases of these functions is then to word in text Identification;But it is different from English, Chinese word does not have natural boundary, so to improve processing of the Chinese software to text Ability, must just carry out Chinese word segmentation.
At present, there are mechanical Chinese word segmentation method, understanding method and statistic law for carrying out the main method of Chinese word segmentation.Mechanical Chinese word segmentation method It is to be segmented according to existing character string in dictionary, but its participle needs substantial amounts of data, and for emerging word Language is helpless;Understanding method be anticipated to article sentence by computer, the analysis of grammer segments, shortcoming is due to Chinese Complexity, have great difficulty in the realization of its algorithm;Statistic law, be by largely train to probability between word and word into Row statistics, so as to fulfill Chinese word segmentation.
Hidden Markov model(Hidden Markov Model, HMM)As a kind of Statistic analysis models, successfully For speech recognition, Activity recognition, the field such as Text region and fault diagnosis.《Chinese based on Hidden Markov Model point Word is studied》(Wei Xiaoning, computer knowledge and technology (academic exchange), 21 phases in 2007)Hidden Markov Model is based on using one kind (HMM) algorithm, is segmented by CHMM (stacking shape Markov model), then is layered, and has both added the accurate of participle Property, it in turn ensure that the efficiency of participle.It is relatively low for frequency but or not hidden Markov model lacks analysis for language environment Common or more appearance but also easily not inaccurate into the situation processing of word.
Asahara M, Goh C L, Wang X, et al. Combining segmenter and chunker for Chinese word segmentation[C]//Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17. Association for Computational Linguistics, 2003: 144-147.
Xue N. Chinese word segmentation as character tagging[J]. Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48.
This two documents describe a kind of Hidden Markov Chinese word segmentation model based on word mark, model inheritance word The advantages of marking model, it can evenly treat the identification problem of vocabulary word and unregistered word, but be a lack of to language environment Analysis.
The content of the invention
The technical problems to be solved by the invention, which are to provide, a kind of to be carried out a large amount of Chinese texts accurate and efficiently segments Word Intelligent Segmentation method based on Hidden Markov Model.
To solve the above problems, a kind of Word Intelligent Segmentation method based on Hidden Markov Model of the present invention, including Following steps:
(1) hidden Markov model parameter is established,
Wherein
NFor markovian state number in model;Remember that n state isθ 1...,θ n, note t moment Markov Chain institute The state at place is, and ...,);
MFor the observed value number of the corresponding possible individual Chinese character of each state;Remember that m observed value is V1..., VM, remember t The observed value that moment is observed, wherein, (V1..., VM);
LFor the observed value number of the corresponding possible multiple Chinese characters of each state;L extension observed value of note... ..., , remember the observed value that t moment is observed, wherein ... ...,);
πRepresent to choose some shape probability of state when sequence starts,π=(π1..., πn), in formula, 1≤іN
AThe transition probability matrix of next state is chosen in expression under current state,( N×N , in formula, 1≤N
B 1 Represent observed value in j-th of statekThe probability matrix of appearance, N×M , in formula,1 ≤N, 1≤M
B 2 Represent observed value s and observed value in j-th of statekThe probability matrix continuously occurred, i.e. extended pattern observed value are general Rate matrix, N×L , in formula, 1≤N, 1≤L
(2) the state set Θ in article is determined:With reference to the language regulation of Chinese, by Chinese word state set elect as prefix H, Tetra- Z, suffix E, only word S states in word;
(3) determiningN, M, LAfterwards, willIt is abbreviated as
(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then with computer pair Its state is labeled, and then counts the probability that each word occurs in the state, formed initial π matrixes, A matrixes,B 1 Matrix,B 2 Matrix;
(5) initial A matrixes to formation and described initialB 1 Matrix and described initialB 2 Matrix uses BW algorithms Article training is carried out, article observation value sequence is known as O, and extended pattern observation value sequence is known as EO, obtains each desired value, and The conditional probability that sequence occurs under this parameter, and BW algorithm weights are pressed to the observed value probability of each observation element Estimate formula and carry out revaluation, calculate the parameter of new hidden Markov modelAnd;And makeA maximum is converged to, so as to obtain newπMatrix, A Matrix andB 1 、B 2 Matrix;
Wherein:
(6) the parameter of new hidden Markov model is used, using viterbi algorithm Chinese word segmentation is carried out, article is divided into according to punctuation mark by multiple sentences, Chinese word segmentation is carried out to each sentence, up to segmenting Article afterwards.
(5) middle BW algorithms refer to the step:Given observation value sequence O= o 1 , o 2 ..., o t , and extension EO=e 1 , e 2 ..., e t , determine one, make Under the conditions of the probability in extension observation sequence EO It is maximum;
Define observed value probability function:
The formula of forwards algorithms is
Initialization:To 1≤iN, have
Recursion:For1≤t≤t-1, 1≤jN, have
Terminate:
The formula of backward algorithm is
Initialization:To 1≤ i ≤ N, have
Recursion:It is rightT=t-1, t-2 ..., 1, and 1≤i N, have
Terminate:
According to the forward and backward variable of definition, BW algorithms have , 1≤tt-1
DefinitionFor given training sequence O and modelWhen,tMoment is iniState,t+1Moment is injState Probability, i.e.,;At the momenttIt is iniShape Probability of state is
(6) middle viterbi algorithm refers to define the stepFor moment t when along a paths q1, q2..., qt, and qt=i, produce Go out e1, e2..., etMaximum probability, that is, have: ;The process for then asking for optimum state sequence Q* is
Initialization:It is right, have
Recursion:It is rightHave,,
Terminate:
Path is recalled, and determines optimum state sequence t=T-1, T-2 ..., 1.
The present invention has the following advantages compared with prior art:
1st, the present invention first passes through Baum-Welch algorithms(Abbreviation BW algorithms)To existing observed value probability matrix, and state Probability matrix is trained, and new observed value probability matrix and state probability matrix is obtained, based on new matrix, then with Wei Te Chinese word segmentation is carried out to article than algorithm.Different from traditional Hidden Markov Model, present invention employs new observed value Probability matrix, i.e. extended pattern observed value probability matrix;This matrix not only covers the information of of Chinese individual character itself, Er Qiehan The information of linguistic context has been covered, has effectively reduced the mistake of statistic law Chinese word segmentation, has substantially increased the accuracy of Chinese word segmentation
2nd, the present invention can carry out substantial amounts of Chinese text accurate and efficiently segment, at a series of other texts The premise of reason technology.
Brief description of the drawings
The embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 is an observation state schematic diagram after example of the present invention extension.
Fig. 2 is example A matrix initial value schematic diagram of the present invention.
Embodiment
A kind of Word Intelligent Segmentation method based on Hidden Markov Model, comprises the following steps:
(1) hidden Markov model parameter is established,
Wherein
NFor markovian state number in model;Remember that n state isθ 1...,θ n, note t moment Markov Chain institute The state at place is, and ...,);
MFor the observed value number of the corresponding possible individual Chinese character of each state;Remember that m observed value is V1..., VM, remember t The observed value that moment is observed, wherein, (V1..., VM);
LFor the observed value number of the corresponding possible multiple Chinese characters of each state;L extension observed value of note... ..., , remember the observed value that t moment is observed, wherein ... ...,);
πRepresent to choose some shape probability of state when sequence starts,π=(π1..., πn), in formula, 1≤іN
AThe transition probability matrix of next state is chosen in expression under current state,( N×N , in formula, 1≤N
B 1 Represent observed value in j-th of statekThe probability matrix of appearance, N×M , in formula,1 ≤N, 1≤M
B 2 Represent observed value s and observed value in j-th of statekThe probability matrix continuously occurred, i.e. extended pattern observed value are general Rate matrix, N×L , in formula, 1≤N, 1≤L
(2) the state set Θ in article is determined:With reference to the language regulation of Chinese, by Chinese word state set elect as prefix H, Tetra- Z, suffix E, only word S states in word.
(3) determiningN, M, LAfterwards, willIt is abbreviated as
(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then with computer pair Its state is labeled, and then counts the probability that each word occurs in the state, formed initial π matrixes, A matrixes,B 1 Matrix,B 2 Matrix.
Such as:BytMoment witht-1Moment collectively forms an element in observation sequence, specific to participle, expands to two A Chinese character, plus the Chinese character at the previous moment of sequence, expands to an observation state(As shown in Figure 1).It is each in status switch The state at a moment by each moment in word sequence observed value(ot)Determine, observed value is extended, into figure two A Chinese character(The moment and previous word), this observed value of t moment(t≠1)I.e..And A matrixes can be by counting To its value of initial value, due to the logical laws in Chinese, some of them value should be 0, as shown in Fig. 2, table 1.
Table 1
(5) to the initial A matrixes of formation and initialB 1 Matrix and initialB 2 Matrix carries out article training, text using BW algorithms Chapter observation value sequence is known as O, and extended pattern observation value sequence is known as EO, obtains each desired value, and sequence is under this parameter The conditional probability of appearance, and revaluation is carried out by BW algorithm revaluations formula to the observed value probability of each observation element, Calculate the parameter of new hidden Markov modelAnd; And makeA maximum is converged to, so as to obtain newπMatrix, A matrixes andB 1 、B 2 Matrix;
Wherein:
Wherein:BW algorithms refer to:Given observation value sequence O= o 1 , o 2 ..., o t , and extension EO=e 1 , e 2 ..., e t , determine one, make Under the conditions of the maximum probability in extension observation sequence EO;
Define observed value probability function:
The formula of forwards algorithms is
Initialization:To 1≤iN, have
Recursion:For1≤t≤t-1, 1≤jN, have
Terminate:
The formula of backward algorithm is
Initialization:To 1≤ i ≤ N, have
Recursion:It is rightT=t-1, t-2 ..., 1, and 1≤i N, have
Terminate:
According to the forward and backward variable of definition, BW algorithms have , 1≤tt-1
DefinitionFor given training sequence O and modelWhen,tMoment is iniState,t+1Moment is injState Probability, i.e.,;At the momenttIt is iniShape Probability of state is
(6) the parameter of new hidden Markov model is used, using viterbi algorithm into Article, is divided into multiple sentences by row Chinese word segmentation according to punctuation mark, Chinese word segmentation is carried out to each sentence, after segmenting Article.
Wherein:Viterbi algorithm refers to defineFor moment t when along a paths q1, q2..., qt, and qt=i, produce e1, e2..., etMaximum probability, that is, have: ;The process for then asking for optimum state sequence Q* is
Initialization:It is right, have
Recursion:It is rightHave,,
Terminate:
Path is recalled, and determines optimum state sequence t=T-1, T-2 ..., 1.

Claims (2)

1. a kind of Word Intelligent Segmentation method based on Hidden Markov Model, comprises the following steps:
(1) hidden Markov model parameter is established,
Wherein
NFor markovian state number in model;Remember that N number of state isθ 1...,θ N, remember residing for t moment Markov Chain State is, and ...,);
MFor the observed value number of the corresponding possible individual Chinese character of each state;Remember that M observed value is V1..., VM, remember t moment It was observed that observed value, wherein, (V1..., VM);
LFor the observed value number of the corresponding possible multiple Chinese characters of each state;L extension observed value of note... ...,, note The observed value that t moment is observed, wherein ... ...,);
πRepresent to choose some shape probability of state when sequence starts,π=(π1..., πN ), in formula, 1 ≤іN
AThe transition probability matrix of next state is chosen in expression under current state,( N×N , in formula, 1≤N
B 1 Represent that j-th of state corresponds toMInkThe probability matrix that a observed value occurs, N×M , in formula,1 ≤j N, 1≤k M
B 2 Represent that j-th of state corresponds toLInkThe probability matrix that the observed value of a element occurs, i.e. extended pattern observed value probability square Battle array, N×L , in formula, 1≤jN, 1≤kL
(2) the state set Θ in article is determined:With reference to the language regulation of Chinese, Chinese word state set is elected as in prefix H, word Z, tetra- suffix E, only word S states;
(3) determiningN, M, LAfterwards, willIt is abbreviated as
(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then with computer to its shape State is labeled, and then counts the probability that each word occurs in the state, formed initial π matrixes, A matrixes,B 1 Matrix,B 2 Matrix;
(5) initial A matrixes to formation and described initialB 1 Matrix and described initialB 2 Matrix is carried out using BW algorithms Article is trained, and article observation value sequence is known as O, and extended pattern observation value sequence is known as EO, obtains each desired value, and sequence The conditional probability occurred under this parameter, and BW algorithm weights are pressed to the observed value probability of each observation element Estimate formula and carry out revaluation, calculate the parameter of new hidden Markov modelAnd;And makeA maximum is converged to, so as to obtain newπSquare Battle array, A matrixes andB 1 、B 2 Matrix;
Wherein:
;T refers to the total length of sequence;
The BW algorithms refer to:Given observation value sequence O= o 1 , o 2 ..., o t , and extension EO=e 1 , e 2 ..., e t , determine One, make Under the conditions of the probability in extension observation sequence EO It is maximum;
Define observed value probability function:
The formula of forwards algorithms is
Initialization:To 1≤iN, have
Recursion:For 1≤ t ≤ T-1,1≤jN, have
Terminate:
The formula of backward algorithm is
Initialization:To 1≤ i ≤ N, have
Recursion:It is rightt=T- 1,T- 2 ..., 1, and 1≤i N, have
Terminate:
According to the forward and backward variable of definition, BW algorithms have , 1≤tT-1;
DefinitionFor given training sequence O and modelWhen,tMoment is iniState,t+1Moment is injState Probability, i.e.,;At the momenttIt is iniShape probability of state is
(6) the parameter of new hidden Markov model is used, carried out using viterbi algorithm Article, is divided into multiple sentences by text participle according to punctuation mark, Chinese word segmentation is carried out to each sentence, up to the text after participle Chapter.
A kind of 2. Word Intelligent Segmentation method based on Hidden Markov Model as claimed in claim 1, it is characterised in that:The step Rapid (6) middle viterbi algorithm refers to defineFor moment t when along a paths q1, q2..., qt, and qt=i, produce e1, e2..., etMaximum probability, that is, have:;Then The process for asking for optimum state sequence Q* is
Initialization:It is right, have
Recursion:It is rightHave,,
Terminate:
Path is recalled, and determines optimum state sequence t=T-1, T-2 ..., 1.
CN201510708169.7A 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model Expired - Fee Related CN105373529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510708169.7A CN105373529B (en) 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510708169.7A CN105373529B (en) 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model

Publications (2)

Publication Number Publication Date
CN105373529A CN105373529A (en) 2016-03-02
CN105373529B true CN105373529B (en) 2018-04-20

Family

ID=55375737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510708169.7A Expired - Fee Related CN105373529B (en) 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model

Country Status (1)

Country Link
CN (1) CN105373529B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912570B (en) * 2016-03-29 2019-11-15 北京工业大学 Resume critical field abstracting method based on hidden Markov model
CN106059829B (en) * 2016-07-15 2019-04-12 北京邮电大学 A kind of network utilization cognitive method based on hidden Markov
CN106569997B (en) * 2016-10-19 2019-12-10 中国科学院信息工程研究所 Science and technology compound phrase identification method based on hidden Markov model
CN107194176B (en) * 2017-05-23 2020-07-28 复旦大学 Method for filling data and predicting behaviors of intelligent operation of disabled person
CN107273356B (en) 2017-06-14 2020-08-11 北京百度网讯科技有限公司 Artificial intelligence based word segmentation method, device, server and storage medium
CN107273360A (en) * 2017-06-21 2017-10-20 成都布林特信息技术有限公司 Chinese notional word extraction algorithm based on semantic understanding
CN107832307B (en) * 2017-11-28 2021-02-23 南京理工大学 Chinese word segmentation method based on undirected graph and single-layer neural network
CN109933778B (en) * 2017-12-18 2024-03-05 北京京东尚科信息技术有限公司 Word segmentation method, word segmentation device and computer readable storage medium
CN108170680A (en) * 2017-12-29 2018-06-15 厦门市美亚柏科信息股份有限公司 Keyword recognition method, terminal device and storage medium based on Hidden Markov Model
CN108647208A (en) * 2018-05-09 2018-10-12 上海应用技术大学 A kind of novel segmenting method based on Chinese
CN109408801A (en) * 2018-08-28 2019-03-01 昆明理工大学 A kind of Chinese word cutting method based on NB Algorithm
CN109284358B (en) * 2018-09-05 2020-08-28 普信恒业科技发展(北京)有限公司 Chinese address noun hierarchical method and device
CN109711121B (en) * 2018-12-27 2021-03-12 清华大学 Text steganography method and device based on Markov model and Huffman coding
CN110162794A (en) * 2019-05-29 2019-08-23 腾讯科技(深圳)有限公司 A kind of method and server of participle
CN110562653B (en) * 2019-07-30 2021-02-09 国网浙江省电力有限公司嘉兴供电公司 Power transformation operation detection intelligent decision system and maintenance system based on ubiquitous power Internet of things
CN111489030B (en) * 2020-04-09 2021-10-15 河北利至人力资源服务有限公司 Text word segmentation based job leaving prediction method and system
CN111767734A (en) * 2020-06-11 2020-10-13 安徽旅贲科技有限公司 Word segmentation method and system based on multilayer hidden horse model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082908A (en) * 2007-06-26 2007-12-05 腾讯科技(深圳)有限公司 Method and system for dividing Chinese sentences
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN104408034A (en) * 2014-11-28 2015-03-11 武汉数为科技有限公司 Text big data-oriented Chinese word segmentation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633819B2 (en) * 1999-04-15 2003-10-14 The Trustees Of Columbia University In The City Of New York Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN101082908A (en) * 2007-06-26 2007-12-05 腾讯科技(深圳)有限公司 Method and system for dividing Chinese sentences
CN104408034A (en) * 2014-11-28 2015-03-11 武汉数为科技有限公司 Text big data-oriented Chinese word segmentation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Combining Segmenter and Chunker for Chinese Word Segmentation;Masayuki Asahara 等;《Processings of the second SIGHAN workshop on Chinese language processing》;20030712;144-147 *
基于词位信息的 HMM 中文分词算法;刘善峰 等;《第十二届全国人机语音通讯学术会议》;20130805;205-208 *

Also Published As

Publication number Publication date
CN105373529A (en) 2016-03-02

Similar Documents

Publication Publication Date Title
CN105373529B (en) A kind of Word Intelligent Segmentation method based on Hidden Markov Model
CN107145483B (en) A kind of adaptive Chinese word cutting method based on embedded expression
EP3767516A1 (en) Named entity recognition method, apparatus, and computer-readable recording medium
CN106598939B (en) A kind of text error correction method and device, server, storage medium
CN111046946B (en) Burma language image text recognition method based on CRNN
CN107944559B (en) Method and system for automatically identifying entity relationship
CN109325112B (en) A kind of across language sentiment analysis method and apparatus based on emoji
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN107346340A (en) A kind of user view recognition methods and system
CN106570456A (en) Handwritten Chinese character recognition method based on full-convolution recursive network
CN107203511A (en) A kind of network text name entity recognition method based on neutral net probability disambiguation
CN105068997B (en) The construction method and device of parallel corpora
CN109376242A (en) Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks
CN107168957A (en) A kind of Chinese word cutting method
CN111680488B (en) Cross-language entity alignment method based on knowledge graph multi-view information
WO2017177809A1 (en) Word segmentation method and system for language text
CN110222329B (en) Chinese word segmentation method and device based on deep learning
CN110909549B (en) Method, device and storage medium for punctuating ancient Chinese
CN110222328B (en) Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN105261358A (en) N-gram grammar model constructing method for voice identification and voice identification system
CN106610937A (en) Information theory-based Chinese automatic word segmentation method
CN110826298B (en) Statement coding method used in intelligent auxiliary password-fixing system
CN107273426A (en) A kind of short text clustering method based on deep semantic route searching
CN108647191A (en) It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method
CN104050255A (en) Joint graph model-based error correction method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180420