CN105373529B - A kind of Word Intelligent Segmentation method based on Hidden Markov Model - Google Patents
A kind of Word Intelligent Segmentation method based on Hidden Markov Model Download PDFInfo
- Publication number
- CN105373529B CN105373529B CN201510708169.7A CN201510708169A CN105373529B CN 105373529 B CN105373529 B CN 105373529B CN 201510708169 A CN201510708169 A CN 201510708169A CN 105373529 B CN105373529 B CN 105373529B
- Authority
- CN
- China
- Prior art keywords
- state
- matrix
- probability
- observed value
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title claims abstract description 23
- 239000011159 matrix material Substances 0.000 claims abstract description 44
- 101100379080 Emericella variicolor andB gene Proteins 0.000 claims abstract description 6
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Document Processing Apparatus (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of Word Intelligent Segmentation method based on Hidden Markov Model, this method comprises the following steps:(1) hidden Markov model parameter is established;(2) the state set Θ in article is determined;(3) determiningN, M, LAfterwards, willIt is abbreviated as;(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then its state is labeled with computer, so formed initial π matrixes, A matrixes,B 1 Matrix,B 2 Matrix;To the initial A matrixes of formation andB 1 Matrix andB 2 Matrix carries out article training using BW algorithms, and carries out revaluation by BW algorithm revaluations formula, obtains newπMatrix, A matrixes andB 1 、B 2 Matrix;(6) the parameter of new hidden Markov model is used
Description
Technical field
The present invention relates to a kind of Chinese word cutting method, more particularly to a kind of Word Intelligent Segmentation side based on Hidden Markov Model
Method.
Background technology
With the development of Internet technology, requirement of the people to computer disposal text is higher and higher.Wherein, software needs
There is input to article, display, editor, output etc., and realize that the bases of these functions is then to word in text
Identification;But it is different from English, Chinese word does not have natural boundary, so to improve processing of the Chinese software to text
Ability, must just carry out Chinese word segmentation.
At present, there are mechanical Chinese word segmentation method, understanding method and statistic law for carrying out the main method of Chinese word segmentation.Mechanical Chinese word segmentation method
It is to be segmented according to existing character string in dictionary, but its participle needs substantial amounts of data, and for emerging word
Language is helpless;Understanding method be anticipated to article sentence by computer, the analysis of grammer segments, shortcoming is due to Chinese
Complexity, have great difficulty in the realization of its algorithm;Statistic law, be by largely train to probability between word and word into
Row statistics, so as to fulfill Chinese word segmentation.
Hidden Markov model(Hidden Markov Model, HMM)As a kind of Statistic analysis models, successfully
For speech recognition, Activity recognition, the field such as Text region and fault diagnosis.《Chinese based on Hidden Markov Model point
Word is studied》(Wei Xiaoning, computer knowledge and technology (academic exchange), 21 phases in 2007)Hidden Markov Model is based on using one kind
(HMM) algorithm, is segmented by CHMM (stacking shape Markov model), then is layered, and has both added the accurate of participle
Property, it in turn ensure that the efficiency of participle.It is relatively low for frequency but or not hidden Markov model lacks analysis for language environment
Common or more appearance but also easily not inaccurate into the situation processing of word.
Asahara M, Goh C L, Wang X, et al. Combining segmenter and chunker
for Chinese word segmentation[C]//Proceedings of the second SIGHAN workshop
on Chinese language processing-Volume 17. Association for Computational
Linguistics, 2003: 144-147.
Xue N. Chinese word segmentation as character tagging[J].
Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48.
This two documents describe a kind of Hidden Markov Chinese word segmentation model based on word mark, model inheritance word
The advantages of marking model, it can evenly treat the identification problem of vocabulary word and unregistered word, but be a lack of to language environment
Analysis.
The content of the invention
The technical problems to be solved by the invention, which are to provide, a kind of to be carried out a large amount of Chinese texts accurate and efficiently segments
Word Intelligent Segmentation method based on Hidden Markov Model.
To solve the above problems, a kind of Word Intelligent Segmentation method based on Hidden Markov Model of the present invention, including
Following steps:
(1) hidden Markov model parameter is established,
Wherein
NFor markovian state number in model;Remember that n state isθ 1...,θ n, note t moment Markov Chain institute
The state at place is, and (...,);
MFor the observed value number of the corresponding possible individual Chinese character of each state;Remember that m observed value is V1..., VM, remember t
The observed value that moment is observed, wherein, (V1..., VM);
LFor the observed value number of the corresponding possible multiple Chinese characters of each state;L extension observed value of note... ...,
, remember the observed value that t moment is observed, wherein (... ...,);
πRepresent to choose some shape probability of state when sequence starts,π=(π1..., πn), in formula, 1≤і≤ N;
AThe transition probability matrix of next state is chosen in expression under current state,() N×N , in formula, 1≤≤ N;
B 1 Represent observed value in j-th of statekThe probability matrix of appearance, N×M , in formula,1 ≤≤ N, 1≤≤ M;
B 2 Represent observed value s and observed value in j-th of statekThe probability matrix continuously occurred, i.e. extended pattern observed value are general
Rate matrix, N×L , in formula, 1≤≤ N, 1≤≤ L;
(2) the state set Θ in article is determined:With reference to the language regulation of Chinese, by Chinese word state set elect as prefix H,
Tetra- Z, suffix E, only word S states in word;
(3) determiningN, M, LAfterwards, willIt is abbreviated as;
(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then with computer pair
Its state is labeled, and then counts the probability that each word occurs in the state, formed initial π matrixes, A matrixes,B 1
Matrix,B 2 Matrix;
(5) initial A matrixes to formation and described initialB 1 Matrix and described initialB 2 Matrix uses BW algorithms
Article training is carried out, article observation value sequence is known as O, and extended pattern observation value sequence is known as EO, obtains each desired value, and
The conditional probability that sequence occurs under this parameter, and BW algorithm weights are pressed to the observed value probability of each observation element
Estimate formula and carry out revaluation, calculate the parameter of new hidden Markov modelAnd;And makeA maximum is converged to, so as to obtain newπMatrix, A
Matrix andB 1 、B 2 Matrix;
Wherein:;
;
(6) the parameter of new hidden Markov model is used, using viterbi algorithm
Chinese word segmentation is carried out, article is divided into according to punctuation mark by multiple sentences, Chinese word segmentation is carried out to each sentence, up to segmenting
Article afterwards.
(5) middle BW algorithms refer to the step:Given observation value sequence O= o 1 , o 2 ..., o t , and extension EO=e 1 , e 2 ..., e t , determine one, make Under the conditions of the probability in extension observation sequence EO
It is maximum;
Define observed value probability function:
;
The formula of forwards algorithms is;
Initialization:To 1≤i ≤ N, have;
Recursion:For1≤t≤t-1, 1≤j≤N, have;
Terminate:;
The formula of backward algorithm is;
Initialization:To 1≤ i ≤ N, have;
Recursion:It is rightT=t-1, t-2 ..., 1, and 1≤i ≤ N, have;
Terminate:;
According to the forward and backward variable of definition, BW algorithms have
, 1≤t≤t-1;
DefinitionFor given training sequence O and modelWhen,tMoment is iniState,t+1Moment is injState
Probability, i.e.,;At the momenttIt is iniShape
Probability of state is。
(6) middle viterbi algorithm refers to define the stepFor moment t when along a paths q1, q2..., qt, and qt=i, produce
Go out e1, e2..., etMaximum probability, that is, have:
;The process for then asking for optimum state sequence Q* is
Initialization:It is right, have;;
Recursion:It is rightHave,;,;
Terminate:;;
Path is recalled, and determines optimum state sequence t=T-1, T-2 ..., 1.
The present invention has the following advantages compared with prior art:
1st, the present invention first passes through Baum-Welch algorithms(Abbreviation BW algorithms)To existing observed value probability matrix, and state
Probability matrix is trained, and new observed value probability matrix and state probability matrix is obtained, based on new matrix, then with Wei Te
Chinese word segmentation is carried out to article than algorithm.Different from traditional Hidden Markov Model, present invention employs new observed value
Probability matrix, i.e. extended pattern observed value probability matrix;This matrix not only covers the information of of Chinese individual character itself, Er Qiehan
The information of linguistic context has been covered, has effectively reduced the mistake of statistic law Chinese word segmentation, has substantially increased the accuracy of Chinese word segmentation
2nd, the present invention can carry out substantial amounts of Chinese text accurate and efficiently segment, at a series of other texts
The premise of reason technology.
Brief description of the drawings
The embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 is an observation state schematic diagram after example of the present invention extension.
Fig. 2 is example A matrix initial value schematic diagram of the present invention.
Embodiment
A kind of Word Intelligent Segmentation method based on Hidden Markov Model, comprises the following steps:
(1) hidden Markov model parameter is established,
Wherein
NFor markovian state number in model;Remember that n state isθ 1...,θ n, note t moment Markov Chain institute
The state at place is, and (...,);
MFor the observed value number of the corresponding possible individual Chinese character of each state;Remember that m observed value is V1..., VM, remember t
The observed value that moment is observed, wherein, (V1..., VM);
LFor the observed value number of the corresponding possible multiple Chinese characters of each state;L extension observed value of note... ...,
, remember the observed value that t moment is observed, wherein (... ...,);
πRepresent to choose some shape probability of state when sequence starts,π=(π1..., πn), in formula, 1≤і≤ N;
AThe transition probability matrix of next state is chosen in expression under current state,() N×N , in formula, 1≤≤ N;
B 1 Represent observed value in j-th of statekThe probability matrix of appearance, N×M , in formula,1 ≤≤ N, 1≤≤ M;
B 2 Represent observed value s and observed value in j-th of statekThe probability matrix continuously occurred, i.e. extended pattern observed value are general
Rate matrix, N×L , in formula, 1≤≤ N, 1≤≤ L。
(2) the state set Θ in article is determined:With reference to the language regulation of Chinese, by Chinese word state set elect as prefix H,
Tetra- Z, suffix E, only word S states in word.
(3) determiningN, M, LAfterwards, willIt is abbreviated as。
(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then with computer pair
Its state is labeled, and then counts the probability that each word occurs in the state, formed initial π matrixes, A matrixes,B 1
Matrix,B 2 Matrix.
Such as:BytMoment witht-1Moment collectively forms an element in observation sequence, specific to participle, expands to two
A Chinese character, plus the Chinese character at the previous moment of sequence, expands to an observation state(As shown in Figure 1).It is each in status switch
The state at a moment by each moment in word sequence observed value(ot)Determine, observed value is extended, into figure two
A Chinese character(The moment and previous word), this observed value of t moment(t≠1)I.e..And A matrixes can be by counting
To its value of initial value, due to the logical laws in Chinese, some of them value should be 0, as shown in Fig. 2, table 1.
Table 1
(5) to the initial A matrixes of formation and initialB 1 Matrix and initialB 2 Matrix carries out article training, text using BW algorithms
Chapter observation value sequence is known as O, and extended pattern observation value sequence is known as EO, obtains each desired value, and sequence is under this parameter
The conditional probability of appearance, and revaluation is carried out by BW algorithm revaluations formula to the observed value probability of each observation element,
Calculate the parameter of new hidden Markov modelAnd;
And makeA maximum is converged to, so as to obtain newπMatrix, A matrixes andB 1 、B 2 Matrix;
Wherein:;
。
Wherein:BW algorithms refer to:Given observation value sequence O= o 1 , o 2 ..., o t , and extension EO=e 1 , e 2 ..., e t , determine one, make Under the conditions of the maximum probability in extension observation sequence EO;
Define observed value probability function:
;
The formula of forwards algorithms is;
Initialization:To 1≤i ≤ N, have;
Recursion:For1≤t≤t-1, 1≤j≤N, have;
Terminate:;
The formula of backward algorithm is;
Initialization:To 1≤ i ≤ N, have;
Recursion:It is rightT=t-1, t-2 ..., 1, and 1≤i ≤ N, have;
Terminate:;
According to the forward and backward variable of definition, BW algorithms have
, 1≤t≤t-1;
DefinitionFor given training sequence O and modelWhen,tMoment is iniState,t+1Moment is injState
Probability, i.e.,;At the momenttIt is iniShape
Probability of state is。
(6) the parameter of new hidden Markov model is used, using viterbi algorithm into
Article, is divided into multiple sentences by row Chinese word segmentation according to punctuation mark, Chinese word segmentation is carried out to each sentence, after segmenting
Article.
Wherein:Viterbi algorithm refers to defineFor moment t when along a paths q1, q2..., qt, and qt=i, produce e1,
e2..., etMaximum probability, that is, have:
;The process for then asking for optimum state sequence Q* is
Initialization:It is right, have;;
Recursion:It is rightHave,;,;
Terminate:;;
Path is recalled, and determines optimum state sequence t=T-1, T-2 ..., 1.
Claims (2)
1. a kind of Word Intelligent Segmentation method based on Hidden Markov Model, comprises the following steps:
(1) hidden Markov model parameter is established,
Wherein
NFor markovian state number in model;Remember that N number of state isθ 1...,θ N, remember residing for t moment Markov Chain
State is, and (...,);
MFor the observed value number of the corresponding possible individual Chinese character of each state;Remember that M observed value is V1..., VM, remember t moment
It was observed that observed value, wherein, (V1..., VM);
LFor the observed value number of the corresponding possible multiple Chinese characters of each state;L extension observed value of note... ...,, note
The observed value that t moment is observed, wherein (... ...,);
πRepresent to choose some shape probability of state when sequence starts,π=(π1..., πN ), in formula, 1
≤і≤N;
AThe transition probability matrix of next state is chosen in expression under current state,() N×N , in formula, 1≤≤N;
B 1 Represent that j-th of state corresponds toMInkThe probability matrix that a observed value occurs, N×M , in formula,1 ≤j ≤N, 1≤k ≤M;
B 2 Represent that j-th of state corresponds toLInkThe probability matrix that the observed value of a element occurs, i.e. extended pattern observed value probability square
Battle array, N×L , in formula, 1≤j≤N, 1≤k≤L;
(2) the state set Θ in article is determined:With reference to the language regulation of Chinese, Chinese word state set is elected as in prefix H, word
Z, tetra- suffix E, only word S states;
(3) determiningN, M, LAfterwards, willIt is abbreviated as;
(4) computer language is used, first substantial amounts of article is segmented using mechanical Chinese word segmentation method;Then with computer to its shape
State is labeled, and then counts the probability that each word occurs in the state, formed initial π matrixes, A matrixes,B 1 Matrix,B 2 Matrix;
(5) initial A matrixes to formation and described initialB 1 Matrix and described initialB 2 Matrix is carried out using BW algorithms
Article is trained, and article observation value sequence is known as O, and extended pattern observation value sequence is known as EO, obtains each desired value, and sequence
The conditional probability occurred under this parameter, and BW algorithm weights are pressed to the observed value probability of each observation element
Estimate formula and carry out revaluation, calculate the parameter of new hidden Markov modelAnd;And makeA maximum is converged to, so as to obtain newπSquare
Battle array, A matrixes andB 1 、B 2 Matrix;
Wherein:;
;T refers to the total length of sequence;
The BW algorithms refer to:Given observation value sequence O= o 1 , o 2 ..., o t , and extension EO=e 1 , e 2 ..., e t , determine
One, make Under the conditions of the probability in extension observation sequence EO
It is maximum;
Define observed value probability function:
;
The formula of forwards algorithms is;
Initialization:To 1≤i≤N, have;
Recursion:For 1≤ t ≤ T-1,1≤j≤N, have;
Terminate:;
The formula of backward algorithm is;
Initialization:To 1≤ i ≤ N, have;
Recursion:It is rightt=T- 1,T- 2 ..., 1, and 1≤i ≤N, have;
Terminate:;
According to the forward and backward variable of definition, BW algorithms have
, 1≤t≤T-1;
DefinitionFor given training sequence O and modelWhen,tMoment is iniState,t+1Moment is injState
Probability, i.e.,;At the momenttIt is iniShape probability of state is;
(6) the parameter of new hidden Markov model is used, carried out using viterbi algorithm
Article, is divided into multiple sentences by text participle according to punctuation mark, Chinese word segmentation is carried out to each sentence, up to the text after participle
Chapter.
A kind of 2. Word Intelligent Segmentation method based on Hidden Markov Model as claimed in claim 1, it is characterised in that:The step
Rapid (6) middle viterbi algorithm refers to defineFor moment t when along a paths q1, q2..., qt, and qt=i, produce e1, e2...,
etMaximum probability, that is, have:;Then
The process for asking for optimum state sequence Q* is
Initialization:It is right, have;;
Recursion:It is rightHave,;,;
Terminate:;;
Path is recalled, and determines optimum state sequence t=T-1, T-2 ..., 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510708169.7A CN105373529B (en) | 2015-10-28 | 2015-10-28 | A kind of Word Intelligent Segmentation method based on Hidden Markov Model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510708169.7A CN105373529B (en) | 2015-10-28 | 2015-10-28 | A kind of Word Intelligent Segmentation method based on Hidden Markov Model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105373529A CN105373529A (en) | 2016-03-02 |
CN105373529B true CN105373529B (en) | 2018-04-20 |
Family
ID=55375737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510708169.7A Expired - Fee Related CN105373529B (en) | 2015-10-28 | 2015-10-28 | A kind of Word Intelligent Segmentation method based on Hidden Markov Model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105373529B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105912570B (en) * | 2016-03-29 | 2019-11-15 | 北京工业大学 | Resume critical field abstracting method based on hidden Markov model |
CN106059829B (en) * | 2016-07-15 | 2019-04-12 | 北京邮电大学 | A kind of network utilization cognitive method based on hidden Markov |
CN106569997B (en) * | 2016-10-19 | 2019-12-10 | 中国科学院信息工程研究所 | Science and technology compound phrase identification method based on hidden Markov model |
CN107194176B (en) * | 2017-05-23 | 2020-07-28 | 复旦大学 | Method for filling data and predicting behaviors of intelligent operation of disabled person |
CN107273356B (en) | 2017-06-14 | 2020-08-11 | 北京百度网讯科技有限公司 | Artificial intelligence based word segmentation method, device, server and storage medium |
CN107273360A (en) * | 2017-06-21 | 2017-10-20 | 成都布林特信息技术有限公司 | Chinese notional word extraction algorithm based on semantic understanding |
CN107832307B (en) * | 2017-11-28 | 2021-02-23 | 南京理工大学 | Chinese word segmentation method based on undirected graph and single-layer neural network |
CN109933778B (en) * | 2017-12-18 | 2024-03-05 | 北京京东尚科信息技术有限公司 | Word segmentation method, word segmentation device and computer readable storage medium |
CN108170680A (en) * | 2017-12-29 | 2018-06-15 | 厦门市美亚柏科信息股份有限公司 | Keyword recognition method, terminal device and storage medium based on Hidden Markov Model |
CN108647208A (en) * | 2018-05-09 | 2018-10-12 | 上海应用技术大学 | A kind of novel segmenting method based on Chinese |
CN109408801A (en) * | 2018-08-28 | 2019-03-01 | 昆明理工大学 | A kind of Chinese word cutting method based on NB Algorithm |
CN109284358B (en) * | 2018-09-05 | 2020-08-28 | 普信恒业科技发展(北京)有限公司 | Chinese address noun hierarchical method and device |
CN109711121B (en) * | 2018-12-27 | 2021-03-12 | 清华大学 | Text steganography method and device based on Markov model and Huffman coding |
CN110162794A (en) * | 2019-05-29 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of method and server of participle |
CN110562653B (en) * | 2019-07-30 | 2021-02-09 | 国网浙江省电力有限公司嘉兴供电公司 | Power transformation operation detection intelligent decision system and maintenance system based on ubiquitous power Internet of things |
CN111489030B (en) * | 2020-04-09 | 2021-10-15 | 河北利至人力资源服务有限公司 | Text word segmentation based job leaving prediction method and system |
CN111767734A (en) * | 2020-06-11 | 2020-10-13 | 安徽旅贲科技有限公司 | Word segmentation method and system based on multilayer hidden horse model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101082908A (en) * | 2007-06-26 | 2007-12-05 | 腾讯科技(深圳)有限公司 | Method and system for dividing Chinese sentences |
CN101201818A (en) * | 2006-12-13 | 2008-06-18 | 李萍 | Method for calculating language structure, executing participle, machine translation and speech recognition using HMM |
CN104408034A (en) * | 2014-11-28 | 2015-03-11 | 武汉数为科技有限公司 | Text big data-oriented Chinese word segmentation method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6633819B2 (en) * | 1999-04-15 | 2003-10-14 | The Trustees Of Columbia University In The City Of New York | Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins |
-
2015
- 2015-10-28 CN CN201510708169.7A patent/CN105373529B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201818A (en) * | 2006-12-13 | 2008-06-18 | 李萍 | Method for calculating language structure, executing participle, machine translation and speech recognition using HMM |
CN101082908A (en) * | 2007-06-26 | 2007-12-05 | 腾讯科技(深圳)有限公司 | Method and system for dividing Chinese sentences |
CN104408034A (en) * | 2014-11-28 | 2015-03-11 | 武汉数为科技有限公司 | Text big data-oriented Chinese word segmentation method |
Non-Patent Citations (2)
Title |
---|
Combining Segmenter and Chunker for Chinese Word Segmentation;Masayuki Asahara 等;《Processings of the second SIGHAN workshop on Chinese language processing》;20030712;144-147 * |
基于词位信息的 HMM 中文分词算法;刘善峰 等;《第十二届全国人机语音通讯学术会议》;20130805;205-208 * |
Also Published As
Publication number | Publication date |
---|---|
CN105373529A (en) | 2016-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105373529B (en) | A kind of Word Intelligent Segmentation method based on Hidden Markov Model | |
CN107145483B (en) | A kind of adaptive Chinese word cutting method based on embedded expression | |
EP3767516A1 (en) | Named entity recognition method, apparatus, and computer-readable recording medium | |
CN106598939B (en) | A kind of text error correction method and device, server, storage medium | |
CN111046946B (en) | Burma language image text recognition method based on CRNN | |
CN107944559B (en) | Method and system for automatically identifying entity relationship | |
CN109325112B (en) | A kind of across language sentiment analysis method and apparatus based on emoji | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN107346340A (en) | A kind of user view recognition methods and system | |
CN106570456A (en) | Handwritten Chinese character recognition method based on full-convolution recursive network | |
CN107203511A (en) | A kind of network text name entity recognition method based on neutral net probability disambiguation | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN109376242A (en) | Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks | |
CN107168957A (en) | A kind of Chinese word cutting method | |
CN111680488B (en) | Cross-language entity alignment method based on knowledge graph multi-view information | |
WO2017177809A1 (en) | Word segmentation method and system for language text | |
CN110222329B (en) | Chinese word segmentation method and device based on deep learning | |
CN110909549B (en) | Method, device and storage medium for punctuating ancient Chinese | |
CN110222328B (en) | Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium | |
CN105261358A (en) | N-gram grammar model constructing method for voice identification and voice identification system | |
CN106610937A (en) | Information theory-based Chinese automatic word segmentation method | |
CN110826298B (en) | Statement coding method used in intelligent auxiliary password-fixing system | |
CN107273426A (en) | A kind of short text clustering method based on deep semantic route searching | |
CN108647191A (en) | It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method | |
CN104050255A (en) | Joint graph model-based error correction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180420 |