CN105373529A - Intelligent word segmentation method based on hidden Markov model - Google Patents

Intelligent word segmentation method based on hidden Markov model Download PDF

Info

Publication number
CN105373529A
CN105373529A CN201510708169.7A CN201510708169A CN105373529A CN 105373529 A CN105373529 A CN 105373529A CN 201510708169 A CN201510708169 A CN 201510708169A CN 105373529 A CN105373529 A CN 105373529A
Authority
CN
China
Prior art keywords
matrix
state
observed value
probability
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510708169.7A
Other languages
Chinese (zh)
Other versions
CN105373529B (en
Inventor
邓剑波
马润宇
刘毓智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gansu Zhicheng Network Technology Co Ltd
Original Assignee
Gansu Zhicheng Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gansu Zhicheng Network Technology Co Ltd filed Critical Gansu Zhicheng Network Technology Co Ltd
Priority to CN201510708169.7A priority Critical patent/CN105373529B/en
Publication of CN105373529A publication Critical patent/CN105373529A/en
Application granted granted Critical
Publication of CN105373529B publication Critical patent/CN105373529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an intelligent word segmentation method based on a hidden Markov model. The method comprises the following steps of (1) building a parameter Lambda<0>=(N, M, L, Pi, A, B<1>, B<2>) of the hidden Markov model; (2) determining a state set Theta in an article; (3) abbreviating Lambda<0>=(N, M, L, Pi, A, B<1>, B<2>) as Lambda=(Pi, A, B<1>, B<2>) after determining N, M and L; (4) carrying out word segmentation on a large amount of articles by a mechanical word segmentation method through applying computer languages, and then marking the states of the articles by a computer to further form an initial Pi matrix, an A matrix, a B<1> matrix and a B<2> matrix; (5) carrying out article training on the formed initial A matrix, the B<1> matrix and the B<2> matrix by using a BW algorithm, and revaluating according to a BW algorithm revaluation formula to obtain a new Pi matrix, a new A matrix, a new B<1> matrix and a new B<2> matrix; and (6) carrying out Chinese word segmentation by using a viterbi algorithm according to a new parameter of the hidden Markov model (please see the abstract), dividing the article into a plurality of sentences according to punctuation symbols, and carrying out Chinese word segmentation on each sentence, thereby obtaining the article after word segmentation. By the intelligent word segmentation method, accurate and high-efficiency word segmentation can be carried out on a large amount of Chinese texts.

Description

A kind of Word Intelligent Segmentation method based on Hidden Markov Model (HMM)
Technical field
The present invention relates to a kind of Chinese word cutting method, particularly relate to a kind of Word Intelligent Segmentation method based on Hidden Markov Model (HMM).
Background technology
Along with the development of Internet technology, the requirement of people to computer disposal text is more and more higher.Wherein, software needs the function such as input, display, editor, output had article, and the basis realizing these functions is then the identification to word in text; But different from English, the word of Chinese does not have natural boundary, so want to improve Chinese software to the processing power of text, just Chinese word segmentation must be carried out.
At present, the main method being used for carrying out Chinese word segmentation has mechanical Chinese word segmentation method, understanding method and statistic law.Mechanical Chinese word segmentation method carries out participle according to character string existing in dictionary, but its participle needs a large amount of data, and helpless for emerging word; Understanding method carries out participle by the analysis of computing machine to article sentence meaning, grammer, and shortcoming is the complicacy due to Chinese, and the realization of its algorithm has great difficulty; Statistic law, is added up probability between word and word by a large amount of training, thus realizes Chinese word segmentation.
Hidden Markov model (HiddenMarkovModel, HMM), as a kind of Statistic analysis models, is successfully used to speech recognition, Activity recognition, the field such as Text region and fault diagnosis." studying based on the Chinese word segmentation of Hidden Markov Model (HMM) " (Wei Xiaoning, computer knowledge and technology (academic exchange), 21 phases in 2007) adopt a kind of algorithm based on Hidden Markov Model (HMM) (HMM), participle is carried out by CHMM (stacked shape Markov model), do layering again, both add the accuracy of participle, in turn ensure that the efficiency of participle.But hidden Markov model lacks for the analysis of language environment, lower but be of little use or more appearance but do not become the situation process of word also easily inaccurate for frequency.
AsaharaM,GohCL,WangX,etal.CombiningsegmenterandchunkerforChinesewordsegmentation[C]//ProceedingsofthesecondSIGHANworkshoponChineselanguageprocessing-Volume17.AssociationforComputationalLinguistics,2003:144-147.
XueN.Chinesewordsegmentationascharactertagging[J].ComputationalLinguisticsandChineseLanguageProcessing,2003,8(1):29-48.
This two sections of documents describe a kind of Hidden Markov Chinese word segmentation model based on sign note, the advantage of this model inheritance word marking model it can treat the identification problem of vocabulary word and unregistered word evenly, but lack the analysis to language environment.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of Word Intelligent Segmentation method based on Hidden Markov Model (HMM) of a large amount of Chinese text being carried out to accurately and efficiently participle.
For solving the problem, a kind of Word Intelligent Segmentation method based on Hidden Markov Model (HMM) of the present invention, comprises the following steps:
(1) set up hidden Markov model parameter ,
Wherein
nfor state number markovian in model; Remember that n state is θ 1..., θ n, the state residing for note t Markov chain is , and ( ..., );
mfor the observed value number of possible individual Chinese character corresponding to each state; Remember that m observed value is V 1..., V m, the observed value that note t is observed , wherein, (V 1..., V m);
lfor the observed value number of possible multiple Chinese characters corresponding to each state; Remember l expansion observed value ..., , the observed value that note t is observed , wherein ( ..., );
πcertain shape probability of state is chosen when representing that sequence starts, π=(π 1..., π n), in formula , 1≤ ?n;
arepresent the transition probability matrix choosing next state under current state, ( ) n × N , in formula , 1≤ n;
b 1 represent observed value in a jth state kthe probability matrix occurred, n × M , in formula , 1≤ n, 1≤ m;
b 2 represent observed value s and observed value in a jth state kthe probability matrix of continuous appearance, i.e. extended pattern observed value probability matrix, n × L , in formula , 1≤ n, 1≤ l;
(2) determine the state set Θ in article: in conjunction with the language regulation of Chinese, Chinese word state set is elected as Z in prefix H, word, suffix E, solely word S one of four states;
(3) determining n, M, Lafterwards, will referred to as ;
(4) use computerese, first adopt mechanical Chinese word segmentation method to carry out participle to a large amount of articles; Then with computing machine, its state is marked, and then adds up the probability that each word occurs in this state, formed initial π matrix, A matrix, b 1 matrix, b 2 matrix;
To the described initial A matrix formed and described initially b 1 matrix and described initially b 2 matrix adopts BW algorithm to carry out article training, and article observed value sequence is called O, and extended pattern observed value sequence is called EO, obtains each expectation value, and the conditional probability that sequence occurs under this parameter , and by BW algorithm revaluation formula, revaluation is carried out to the observed value probability of each observation element, calculate the parameter of new hidden Markov model and ; And make converge to a maximal value, thus obtain new πmatrix, A matrix and b 1 , B 2 matrix;
Wherein: ;
(6) use the parameter of new hidden Markov model , adopt viterbi algorithm carry out Chinese word segmentation, according to punctuation mark, article is divided into multiple sentence, Chinese word segmentation is carried out to each sentence, obtain the article after participle.
Described step (5) middle BW algorithm refers to: a given observed value sequence O= o 1 , o 2 ..., o t , and expansion EO= e 1 , e 2 ..., e t , determine one , make ? the maximum probability of expansion observation sequence EO is under condition;
Definition observed value probability function:
The formula of forwards algorithms is ;
Initialization: to 1≤ in, have ;
Recursion: for 1≤t≤t-1, 1≤ jn, have ;
Stop: ;
The formula of backward algorithm is ;
Initialization: to 1 ≤ i≤N, have ;
Recursion: right t=t-1, t-2 ..., 1, and 1≤ in, have ;
Stop: ;
According to the forward and backward variable of definition, BW algorithm has , 1≤ tt-1;
Definition for given training sequence O and model time, tmoment is in istate, t+1moment is in jshape probability of state, namely ; In the moment tbe in ishape probability of state is .
Described step (6) middle viterbi algorithm refers to definition for during moment t along a paths q 1, q 2..., q t, and q t= i, produce e 1, e 2..., e tmaximum probability, namely have: ; The process then asking for optimum condition sequence Q* is
Initialization: right , have ; ;
Recursion: right have , ; , ;
Stop: ; ;
Path is recalled, and determines optimum condition sequence t=T-1, T-2 ..., 1.
The present invention compared with prior art has the following advantages:
1, the present invention first passes through Baum-Welch algorithm (being called for short BW algorithm) to existing observed value probability matrix, train with state probability matrix, obtain new observed value probability matrix and state probability matrix, based on new matrix, then use viterbi algorithm to carry out Chinese word segmentation to article.Different from traditional Hidden Markov Model (HMM), present invention employs novel observed value probability matrix, i.e. extended pattern observed value probability matrix; This matrix not only covers the information of of Chinese individual character itself, and covers the information of linguistic context, effectively reduces the mistake of statistic law Chinese word segmentation, substantially increases the accuracy of Chinese word segmentation
2, the present invention can carry out participle accurately and efficiently, as the prerequisite of other a series of text-processing technology to a large amount of Chinese texts.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.
Fig. 1 is an observation state schematic diagram after example of the present invention expansion.
Fig. 2 is example A matrix initial value schematic diagram of the present invention.
Embodiment
Based on a Word Intelligent Segmentation method for Hidden Markov Model (HMM), comprise the following steps:
(1) set up hidden Markov model parameter ,
Wherein
nfor state number markovian in model; Remember that n state is θ 1..., θ n, the state residing for note t Markov chain is , and ( ..., );
mfor the observed value number of possible individual Chinese character corresponding to each state; Remember that m observed value is V 1..., V m, the observed value that note t is observed , wherein, (V 1..., V m);
lfor the observed value number of possible multiple Chinese characters corresponding to each state; Remember l expansion observed value ..., , the observed value that note t is observed , wherein ( ..., );
πcertain shape probability of state is chosen when representing that sequence starts, π=(π 1..., π n), in formula , 1≤ ?n;
arepresent the transition probability matrix choosing next state under current state, ( ) n × N , in formula , 1≤ n;
b 1 represent observed value in a jth state kthe probability matrix occurred, n × M , in formula , 1≤ n, 1≤ m;
b 2 represent observed value s and observed value in a jth state kthe probability matrix of continuous appearance, i.e. extended pattern observed value probability matrix, n × L , in formula , 1≤ n, 1≤ l.
(2) determine the state set Θ in article: in conjunction with the language regulation of Chinese, Chinese word state set is elected as Z in prefix H, word, suffix E, solely word S one of four states.
(3) determining n, M, Lafterwards, will referred to as .
(4) use computerese, first adopt mechanical Chinese word segmentation method to carry out participle to a large amount of articles; Then with computing machine, its state is marked, and then adds up the probability that each word occurs in this state, formed initial π matrix, A matrix, b 1 matrix, b 2 matrix.
Such as: by tmoment with t-1moment forms an element in observation sequence jointly, specific to participle, expands to two Chinese characters, adds the Chinese character in the previous moment of sequence, expands to an observation state (as shown in Figure 1).In status switch, the state in each moment is by the observed value (o in each moment in word sequence t) determine, observed value is expanded, and to have become in figure two Chinese characters (this moment and previous word), this observed value of t (t ≠ 1) is namely .And A matrix can obtain its value of initial value by statistics, due to the logical laws in Chinese, some of them value should be 0, as shown in Fig. 2, table 1.
Table 1
(5) to the initial A matrix and initial formed b 1 matrix and initial b 2 matrix adopts BW algorithm to carry out article training, and article observed value sequence is called O, and extended pattern observed value sequence is called EO, obtains each expectation value, and the conditional probability that sequence occurs under this parameter , and by BW algorithm revaluation formula, revaluation is carried out to the observed value probability of each observation element, calculate the parameter of new hidden Markov model and ; And make converge to a maximal value, thus obtain new πmatrix, A matrix and b 1 , B 2 matrix;
Wherein: ;
Wherein: BW algorithm refers to: a given observed value sequence O= o 1 , o 2 ..., o t , and expansion EO= e 1 , e 2 ..., e t , determine one , make ? the maximum probability of expansion observation sequence EO is under condition;
Definition observed value probability function:
The formula of forwards algorithms is ;
Initialization: to 1≤ in, have ;
Recursion: for 1≤t≤t-1, 1≤ jn, have ;
Stop: ;
The formula of backward algorithm is ;
Initialization: to 1 ≤ i≤N, have ;
Recursion: right t=t-1, t-2 ..., 1, and 1≤ in, have ;
Stop: ;
According to the forward and backward variable of definition, BW algorithm has , 1≤ tt-1;
Definition for given training sequence O and model time, tmoment is in istate, t+1moment is in jshape probability of state, namely ; In the moment tbe in ishape probability of state is .
(6) use the parameter of new hidden Markov model , adopt viterbi algorithm carry out Chinese word segmentation, according to punctuation mark, article is divided into multiple sentence, Chinese word segmentation is carried out to each sentence, obtain the article after participle.
Wherein: viterbi algorithm refers to definition for during moment t along a paths q 1, q 2..., q t, and q t= i, produce e 1, e 2..., e tmaximum probability, namely have: ; The process then asking for optimum condition sequence Q* is
Initialization: right , have ; ;
Recursion: right have , ; , ;
Stop: ; ;
Path is recalled, and determines optimum condition sequence t=T-1, T-2 ..., 1.

Claims (3)

1., based on a Word Intelligent Segmentation method for Hidden Markov Model (HMM), comprise the following steps:
(1) set up hidden Markov model parameter ,
Wherein
nfor state number markovian in model; Remember that n state is θ 1..., θ n, the state residing for note t Markov chain is , and ( ..., );
mfor the observed value number of possible individual Chinese character corresponding to each state; Remember that m observed value is V 1..., V m, the observed value that note t is observed, wherein, (V 1..., V m);
lfor the observed value number of possible multiple Chinese characters corresponding to each state; Remember l expansion observed value ..., , the observed value that note t is observed , wherein ( ..., );
πcertain shape probability of state is chosen when representing that sequence starts, π=(π 1..., π n), in formula , 1≤ ?n;
arepresent the transition probability matrix choosing next state under current state, () n × N , in formula , 1≤ n;
b 1 represent observed value in a jth state kthe probability matrix occurred, n × M , in formula , 1≤ n, 1≤ m;
b 2 represent observed value s and observed value in a jth state kthe probability matrix of continuous appearance, i.e. extended pattern observed value probability matrix, n × L , in formula , 1≤ n, 1≤ l;
(2) determine the state set Θ in article: in conjunction with the language regulation of Chinese, Chinese word state set is elected as Z in prefix H, word, suffix E, solely word S one of four states;
(3) determining n, M, Lafterwards, will referred to as ;
(4) use computerese, first adopt mechanical Chinese word segmentation method to carry out participle to a large amount of articles; Then with computing machine, its state is marked, and then adds up the probability that each word occurs in this state, formed initial π matrix, A matrix, b 1 matrix, b 2 matrix;
To the described initial A matrix formed and described initially b 1 matrix and described initially b 2 matrix adopts BW algorithm to carry out article training, and article observed value sequence is called O, and extended pattern observed value sequence is called EO, obtains each expectation value, and the conditional probability that sequence occurs under this parameter , and by BW algorithm revaluation formula, revaluation is carried out to the observed value probability of each observation element, calculate the parameter of new hidden Markov model and ; And make converge to a maximal value, thus obtain new πmatrix, A matrix and b 1 , B 2 matrix;
Wherein: ;
(6) use the parameter of new hidden Markov model , adopt viterbi algorithm carry out Chinese word segmentation, according to punctuation mark, article is divided into multiple sentence, Chinese word segmentation is carried out to each sentence, obtain the article after participle.
2. a kind of Word Intelligent Segmentation method based on Hidden Markov Model (HMM) as claimed in claim 1, is characterized in that: described step (5) in BW algorithm refer to: a given observed value sequence O= o 1 , o 2 ..., o t , and expansion EO= e 1 , e 2 ..., e t , determine one , make ? the maximum probability of expansion observation sequence EO is under condition;
Definition observed value probability function:
The formula of forwards algorithms is ;
Initialization: to 1≤ in, have ;
Recursion: for 1≤t≤t-1, 1≤ jn, have ;
Stop: ;
The formula of backward algorithm is ;
Initialization: to 1 ≤ i≤N, have ;
Recursion: right t=t-1, t-2 ..., 1, and 1≤ in, have ;
Stop: ;
According to the forward and backward variable of definition, BW algorithm has , 1≤ tt-1;
Definition for given training sequence O and model time, tmoment is in istate, t+1moment is in jshape probability of state, namely ; In the moment tbe in ishape probability of state is .
3. a kind of Word Intelligent Segmentation method based on Hidden Markov Model (HMM) as claimed in claim 1, is characterized in that: described step (6) in viterbi algorithm refer to definition for during moment t along a paths q 1, q 2..., q t, and q t= i, produce e 1, e 2..., e tmaximum probability, namely have: ; The process then asking for optimum condition sequence Q* is
Initialization: right , have ; ;
Recursion: right have , ; , ;
Stop: ; ;
Path is recalled, and determines optimum condition sequence t=T-1, T-2 ..., 1.
CN201510708169.7A 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model Active CN105373529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510708169.7A CN105373529B (en) 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510708169.7A CN105373529B (en) 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model

Publications (2)

Publication Number Publication Date
CN105373529A true CN105373529A (en) 2016-03-02
CN105373529B CN105373529B (en) 2018-04-20

Family

ID=55375737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510708169.7A Active CN105373529B (en) 2015-10-28 2015-10-28 A kind of Word Intelligent Segmentation method based on Hidden Markov Model

Country Status (1)

Country Link
CN (1) CN105373529B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912570A (en) * 2016-03-29 2016-08-31 北京工业大学 English resume key field extraction method based on hidden Markov model
CN106059829A (en) * 2016-07-15 2016-10-26 北京邮电大学 Hidden markov-based network utilization ratio sensing method
CN106569997A (en) * 2016-10-19 2017-04-19 中国科学院信息工程研究所 Scientific and technological compound phrase identification method based on hidden Markov model
CN107194176A (en) * 2017-05-23 2017-09-22 复旦大学 A kind of data filling of disabled person's intelligent operation and the method for behavior prediction
CN107273360A (en) * 2017-06-21 2017-10-20 成都布林特信息技术有限公司 Chinese notional word extraction algorithm based on semantic understanding
CN107273356A (en) * 2017-06-14 2017-10-20 北京百度网讯科技有限公司 Segmenting method, device, server and storage medium based on artificial intelligence
CN107832307A (en) * 2017-11-28 2018-03-23 南京理工大学 Chinese word cutting method based on non-directed graph and monolayer neural networks
CN108170680A (en) * 2017-12-29 2018-06-15 厦门市美亚柏科信息股份有限公司 Keyword recognition method, terminal device and storage medium based on Hidden Markov Model
CN108647208A (en) * 2018-05-09 2018-10-12 上海应用技术大学 A kind of novel segmenting method based on Chinese
CN109284358A (en) * 2018-09-05 2019-01-29 普信恒业科技发展(北京)有限公司 A kind of hierarchical method and apparatus of Chinese address noun
CN109408801A (en) * 2018-08-28 2019-03-01 昆明理工大学 A kind of Chinese word cutting method based on NB Algorithm
CN109711121A (en) * 2018-12-27 2019-05-03 清华大学 Text steganography method and device based on Markov model and Huffman encoding
CN109933778A (en) * 2017-12-18 2019-06-25 北京京东尚科信息技术有限公司 Segmenting method, device and computer readable storage medium
CN110562653A (en) * 2019-07-30 2019-12-13 国网浙江省电力有限公司嘉兴供电公司 power transformation operation detection intelligent decision system and maintenance system based on ubiquitous power Internet of things
CN111489030A (en) * 2020-04-09 2020-08-04 河北利至人力资源服务有限公司 Text word segmentation based job leaving prediction method and system
CN111767734A (en) * 2020-06-11 2020-10-13 安徽旅贲科技有限公司 Word segmentation method and system based on multilayer hidden horse model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082908A (en) * 2007-06-26 2007-12-05 腾讯科技(深圳)有限公司 Method and system for dividing Chinese sentences
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
US20100004874A1 (en) * 1999-04-15 2010-01-07 Andrey Rzhetsky Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins
CN104408034A (en) * 2014-11-28 2015-03-11 武汉数为科技有限公司 Text big data-oriented Chinese word segmentation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100004874A1 (en) * 1999-04-15 2010-01-07 Andrey Rzhetsky Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN101082908A (en) * 2007-06-26 2007-12-05 腾讯科技(深圳)有限公司 Method and system for dividing Chinese sentences
CN104408034A (en) * 2014-11-28 2015-03-11 武汉数为科技有限公司 Text big data-oriented Chinese word segmentation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MASAYUKI ASAHARA 等: "Combining Segmenter and Chunker for Chinese Word Segmentation", 《PROCESSINGS OF THE SECOND SIGHAN WORKSHOP ON CHINESE LANGUAGE PROCESSING》 *
刘善峰 等: "基于词位信息的 HMM 中文分词算法", 《第十二届全国人机语音通讯学术会议》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912570B (en) * 2016-03-29 2019-11-15 北京工业大学 Resume critical field abstracting method based on hidden Markov model
CN105912570A (en) * 2016-03-29 2016-08-31 北京工业大学 English resume key field extraction method based on hidden Markov model
CN106059829A (en) * 2016-07-15 2016-10-26 北京邮电大学 Hidden markov-based network utilization ratio sensing method
CN106059829B (en) * 2016-07-15 2019-04-12 北京邮电大学 A kind of network utilization cognitive method based on hidden Markov
CN106569997A (en) * 2016-10-19 2017-04-19 中国科学院信息工程研究所 Scientific and technological compound phrase identification method based on hidden Markov model
CN106569997B (en) * 2016-10-19 2019-12-10 中国科学院信息工程研究所 Science and technology compound phrase identification method based on hidden Markov model
CN107194176A (en) * 2017-05-23 2017-09-22 复旦大学 A kind of data filling of disabled person's intelligent operation and the method for behavior prediction
CN107194176B (en) * 2017-05-23 2020-07-28 复旦大学 Method for filling data and predicting behaviors of intelligent operation of disabled person
CN107273356A (en) * 2017-06-14 2017-10-20 北京百度网讯科技有限公司 Segmenting method, device, server and storage medium based on artificial intelligence
CN107273356B (en) * 2017-06-14 2020-08-11 北京百度网讯科技有限公司 Artificial intelligence based word segmentation method, device, server and storage medium
US10650096B2 (en) 2017-06-14 2020-05-12 Beijing Baidu Netcom Science And Techonlogy Co., Ltd. Word segmentation method based on artificial intelligence, server and storage medium
CN107273360A (en) * 2017-06-21 2017-10-20 成都布林特信息技术有限公司 Chinese notional word extraction algorithm based on semantic understanding
CN107832307A (en) * 2017-11-28 2018-03-23 南京理工大学 Chinese word cutting method based on non-directed graph and monolayer neural networks
CN107832307B (en) * 2017-11-28 2021-02-23 南京理工大学 Chinese word segmentation method based on undirected graph and single-layer neural network
CN109933778A (en) * 2017-12-18 2019-06-25 北京京东尚科信息技术有限公司 Segmenting method, device and computer readable storage medium
CN109933778B (en) * 2017-12-18 2024-03-05 北京京东尚科信息技术有限公司 Word segmentation method, word segmentation device and computer readable storage medium
CN108170680A (en) * 2017-12-29 2018-06-15 厦门市美亚柏科信息股份有限公司 Keyword recognition method, terminal device and storage medium based on Hidden Markov Model
CN108647208A (en) * 2018-05-09 2018-10-12 上海应用技术大学 A kind of novel segmenting method based on Chinese
CN109408801A (en) * 2018-08-28 2019-03-01 昆明理工大学 A kind of Chinese word cutting method based on NB Algorithm
CN109284358A (en) * 2018-09-05 2019-01-29 普信恒业科技发展(北京)有限公司 A kind of hierarchical method and apparatus of Chinese address noun
CN109284358B (en) * 2018-09-05 2020-08-28 普信恒业科技发展(北京)有限公司 Chinese address noun hierarchical method and device
CN109711121B (en) * 2018-12-27 2021-03-12 清华大学 Text steganography method and device based on Markov model and Huffman coding
CN109711121A (en) * 2018-12-27 2019-05-03 清华大学 Text steganography method and device based on Markov model and Huffman encoding
CN110562653A (en) * 2019-07-30 2019-12-13 国网浙江省电力有限公司嘉兴供电公司 power transformation operation detection intelligent decision system and maintenance system based on ubiquitous power Internet of things
CN111489030A (en) * 2020-04-09 2020-08-04 河北利至人力资源服务有限公司 Text word segmentation based job leaving prediction method and system
CN111489030B (en) * 2020-04-09 2021-10-15 河北利至人力资源服务有限公司 Text word segmentation based job leaving prediction method and system
CN111767734A (en) * 2020-06-11 2020-10-13 安徽旅贲科技有限公司 Word segmentation method and system based on multilayer hidden horse model

Also Published As

Publication number Publication date
CN105373529B (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN105373529A (en) Intelligent word segmentation method based on hidden Markov model
CN111832292B (en) Text recognition processing method, device, electronic equipment and storage medium
EP3767516A1 (en) Named entity recognition method, apparatus, and computer-readable recording medium
JP7430660B2 (en) Text creation methods, devices, equipment and storage media
TWI636452B (en) Method and system of voice recognition
CN111859951B (en) Language model training method and device, electronic equipment and readable storage medium
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
CN110597997B (en) Military scenario text event extraction corpus iterative construction method and device
CN107203511A (en) A kind of network text name entity recognition method based on neutral net probability disambiguation
CN111709248A (en) Training method and device of text generation model and electronic equipment
CN107168957A (en) A kind of Chinese word cutting method
US20210248484A1 (en) Method and apparatus for generating semantic representation model, and storage medium
CN112395385B (en) Text generation method and device based on artificial intelligence, computer equipment and medium
CN114970522B (en) Pre-training method, device, equipment and storage medium of language model
CN105261358A (en) N-gram grammar model constructing method for voice identification and voice identification system
CN104599680A (en) Real-time spoken language evaluation system and real-time spoken language evaluation method on mobile equipment
CN110222329B (en) Chinese word segmentation method and device based on deep learning
CN110222328B (en) Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN106610937A (en) Information theory-based Chinese automatic word segmentation method
CN114841274B (en) Language model training method and device, electronic equipment and storage medium
CN111709249A (en) Multi-language model training method and device, electronic equipment and storage medium
CN116523031B (en) Training method of language generation model, language generation method and electronic equipment
CN113918031A (en) System and method for Chinese punctuation recovery using sub-character information
Pham et al. Nnvlp: A neural network-based vietnamese language processing toolkit
US11615247B1 (en) Labeling method and apparatus for named entity recognition of legal instrument

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant