CN102004560A - User character recognition method and online once learning method in statement-level Chinese character input method and machine learning system - Google Patents

User character recognition method and online once learning method in statement-level Chinese character input method and machine learning system Download PDF

Info

Publication number
CN102004560A
CN102004560A CN 201010567997 CN201010567997A CN102004560A CN 102004560 A CN102004560 A CN 102004560A CN 201010567997 CN201010567997 CN 201010567997 CN 201010567997 A CN201010567997 A CN 201010567997A CN 102004560 A CN102004560 A CN 102004560A
Authority
CN
China
Prior art keywords
speech
user
chinese character
iwp
character input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010567997
Other languages
Chinese (zh)
Other versions
CN102004560B (en
Inventor
刘秉权
王晓龙
刘峰
刘远超
林磊
孙承杰
单丽莉
刘铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN 201010567997 priority Critical patent/CN102004560B/en
Publication of CN102004560A publication Critical patent/CN102004560A/en
Application granted granted Critical
Publication of CN102004560B publication Critical patent/CN102004560B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a user character recognition method and an online once learning method in a statement-level Chinese character input method and a machine learning system, relating to the technical field of the machine learning of the Chinese character input. The invention solves the problem of final result acquisition through frequent user intervention existing in a traditional machine learning method. The user character recognition method recognizes user characters by adopting word forming capability in opposite positions as an evaluation criterion, and the learning method is started only when the optimal path outputted by adopting the statement-level Chinese character input method and the final output path are different, acquires a probability value by adopting a probability calculation method based on an N-element grammar and acquires a user regulated value CA by adopting MAP (Maximum A Posteriori), and the regulated value CA and the corresponding characters are stored in a user language model base. The machine learning system is a learning system realized by applying the user character recognition method and the learning method. By adopting the technology of the invention, the user intervention number during inputting can be reduced so that a user is easier to acquire a needed output result.

Description

User's word recognition method in the statement level Chinese character input method and online disposable learning method and machine learning system
Technical field
The present invention relates to user's word recognition method and on-line study method in the machine learning method of Chinese character input.
Background technology
Machine learning method in the input of statement level Chinese character can be adjusted the result that best Chinese character makes up according to user's input habit automatically, goes for various input method of Chinese character and input system.
Along with natural language processing and artificial intelligence theory constantly improve, Chinese character entering technique also correspondingly improves constantly, but does not also have a kind of Chinese character entering technique can reach the boundary of a perfect conversion up to now, all exists deficiency separately in the various technology.Be embodied in the spelling input method and be exactly, do not have a kind of product can reach 100% conversion ratio accuracy now,, just can reach the output result that a user needs all in the intervention that needs the user in varying degrees by different way.Adopt this method to improve these systems, can greatly reduce the number of times of required user intervention, and then improve the conversion accuracy.
In for the input method of encode Chinese characters for computer and input system, the situation through there are the corresponding a plurality of Chinese characters of coding in regular meeting as the phonetic input, when phonetic entry and hand-written fuzzy diagnosis, is embodied as:
One the coding corresponding a plurality of Chinese characters.When for example phonetic was imported " cheng ", corresponding Chinese character had " one-tenth ", " city ", " title ", " being " etc.During input Pinyin string " chengshi ", equivalent has " city ", " honesty ", " formula ", " succeeding " etc.When longer statement input, this situation also can occur.If the preference that this moment, input system provided not is user's required input, the user then needs the manual required input of selecting so.When being furnished with online disposable learning functionality, the input habit that input method can recording user provides the most frequently used result of user as preference.Here give one example, in common input method system, during the input Pinyin first time " haerbingongyedaxuezhinengjisuanzhongxin ", because " function " speech word frequency in the statistics storehouse is higher, the result who obtains is " function computing center of Harbin Institute of Technology ", after user intervention is once imported, carried out adjustment to language model.So-called intervention is exactly that the user replaces " function " usefulness " intelligence " candidate item by hand.
2. the combination ambiguity between words.When " being the Pei Jianli meeting tomorrow " as the short sentence that contains name " Pei Jianli " in input, the corresponding transformation result of pinyin string " mingtianjiaopeijianlikaihui " mostly is " meeting is set up in mating tomorrow ".This is because contain " mating " and " foundation " these two speech in the input system dictionary, and does not have " Pei Jianli " this name, and this situation needs more user intervention just can obtain correct result.If input system is not furnished with corresponding user's speech structure and corresponding on-line study function, the user will intervene thorough at every turn in a large number so.
3. the transformation of user's custom.For example a user studies electronic engineering, and he always uses " chip " this speech, and he is to also being a moviegoer simultaneously, will remove to write " new film " every night and recommend.
If adopt the existing input method learning method, just need the user constantly to intervene when importing these two speech.
Summary of the invention
The needed result's of user problem be can obtain in order to solve the user intervention that often needs that exists in the existing machine learning method, user's word recognition method and online disposable learning method and machine learning system in the statement level Chinese character input method the present invention proposes.
User's word recognition method in the statement level Chinese character input method of the present invention is a kind of location-based user's word recognition method, in this method:
For root c, the probability that this root c is occurred in word combination with position rp as the one-tenth speech ability IWP of this root c (c, rp):
IWP ( c , rp ) = C ( Word ( c , rp ) ) C ( c ) - - - ( 1 )
Wherein, C (Word (c, the number of the speech that root c occurs with position rp in the language material of training usefulness when rp)) being the production language model, C (c) is the number of times that root c occurs in the language material, described one-tenth speech ability IWP (c, during rp) greater than threshold value δ (0<δ<1), corresponding speech is as user's speech, otherwise corresponding speech is not as user's speech;
For speech string S=c 1, c 2..., c l(l>1), with the geometrical mean of the one-tenth speech ability of each root in this speech string one-tenth speech ability IWP (S) as this speech string:
IWP ( S ) = Π i = 1 l IWP ( c i , rp ) l - - - ( 2 )
When IWP (S) 〉=δ (0<δ≤1), so with S as user's speech; Otherwise S is not as user's speech.
Above-mentioned location-based user's word recognition method is when estimating individual character and speech and be combined into user's speech, adopted and a kind ofly become speech ability IWP (c with relative position, rp) be evaluation criterion, select the candidate user speech by calculating IWP (S), and then according to statistical information to determine whether it is user's speech.
The present invention also provides the online disposable learning method in the statement level Chinese character input method, and this method is specially:
Step 1, with sound word conversion outgoing route cRoad[M] and final path candidate wRoad[N] carry out alignment, the sound word conversion outgoing route cRoadA[L after obtaining aliging based on length] and final path candidate wRoadA[L]; M, N and L represent the number of speech contained in this two paths respectively;
Step 2, make i=1;
Step 3, according to the information in the language model, calculating p (cRoadA[i] | cRoadA[i-1]) and p (wRoadA[i] | wRoadA[i-1]), and then utilize this two values, adopt maximum a posteriori MAP (Maximum a Posterior) probabilistic method to calculate user's regulated value C of posterior probability maximum AWill (wRoad[i-1], wRoad[i]) and corresponding C ABe added in the user language model bank as the binary element;
Step 4, make i=i+1,, then return execution in step three if i≤L is arranged; Otherwise once study is finished.
When i=1, p (cRoadA[1] | cRoadA[0])=p (cRoadA[1]) this is the conditional probability definition.
Online disposable learning method of the present invention is applicable in the Chinese input method or input system of existing any use based on the language model of statistics, makes up and revise the user language model bank.
In order to effectively utilize user's input information, the necessary while of net result that the result of sound word conversion and process user intervention obtain is as the input of adaptive model, and then the processing of process adaptive model, and influence sound word transformation model in some way, and then reach the purpose of an on-line study.
The present invention also provides the machine learning system in the statement level Chinese character input method, and this system is made up of user's speech identification module and online disposable study module, wherein:
User's speech identification module, whether be used for discerning the statement level Chinese character input method is user's speech through the final output result that user intervention obtains, and the speech that is judged to be user's speech encoded, then this user's speech machine code is deposited in the user thesaurus of statement level Chinese character input method;
Online disposable study module, be used for optimal path in statement level Chinese character input method output with final path when inconsistent, carry out online disposable study according to the optimal path of statement level Chinese character input method output with through the final path that user intervention obtains, and adjust the weight of corresponding words according to learning outcome, revise the user language model bank then.
Above-mentioned online disposable learning method is applied in existing Chinese input method or the system, can effectively solve the problem that exists in existing Chinese character input method described in the background technology or the system, for example: after adopting the disposable learning method of line of the present invention that language model is adjusted at first kind of phenomenon " 1. coding corresponding a plurality of Chinese characters ", when for the second time importing same a string phonetic, just can directly obtain " the output result of Harbin Institute of Technology's intelligence computation " center ".At second kind of phenomenon " the 2. combination ambiguity between words ", adopt online disposable learning method of the present invention, just can be after the user intervene input for the first time, " Pei Jianli " is added into user thesaurus.After this, though to import this people's name separately or in statement the input this name can both obtain the transformation result that the user wants.At the third phenomenon " the 3. transformation of user's custom ", adopt an on-line study method of the present invention, after being subjected to user intervention each first time, just can remember this speech, and this speech exported as optimum option, and need not repeatedly intervene, can greatly reduce this class intervention operation.
Description of drawings
Fig. 1 be embodiment three described machine learning methods application model.
Fig. 2 is the storage organization of embodiment one described user thesaurus.
Embodiment
Embodiment one: the user's word recognition method in the described Chinese input method of present embodiment is:
For root c, the probability that this root c is occurred in word combination with position rp as the one-tenth speech ability IWP of this root c (c, rp):
IWP ( c , rp ) = C ( Word ( c , rp ) ) C ( c ) - - - ( 1 )
Wherein, C (Word (c, the number of the speech that root c occurs with position rp in the language material of training usefulness when rp)) being the production language model, C (c) is the number of times that root c occurs in the language material, (c is during greater than threshold value δ (0<δ<1) rp) to described one-tenth speech ability IWP, and corresponding speech is as user's speech, otherwise corresponding speech is not as user's speech;
For speech string S=c 1, c 2..., c l(l>1), with the geometrical mean of the one-tenth speech ability of each root in this speech string one-tenth speech ability IWP (S) as this speech string:
IWP ( S ) = Π i = 1 l IWP ( c i , rp ) l - - - ( 2 )
When IWP (S) 〉=δ (0<δ≤1), so with S as user's speech; Otherwise S is not as user's speech.
User thesaurus described in the present embodiment adopts the file layout of Hash table.Concrete file layout is referring to shown in Figure 2, and wherein every data comprise label i, keyword w 0, keyword property value and w 0Relevant data link table, described and w 0Comprise in the relevant data link table: correlation unit (w 0, w K0), (w 0, w K0) property value, next bar pointer ..., correlation unit (w 0, w K0+n0), (w 0, w K0+n0) property value, full stop.This storage mode relatively is fit to user's speech storage that needs dynamically change.
User's speech is exactly that the user needs but the speech in the dictionary of Chinese input system not.The record of these speech can improve user's input efficiency.
From the linguistics angle, user's speech can be divided into following a few class substantially according to the source:
1. named entity: comprise name, place name, trade name, company's font size, mechanism's name etc.;
2. abbreviation: as " World Trade Organization ", " South Airways " etc.;
3. dialecticism: as " beautiful ", " footing the bill " etc.;
4. coinage: as " sharp brother ", " phoenix elder sister " etc.;
5. technical term: as " wireless network ", " trigger " etc.;
6. transliteration speech: as " extremely ", " show ", " clone " etc.;
7. alphabetic word: as " WTO ", " UN " etc.;
8. the old word that changes of the meaning of a word, usage: as step down, " charging " etc.
9. user's self-word creation: as " An Bami " etc.
The main difficult point of user's speech identification is to set up rational criterion, the described user's word recognition method of present embodiment is when estimating individual character and speech and be combined into user's speech, adopted and a kind ofly become speech ability IWP (c with relative position, rp) be evaluation criterion, select the candidate user speech by calculating IWP (S), and then according to statistical information to determine whether it is user's speech.
For example the described user's word recognition method of present embodiment is described below:
Owing to have a large amount of derivation phenomenons in the Chinese,, the word-building position be divided into three classes as " straw hat ", " sunbonnet " etc. based on root:
1 root is positioned at prefix: as " working ", " top " etc., relative position rp=1.
2 roots are arranged in speech: as " not understanding puzzled ", " cannot bear to part " etc., relative position rp=2.
3 roots are positioned at suffix: as " mouse ", " squirrel " etc., relative position rp=3.
Therefore, appear at probability in the speech with position rp, can increase into the accuracy that speech is judged according to root.
And for speech string S=c 1, c 2..., c l(l>1) for fear of " short speech is preferential " danger, adopts the method for geometrical averages-were calculated to obtain into the speech ability
The recognition methods of this user speech can be carried out effective user's speech identification under very little computing cost.This is because all statisticss all are that calculated in advance obtains, and are saved in the corresponding file.When carrying out the judgement of user's speech, this method directly goes to read in the file the good statistical information of these calculated in advance, and needn't go to add up these information again.This just greatly reduces the calculated amount of system, can satisfy the real-time requirement of input system, and these sizes of preserving the statistical information file are also very little.
The described user's word recognition method of present embodiment can be applied in the Chinese input method or input system of existing any use based on the language model of statistics.
Embodiment two: the online disposable learning method in the described statement level input method of present embodiment, this on-line study method is:
Step 1, with sound word conversion outgoing route cRoad[M] and final path candidate wRoad[N] carry out alignment, the sound word conversion outgoing route cRoadA[L after obtaining aliging based on length] and final path candidate wRoadA[L]; M, N and L represent the number of speech contained in this two paths respectively;
Step 2, make i=1;
Step 3, according to the information in the language model, calculating p (cRoadA[i] | cRoadA[i-1]) and p (wRoadA[i] | wRoadA[i-1]), and then utilize this two values, adopt maximum a posteriori MAP (Maximum a Posterior) probabilistic method to calculate user's regulated value C of posterior probability maximum AWill (wRoad[i-1], wRoad[i]) and corresponding C ABe added in the user language model bank as the binary element;
Step 4, make i=i+1,, then return execution in step three if i≤L is arranged; Otherwise once study is finished.
The described online disposable learning method of present embodiment, only when the optimal path of statement level Chinese character input method output is inconsistent with final path, just start, can change language model apace, original language model is close on user's the speech habits most.And can avoid the overlearning problem, just can change language model more hardly because in a single day it achieve the goal.
The purpose of the step 2 in the present embodiment, be because the corresponding multiple Chinese character word combination of phonetic, sound word conversion outgoing route cRoad[M] and final path candidate wRoad[N] length might be inconsistent, therefore to carry out the length alignment to it, L represents length after reunification.
P in the present embodiment in the step 3 (cRoadA[i] | cRoadA[i-1]) expression: at sound word conversion outgoing route cRoadA[L] in, when i-1 speech is cRoadA[i-1] condition under, an i speech is cRoadA[i] probability, p (wRoadA[i] | wRoadA[i-1]) expression: at final path candidate cRoadA[L] in, when i-1 speech is wRoadA[i-1] condition under, an i speech is wRoadA[i] probability.
Method for calculating probability in the present embodiment is based on the existing N unit gram probability computing method that adopted by the statement level input method, and this method is to obtain one by m speech w 1, w 2... w mForm the probable value P (S) of sentence S, formula is:
P ( S ) = P ( w 1 w 2 . . . w m ) = Π i = 1 m P ( w i | w i - 1 w i - 2 . . . w i - n + 1 ) - - - ( 3 )
Wherein, n is the value of N in the N unit syntax, P (w i) word w iStatistical probability value in language model.
The maximum a posteriori MAP of the employing dynamic weighting factor described in the present embodiment (Maximum a Posterior) probabilistic method is calculated user's regulated value C of posterior probability maximum A, so-called maximum a posteriori method, its main thought win it after being the P (w) that regulates between some word and word in next probabilistic operations.So just can make the statement S ' after the adjusting, the new P (S ') that calculates is bigger than the P (S) that wrong word combination S calculates.Adopt following method to obtain C in the present embodiment A:
C A = C B Σ w ∈ W C B * - C B * Σ w ∈ W C B σ ( Σ w ∈ W C B - C B ) + ϵ - - - ( 4 )
C wherein BWord frequency for each speech node in user candidate's the path;
Figure BDA0000035523170000063
Word frequency for each speech node in the maximum probability path that calculates; ε is a less constant, for example natural number between the 1-10; W represents a speech, and W represents the path be made up of the predicate w of institute, and σ represents weighting factor, the rational number between the desirable 0-2, and the information in () is calculated in σ () expression earlier, does multiplication again.The present invention only adjusts the part of mistake.
In adjusting the process of language model, subject matter is how user's input habit to be attached in the background model to go, and promptly needs to adopt certain method that the parameter of original language model is reappraised.For on-line study, require the speed of this parameter revaluation method fast as far as possible, and need can return to original parameter detecting self-adaptation when unreasonable.
If the probable value P of some word (w) is than the word or the speech height of other unisonance in the same path, input system will be it as optimal candidate.If these words desired obtaining that be not the user, the user is with regard to the manual candidate operations of carrying out of needs so.In the input system with study not, these speech also can occur as optimal candidate next time, and at this moment the user just also needs to carry out candidate operations, and this has reduced user's input efficiency to a great extent.
Because language model is based on statistics, so can regulate these statistical parameters by dynamically recording user's input historical data.Adopt online disposable learning method of the present invention, can change language model apace, original language model is close on user's the speech habits most.And can avoid the overlearning problem, just can change language model more hardly because in a single day it achieve the goal.
The described disposable learning method of present embodiment, its time and space complexity can satisfy the real-time requirement of input system fully by analysis.At first this method be that non-user moves when required in the result only, and it is very low that it calls probability.Secondly when operation, it is only made amendment to the statement error section, is not to calculate all speech in this sentence.When last this method is revised, only calculate this speech and with the speech of its competition, this class speech is few, so calculated amount is very little.Aspect storage, adopted Hash table mode, search and storage time expense little.And aspect storage space, this method is only stored the speech that needs adjustment, can be far smaller than the system statistics storehouse so take.
Embodiment three: the machine learning system in the described statement level Chinese character input of present embodiment, be to adopt embodiment one described user's word recognition method and embodiment two described online disposable learning methods to realize, this system is made up of user's speech identification module and online disposable study module, wherein:
User's speech identification module, whether be used for discerning the statement level Chinese character input method is user's speech through the final output result that user intervention obtains, and the speech that is judged to be user's speech encoded, then this user's speech machine code is deposited in the user thesaurus of statement level Chinese character input method;
Online disposable study module, be used for optimal path in statement level Chinese character input method output with final path when inconsistent, carry out online disposable study according to the optimal path of statement level Chinese character input method output with through the final path that user intervention obtains, and adjust the weight of corresponding words according to learning outcome, revise the user language model bank then.
In user's speech identification module in the present embodiment, embodiment one described user's word recognition method is adopted in the recognition methods of user's speech.
In user's speech identification module in the present embodiment, the online disposable learning method in the online disposable study module adopts embodiment two described learning methods.
The machine learning method application model that the described machine learning system of present embodiment is applied to form in existing statement level input system or the method is referring to shown in Figure 1.
The final output result of this application model, be the Chinese character transformation result that obtains according to adaptation module, after the Chinese character transformation result that obtains of Chinese character transformation result that the user language model bank obtains and the language model storehouse in former input method or the input system multiplies each other with a weighting coefficient (rational number of 0-1) respectively, summation calculates most possible Chinese character combination again, then it is sent to input method or input system as final corresponding Chinese character combination.
In this model, in user's speech identifying, need dictionary that original system provides self to user's word recognition method, user's word recognition method just can judge whether user's input should be formed user's speech and set up user thesaurus, Chinese character input method originally need read user's speech from the dictionary that this method is set up, and these information are used for optimum path calculation get final product.
In online disposable learning method, need original system that self statistical information of language model is provided, this method can be imported by these information and user and calculate adjustment amount, and generation user language model, Chinese character input method originally need read statistical information from the user language model that this method is set up, and these information are used for optimum path calculation get final product.

Claims (7)

1. the user's word recognition method in the statement level Chinese character input method is characterized in that, it is based on user's word recognition method of position,
For root c, the probability that this root c is occurred in word combination with position rp as the one-tenth speech ability IWP of this root c (c, rp):
IWP ( c , rp ) = C ( Word ( c , rp ) ) C ( c ) - - - ( 1 )
Wherein, C (Word (c, the number of the speech that root c occurs with position rp in the language material of training usefulness when rp)) being the production language model, C (c) is the number of times that root c occurs in the language material, (c is during greater than threshold value δ (0<δ<1) rp), and corresponding speech is as user's speech as described one-tenth speech ability IWP, otherwise corresponding speech is not as user's speech;
For speech string S=c 1, c 2..., c l(l>1), with the geometrical mean of the one-tenth speech ability of each root in this speech string one-tenth speech ability IWP (S) as this speech string:
IWP ( S ) = Π i = 1 l IWP ( c i , rp ) l - - - ( 2 )
When IWP (S) 〉=δ (0<δ≤1), so with S as user's speech; Otherwise S is not as user's speech.
2. the user's word recognition method in the statement level Chinese character input method according to claim 1 is characterized in that, described user thesaurus adopts the file layout of Hash table.
3. the online disposable learning method in the statement level Chinese character input method is characterized in that, the process of this on-line study method is:
Step 1, with sound word conversion outgoing route cRoad[M] and final path candidate wRoad[N] carry out alignment, the sound word conversion outgoing route cRoadA[L after obtaining aliging based on length] and final path candidate wRoadA[L]; M, N and L represent the number of speech contained in this two paths respectively;
Step 2, make i=1;
Step 3, according to the information in the language model, calculating p (cRoadA[i] | cRoadA[i-1]) and p (wRoadA[i] | wRoadA[i-1]), and then utilize this two values, adopt maximum a posteriori MAP (Maximum a Posterior) probabilistic method to calculate user's regulated value C of posterior probability maximum AWill (wRoad[i-1], wRoad[i]) and corresponding C ABe added in the user language model bank as the binary element;
Step 4, make i=i+1,, then return execution in step three if i≤L is arranged; Otherwise once study is finished.
4. the online disposable learning method in the statement level Chinese character input method according to claim 3, it is characterized in that, p (cRoadA[i] | cRoadA[i-1]) expression: at sound word conversion outgoing route cRoadA[L] in, when i-1 speech is cRoadA[i-1] condition under, an i speech is cRoadA[i] probability, p (wRoadA[i] | wRoadA[i-1]) expression: at final path candidate cRoadA[L] in, when i-1 speech is wRoadA[i-1] condition under, an i speech is wRoadA[i] probability.
5. the machine learning system in the statement level Chinese character input method, this system is made up of user's speech identification module and online disposable study module, wherein:
User's speech identification module, whether be used for discerning the statement level Chinese character input method is user's speech through the final output result that user intervention obtains, and the speech that is judged to be user's speech encoded, then this user's speech machine code is deposited in the user thesaurus of statement level Chinese character input method;
Online disposable study module, be used for optimal path in statement level Chinese character input method output with final path when inconsistent, carry out online disposable study according to the optimal path of statement level Chinese character input method output with through the final path that user intervention obtains, and adjust the weight of corresponding words according to learning outcome, revise the user language model bank then.
6. the machine learning system in the statement level Chinese character input method according to claim 5 is characterized in that, in described user's speech identification module, the recognition methods of user's speech is:
For root c, the probability that this root c is occurred in word combination with position rp as the one-tenth speech ability IWP of this root c (c, rp):
IWP ( c , rp ) = C ( Word ( c , rp ) ) C ( c ) - - - ( 1 )
Wherein, C (Word (c, the number of the speech that root c occurs with position rp in the language material of training usefulness when rp)) being the production language model, C (c) is the number of times that root c occurs in the language material, (c is during greater than threshold value δ (0<δ<1) rp) to described one-tenth speech ability IWP, and corresponding speech is as user's speech, otherwise corresponding speech is not as user's speech;
For speech string S=c 1, c 2..., c l(l>1), with the geometrical mean of the one-tenth speech ability of each root in this speech string one-tenth speech ability IWP (S) as this speech string:
IWP ( S ) = Π i = 1 l IWP ( c i , rp ) l - - - ( 2 )
When IWP (S) 〉=δ (0<δ≤1), so with S as user's speech; Otherwise S is not as user's speech.
7. the machine learning system in the statement level Chinese character input method according to claim 5 is characterized in that, in described user's speech identification module, the process of the online disposable learning method in the online disposable study module is:
Step 1, with sound word conversion outgoing route cRoad[M] and final path candidate wRoad[N] carry out alignment, the sound word conversion outgoing route cRoadA[L after obtaining aliging based on length] and final path candidate wRoadA[L]; M, N and L represent the number of speech contained in this two paths respectively;
Step 2, make i=1;
Step 3, according to the information in the language model, calculating p (cRoadA[i] | cRoadA[i-1]) and p (wRoadA[i] | wRoadA[i-1]), and then utilize this two values, adopt maximum a posteriori MAP (Maximum a Posterior) probabilistic method to calculate user's regulated value C of posterior probability maximum AWill (wRoad[i-1], wRoad[i]) and corresponding C ABe added in the user language model bank as the binary element;
Step 4, make i=i+1,, then return execution in step three if i≤L is arranged; Otherwise once study is finished.
CN 201010567997 2010-12-01 2010-12-01 User character recognition method in sentence-level Chinese character input method and machine learning system Expired - Fee Related CN102004560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010567997 CN102004560B (en) 2010-12-01 2010-12-01 User character recognition method in sentence-level Chinese character input method and machine learning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010567997 CN102004560B (en) 2010-12-01 2010-12-01 User character recognition method in sentence-level Chinese character input method and machine learning system

Publications (2)

Publication Number Publication Date
CN102004560A true CN102004560A (en) 2011-04-06
CN102004560B CN102004560B (en) 2013-07-24

Family

ID=43811964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010567997 Expired - Fee Related CN102004560B (en) 2010-12-01 2010-12-01 User character recognition method in sentence-level Chinese character input method and machine learning system

Country Status (1)

Country Link
CN (1) CN102004560B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053686A (en) * 2020-07-28 2020-12-08 出门问问信息科技有限公司 Audio interruption method and device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101038596A (en) * 2007-04-29 2007-09-19 北京搜狗科技发展有限公司 Method and system for classifying website
CN101127042A (en) * 2007-09-21 2008-02-20 浙江大学 Sensibility classification method based on language model
CN101539907A (en) * 2008-03-19 2009-09-23 日电(中国)有限公司 Part-of-speech tagging model training device and part-of-speech tagging system and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101038596A (en) * 2007-04-29 2007-09-19 北京搜狗科技发展有限公司 Method and system for classifying website
CN101127042A (en) * 2007-09-21 2008-02-20 浙江大学 Sensibility classification method based on language model
CN101539907A (en) * 2008-03-19 2009-09-23 日电(中国)有限公司 Part-of-speech tagging model training device and part-of-speech tagging system and method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053686A (en) * 2020-07-28 2020-12-08 出门问问信息科技有限公司 Audio interruption method and device and computer readable storage medium
CN112053686B (en) * 2020-07-28 2024-01-02 出门问问信息科技有限公司 Audio interruption method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN102004560B (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN113010693B (en) Knowledge graph intelligent question-answering method integrating pointer generation network
CN104834747B (en) Short text classification method based on convolutional neural networks
CN106484681A (en) A kind of method generating candidate's translation, device and electronic equipment
CN111949787A (en) Automatic question-answering method, device, equipment and storage medium based on knowledge graph
CN102880611B (en) A kind of Language Modeling method and Language Modeling device
CN102122298A (en) Method for matching Chinese similarity
CN102193939A (en) Realization method of information navigation, information navigation server and information processing system
CN105261358A (en) N-gram grammar model constructing method for voice identification and voice identification system
CN102214238B (en) Device and method for matching similarity of Chinese words
WO2021082086A1 (en) Machine reading method, system, device, and storage medium
CN109710921A (en) Calculation method, device, computer equipment and the storage medium of Words similarity
CN113761890A (en) BERT context sensing-based multi-level semantic information retrieval method
CN104699797A (en) Webpage data structured analytic method and device
CN110188926A (en) A kind of order information forecasting system and method
CN114997288A (en) Design resource association method
CN116258137A (en) Text error correction method, device, equipment and storage medium
CN110334362B (en) Method for solving and generating untranslated words based on medical neural machine translation
CN108694167B (en) Candidate word evaluation method, candidate word ordering method and device
CN108628826B (en) Candidate word evaluation method and device, computer equipment and storage medium
CN110717014B (en) Ontology knowledge base dynamic construction method
WO2023103914A1 (en) Text sentiment analysis method and device, and computer-readable storage medium
CN102004560B (en) User character recognition method in sentence-level Chinese character input method and machine learning system
CN107329951A (en) Build name entity mark resources bank method, device, storage medium and computer equipment
CN116701734A (en) Address text processing method and device and computer readable storage medium
CN114896966A (en) Method, system, equipment and medium for positioning grammar error of Chinese text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130724

Termination date: 20141201

EXPY Termination of patent right or utility model