CN1085859C - Chinese characters shift learning device - Google Patents
Chinese characters shift learning device Download PDFInfo
- Publication number
- CN1085859C CN1085859C CN94106045A CN94106045A CN1085859C CN 1085859 C CN1085859 C CN 1085859C CN 94106045 A CN94106045 A CN 94106045A CN 94106045 A CN94106045 A CN 94106045A CN 1085859 C CN1085859 C CN 1085859C
- Authority
- CN
- China
- Prior art keywords
- word
- unit
- unisonance
- semantic
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
To provide a KANJI (Chinese character) conversion leaning device where the accuracy of KANJI conversion is improved based on the semantic similarity and the frequency in use between candidate words of two syllables. A character conversion part 200 refers to a homonym leaning dictionary part 400 and a fundamental dictionary part 410 with an inputted reading symbol as the retrieval element to take out all of candidate words corresponding to each candidate syllable and their semantic codes. A semantic similarity calculating part 210 calculates the semantic similarity based on semantic codes of preceding and succeeding candidate words. A frequency weight calculating means part 220 refers to the frequency in use in the homonym leaning dictionary part to calculate the frequency weight. After finding a route of optimum conversion of each candidate group by dynamic programming based on obtained semantic similarity and frequency weight, the character conversion part displays words on this route from an output part as a temporary result. Based on the conversion result which is finally judged to be correct by a user, a meaning learning dictionary part and the homonym leaning dictionary part are updated by a meaning learning dictionary update part and a homonym learning dictionary update part 500.
Description
The present invention relates to have the Chinese characters changing learning device that the input in Chinese of learning functionality is used.
Illustrate before the present invention by relevant prior art, embodiment, earlier definition, explanation given in the important statement that uses in this instructions etc.
Of the present invention mainly to liking Chinese text processor, japanese type processor etc.Therefore, " pronunciation sign indicating number ", " reading symbol " are meant " user is by the Chinese character conversion; make output or show to be used for representing uncommon word special symbol ", it comprises certainly as the Chinese phonetic script the Roman character, phoneme word as Japanese ideogram, and comprise the such ideograph of arabic numeral, Korea's literary composition, be similar to " V " that use in the Chinese, (irrelevant with essence of the present invention; convenient, as to substitute expression) such as tone marks of ", "/", “ " with the similar symbol of shape for the electronic system application.
Again, " Chinese character ", " word ", " speech ", " word " as the result or the target of conversion, be meant " desired Chinese character transformation results output of user or symbols displayed ", it comprises outside the ideograph as Chinese character, also comprise as the Chinese character of " like The Ru " and the combination of assumed name, also comprise word, the compound word of content aspect on symbol as " 々 ", the syntax certainly and comprise article paragraph or sentence again as " Tenjho Tenge is overweening ".
Again,, and there is not the difference of odd number and plural number in the Japanese, so " string " in the Chinese character string etc. is used for as far as possible correctly showing this plural situation because input reading symbol, output Chinese character all be that one is a lot of always as the situation of conversion process object with a plurality of.But,, not necessarily have " string " to be exactly plural number, not to be exactly odd number owing in original Japanese, there is not single, plural difference.
Though a syllable is equivalent to a Chinese character in principle in Chinese, calling the turn at Japanese grade national language may not be like this.Therefore, in this manual, not limiting a syllable is a Chinese character.
Again, the extraction of the Chinese character transforming object of conduct input reading symbol in the Chinese text processor, the longest consensus method commonly used is finished, this method will be imported the reading symbol string as object, preferential according to the syllable preferential, second input earlier that the syllable of the first formation word is many, after extracting temporarily earlier, reexamine and whether be recorded in the dictionary.But, just not necessarily like this for the word processor of other literal such as Japanese.
And, about the longest consensus method, because clear known technology of setting forth in being the applicant in other application special 5-75911 of Yuanping City number, with No. 7591 etc., so omit its explanation.
Below, the prior art directly related with the present invention described.
In the input mode of the reading symbol that Chinese text processor etc. uses, for example use the Chinese characters for keyboard inputting reading symbol, from with corresponding unisonance Chinese character of this reading symbol or homophony word automatically select contemplated importer to plan conversion by preset program Chinese character, thereby generation Chinese character string, at this moment, if the Chinese character string that generates is not that the importer wants, then by manually revising.Therefore, if Chinese character or the word that meets the requirements can be selected fast and correctly, then can improve the efficient of this Chinese character reading symbol input mode from candidate unisonance Chinese character, homophony word.So, should solve the selection problem of homophony word, and utilize learning functionality usually.Here it is writes down after user's the use experience, utilizes these experiences that the homophony word is selected., the time length that reflects when utilizing the use experience that is write down of this study is divided into two kinds of short-term study and long-term study.
(1) so-called short-term study, the homophony word override that is exactly the up-to-date selection of user is selected.For example, the homophone of reading symbol " i4 " of Chinese be " meaning, hundred million, easily, also, beneficial, the wing ... ", though usually be arranged in and represent and conversion by said sequence according to the statistics in the average article of common people etc., but if " i4 " sound that the user aligns in the manuscript of making is selected " wing " at present, then " i4 " homonymic puts in order becomes automatically " wing, meaning, hundred million, easily, also, benefit ... " in case import " i4 " more next time then automatically override select " wing " word.
(2) what is called is learnt for a long time, then the access times of each corresponding respectively word of each reading symbol in each reading symbol string are carried out the accumulative total and the storage of certain hour with regard to the homophony word, on this basis, each reading symbol in each reading symbol string is distinguished corresponding homophony word, by what sequential storing of its access times (usage frequency), then in each homophony word, from putting in order first word, select relative importance value to descend successively.
These modes of learning all are the study of single Chinese character or single word.Therefore, if short-term study, then the Chinese character that seldom uses is also as learning object, its result, and according to the situation of manuscript, word commonly used can not produce utmost point bad influence to interconversion rate probably as preferred object basically.Again, as be long-term study, Chinese character, word commonly used in then current manuscript being made is difficult to quick reflection.Like this, in order to eliminate the shortcoming in these two kinds of modes of learning, suggestion is learnt adjacent two literal, two words in the manuscript as the object work method.Yet because that the combination of any two literal, two words in fact approaches is unlimited many, this method needs a large amount of storage resources, thereby it still is unpractical, and like this, this method is modified into the semanteme that utilizes adjacent two literal, two words gradually.
, as existing Chinese characters changing learning device, for example, disclosed a kind of such device in Japanese kokai publication hei 3-74763 number with this learning functionality.Figure 10 is the pie graph of this Chinese characters changing learning device.In this figure, 1 is the keyboard of the order of input English numeral, Chinese character, conversion etc.2 is the central processing unit of handling according to from the order of keyboard 1 input.3 show before the conversion and the display device of manuscript after the conversion etc. for being made of CRT monitor etc.The 4th, by the class language classified dictionary unit that stores on the basis to the semantic classification of each alternated Chinese character or word, its storage content summary is shown among Figure 11.5 are adjacent two literal of storage or the semantic classification of two words and the study impact damper of usage frequency, and its storage content roughly is illustrated among Figure 12.
The following describes the operation of this Chinese characters changing learning device
If from keyboard 1 input reading symbol string, then central processing unit 2 is with reference to relevant dictionary unit, constitute long preferential of the syllable of Chinese character or word by first, after second syllable of importing earlier preferentially extracts, the longest consensus method of what is called that compares with dictionary, extract syllable, take out candidate's word of each syllable again according to putting in order of dictionary unit.Then, with reference to class language classified dictionary unit 4, take out the classified information of each candidate's word, and make whole combinations of the classified information between the adjacent Wen Jie.Then, check whether these combinations are present in the study impact damper 5.If exist, then with these Chinese characters to, word to preferentially selecting as the Chinese character transforming object.When if there is a plurality of combination in the impact damper 5 in study, the highest combination of the usage frequency of classified information more preferably.On this basis, transformation results is presented in the display device 3.Confirm whether be the correct conversion Chinese character that uses finally, correct when wrong then.Along with above processing, the combination that is stored in the study impact damper 5 is used for the Chinese character conversion, and utilization all adds up and stores the usage frequency of respective combination at every turn, learn reflection.
, in above Chinese characters changing learning device, there is following problem.
(1) because only will two adjacent Chinese characters, word is as judging object, related Chinese character, then can not be as the judgement object when word is non-conterminous.For example, when supposing to import the sentence of " very * is fierce for enemy army * offensive * ", in the study impact damper, store the combination of " offensive "~" fiercely ".Yet, since non-conterminous connect this can not be as judging object.And at this moment, the word that in the sentence that will import because between this two word, has " very ", so if for " very " this word, learning experience during its front input is not " offensive " but " fortification ", then can not carry out correct Chinese character conversion, and Chinese character is transformed into " very * is fierce for enemy army * fortification * ".Yet the user presses the use of these short-term two words of study " offensive " and " fortification " if this will be modified to the sentence of wanting originally, and they are stored in the short-term study impact damper (not shown).When descending the sentence of less important input " very * is firm for I * of army fortification * " like this, be transformed into the sentence of undesirable " very * is firm for I * of army offensive * " by the storage of study impact damper.(again, " * " here is the mark that separates of each syllable, and the reading symbol of " offensive ", " fortification " all is " gong1sh4 ").
(2) coarse as fruit language classification, the results of learning of then adjacent two Chinese characters or word descend, otherwise, if classification is careful, then increase necessary storage resources.
(3) because utilize the classification of class language, so, just can not utilize as long as the classification of the Chinese-character words of front and back is not quite identical.That is, owing to can not utilize semantic approaching word, so the raising of interconversion rate is restricted.
Therefore, be desirable to provide that the semantic classification of a kind of adjacency two literal, word can effectively be utilized and required storage resources amount few, the Chinese characters changing learning device that accuracy is high.
In order to achieve the above object, in the invention of claim 1, a kind of Chinese characters changing learning device that is made is characterized in that, it has: storage pronunciation sign indicating number and to whole unisonance words that should the pronunciation sign indicating number and whole basic dictionary unit of the semantic code of unisonance words thereof; The semantic learning dictionary unit of the adjacent back phrase semantic sign indicating number after the recording learning and the combination of preceding phrase semantic sign indicating number; Extract after the code conversion of input pronunciation becomes the syllable string of current Chinese character transforming object by preset program, detect from the basic dictionary unit to should yard whole unisonance words and candidate's word of the semantic code of each unisonance word detect the unit; On basis, calculate the semantic similar degree computing unit of the semantic similar degree between two syllable candidate words by predetermined computing with reference to above-mentioned semantic learning dictionary unit; Semantic similar degree according to above-mentioned semantic similar degree computing unit is calculated detects candidate's word taking-up optimal mapping path that the unit detects to above-mentioned candidate's word, and with the text transform unit of the word in this path as preliminary transformation results; By the instruction that the user does with reference to the preliminary transformation results of above-mentioned text transform unit, select word of wanting and the selected cell that obtains final transformation results in the unisonance word that from above-mentioned text transform unit, detects; According to the final transformation results of selected cell, the updating block of the semantic learning dictionary unit that the combination in the above-mentioned semantic learning dictionary unit is upgraded.
In the invention in claim 2, make Chinese characters changing learning device according to claim 1, it is characterized in that, it also can comprise the unisonance word learning dictionary unit of the record pronunciation sign indicating number and the usage frequency in the past of its whole unisonance words of correspondence and whole unisonance words thereof, the updating block of the unisonance word learning dictionary unit that the selection instruction of doing by selected cell by the user is upgraded the record of the usage frequency in the above-mentioned unisonance word learning dictionary unit, and above-mentioned candidate's word detects the unit by unisonance word learning dictionary unit, the order of basic dictionary unit detects corresponding whole unisonance words and their semantic code.
In the invention of claim 3, make Chinese characters changing learning device as claimed in claim 2, it is characterized in that, it also comprises with reference to unisonance word learning dictionary unit, calculate the frequency weighted calculation unit of the usage frequency weighting coefficient of identical syllable candidate word by preset program, and the semantic similar degree that calculated of the frequency weighting coefficient that calculated according to above-mentioned frequency weighted calculation unit of above-mentioned text transform unit and above-mentioned semantic similar degree computing unit, the optimal mapping path got in candidate's word to the pronunciation sign indicating number string imported, and with the word in this path as transformation results.
According to above-mentioned formation, in the invention of claim 1, make pronunciation sign indicating number (Chinese phonetic script string behind the coding) and correspondence its (as principle) in the basic dictionary unit in advance, after representing the semantic code of the whole unisonance words of pronunciation (homophony Chinese character or homophony word) of this alphabetic string and each unisonance word, store with the form that can not eliminate.After candidate's word detects the unit and extracts the syllable string that institute import the current transforming object of formation with preset program, from the basic dictionary unit, detect the unisonance word (in this stage, as candidate's word of conversion candidate) of correspondence and the semantic code of unisonance word.The semantic code of the adjacent back of learn word and the combination of preceding phrase semantic sign indicating number are write down (result of study, renewal, variation etc.) in semantic learning dictionary unit.Semantic similar degree computing unit is with reference to above-mentioned semantic learning dictionary unit, calculates semantic similar degree between two syllable candidate words with predetermined computing.The semantic similar degree that the text transform unit is calculated according to above-mentioned semantic similar degree computing unit detects candidate's word that the unit detects to above-mentioned candidate's word and gets optimal path, and with the word in this path as preliminary transformation results.The selection instruction that selected cell is done with reference to the preliminary transformation results of text transform unit by the user is selected desired word, and is obtained final conversion from the unisonance word that above-mentioned text transform unit is detected.The updating block of semantic learning dictionary unit then upgrades combination in the above-mentioned semantic learning dictionary unit according to the final transformation results of selected cell.
In the invention of claim 2, the pronunciation sign indicating number is recorded in the unisonance word learning dictionary unit with its whole unisonance words of correspondence and usage frequency (be meant the data relevant with usage frequency, also comprise the notion of access times) in the past thereof.The selection instruction that the updating block of unisonance word learning dictionary unit is done by selected cell by the user is upgraded the record of the usage frequency in the unisonance word learning dictionary unit.Candidate's word detects the unit then according to the order of unisonance word learning dictionary unit, basic dictionary unit, detects corresponding whole unisonance words and their semantic code.
In the invention of claim 3, frequency weighted calculation unit is with reference to unisonance word learning dictionary unit, calculates the weighting coefficient of usage frequency of candidate's word of identical pronunciation sign indicating number with preset program.The semantic similar degree that frequency weighting coefficient that the text transform unit is calculated according to frequency weighted calculation unit and semantic similar degree computing unit are calculated, the optimal mapping path got in candidate's word to input pronunciation sign indicating number, and with the word in this path as transformation results.
As above Shuo Ming Chinese characters changing learning device of the present invention has following effect.
(1) owing on the basis of long-term study and short-term study and usefulness, also consider the learning method of the semanteme between the adjacent word, so even to this day, can correctly select the homophony word, and can improve correct interconversion rate.
(2) owing to adopting dynamic programming, so can utilize non-conterminous semantic study.For example, " dog " with " dog " of Chinese, read when being revised as the reading of " pencil ", when importing the reading symbol of " pencil ", because " only " automatically is modified to " branch ", so also can obtain the correct transformation results of " pencil ", thereby can constitute intelligent input even without the complicated syntax of input, the information of term.
(3) owing to adopt the semantic code of layer-stepping, so with of the study of few memory with regard to storing adjacent phrase semantic.
As mentioned above, practicality of the present invention is high.
Describe the present invention in detail below in conjunction with the accompanying drawing illustrated embodiment.
Fig. 1 is the pie graph of an embodiment of Chinese characters changing learning device of the present invention.
Fig. 2 is with the operational flowchart of the text transform unit in the foregoing description as the center.
The operational flowchart that Fig. 3 is is the center with the semantic similar degree computing unit in the foregoing description.
Fig. 4 is for being the operational flowchart at center with the frequency weighted calculation unit in the foregoing description.
Fig. 5 is the operational flowchart at center for the updating block with the semantic learning dictionary unit in the foregoing description.
Fig. 6 is the synoptic diagram of the stored data structure of the unisonance word learning dictionary unit in the foregoing description.
Fig. 7 is the synoptic diagram of the basic dictionary unit stored data structure in the foregoing description.
Fig. 8 is the synoptic diagram of the semantic learning dictionary unit stored data structure in the foregoing description.
Fig. 9 is the key diagram of the semantic code in explanation the foregoing description.
Figure 10 is the pie graph of existing Chinese characters changing learning device.
Figure 11 is the synoptic diagram of the storage content of the class language classified dictionary unit of existing Chinese characters changing learning device.
Figure 12 is the synoptic diagram of the storage content of the study impact damper of existing Chinese characters changing learning device.
Figure 13 is the key diagram of the oriented network in the foregoing description.
Below, by embodiment the present invention is described.
Present embodiment adopts semantic classification (semantic categorization) method as the fundamental sum basis that constitutes semantic code.This classification is represented all classification information of a morphactin with the hexadecimal 4-digit number of macrotaxonomy (), middle classification (two), subclassification (three) and disaggregated classification (four).Here the reason that adopts 16 system numerals is the extensive numeral that adopts 16 systems (2 byte) in the computing machine, and each is classified with the abundant correspondence of 1 potential energy of 16 systems.This kind dictionary as shown in Figure 9, it with whole Chinese characters, word be divided into nature, proterties, change, action, mood, personage, personality, society, learn a craft or trade, ten big classes of article, and each big class is divided into ten middle classes, class, subclassification also use the same method in each, are categorized into more tiny classification equally.In such stratum's classification code, the semantic coverage of upper semantic code is wideer than the next.Just the semantic coverage of the next more semantic code is narrow more.For example:
0 (" nature " class)
02 (" meteorology " class in " nature " class)
028 (" wind " class in " meteorology " class)
028a (" power " class in " wind class ")
And, being documented about in above-mentioned " similar language dictionary " as river, angle bookstore publishing (1985 periodical) etc., these are not inventive points of the present invention, are used for as prerequisite of the present invention, so omit its explanation.
Fundamental point of the present invention is to carry out correct Chinese character conversion by calculating semantic similar degree according to above-mentioned semantic classification yardage.
Fig. 1 is the pie graph of an embodiment of Chinese characters changing learning device of the present invention.In this figure, 400 is unisonance word learning dictionary unit, and it is after distinguishing each reading symbol, in each unisonance word of correspondence and the access times in its past are recorded in the lump.Again, the recording status of this unisonance word learning dictionary unit 400 illustrates in Fig. 6.410 for storing the basic dictionary unit of the semantic code of reading symbol and its whole unisonance words of correspondence and each unisonance word.And the storage state of this basic dictionary unit illustrates in Fig. 7.420 is the semantic learning dictionary unit of the semantic code of former and later two adjacent words of storage.And the recording status of this semanteme learning dictionary unit 420 illustrates in Fig. 8.100 is input block, and it is made of keyboard, in order to import reading symbol string or the order that will import.210 is semantic similar degree computing unit, and it is used for calculating corresponding to each syllable string of import with reference to semantic learning dictionary unit 420, as the semantic similar degree of the unisonance word of conversion candidate.220 is the usage frequency with reference to unisonance word learning dictionary unit 400, calculates the frequency weighted calculation unit of each candidate unisonance word usage frequency weighting coefficient by preset program.200 is the text transform unit, it extracts after the syllable the reading symbol string imported with built-in syllable extraction element (not shown), with reference to unisonance word learning dictionary unit 400 or basic dictionary unit 410, at the possible candidate's word that takes out each syllable, and after the semantic code relevant with them, start semantic similar degree computing unit 210 and frequency weighted calculation unit 220, take out the optimal mapping path of candidate word phylum, and will take out Chinese character in the path or word as transformation results.600 is the renewal portion of semantic learning dictionary unit, and it is used for the semanteme according to study adjacent Chinese characters of the text strings after the conversion or word, and semantic learning dictionary unit 420 is upgraded.300 is unisonance word selected cell, it is used for according to user's revision directive and with reference to unisonance word learning dictionary unit 400 and basic dictionary unit 410, vicious Chinese character, word are revised in the text strings to institute's conversion, or correct conversion is confirmed.The 500th, according to the text strings of final conversion, the updating block of the unisonance word learning dictionary unit that unisonance word learning dictionary unit 400 is upgraded.The 700th, the output unit of the text strings behind the output transform that constitutes by display, display also is used for by the user preliminary transformation results being confirmed, revises indication during mistake.
Except that the above, also have: the Chinese character string after the correct conversion, word are moved to Chinese character mobile device after the conversion of pre-position of the file in the making; Candidate's word of second, third priority etc. is presented at the preparation candidate word display device etc. of the pre-position of display device, the performance word processor is made the formation unit of time spent necessity, because it is self-evident that this class constitutes the unit, so omit its diagram, explanation.
Below, with reference to the operating process of description of drawings present embodiment.
Fig. 2 is total processing operational flowchart of present embodiment.The following describes each step of this figure.
In (S201), the reading symbol of being imported is extracted syllable according to predetermined syllable extracting rule (in the present embodiment according to the longest consensus method).
In (S202), as the retrieval key element, order is taken out the unisonance word corresponding with each syllable and the semantic code of these unisonance words with reference to unisonance word learning dictionary unit 400 and basic dictionary unit 410 with the pronunciation of each syllable of being extracted.
In (S203), the adjacent word in rear portion by " forward direction dynamic programming " (forward dynamicprogramming) is preferential, infer the method for anterior adjacent word again, obtain] to the accumulative total maximal value f (i) of each syllable candidate word words and phrases joint i, f (i)=max[tij+f (j) here.And j is the back speech joint that links with i.Tij is that the semantic similar degree of i candidate speech joint and j candidate speech joint adds the frequency weighting.About these, the back specifies.
In (S204), the word that will have in the optimal mapping path that adds up maximal value f is exported as transformation results.
Fig. 3 is the processing flow chart of semantic similar degree computing unit 210.The following describes each step of this figure.
In (S301), speech joint semantic code and back speech joint semantic code variable sem1 and sem2[i before giving respectively], i=1 wherein ..., n (the speech joint number is determined in the back).
In (S302), with sem2[i], i=1 ..., n retrieves key element as each, with reference to semantic learning dictionary unit 420, takes out possible adjacent semanteme, is deposited into variable possisem[i simultaneously], i=1 ..., among the n.
In (S303), to sem1 and possisem[i] carry out the long-pending computing (setintersection) of logic of sets, and deposit this operation result in variable result[i], i=1 ..., n.Illustrate with concrete example about this computing back.
In (S304), with reference to each result[i], determine semantic similar degree.For example, the result value is 7124 o'clock, if four sign indicating numbers of preceding speech joint are identical, then semantic similar degree is 1.If the result value is 712, because three sign indicating number unanimities, then semantic similar degree is 3/4.Then, with result[i] semantic similar degree add up, export to the text transform unit.
Finish the processing that this semanteme similar degree calculates by above steps.
Fig. 4 is the processing flow chart of frequency weighted calculation unit 220.The following describes each step among this figure.
In (S401), at first reading symbol and the candidate's word with candidate's word deposits A-register, B-register respectively in, simultaneously with four initialization of register of C, D, E, F.
In (S402), as the retrieval key element,, the unisonance word and their usage frequency (adopting access times in the present embodiment) of correspondence deposited the reading symbol that is stored in the candidate's word in the A-register in the C register with reference to unisonance word learning dictionary unit 400.
In (S403),,, take out corresponding usage frequency and deposit in the D register with reference to the C register with being stored in candidate's word in the B-register as the retrieval key element.
In (S404), take out the usage frequency of each unisonance word of C register, after adding up summed result is deposited in the extension register.
In (S405), the value of D register is divided by the value of extension register, deposits the frequency weighting of gained in the F register.Explain for embodiment in the back about this calculated value.
In (S406), export the value of F register to text transform unit 200.
Finish the processing of frequency weighted calculation unit 220 by above steps.
Fig. 5 is the processing flow chart of the updating block 600 of semantic learning dictionary unit.Below, each step of this figure is described.
In (S501), authorize by the text strings of unisonance word selected cell 300 conversion and their phrase semantic sign indicating number.
In (S502), for the continuous adjacent word of text strings from left to right, the semantic code of its preceding word deposits in the A-register, and the semantic code of back word deposits in the B-register.
Whether in (S503), differentiating B-register is empty set.Enter (S504) when being not empty set, if empty set then enters (S512).
In (S504), the value of B-register is as the retrieval key element, with reference to semantic learning dictionary unit 420, deposits the data of unanimity in the C register.
In (S505), differentiate whether the C register is empty set.If empty set enters (S506), otherwise enter (S507).
In (S506), the value of B-register is as the retrieval key element, deposit in the value of A-register in the semantic learning dictionary unit 420 after, get back to (S502).
In (S507), A-register and B-register are carried out the long-pending computing of logic of sets, the result of computing deposits the D register in.
In (S508), differentiate whether the D register is empty set.If not empty set is got back to (S502).Enter (S509) when being empty set.
In (S509), A-register and C register are carried out disjunction operation, its result deposits A-register in.
In (S510), by the semantic code compression method, for example, semantic code method for normalizing etc. deposit A-register in after the value compression with A-register.
In (S511), with the value of B-register as the retrieval key element, deposit in the value of A-register in the semantic learning dictionary unit 420 after, get back to (S502).
In (S512),, learning outcome is classified and deposited in the semantic learning dictionary unit 420 according to the back phrase semantic sign indicating number of semantic learning dictionary unit 420.
Finish the renewal of semantic learning dictionary unit 420 handles by above steps.
Besides semantic code is represented with layer-stepping as shown in Figure 9.Like this, the semantic similar degree between two syllable candidate words can obtain by the long-pending computing of logic of sets.For example, semantic code " 7140 " is " 714 " with the operation result of the logic of sets long-pending (set intersection) of " 7140a ".At this moment, because three sign indicating number unanimities, semantic similar degree is 3/4.And when all sign indicating number was consistent, semantic similar degree was 1, two sign indicating number when consistent, and semantic similar degree is 2/4, one sign indicating number when consistent, and semantic similar degree is 1/4, and complete is 0 when inconsistent.
Below, the concrete example by Chinese reading symbol string illustrates above-mentioned effect.But the explanation of long consensus method is omitted.The situation that to import " di2jyuen1gong1sh4fei1ccang 2jian1gu4 " specifies this Chinese character conversion as an example.
In (S201), extract from this syllable string of input block input object as the Chinese character conversion, obtain " di2jyuen1gong1sh4fei1ccang 2jian1gu4 ".At this moment possible candidate's word is shown among Figure 13.
In (S202), for each speech joint that is shown among Figure 13, by the forward direction dynamic programing method, to the semantic similar degree+frequency weighting of each speech joint+determine that the accumulative total maximal value f (i) in path carries out computing, its result forms following computing.
f(7)=max[1]=1
f(6)=max[0+2+f(7)]=3,
f(5)=max[1+1.18+f(6)]=5.18
f(4)=max[0+1.20+f(6)]=4.20
f(3)=max[0+1.20+f(6)]=4.20
f(2)=max[0+1.42+f(6)]=4.42
f(1)=max[0+1.42+f(2),0+1.2+f(3),0+1.2+f(4),0.75+1.18+f(5)]
=max(5.84,5.4,5.4,7.11)=7.11
(in the formula, max (...) mean in () ... in get maximal value)
Below, the content of above-mentioned computing is described.
The frequency weighting of the word of " fortification ", the information of the unisonance word learning dictionary unit 400 by reference Fig. 6, can calculate is 2/ (28+13+13+12)=0.18." formula ", " public affair ", " offensive " respectively are 0.42,0.20,0.20 equally." gong1sh4 " syllable does not in addition have the unisonance word, so their frequency weighting is 1 entirely.Therefore, the frequency among the f (6) is weighted to 1+1=2, and f (5), f (4), f (3), f (2) respectively are 1+0.18,1+0.2,1+0.2,1+0.42, the value of formation following formula.In the calculating of semantic similar degree, the semantic code of each syllable shown in Figure 13 is shown in semantic learning dictionary unit 420 among Fig. 8 as the retrieval key element with reference to its content.Because the semantic code " 3950 " of " fortification " only is recorded in the semantic learning dictionary unit 420, so the semantic code " 7140 " of the word " enemy army " of the semantic code " 714a " of the word that the front is possible and front compares.Because three sign indicating number unanimities are so semantic similar degree is 3/4.Again, when calculating f (5), as shown in Figure 8, " very " with " fortification " though semantic similar degree be 0, " firm " of determining the path is 1 with the semantic similar degree of " fortification ".From above operation result as seen, because saving 1-5-6-7 by speech, the optimal mapping path constitutes, so can obtain " enemy army's fortification are very firm " so correct transformation results.And then according to above result of calculation, it is as follows to upgrade semantic learning dictionary unit 420.
3950……7140,714a,7330
1950……3950
135a……1950,3950
On this basis, unisonance word learning dictionary unit 400 also upgrades as follows.
Di2jyuen1 ... the enemy army, 1
Gong1sh4 ... formula, 28, fortification, 13, public affair, 13, offensive, 13
Fei1ccang2 ... very, 1
Jian1gu4 ... firm, 1
Though more than according to embodiment the present invention has been described, self-evident, the foregoing description can not be as qualification of the present invention.That is, for example,
(1) can use other the layer-stepping or the semantic code of classification.
(2) only utilize semantic similar degree.
(3) can use other operational method, for example integer programming method etc.
(4) as searching object, or the user can select similar above-mentioned functions to the preceding word of adjacent two words as retrieval key element, back word.In Japanese, perhaps above-mentioned functions is more suitable for.
Specifically, when the phoneme word of being imported is " か body The I Ru ", if word before it etc. are " バ リ カ Application In ", then be transformed to " the shave a man's head The is cut Ru ", if " Na ィ Off In " then is transformed to " the paper The is cut Ru ", if perhaps " pincers In " then can influence usage frequency greatly.
(5) extract the syllable string that constitutes the Chinese character transforming object from the pronunciation sign indicating number imported, also can by the literal importer by the input space or/wait predetermined symbol to carry out.
(6) computing of semantic similar degree also can be long-pending without logic of sets, though non-conterminous, other determines word as direct research object in the also available same article, at this moment will consider the syllable number between two words or the syllable string number of extraction.
Claims (3)
1. Chinese characters changing learning device, have: storage pronunciation sign indicating number and whole unisonance words that should the pronunciation sign indicating number and whole basic dictionary unit of the meaning of a word sign indicating number of unisonance words thereof, it is characterized in that, also have: the lexical study dictionary unit of the adjacent back word meaning of a word sign indicating number after the recording learning and the combination of preceding word meaning of a word sign indicating number; Extract after input pronunciation sign indicating number becomes the syllable string of current Chinese character transforming object by preset program, detect from the basic dictionary unit to should yard whole unisonance words and candidate's word of the meaning of a word sign indicating number of each unisonance word detect the unit; On basis, calculate the meaning of a word similar degree computing unit of the meaning of a word similar degree between two syllable candidate words via predetermined operational formula with reference to above-mentioned lexical study dictionary unit; Meaning of a word similar degree according to above-mentioned meaning of a word similar degree computing unit is calculated detects candidate's word taking-up optimal mapping path that the unit detects to above-mentioned candidate's word, and with the text transform unit of the word in this path as preliminary transformation results; By the instruction that the user does with reference to the preliminary transformation results of above-mentioned text transform unit, select word of wanting and the selected cell that obtains final transformation results in the unisonance word that from above-mentioned text transform unit, detects; According to the final transformation results of selected cell, the updating block of the lexical study dictionary unit that the combination in the above-mentioned lexical study dictionary unit is upgraded.
2. Chinese characters changing learning device according to claim 1, it is characterized in that, it also can comprise the unisonance word learning dictionary unit of the record pronunciation sign indicating number and the usage frequency in the past of its whole unisonance words of correspondence and whole unisonance words thereof, according to the updating block of the unisonance word learning dictionary unit that the record of the usage frequency in the above-mentioned unisonance word learning dictionary unit is upgraded from the final transformation results of above-mentioned selected cell, and above-mentioned candidate's word detects the unit by unisonance word learning dictionary unit, the order of basic dictionary unit detects corresponding whole unisonance words and their meaning of a word sign indicating number.
3. Chinese characters changing learning device as claimed in claim 2, it is characterized in that, it also comprises with reference to unisonance language learning dictionary unit, by the frequency weighted calculation unit of preset program calculating corresponding to the usage frequency weighting coefficient of each candidate's word of same pronunciation sign indicating number, and the meaning of a word similar degree that calculated of the frequency weighting coefficient that calculated according to above-mentioned frequency weighted calculation unit of above-mentioned text transform unit and above-mentioned meaning of a word similar degree computing unit, the optimal mapping path got in candidate's word to the pronunciation sign indicating number string imported, and with the word in this path as transformation results.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP5283361A JPH07141354A (en) | 1993-11-12 | 1993-11-12 | Kanji conversion learning device |
JP283361/93 | 1993-11-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1103494A CN1103494A (en) | 1995-06-07 |
CN1085859C true CN1085859C (en) | 2002-05-29 |
Family
ID=17664499
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN94106045A Expired - Fee Related CN1085859C (en) | 1993-11-12 | 1994-05-07 | Chinese characters shift learning device |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPH07141354A (en) |
CN (1) | CN1085859C (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104750831A (en) * | 2015-04-01 | 2015-07-01 | 广东小天才科技有限公司 | Intelligent Chinese character learning method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN86102518A (en) * | 1986-09-10 | 1988-03-23 | 施国梁 | Blurred words key board entry technology |
-
1993
- 1993-11-12 JP JP5283361A patent/JPH07141354A/en active Pending
-
1994
- 1994-05-07 CN CN94106045A patent/CN1085859C/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN86102518A (en) * | 1986-09-10 | 1988-03-23 | 施国梁 | Blurred words key board entry technology |
Also Published As
Publication number | Publication date |
---|---|
CN1103494A (en) | 1995-06-07 |
JPH07141354A (en) | 1995-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1205572C (en) | Language input architecture for converting one text form to another text form with minimized typographical errors and conversion errors | |
CN1232226A (en) | Sentence processing apparatus and method thereof | |
US20060136193A1 (en) | Retrieval method for translation memories containing highly structured documents | |
CN1918578A (en) | Handwriting and voice input with automatic correction | |
CN1942875A (en) | Dialogue supporting apparatus | |
CN1607491A (en) | System and method for Chinese input using a joystick | |
WO2005013054A2 (en) | System and method for disambiguating phonetic input | |
CN1781102A (en) | Low memory decision tree | |
CN1910573A (en) | System for identifying and classifying denomination entity | |
CN101067766A (en) | Method for cancelling character string in inputting method and word inputting system | |
CN1316689A (en) | Chinese character input unit and method | |
CN1835075A (en) | Speech synthetizing method combined natural sample selection and acaustic parameter to build mould | |
CN1256650C (en) | Chinese whole sentence input method | |
CN1224954C (en) | Speech recognition device comprising language model having unchangeable and changeable syntactic block | |
Maskey et al. | Bootstrapping phonetic lexicons for new languages | |
CN1085859C (en) | Chinese characters shift learning device | |
CN1787072A (en) | Method for synthesizing pronunciation based on rhythm model and parameter selecting voice | |
CN1278209C (en) | Composite phonetic alphabet Chinese character coding input method and its keyboard | |
CN1084500C (en) | Chinese characters alternating device | |
JP2008059389A (en) | Vocabulary candidate output system, vocabulary candidate output method, and vocabulary candidate output program | |
CN1560767A (en) | Automatic fully adding method for word input | |
CN1227369A (en) | Chinese input transition processing device and Chinese input transition processing method | |
CN1101732A (en) | Chinese characters changing learning device | |
CN1054220C (en) | Intelligence input method for Chinese characters | |
CN1043541C (en) | Apparatus for conversion of Chinese characters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C19 | Lapse of patent right due to non-payment of the annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |