CN1068127C - Text data processing method and device - Google Patents

Text data processing method and device Download PDF

Info

Publication number
CN1068127C
CN1068127C CN96115997A CN96115997A CN1068127C CN 1068127 C CN1068127 C CN 1068127C CN 96115997 A CN96115997 A CN 96115997A CN 96115997 A CN96115997 A CN 96115997A CN 1068127 C CN1068127 C CN 1068127C
Authority
CN
China
Prior art keywords
word
machine code
isn
subitem
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN96115997A
Other languages
Chinese (zh)
Other versions
CN1182234A (en
Inventor
吴胜远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN96115997A priority Critical patent/CN1068127C/en
Publication of CN1182234A publication Critical patent/CN1182234A/en
Application granted granted Critical
Publication of CN1068127C publication Critical patent/CN1068127C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention provides a method and a device for processing word information. In the method, the technology of multi-stage inner codes is used for solving the problems existing in the storage, the transmission, the segmentation, the text-to-speech conversion, etc. of word information. The method can be widely used in computer networks, various word information processing devices, the multimedia field, language engineering, etc.

Description

Literal information processing method and device
The present invention relates to the method and apparatus that Word message is handled.Being particularly related to the literal composition is the method and apparatus that unit handles, and this invention is the method and apparatus that the Word message that the Word message that contains multilevel machine code can directly be handled is handled.
In the method and apparatus of existing word processing, alphabetic character represents that with internal code this instructions is called the single-stage ISN with this internal code, is called the one-level ISN again, for example ASCII character, Chinese internal code etc.In existing Word message disposal system, Word message is to handle with the form of single-stage ISN.The Word message memory space is big, and transmission quantity is big, and processing speed is slow.
Existing text compress technique can increase the memory space of Word message on supplementary storage, improves the transfer efficiency of Word message, but can not improve the processing speed of Word message, can not increase the memory space of Word message in primary memory.In language engineering, in natural language understanding, literary composition-language conversion and mechanical translation, the processing of Word message need be carried out on the level of speech, phrase or phrase, thereby the efficient of existing word processor is lower, and the difficulty that exists some to be difficult to overcome, the for example speed of participle and correctness problem, the problems such as correctness of voice in the conversion of literary composition-language.The root of many difficulties just was the single-stage ISN during current Word message was handled, for example in the input process of Chinese, the method of the employing speech input that has, that is to say, in input process, solved many participle problems, but, existing input method has only solved the conversion of input code to the single-stage ISN, and the single-stage ISN is difficult to keep the branch word information, thereby the branch word information in the input process has been thrown away; And for example, there is the stress problem in Chinese character, and corresponding Chinese character has only a font, but a plurality of sounds are arranged, but has only an internal code, thereby can't distinguish the stress word.
After four pieces of literature research enumerating in to the international search report, find these four pieces of documents all about literal input problem, neither one proposes the notion of multilevel machine code in the above pertinent literature.According to CN-A-1053960, input Chinese character pattern composite symbol (being the input code of word or speech) is retrieved input code in dictionary and dictionary, phrase is transformed to corresponding word, the word behind the output transform.CN-A-86107235 discloses a kind of based on the phrase input coding, and single character code is input as auxilliary binary input method.These documents only solve Chinese character or the ideograph input code transfer problem to corresponding single-stage ISN.Owing to do not propose the notion of multilevel machine code, still store and handle with the single-stage ISN at machine intimate, nature can not solve the problem that increases the Word message memory space, can not improve the transfer efficiency of Word message, can not improve the speed of locating (though the improved input speed that has) of Word message, also not bring new strong point aspect participle and the literary composition-language conversion.
The present invention not only can be used for ideograph, also is applicable to the processing of the Word message of alphabetical formula structure, that is to say, the present invention is applicable to the processing of various Word messages.The present invention also comprises the input method of Chinese character information, but mainly is that input code is converted to the method and apparatus that contains multilevel machine code, is converted to the method for single-stage ISN for input code, then is the method for having utilized in the multilevel machine code technology.
Goal of the invention
One of purpose of the present invention is to provide a kind of literal information processing method that contains multilevel machine code, and the application of this method in first kind word processor is provided.
Two of purpose of the present invention is to provide a kind for the treatment of apparatus that contains the Word message of multilevel machine code, and the device that provides the first kind Word message relevant with the second class character information processor to handle.
For the purpose that realizes inventing, the inventor has proposed the notion of multilevel machine code.
The implication of multilevel machine code at first is described.
ISN is the expression of Word message at machine intimate, is called for short internal code or ISN.
The single-stage ISN is the ISN of corresponding alphabetic character or base unit.For example ASCII character and Chinese internal code etc.The single-stage ISN also can be described as the one-level ISN.
The literal composition is the word segments such as word, speech, phrase or phrase in the corresponding literal.
Multilevel machine code is corresponding to the ISN of literal composition, that is to say, multilevel machine code is the expression at machine intimate of word, speech, phrase or phrase.Multilevel machine code not only is used for storage, the transmission of Word message, and can be used for the computing and the processing of Word message.The single-stage ISN can be seen the multilevel machine code of one-level as, thereby the system natural energy that can handle multilevel machine code is handled the Word message that only contains the single-stage ISN.
Because multilevel machine code can be corresponding with speech, thereby the Word message that contains multilevel machine code does not need participle, the correctness and the participle speed issue of participle have been solved like this, simultaneously, multilevel machine code has also solved the stress problem of word and speech, the word of a plurality of for containing (containing 2) stress, one of them uses representation in the single-stage, all the other represent that with multilevel machine code for example " weight " can send out " zhong " or " chong ", also have some stress speech in the Chinese, for example " delegation " can be read as " yi xing " or " yi hang ", go on business readable be " chu chai " or " chucha " etc., in English, also have the stress problem, when for example record makes verb and makes noun, pronunciation is just different, the speech of different pronunciations can be represented with different multilevel machine codes in English, so the employing multilevel machine code can solve the stress problem in the Word message.
The first aspect of first purpose of the present invention, be about a kind of literal information processing method, wherein alphabetic character is represented with internal code, this internal code is also referred to as the single-stage ISN, and literal composition one is that word, speech, phrase and phrase are to represent with the set of the single-stage ISN of contained at least one character in the literal composition, the processing of Word message is to realize that by the processing to the single-stage ISN described literal information processing method is characterised in that
Described literal composition also can be expressed as another kind of internal code, this ISN is called multilevel machine code, like this, the set of the single-stage ISN of at least one character corresponding with the literal composition is represented with regard to an available multilevel machine code, to the processing of Word message also by processing to multilevel machine code, the operation of carrying out relevant multilevel machine code realizes, thereby can improve the memory space of Word message on storage medium significantly, improve processing speed, improve transfer efficiency, and solved the correctness problem etc. of voice in the correctness problem of Word message cutting and the conversion of literary composition language.
Multilevel machine code is an a kind of multibyte coding, and it can adopt bit-identify, byte-identifier, string sign or do not have identification code.Multilevel machine code should be easy to distinguish mutually with the single-stage ISN.
Suppose that multilevel machine code adopts two byte bit-identifies coding, having 7 Chinese internal codes for " Chinese People's Liberation Army " forms, need account for 14 bytes, phrase can be represented with a multilevel machine code hereto, only need 2 bytes, so adopt multilevel machine code can increase auxilliary memory space of depositing to Word message, also can improve the transfer efficiency of Word message, can participate in various computings and processing directly owing to contain the Word message of multilevel machine code again, in computing and handling, do not need to be converted to the single-stage ISN, the Word message that contains multilevel machine code is shorter than the Word message that only contains the single-stage ISN, so processing speed has improved, also improved the memory space of primary memory simultaneously to Word message.
The unidirectional conversion of ISN is by the conversion of senior ISN to rudimentary ISN, is meant generally speaking to be converted to the single-stage ISN.Multidirectional conversion is by the conversion of rudimentary ISN to senior ISN, is meant generally speaking by the single-stage ISN to be converted to multilevel machine code.
Multidirectional conversion can be a kind of of following conversion:
(1) is converted to corresponding multilevel machine code by speech or phrase;
(2) the single-stage ISN by word is converted to corresponding multilevel machine code, generally is used for the stress word;
(3) be converted to corresponding multilevel machine code by the corresponding single-stage ISN of stress speech.
The phonetic notation information of literal common two is content partly: the one, and the basic syllable that literal constituted, the 2nd, stress or tone information; For example the phonetic notation of Chinese character can be made up of its phonetic and tone; Tone also comprises softly except that the four tones of standard Chinese pronunciation; Totally five.We will comprise at least that the phonetic notation of the literal of this two parts information is called whole tone.English phonetic notation can be made up of phonetic symbol and stress.
The unidirectional conversion equipment of whole tone: will contain the ISN Word message and be converted to corresponding whole tone.
The coding of multilevel machine code is relevant with the structure of composition storehouse device.
The one-tenth subitem is the Word message part corresponding to multilevel machine code, and it can contain single-stage ISN or multilevel machine code, or multilevel machine code and single-stage ISN.The length sum of the contained corresponding single-stage ISN of one-tenth subitem is called into the physical length of subitem, and the length sum of contained ISN is the list item length that becomes subitem, becomes the list item length or the physical length of subitem to be called into subitem length.
Composition storehouse device is to become subitem by certain regularly arranged device.For example, the queueing discipline of basis storehouse device is the list item length segmentation of proportionately itemizing, and is again that each section is regularly arranged by certain, and the coding of multilevel machine code connects with the corresponding address that becomes to itemize in the composition storehouse.Multilevel machine code is the coding about the literal composition, and it is the same with the single-stage ISN, is internal code also, can directly handle the Word message that contains multilevel machine code.
Word sound storehouse device is the device of word and its whole tone corresponding relation.Whole tone for Chinese character can constitute with two pinyin character and its tone information, and we realize with two bytes.16 of two bytes, character 5 bit representations, 5 tone 3 bit representations, also surplus 3 are used to store other information.The whole tone in word sound storehouse is by the series arrangement of corresponding internal code.For the word that stress is arranged, only preserve its keynote in the device of word sound storehouse, so-called keynote is meant the whole tone that a Chinese character is commonly used, and the whole tone that is of little use is called secondary noise, and secondary noise also is a whole tone.Below the whole tone of said Chinese character all be meant whole tone with this 2 byte representations.Because word sound storehouse device is pressed the internal code of Chinese character and is arranged, thereby can directly obtain its whole tone by internal code.Speech sound storehouse device is a device of depositing the whole tone of the speech more than two words or two words, and its whole tone is pressed the multilevel machine code corresponding order and arranged.
Stress word sound storehouse is a device of depositing the secondary noise of stress word, presses the series arrangement of the corresponding multilevel machine code of stress word.
Stress speech sound storehouse and variant pronunciation speech sound storehouse also are speech sound storehouses.
Stress speech sound storehouse device is a device of depositing the whole tone of stress speech, and so-called stress speech is meant that in a speech pronunciation that has a word at least is a secondary noise.
The variant pronunciation speech is meant that a speech has the speech of two or more pronunciation, as " chu chai goes on business " " chu cha goes on business ".Can represent with its multilevel machine code for the variant pronunciation speech.
The second aspect of first purpose of the present invention is characterized in that:
Comprise unidirectional conversion operations in described method, unidirectional conversion operations can be
(1) the unidirectional conversion operations of ISN:
Multilevel machine code is converted to and the corresponding single-stage ISN of multilevel machine code; Or
(2) the unidirectional conversion operations of whole tone: ISN is converted to corresponding whole tone.
The third aspect of first purpose of the present invention is characterized in that:
In described method, use the unidirectional conversion operations of ISN that multilevel machine code is converted to and the corresponding single-stage ISN of multilevel machine code, thus the compatibling problem of solution and existing word processor; Described unidirectional conversion operations may further comprise the steps:
Calculation procedure is calculated this multilevel machine code according to multilevel machine code and is become subitem accordingly
Position in the device of composition storehouse:
Switch process replaces multilevel machine code with corresponding one-tenth subitem;
Identification step according to the coding characteristic of multilevel machine code, is discerned corresponding one-tenth subitem
In whether contain multilevel machine code, do as follows according to recognition result
Action Selection; Continue unidirectional transfer process if contain multilevel machine code;
Otherwise EOC; Identification step contains multistage in the one-tenth subitem of composition storehouse device
Just adopt during ISN.
The purpose of the unidirectional conversion of ISN is in order to make the system that contains multilevel machine code and the system compatible that only contains the single-stage ISN, for example, when need show or print when containing the Word message of multilevel machine code, needs multilevel machine code wherein is converted to the single-stage ISN.
The fourth aspect of first purpose of the present invention is characterized in that also comprising multidirectional switch process in described method:
The corresponding ISN of multilevel machine code is converted to corresponding multilevel machine code.
The implementation method of multidirectional conversion is relevant with indexing unit with mapping composition storehouse device with device.Mapping composition storehouse device constitutes by being mapped to subitem.
Be mapped to a kind of of subitem, become subitem length and corresponding multilevel machine code to constitute by becoming subitem.
Mapping composition storehouse device is to be mapped to subitem by certain regularly arranged device.
The queueing discipline of mapping composition storehouse device generally is to arrange by being mapped to the corresponding single-stage ISN ascending order of subitem (or descending).
Indexing unit is made of index entry, index entry mainly by indicate be mapped to subitem corresponding first or preceding several) address entries of the address that occurs in the device of mapping composition storehouse first of ISN constitutes.
" size order " in the following narration is meant descending order or ascending order.
The 5th aspect of first purpose of the present invention is the literal information processing method to the 5th aspect, and the method for the multidirectional conversion of speech is provided, and the literal information processing method of described the 5th aspect is characterized in that:
In described method, use multidirectional conversion operations to be converted to corresponding multilevel machine code with the corresponding level of multilevel machine code ISN, multidirectional conversion contains mapping composition storehouse device, the subitem that is mapped in the mapping composition storehouse is arranged by the size order of the single-stage ISN of corresponding one-tenth subitem, and described multidirectional conversion operations may further comprise the steps:
Look into the index step, look into indexing unit, shine upon accordingly if can find mapping composition storehouse device according to corresponding ISN
Become subitem to change the comparison match step,
Otherwise, return; The comparison match step is compared with the corresponding Word message that is converted being mapped to subitem, does following selection according to the result:
Jump out condition and change last treatment step if satisfy;
If equate, then carry out matching operation;
Shift moving step: mobile step, will be mapped to subitem and move one by working direction, change the last treatment step of comparison match step, if coupling is returned being mapped to of last coupling
The corresponding multilevel machine code of itemizing.
Above-mentioned " jumping out condition " is meant: when being mapped to when subitem is arranged by the single-stage ISN ascending order of corresponding one-tenth subitem of mapping composition storehouse is: the one-tenth subitem that is mapped to subitem greater than being converted Word message during descending sort is: the one-tenth subitem that is mapped to subitem is less than being converted Word message;
Above-mentioned working direction is meant: when being mapped to when subitem is arranged by the single-stage ISN ascending order of corresponding one-tenth subitem of mapping composition storehouse is: the ascending order direction in mapping composition storehouse;
During descending sort be: the descending direction in mapping composition storehouse;
The execution matching operation of above-mentioned steps generally is to carry out " set match flag; this is mapped to subitem will to mate pointed ", also can carry out " with current be mapped to the corresponding multilevel machine code of subitem substitute be compared corresponding part in the literal ", the purpose of matching operation is in order to return correct multilevel machine code at last.
Multidirectional conversion can be converted to the Word message that only contains the single-stage ISN and contain multistage Word message, and these Word messages can be from storage medium, communication apparatus and input equipment etc.
Multidirectional conversion and unidirectional conversion also can be adopted conversion device for pipeline, and conversion device for pipeline contains pipeline composition storehouse device.
Pipeline composition storehouse device can be divided into pipeline single hop composition storehouse device and pipeline multistage composition storehouse device.
The 6th aspect of first purpose of the present invention is about the conversion method of Word message input code to multilevel machine code, it is characterized in that:
In the input of Word message, be converted into corresponding multilevel machine code with the input code of multilevel machine code related words information.
The 7th aspect of first purpose of the present invention is the maximum match that the method for multidirectional conversion is used for Word message, and the literal information processing method of described the 5th aspect is characterized in that:
The method of multidirectional conversion can be used for only containing in the maximum match of Word message sequence of single-stage ISN, also can be used for containing in the maximum match of Word message sequence of multilevel machine code; Here, we will shine upon composition storehouse device and be called dictionary, and dictionary is made up of the dictionary item, and the dictionary item contains the one-tenth subitem that mates usefulness, and proportionately the itemize size order of corresponding single-stage ISN of dictionary item is arranged;
The Word message sequence can be imported by input media, also can be imported by memory storage, also can be imported into by communication device;
The operation steps of the maximum match of Word message sequence is as follows:
(1) scan text information sequence is looked into indexing unit according to corresponding ISN,
If find address entries, continue, otherwise return;
(2) relatively become the literal that subitem and corresponding quilt mate to believe by comparison means
Appropriate section in the breath sequence;
Make following Action Selection according to comparative result:
If the result satisfies the redirect condition, change step (4);
If equate, carry out matching operation;
(3) the dictionary item is moved one by working direction, change step (2);
(4) if mate, then return the Word message of maximum match at last;
Described method can reduce time complexity and space complexity when being used for the cutting of Word message significantly;
Execution matching operation in the above-mentioned steps generally is to carry out " set match flag, this becomes subitem the coupling pointed ", and the purpose of matching operation is in order to return the one-tenth subitem of maximum match at last;
Above-mentioned " jumping out condition " is meant: when arranging by corresponding single-stage ISN ascending order as subitem be: become subitem greater than by the Word message of cutting;
During descending sort be: become subitem less than by the Word message of cutting;
Above-mentioned working direction is meant: when becoming subitem to arrange by corresponding single-stage ISN ascending order be: the ascending order direction that becomes subitem;
Descending sort is to be the descending direction that becomes subitem.
The eight aspect of first purpose of the present invention is the maximum match that the method for multidirectional conversion is used for the Chinese input code, and the literal information processing method of described the 5th aspect is characterized in that:
The method of multidirectional conversion is used for the maximum match cutting of Chinese input code sequence, by cutting the input code sequence is converted to the Chinese character sequence; Here claim that mapping composition storehouse device is the character word stock device, the character word stock device contains word item and lexical item, word item and lexical item general designation words item, the word item contains the internal code of the Chinese character of same input code, and lexical item contains input code item and the multilevel machine code or the single-stage ISN of the corresponding speech of input code item therewith; The word of same input code or speech are regularly arranged by certain, and word item and the lexical item single-stage ISN size order by corresponding input code character in the character word stock device is arranged; Switch process is as follows:
(1) looks into indexing unit with current input code, draw the address of character word stock;
(2) appropriate section in the sequence is compared with the input code item of character word stock appropriate address,
If satisfy the redirect condition, change (5);
(3) if equate, carry out matching operation;
(4) the words item is moved one by working direction, change (2);
(5), perhaps return the single-stage ISN of the speech correspondence that the importer chooses or many if coupling
The level ISN perhaps returns the single-stage ISN of the speech of choosing by certain priority rule
Or multilevel machine code;
If when not matching, perhaps return the word that the importer chooses the single-stage ISN or
Return the single-stage ISN of the word of choosing by certain priority rule;
In the above-mentioned input method, can increase a switch, the user can be selected:
Input code is converted to the Word message that contains multilevel machine code, still only contains the single-stage ISN
Word message;
Above-mentioned " jumping out condition " is meant: when the input code item is arranged by the single-stage ISN ascending order of corresponding input code character be: the input code item is greater than by cutting input code information;
During descending sort be: the input code item is less than by cutting input code information;
Above-mentioned working direction is meant: when the input code item is arranged by the single-stage ISN ascending order of corresponding input code character be: input code item ascending order direction;
During descending sort be: input code item descending direction;
Execution matching operation in the above-mentioned steps generally is to carry out " set match flag, this words item of coupling pointed ", and in a word, the purpose of matching operation is in order to return the longest speech at last.
In input in Chinese,, can adopt the method for the auxiliary input of dimorphism sound sign indicating number for reducing the repetition rate of coding.So-called dimorphism sound sign indicating number is meant that a Chinese character is by its formation first letter representation with the phonetic of the sound of two shapes.Here, we adopt following scheme: Chinese character can be divided into two classes, and a class can be decomposed into part, and a class can not be decomposed, as " taking advantage of ", " holding " and " interior " etc.Do not want rigid decomposition for not decomposing.For decomposable Chinese character,, we are divided into two parts with it, also only are decomposed into two parts, this class can have following situation:
(1) two all is Chinese character partly: for example " open " and be decomposed into " bow " and " length ", at this moment, first letter of the Chinese character of desirable decomposition is its shape sound sign indicating number, here get " g " and " i ", (for " zh ", " ch ", we represent " sh " with three initial consonants that are of little use, for example use " v " respectively, " i " " u " expression).
(2) part is radicals by which characters are arranged in traditional Chinese dictionaries, because Chinese character radicals is a lot, we only get the radicals by which characters are arranged in traditional Chinese dictionaries that obvious corresponding relation is arranged with Chinese character, and with first letter representation of the corresponding syllable of its Chinese character, all the other radicals by which characters are arranged in traditional Chinese dictionaries are handled as non-word (promptly not being word).
(3) two parts of Fen Xieing wherein a part be non-word, or two parts are non-word.For non-word, we represent as " a " with the initial consonant that is of little use;
(4) for indissoluble Chinese character, we represent as " oo " etc. with two initial consonants that are of little use.
In (2), also the most frequently used and people can be generally acknowledged and have the radicals by which characters are arranged in traditional Chinese dictionaries of same title and regard a part of decomposing, its shape sound is just represented with first phonetic alphabet of its radicals by which characters are arranged in traditional Chinese dictionaries title; For example 3 water " Rui " are made shape sound sign indicating number with " s ", and and for example grass-character-head " Lv " makes shape sound sign indicating number etc. with " c "; But that does not get is too many, because this is a kind of method of auxiliary minimizing repeated code, can not increase the burden of user's memory; The radicals by which characters are arranged in traditional Chinese dictionaries that are simplified to by the complex form of Chinese characters are as " Yan " in addition, the pronunciation of the Chinese character before " Cannibals " etc. simplifies with it.
Dimorphism sound sign indicating number is a kind of auxiliary input method only, is particularly suitable for phonetic input, when Chinese character or speech have repeated code, just can import the dimorphism sound sign indicating number string of corresponding word or the corresponding Chinese character of speech, or input dimorphism sound sign indicating number string partly only, to reduce the repetition rate of coding, the raising input speed.
The 9th aspect of first purpose of the present invention, described literal information processing method is characterized in that:
In the process of input in Chinese sign indicating number input, contain the dimorphism pronunciation input method, the operation steps of dimorphism sound sign indicating number is as follows:
(1) the input code string of input Chinese character or speech does not change (4) if do not import dimorphism sound sign indicating number;
(2) part of input Chinese character or speech or all dimorphism sound sign indicating number string;
(3) machine (information) retrieval dimorphism sound sign indicating number storehouse device extracts Chinese character or the speech that not only meets former input code but also meet the dimorphism sound sign indicating number of being imported, and allows the user select;
(4) return Chinese character or the speech that the user selects.
In some input methods, the input code that has may not know that as in phonetics input method, the phonetic of the Chinese character that has may not known, at this moment available stroke radicals input method.The stroke radicals input method is made up of three devices, stroke indexing unit, radicals by which characters are arranged in traditional Chinese dictionaries indexing unit and character library device.Stroke indexing unit one shared 10, " 0 " arrives " 9 "; Wherein " 0 " item is the start address of radicals by which characters are arranged in traditional Chinese dictionaries in the radicals by which characters are arranged in traditional Chinese dictionaries indexing unit more than 10 strokes and 10 strokes, and " 1 " is respectively 1 stroke to 9 strokes the start address of radicals by which characters are arranged in traditional Chinese dictionaries in the radicals by which characters are arranged in traditional Chinese dictionaries indexing unit to " 9 ".The radicals by which characters are arranged in traditional Chinese dictionaries indexing unit is the start address of Chinese character in radicals by which characters are arranged in traditional Chinese dictionaries character library device of corresponding radicals by which characters are arranged in traditional Chinese dictionaries.Radicals by which characters are arranged in traditional Chinese dictionaries character library device is divided into plurality of sections by the radicals by which characters are arranged in traditional Chinese dictionaries of Chinese character, the Chinese character of a certain radicals by which characters are arranged in traditional Chinese dictionaries of each section correspondence, because the one-level character library is not to deposit by radicals by which characters are arranged in traditional Chinese dictionaries in Chinese character base, but the secondary character library is deposited by radicals by which characters are arranged in traditional Chinese dictionaries, so, in the character library device, mainly deposit first-level Chinese characters, can only leave corresponding first Chinese character of these radicals by which characters are arranged in traditional Chinese dictionaries and last Chinese character in the secondary character library in for the Chinese characters of level 2, or the number of Chinese character under the corresponding radicals by which characters are arranged in traditional Chinese dictionaries in first Chinese character and the secondary character library.
The tenth aspect of first purpose of the present invention, described literal information processing method is characterized in that: the operation steps of stroke radicals by which characters are arranged in traditional Chinese dictionaries input is as follows:
(1) stroke number of radicals by which characters are arranged in traditional Chinese dictionaries of the required word of input equals 10 strokes or more than 10 strokes of inputs " 0 ";
(2) machine is looked into the stroke indexing unit according to the stroke number of input, finds the start address of radicals by which characters are arranged in traditional Chinese dictionaries in the radicals by which characters are arranged in traditional Chinese dictionaries indexing unit of this stroke;
(3) machine shows the corresponding radicals by which characters are arranged in traditional Chinese dictionaries of this stroke on screen;
(4) user selects required radicals by which characters are arranged in traditional Chinese dictionaries;
(5) machine (information) retrieval radicals by which characters are arranged in traditional Chinese dictionaries character library device shows this radicals by which characters are arranged in traditional Chinese dictionaries corresponding Chinese character;
(6) return the required Chinese character of user.
We can not handle the multilevel machine code Word message handling single-stage ISN Word message character processor is a first kind character processor, and the character processor that can handle the Word message that can handle the single-stage ISN multilevel machine code Word message again is the second class character processor.
The described literal information processing method in the tenth one side five, six, seven, eight, nine, ten aspects, front of first purpose of the present invention is characterized in that:
Described method is used for the system that first kind Word message is handled.
The first aspect of second purpose of the present invention is about a kind of character information processor; Wherein alphabetic character is represented with internal code: this internal code is also referred to as the single-stage ISN, and literal composition-be word, speech, phrase and phrase are to represent with the set of the single-stage ISN of contained at least one character in the literal composition, the processing of Word message is to realize by the processing to the single-stage ISN, it is characterized in that:
In the described character processor, the literal composition also can be expressed as another kind of internal code, this ISN is called multilevel machine code, like this, the available multilevel machine code of the set of the single-stage ISN of at least one character corresponding with the literal composition is represented, processing to Word message also realizes by the processing to multilevel machine code, thereby can improve the memory space of Word message on storage medium significantly, improve processing speed, improve transfer efficiency and solved the correctness problem of Word message cutting and the conversion of literary composition language in the correctness problem etc. of voice; This word processing device comprises:
(1) input media is imported Word message from input media, input media can be a memory storage, and Word message is converted to the device of literal ISN, with the receiving trap of Word message by the transmission medium input etc.;
(2) treating apparatus carries out the operation of relevant multilevel machine code to Word message;
(3) output unit, with Word message output, output unit can be a display device, sound-producing device, printing equipment and the dispensing device etc. that sends Word messages by transmission medium to other character processors.
The second aspect of second purpose of the present invention is as the described character information processor of first aspect
It is characterized in that:
In described device, use the unidirectional conversion equipment of ISN that multilevel machine code is converted to and the corresponding single-stage ISN of multilevel machine code, thus the compatibling problem of solution and existing word processor; Described unidirectional conversion equipment comprises with lower device:
(1) composition storehouse device is deposited with multilevel machine code to become to itemize accordingly;
(2) calculation element, this device calculates corresponding one-tenth subitem at composition according to multilevel machine code
Position in the device of storehouse;
(3) conversion equipment is replaced corresponding multilevel machine code with corresponding one-tenth subitem;
(4) recognition device, this device are identified as whether contain multilevel machine code in the subitem, according to knowledge
Other result can select following action:
Continue unidirectional conversion if contain multilevel machine code, otherwise unidirectional EOC.
Above-mentioned identification step just adopts when containing multilevel machine code in the subitem of the one-tenth in the device of composition storehouse.
The third aspect of second purpose of the present invention is characterized in that as the described character information processor of first aspect:
In described device, use the unidirectional conversion equipment of whole tone that multilevel machine code is converted to and the corresponding whole tone of multilevel machine code; The unidirectional conversion equipment of described whole tone comprises with lower device:
(1) word sound storehouse device: the corresponding whole tone of verification certificate level ISN, and judge whether the Chinese character of this single-stage ISN correspondence is the stress word;
(2) speech sound storehouse device: the corresponding whole tone of multilevel machine code of looking into the corresponding words correspondence;
(3) stress word sound storehouse device: look into whole tone corresponding to the stress word; The step of the unidirectional conversion of whole tone is as follows:
(1) scanning contains the ISN in the Word message of multilevel machine code;
(2) if the single-stage ISN, its whole tone is a corresponding whole tone in the device of word sound storehouse;
(3) if multilevel machine code, if multilevel machine code is the corresponding multilevel machine code of speech, then its whole tone is a corresponding whole tone in the device of speech sound storehouse, if multilevel machine code is the multilevel machine code of secondary noise word correspondence then is corresponding whole tone in the device of stress word sound storehouse.
The fourth aspect of second purpose of the present invention as the described character information processor of first aspect, is characterized in that:
Using the whole tone voice conversion device in described device is corresponding voice with the whole tone information translation of literal; Described whole tone voice conversion device comprises with lower device:
(1) sound storehouse device; Deposit the device of whole tone syllable waveform or synthetic parameters;
(2) whole tone indexing unit: deposit corresponding waveform of whole tone or the parameter position in the device of sound storehouse;
(3) conversion equipment: calculate the position of the corresponding index entry of this whole tone in whole tone index database device according to whole tone, take out the address entries in this index entry, obtain the position in the device of sound storehouse, whole tone is converted to corresponding waveform or parameter.
The 5th aspect of second purpose of the present invention as the described character information processor of first aspect, is characterized in that:
Using the whole tone voice conversion device in described device is that corresponding speech concurrent is given the telephone subscriber with the whole tone information translation of literal.
The 6th aspect of second purpose of the present invention, as the described character information processor of first aspect, it is waited to levy and is:
Use multidirectional conversion equipment to be converted to corresponding multilevel machine code with the corresponding single-stage ISN of speech in described device, multidirectional conversion equipment comprises:
(1) indexing unit, this device can be used to judge in mapping composition storehouse whether exist
With corresponding ISN is initial one-tenth subitem, is mapped to if exist then be given in
Divide the address in the storehouse;
(2) mapping composition storehouse device, this device is suitable by the single-stage ISN size of corresponding one-tenth subitem
Preface is arranged;
(3) comparison match device, this device comprises:
Judgment means: by to the Word message that is converted be mapped to subitem
In one-tenth subitem relatively, carry out as follows according to comparative result
Action Selection:
If satisfy the redirect condition, jump out circulation, if coupling,
Then do last matching operation;
If the judged result of judgment means is then advanced for equating
Go into coalignment, otherwise enter mobile device;
Coalignment: carry out matching operation, enter mobile device;
Mobile device: will be mapped to subitem and move one, and advance by the direction of advancing
Go into judgment means;
Above-mentioned " jumping out condition " is meant: when being mapped to when subitem is arranged by the single-stage ISN ascending order of corresponding one-tenth subitem of mapping composition storehouse is: the one-tenth subitem that is mapped to subitem greater than being converted Word message during descending sort is: the one-tenth subitem that is mapped to subitem is less than being converted Word message;
Above-mentioned working direction is meant: when being mapped to when subitem is arranged by the single-stage ISN ascending order of corresponding one-tenth subitem of mapping composition storehouse is: the ascending order direction in mapping composition storehouse;
Descending sort is to be the descending direction in mapping composition storehouse;
Above-mentioned execution matching operation generally is to carry out " set match flag; this is mapped to subitem the coupling pointed ", also can carry out " with current be mapped to the corresponding multilevel machine code of subitem substitute be compared corresponding part in the literal ", the purpose of matching operation is in order to return correct multilevel machine code at last.
The 7th aspect of second purpose of the present invention as the described character information processor of first aspect, is characterized in that:
Use multidirectional conversion equipment to be converted to corresponding multilevel machine code with the corresponding single-stage ISN of word in described device, multidirectional conversion equipment comprises:
(1) stress word composition storehouse device: the single-stage ISN of depositing the stress word that contains secondary noise;
(1) word sound storehouse device: the corresponding whole tone of verification certificate level ISN, and judge whether the Chinese character of this single-stage ISN correspondence is the stress word;
(3) stress word sound storehouse device: look into whole tone corresponding to the stress word; The multidirectional switch process of word is:
(1) input parameter is the single-stage ISN of the word that will change;
(2) this word for the stress word is denied, if not, still be the single-stage ISN, change 5;
(3) this stress word is that keynote is denied, if keynote still is the single-stage ISN, changes 5;
(4) look into the composition storehouse device of stress word, according to stress word sound storehouse device, should
The single-stage ISN of word substitutes with the multilevel machine code of its corresponding secondary noise;
(5) EOC.
Beneficial effect
Adopt multilevel machine code not only can improve auxilliary memory space of depositing, and can improve the memory space of disposal system internal memory Word message.Can directly handle owing to contain the Word message of multilevel machine code, thereby can improve the processing speed of Word message.The raising of processing speed mainly contains three reasons.The firstth, the Word message that contains multilevel machine code is shorter than the Word message that only contains the single-stage ISN; The secondth, the Word message that carry out computing is short; The 3rd, owing to when depositing or reading to coil, do not need to carry out compression and decompression, again owing to contain the Word message weak point of multilevel machine code: thereby the I/O operation is accelerated.Similar with the reason that improves processing speed, adopt multilevel machine code can improve transfer efficiency.
Method and apparatus provided by the invention can be widely used in the first kind character processor, is exemplified below.The input code input conversion apparatus can be used in the first kind character processor input code is converted to corresponding single-stage ISN; Unidirectional conversion equipment and multidirectional conversion equipment can be used for the storage of compression of the Word message in the first kind character processor and communication etc.For example, in file operation or disc operating system (DOS), add unidirectional and multidirectional conversion equipment, Word message is stored with the form of compression automatically.
In the second class character processor or between, can adopt the Word message that contains multilevel machine code to store, transmission and handle; With the Word message contrast that only contains the single-stage ISN, the Word message processing speed that contains multilevel machine code is fast, and can increase memory space and the transfer efficiency that improves on the transmission medium on the storage medium, thereby improve the performance of internal system and machine intimate, improve the efficient of character processor.
The second class character processor has also been simplified the processing procedure in the language engineering, for example can partly or entirely save word segmentation work, and this all can be applied in natural language understanding, mechanical translation and text-speech conversion.For example, in the conversion of the text-voice of Chinese, must solve the stress problem and the rhythm problem of participle problem, Chinese character, current, the participle problem of Chinese is a problem that is difficult to solution, the Word message that employing contains multilevel machine code makes the Chinese word segmenting problem obtain solution, and the stress problem also has been readily solved.
The present invention also can be used in the word segmentation, particularly in the input of the phonetic of the word segmentation of Chinese and Chinese character and speech.
Method of the present invention is write computer program in order to guidance, generates computer instruction, and control computer is finished corresponding operating.
Technical scheme of the present invention can be widely used in the every field that Word message is handled, and also can be used for instructing related software, semi-software, and firmware and integrated circuit (IC) design and manufacturing have huge economic benefit and social benefit.
For example:
TTS (text and voice transfer system) based on multilevel machine code, because multidirectional conversion has solved participle problem and stress problem, thereby the correctness problem of thorough pronunciation, make TTS really enter the application stage, the sound reading that for example is used for text, check and correction is based on the beeper of multilevel machine code and paging system etc.
Telephone voice system based on multilevel machine code can send to the telephone subscriber with the text message of computing machine or computer network, thereby the microcomputer user of networking is expanded to the telephone subscriber, has improved its social benefit and economic benefit greatly.
The operating system of current computer, no matter be western language, still the operating system of Chinese all is based on the operating system of single-stage ISN, or the operating system of word or character one-level, and be the operating system of speech one-level based on the operating system of multilevel machine code, because the single-stage ISN only is a kind of special case of multilevel machine code, so work that the operating system of single-stage ISN can be finished, the operating system of multilevel machine code can both be finished, but a few thing that the operating system of multilevel machine code can be finished, the operating system of single-stage ISN can't be finished, for example participle problem and stress problem etc., thereby the operating system of multilevel machine code has very strong competitive power.
Based on the operating system of the multilevel machine code participle problem with literal, stress problem etc. is put into the operating system one-level and is handled, and has solved mechanical translation, natural language understanding, bottleneck problems such as full-text search.
Based on the composing system of multilevel machine code, make the electronic publication of layout have orthoepic advantage.
In the machine translation system based on multilevel machine code, have at least a kind of literal to adopt the Word message that contains multilevel machine code, because participle is correct, the correctness of translation has improved.
The technology of multilevel machine code almost can be applicable to the every field of computing machine, and all brings superiority.
The drawing explanation
Fig. 1 is the unidirectional flow path switch synoptic diagram of ISN;
Fig. 2 is the synoptic diagram of the comparison match part flow process in the multidirectional conversion equipment of speech;
Fig. 3 is the multidirectional conversion equipment synoptic diagram of speech;
Fig. 4 is an input code input conversion apparatus synoptic diagram;
Fig. 5 is the multidirectional conversion equipment synoptic diagram of pipeline;
Fig. 6 is the synoptic diagram of character information processor;
Fig. 7 is the unidirectional conversion equipment synoptic diagram of ISN;
Fig. 8 is the comparison match device synoptic diagram in the multidirectional conversion equipment of speech.
Be description of drawings below:
Fig. 1 is the unidirectional flow path switch synoptic diagram of ISN.1 is the multilevel machine code that is converted; 2 is conversion portion; 3 is composition storehouse device; 4 is the judgment part; 5 EOCs.
Fig. 2 is the synoptic diagram of the comparison match part flow process in the multidirectional conversion equipment of speech.ISN in the identification Word message is looked into indexing unit in view of the above, if can find the address, enters step 1 by inlet 5;
Step 1: relatively be converted Word message whether more than or equal to (or less than or
Equal) be mapped to and itemize to become accordingly subitem; When not satisfying condition
Jump out comparison procedure from exporting 6;
Step 2: if the result of step 1 is greater than changeing step 4;
Step 3: carry out matching operation;
Step 4: will be mapped to subitem and move one by ascending order (or descending) direction.
The execution matching operation of above-mentioned steps 3 generally is to carry out " set match flag; this is mapped to subitem the coupling pointed ", if, the size of multilevel machine code can be done correct comparison, also can carry out " with current be mapped to the corresponding multilevel machine code of subitem substitute be compared corresponding part in the literal ", in a word, the purpose of matching operation is in order to return correct multilevel machine code at last.
Fig. 3 is the multidirectional conversion equipment synoptic diagram of speech.1 is recognition device, and the ISN in the identification Word message is looked into indexing unit in view of the above; 2 is the comparison match device; 3 is indexing unit; 4 are mapping composition storehouse device; 5 is composition storehouse device.
If with dashed lines links to each other among the figure, needs composition storehouse device in compare operation need carry out unidirectional conversion the time.
The input code input conversion apparatus is a device of finishing the input code conversion operations.
Fig. 4 is an input code input conversion apparatus synoptic diagram.1 is input media: 2 are conversion portion; 3 is the input code meter apparatus; 4 is composition storehouse device.
Input code is imported multidirectional conversion and can followingly be described in containing the literal information processing method of multilevel machine code;
The operation steps that input code is imported multidirectional conversion is:
The input code of the literal composition that (1) will import is imported from input media;
(2) look into the input code meter apparatus according to input code:
(3) according to the mapping relations of input code meter apparatus and composition storehouse device, change
Be the single-stage ISN of corresponding composition, if input code is at the input code meter apparatus
In repeated code is arranged, then the composition in the device of the corresponding composition of repeated code storehouse is shown
Illustrate, the importer can select to grind the literal composition that needs, thereby returns
Return corresponding multilevel machine code;
Fig. 5 is the multidirectional conversion equipment synoptic diagram of pipeline.1 is by the Word message of multidirectional conversion, and 2 is the Word message after the multidirectional conversion; 3 is pipeline composition storehouse device; 4 is top; 5 is terminal.6 is by the Word message of unidirectional conversion; 7 is the Word message after the unidirectional conversion.
Fig. 6 is the synoptic diagram of character information processor.1 is input media; 2 is treating apparatus; 3 is output unit.This word processing device comprises:
The device of input media 1 inputting word information;
The Word message of 2 pairs of single-stage ISNs for the treatment of apparatus is handled.
Output unit 3 is exported Word message.The principal character of described character processor is:
(1) input media can be the input equipment such as the keyboard of Word message, literal automatic identification equipment, the memory device of Word message, the receiving equipment of Word message etc.; Word message comprises the whole tone information of single-stage internal code information, literal, contains multistage interior Word message etc.
(2) this character information processor can be carried out the operation that contains multilevel machine code.
(3) output unit can be exported the Word message that contains multilevel machine code, the Word message of single-stage ISN, the whole tone information of literal etc., output unit can be a display device, sound-producing device, printing equipment, memory storage etc., the also dispensing device of Word message, dispensing device sends Word message by transmission medium.
Fig. 7 is the unidirectional conversion equipment synoptic diagram of ISN; 1 is composition storehouse device; 2 is calculation element; 3 is conversion equipment; 4 is recognition device; 5 are inlet; 6 are outlet.
Fig. 8 is the comparison match device synoptic diagram in the multidirectional conversion equipment of speech, and 1 is judgment means, and 2 is coalignment, and 3 is mobile device, and 4 are inlet, and 5 are outlet.
Preferred forms of the present invention
The relevant device and the implication of operation at first are described.
Unidirectional conversion equipment is a device of realizing unidirectional conversion.
Multidirectional conversion equipment is a device of realizing multidirectional conversion.
Conversion device for pipeline contains the pipeline that becomes subitem to constitute by pipeline, and it can realize multidirectional conversion and unidirectional conversion.
Unidirectional conversion operations is the operation of fill order to conversion.
Multidirectional conversion operations is the operation of carrying out multidirectional conversion.
Input code is imported multidirectional conversion operations and is meant the operation that is converted to corresponding multilevel machine code by input code.
The Word message transmission operation that contains multilevel machine code is meant the transmission of the Word message that contains multilevel machine code.In computer network or between, between the literal communication device, the communication of Word message is that the transmission operation by the Word message that contains multilevel machine code is carried out.
Input code input conversion operations is meant the operation that is converted to corresponding multilevel machine code or single-stage ISN by input code.
Compare operation is the operation to the ISN comparison, and the compare operation that contains multilevel machine code is meant single-stage ISN and multilevel machine code relatively, or multilevel machine code and multilevel machine code comparison.Multilevel machine code and the comparison of single-stage ISN can be converted to multilevel machine code the single-stage ISN compares again, can do the comparison that whether equates between multilevel machine code.If multilevel machine code is in full accord with the order of corresponding single-stage ISN, then can compare size between multilevel machine code, thereby single-stage ISN and multilevel machine code more also can be converted to multilevel machine code with the single-stage ISN and compare.
The pipeline conversion operations is the operation that realizes multidirectional conversion and unidirectional conversion by conversion device for pipeline.
The operation that contains multilevel machine code is meant the searching of the Word message that contains multilevel machine code, replaces, multidirectional conversion operations is imported in insertion, deletion action, unidirectional conversion operations, multidirectional conversion operations, input code, contain transmission operation, pipeline conversion operations and the compare operation etc. of multilevel machine code.The processing that contains the Word message of multilevel machine code comprises the operation that contains multilevel machine code.
Be explanation invention first purpose and second purpose, we need illustrate the coding method of multilevel machine code.
The coding of multilevel machine code is relevant with composition storehouse device, composition storehouse device difference, and the coding of multilevel machine code is also different.
With Chinese is example, and the coding of the multilevel machine code of the speech that two words and two words are above is described.The composition storehouse device of speech can be basis storehouse device, or etc. long component storehouse device, or half index composition storehouse device, or full index composition storehouse device.The list item length of itemizing Deng the one-tenth of long component storehouse device all equates.The list item length that becomes subitem when small part can become in the subitem establish an index not simultaneously at it, is called index and becomes subitem; Other builds an auxiliary element storehouse device, and index becomes subitem to contain this one-tenths information such as position in the auxiliary element storehouse and list item length of itemizing; Index becomes the length of subitem to become the list item equal in length of subitem with all the other; This composition storehouse device is called half index composition storehouse device., can become subitem all be changed to index and become subitem when big as the list item length difference of subitem; And real one-tenth subitem is all in the device of auxiliary element storehouse; This is called full index composition storehouse device.The index of half index composition storehouse device become the content of subitem want can with the single-stage ISN, multilevel machine code is distinguished mutually.Corresponding unidirectional conversion equipment will be revised a little in full index or the half index composition storehouse device, will judge for half index composition storehouse device to become subitem or index one-tenth subitem in the device of composition storehouse.Index for full index and half index composition storehouse device becomes subitem will increase the step of visit auxiliary element storehouse device.Isometric, the list item length of half index and full index basis storehouse device all equates, can be all by the rank order of its corresponding single-stage ISN, thereby the order of multilevel machine code and single-stage ISN is in full accord, multilevel machine code also can compare size.Simultaneously, the order of composition storehouse device and mapping composition storehouse device is in full accord, also can be combined into one when being mapped to when subitem is only synthetic itemizes.
Mapping composition storehouse device is arranged by being mapped to the corresponding single-stage ISN ascending order of subitem (or descending), it is mapped to subitem by becoming subitem length, become subitem and multilevel machine code to constitute, or by becoming subitem length and becoming subitem to constitute, or by becoming subitem and multilevel machine code to constitute, or by becoming subitem length and multilevel machine code to constitute, or by becoming subitem to constitute, or constitute by multilevel machine code.
The index entry of indexing unit is made of address entries or label entry; Or by address entries and multilevel machine code, or label entry constitutes; Or constitute by ISN item and address entries; Or by the ISN item, address entries and multilevel machine code constitute.
Below with two byte bit-identifies codings with to wait the long component storehouse be the coding that example illustrates multilevel machine code.Chinese word two words are more, the internal code of a Chinese character is 2 bytes, two words are 4 byte longs, the one-tenth subitem of three byte speech is made up of an one-level ISN and a secondary ISN, then list item length also is 4 byte longs, and for example " PLA " establishes its single-stage ISN and is respectively " a ", " b " and " c ", if " ab " corresponding secondary ISN is A, then the list item length of the one-tenth subitem of three words of being made up of " Ac " also is 4 byte longs; And for example the secondary ISN of " China " is B, the secondary ISN of " people " is " C ", then the list item length by " BC " two secondary ISN four words also is 4 byte longs, if " Ac " corresponding three grades of ISNs are D, " BC " corresponding three grades of ISNs are E, then the list item length of " ED " corresponding seven words " Chinese People's Liberation Army " also is 4 byte longs, corresponding multilevel machine code is a level Four, through " information processing Modern Chinese 5,000 vocabularys " are tested, the list item length of the one-tenth subitem of about 90% speech all can turn to 4 byte longs.To becoming subitem to arrange, just constituted composition storehouse device by corresponding single-stage ISN ascending order.Become subitem if every district is 94, first a byte high position of multilevel machine code is that the high position of 1, the second byte is 0, and then multilevel machine code first byte is that area code adds AOH, and multilevel machine code second byte is that item adds 20H.The encode high position of two bytes of two byte bit-identifies can be 0 and 1 various combination.(note: represent a of ISN here, b, c... or A, B, C... etc. only are used for illustration purpose, rather than real interior code value.)
To the composition storehouse device of top isometric Chinese word, establish and be mapped to subitem and only contain into subitem, it is identical with composition storehouse device then to shine upon composition storehouse device.The index entry of establishing indexing unit again is made of address entries or label entry, become to contain the indexing unit of 6768 index entries by 6768 character structures, index entry is pressed the internal code ascending sort of Chinese character, if the index entry label entry indicates that then in the mapping composition storehouse be not the speech of prefix with this Chinese character, if address entries, then for this Chinese character be prefix first speech be mapped to the address of subitem in mapping composition storehouse, certainly this address also the multilevel machine code of available corresponding words represent.
The following describes unidirectional transfer process.If " PLA " corresponding three grades of ISNs are D, if in a word sequence, contain D, at first discerning D is multilevel machine code, because the single-stage ISN of Chinese character also is two byte codes, but the high position of two bytes of single-stage ISN all is " 1 ", and one of the high position of two bytes of multilevel machine code is " 1 ", another is " 0 ", so judgement D is a multilevel machine code because contain among the D corresponding one-tenth subitem in the composition storehouse area code and the information of item, just relevant address information, thereby can obtain its become to itemize " Ac ", contain multilevel machine code A because become in the subitem, repeat said process, the one-tenth subitem " ab " of A is replaced A, become in the subitem " ab " and do not contain multilevel machine code, unidirectional EOC, the transformation result of D are " abc ", i.e. " PLA ".(note: represent a of ISN here, b, c... or A, B, C... etc. only are used for illustration purpose, rather than real interior code value.)
The following describes the process of the multidirectional conversion of speech.With " PLA is advancing " is example, single-stage ISN according to " separating " calculates the position of respective index item in indexing unit, find that corresponding index entry is an address entries, according to this address find with " separating " be lead-in first speech be mapped to subitem, suppose to be mapped to subitem by ... " ab ", " Ac " ... series arrangement, with the single-stage ISN " ab " of " liberation " be mapped in the subitem " ab " relatively, because equate, the set match flag, and this is mapped to subitem will to mate pointed, then, be mapped to subitem and move one by the ascending order direction, " abc " that be converted is mapped to subitem " Ac " relatively with current, in the time of relatively A is converted to " ab " through unidirectional conversion, because equate, the set match flag, to mate pointed and be mapped to subitem " Ac ", to be mapped to subitem then and move one, at this moment, " abcd " (establish " " the single-stage ISN for " d ") compare with being mapped to subitem, find the current subitem that is mapped to greater than " abcd ", thus jump out comparison match process " PLA " accordingly the single-stage ISN be converted into corresponding multilevel machine code " D ".(note: represent a of ISN here, b, c... or A, B, C... etc. only are used for illustration purpose, rather than real interior code value.
The coding of multilevel machine code can adopt the nested mode of ISN, also can not adopt nested mode.
English composition storehouse device can adopt basis storehouse device or full index basis storehouse device, indexing unit can adopt the method for HASH inquiry, for example first letter is one of 26 letters, and second letter is one of 26 letters or space, altogether 26*27=702 index entry.When speech and phrase more for a long time, can adopt several two byte bit-identifies codings, for example first a byte high position is that 0 or 1, the second a byte high position is 0 etc.
The following describes the coding of the multilevel machine code of word.
For polyphone, polyphone is formed corresponding composition storehouse by the single-stage ISN series arrangement of the corresponding word of secondary noise: in fact the coding of its multilevel machine code reflects the position of corresponding secondary noise word in the composition storehouse.Thereby be easy to change out the internal code of corresponding Chinese character by multilevel machine code, the unidirectional conversion of the multilevel machine code of Here it is word.
By as can be seen top, the unidirectional switch process of the ISN of speech and word is similar.
Be the example of the unidirectional conversion method of ISN below.
Unidirectional conversion can be converted to the single-stage ISN with a multilevel machine code, or is converted to more low-level multilevel machine code, is that unidirectional conversion operations and multidirectional conversion are more specifically described below, and wherein unidirectional conversion is that example illustrates its feature to be converted to the single-stage ISN.
The operation steps of unidirectional conversion operations is:
(1) according to the coding characteristic of multilevel machine code, multistage in the identification Word message
ISN;
(2) calculating this multilevel machine code according to multilevel machine code becomes subitem at composition accordingly
Position in the device of storehouse;
(3) become subitem to replace this multilevel machine code with it;
(4) if become in the subitem to contain multilevel machine code, return step (2);
(5) return single-stage ISN after the conversion.
First purpose of the present invention the 5th aspect is multidirectional conversion, and multidirectional conversion comprises the multidirectional conversion of speech and the multidirectional conversion of word.
Word message can be finished multidirectional conversion by add multidirectional device in input media from the multidirectional conversion of input media input.Also can finish by the input code input conversion apparatus.The input code of literal composition can adopt multiple encoding scheme, and in fact, the region-position code of literal composition also is a kind of input code of literal composition.Become the region-position code of subitem corresponding with multilevel machine code, input method is very simple.
Multidirectional conversion and unidirectional conversion also can be adopted conversion device for pipeline.
Pipeline composition storehouse device can be divided into pipeline single hop composition storehouse device and pipeline multistage composition storehouse device.
Pipeline single hop composition storehouse device is pressed pipeline and is become the corresponding single-stage ISN ascending order of subitem (or descending) to arrange, and least significant end is called top, and most significant end is called terminal, its pipeline becomes subitem by becoming subitem length, become subitem and multilevel machine code to constitute, or constitute by one-tenth subitem and multilevel machine code, or by becoming subitem to constitute.Contained one-tenth subitem A in the one-tenth of the pipeline single hop composition storehouse device position L subitem, if A be the subitem that becomes between L and top, then A represents with corresponding multilevel machine code, otherwise with the interior representation of single-stage.Can be convenient to compare operation like this.The rank segmentation that the pipeline of pipeline multistage composition storehouse device is pressed ISN then, successively connects to a pipeline by low ISN section to high ISN section with each section, and the outer end of minimum ISN section is called top, and the outer end of the highest ISN section is called terminal.The one-tenth subitem of pipeline multistage composition storehouse device becomes subitem and multilevel machine code to constitute by becoming subitem length, or by becoming subitem and multilevel machine code to constitute.
Just become the second class text output device with adding unidirectional conversion equipment in the first kind text output device.For example, in printer, add software or the hardware that contains unidirectional conversion equipment, make that printing function prints the Word message that contains multilevel machine code.Add the input code input conversion apparatus in the first kind input device, or multidirectional conversion equipment, or the combination of above device, the second class input device just become.
For the multidirectional conversion of word, it needs by means of word sound storehouse device and speech sound storehouse device, in transfer process, according to the whole tone information of this word, the multilevel machine code that selection will be changed.Multidirectional conversion for the variant pronunciation speech also is similar.
The word sound storehouse device of Chinese character is the set of GB Chinese character whole tone, pressing the internal code of Chinese character arranges, 6768*2 byte altogether, so be easy to find its whole tone by the single-stage ISN, same, multitone stress word sound storehouse device, it arranges the order of pressing multilevel machine code, thereby, find corresponding whole tone easily by its multilevel machine code, the multilevel machine code of speech and its whole tone also have same corresponding relation.
In the 6th aspect of first purpose of the present invention, the multidirectional switch process of word is:
(1) input parameter is the single-stage ISN of the word that will change;
(2) this word for the stress word is denied, if not, still be the single-stage ISN, change 5;
(3) this stress word is that keynote is denied, if keynote still is the single-stage ISN, changes 5;
(4) look into the composition storehouse device of stress word, according to stress word sound storehouse device, should
The single-stage ISN of word substitutes with the multilevel machine code of its corresponding secondary noise correspondence;
(5) EOC.
In step (2), determine whether a Chinese character is the stress word, can realize that the spare bits in 2 bytes of corresponding whole tone is provided with a sign by word sound storehouse device, for example, if the stress word, zone bit is " 1 ", otherwise is " 0 ", by the checkmark position, just be easy to judge whether be the stress word.In step (3), (4), if the stress word, can be by means of word sound storehouse device and stress character library device, corresponding keynote and secondary noise shown or fall corresponding sound send by sound-producing device, by artificial auxiliary, be converted to the corresponding multilevel machine code of correct sound.
The 5th aspect of first purpose of the present invention is the multidirectional conversion of speech.
Be the example of the multidirectional conversion method of speech below.
Multidirectional conversion can be converted to the Word message that contains the single-stage ISN corresponding multilevel machine code, also rudimentary multilevel machine code can be converted to senior multilevel machine code, the following describes its feature.
The operation steps of multidirectional conversion operations is:
(1) according to the coding characteristic of ISN, discerns the ISN in the Word message;
(2) look into indexing unit according to corresponding ISN, if do not contain address entries in the index entry,
Change (8);
(3) according to this address entries, it is corresponding to find mapping this address of device, composition storehouse
Be mapped to subitem;
(4) will be mapped to subitem and compare with the corresponding Word message that is converted,
If satisfying the condition of jumping out, the result changes (7);
(5) if equate, then carry out matching operation;
(6) will be mapped to subitem and move one, change step (4) by working direction;
(7) if coupling, return last coupling be mapped to subitem corresponding multistage in
Sign indicating number.Otherwise change (8);
(8) return.
Above-mentioned " jumping out condition " is meant: when being mapped to when subitem is arranged by the single-stage ISN ascending order of corresponding one-tenth subitem of mapping composition storehouse is: the one-tenth subitem that is mapped to subitem greater than being converted Word message during descending sort is: the one-tenth subitem that is mapped to subitem is less than being converted Word message.
Above-mentioned working direction is meant: when being mapped to when subitem is arranged by the single-stage ISN ascending order of corresponding one-tenth subitem of mapping composition storehouse is: the ascending order direction in mapping composition storehouse;
During descending sort be: the descending direction in mapping composition storehouse.
The execution matching operation of above-mentioned steps 5 generally is to carry out " set match flag; this is mapped to subitem the coupling pointed ", if, the size of multilevel machine code can be done correct comparison, also can carry out " with current be mapped to the corresponding multilevel machine code of subitem substitute be compared corresponding part in the literal ", in a word, the purpose of matching operation is in order to return correct multilevel machine code at last.
The eight aspect of first purpose of the present invention is about the conversion method of Word message input code to multilevel machine code, it is characterized in that:
In the input of Word message, be converted into corresponding multilevel machine code with the input code of multilevel machine code related words information.
Existing Chinese character input method all is the single-stage ISN that the input code of Chinese character is converted to corresponding Chinese character, in input process, though what have has also adopted word input method, just done the work of participle in the input process, but, in the process that is converted to the single-stage ISN, with participle information dropout, in input process, input code is converted to multilevel machine code again, just the information with participle remains, for the stress word, in the process of input, perhaps by the pronunciation prompting, perhaps show by phonetic notation, the stress word can be converted to the corresponding multilevel machine code of correct sound, thereby, correct pronunciation information is remained, morphological information that resembles speech in addition etc. also remains in containing the text of multilevel machine code, this be existing input method can't accomplish.
In the tenth one side of first purpose of the present invention, crucial device is a dimorphism sound sign indicating number storehouse device, for the GB Chinese character, this device is 6768*2 byte, the dimorphism sound sign indicating number of each Chinese character is 2 bytes, contains the corresponding dimorphism sound of this Chinese character sign indicating number in these two bytes, i.e. two phonetic alphabet, in the device of dimorphism sound sign indicating number storehouse, the series arrangement that the dimorphism sound sign indicating number of Chinese character is pressed the internal code of Chinese character.
Seven, nine, ten, five and the described literal information processing method of eight aspect of the 13 aspect first purpose of first purpose of the present invention is characterized in that:
Described method is used for the system that first kind Word message is handled.
First purpose of the present invention the 5th aspect is the method for the multidirectional conversion of speech, the core apparatus of this method is a mapping composition storehouse device, its principal character is that speech with different length is according to certain rule mixing, in the maximum match method of character string, adopted this feature exactly, aspect the tenth, also adopted this principal character in the cutting of Hanzi inputing code: just with different syllable individual characters also mixing in the speech of different length, its basic feature is the same.In existing segmenting method, the speech of different length is to deposit by the length segmentation of speech, thereby the time complexity of participle is big, is approximately 12.32, if adopt the method for the 9th aspect, time complexity is reduced to 2.89; That is to say that the speed of participle rises to 4.3 times of existing method.For the tenth aspect, in the cutting of input code, also has identical advantage equally.
Be below when input code is phonetic, phonetic flows to the example in the single-stage ISN conversion of Chinese character.
When the Word message sequence was Chinese phonetic alphabet sequence, the method for the maximum match cutting of Word message sequence can be used in the maximum match cutting of Chinese phonetic alphabet sequence; By cutting pinyin sequence is converted to the Chinese character sequence: claim that here the dictionary device is the character word stock device: the character word stock device contains word item and lexical item, the word item contains the internal code of the Chinese character of same syllable, and lexical item contains pinyin term and the multilevel machine code or the single-stage ISN of the corresponding speech of pinyin term therewith; The word of same phonetic or speech are regularly arranged by certain, and word item and lexical item are pressed the size order of the internal code of corresponding pinyin character and arranged in the character word stock device;
Switch process is as follows:
(1) the scanning pinyin sequence is looked into indexing unit with current syllable, draws the address A of character word stock, and finds the pinyin term of address A working direction first lexical item;
(2) appropriate section in the sequence is compared with the pinyin term of speech, jump if the result satisfies
Go out condition, change (5);
(3) if equate, carry out matching operation;
(4) pinyin term with speech moves one by working direction, changes (2);
(5), perhaps return the single-stage ISN of the speech correspondence that the importer chooses if coupling
Perhaps return the single-stage ISN of the speech of choosing by certain priority rule,
If when not matching, perhaps return the word that the importer chooses the single-stage ISN or
The person returns the single-stage ISN of the word of choosing by certain priority rule.
Above-mentioned " jumping out condition " is meant: when the pinyin term of speech is arranged by the single-stage ISN ascending order of corresponding pinyin character be: the pinyin term of speech is greater than by the cutting Pinyin information;
During descending sort be: the pinyin term of speech is less than by the cutting Pinyin information.
Above-mentioned working direction is meant: when the pinyin term of speech is arranged by the single-stage ISN ascending order of corresponding pinyin character be: the pinyin term ascending order direction of speech;
During descending sort be: the pinyin term descending direction of speech.
The execution matching operation of above-mentioned steps 3 generally is to carry out " set match flag, this lexical item of coupling pointed ", and in a word, the purpose of matching operation is in order to return the longest speech at last.
The method of multidirectional conversion can be used for only containing in the maximum match of character string of single-stage ISN, dictionary is made up of the dictionary item, the dictionary item contains the one-tenth subitem that cutting uses, and proportionately the itemize size order of corresponding ISN of dictionary item is arranged, and the dictionary here is equivalent to the mapping composition storehouse device of multidirectional conversion method;
The Word message sequence can be imported by input media, also can be imported by memory storage, also can be imported into by communication device;
The operation steps of the maximum match cutting of Word message sequence is as follows:
(1) provides the ISN string that is mated;
(2) look into indexing unit according to corresponding ISN, if do not contain address entries in the index entry,
Change (8);
(3) according to this address entries, it is corresponding to find mapping this address of device, composition storehouse
Be mapped to subitem;
(4) will be mapped to subitem and compare with the corresponding Word message that is converted,
If the result be greater than (or less than) commentaries on classics (7);
(5) if equate, carry out matching operation;
(6) will be mapped to subitem and move one, change by the direction of ascending order (or descending)
Step (4);
(7) if mate, then the subitem that is mapped to of last coupling is the literary composition of maximum match
Word information is returned.If do not match commentaries on classics (8);
(8) return.
The execution matching operation of above-mentioned steps 5 generally is to carry out " set match flag, this is mapped to subitem the coupling pointed ", and the purpose of matching operation is in order to return the character string of maximum match at last.
When adopting the method for above-mentioned character string maximum match to carry out the word segmentation of Chinese, can reduce time complexity and space complexity, through theoretical analysis and actual test, the time complexity of existing word segmentation method is 12.32, and the time complexity of said method is 2.89.
Obviously, the raising of single sweep segmenting method participle speed be because:
Dictionary for word segmentation adopts the structure in mapping composition storehouse, and promptly the speech of different length is arranged by the size order of corresponding single-stage ISN in the dictionary for word segmentation.During comparison match be with the whole speech in the dictionary with compared by the word segment of cutting, it is as follows to jump out comparison match round-robin condition:
When arranging by corresponding single-stage ISN ascending order be: become subitem greater than by the Word message of cutting as subitem;
When descending sort be: become subitem less than by the Word message of cutting.
The character processor of the first aspect of second purpose of the present invention can constitute different devices according to different needs.
For example:
(1) send the Word message contain multilevel machine code when output unit, just constitute a dispensing device, this can be used for many fields, in the emitter as radio paging system.
(2) when input media be one to contain the Word message receiving trap of multilevel machine code, and the Word message that contains multilevel machine code that will receive arrives sound-producing device through unidirectional conversion of whole tone and whole tone speech conversion with voice output, just constitute a text sound reading device, if radio receiver can constitute pager apparatus.
(3) send the whole tone information contain Word message when dispensing device, also constitute a dispensing device, this also can be used for many fields, in the emitter as radio paging system.
(4) when input media be one to contain the whole tone information receiver of Word message, and with the whole tone information via whole tone speech conversion of the Word message that receives with voice output to sound-producing device, radio receiver just constitutes a text sound reading device, if can constitute pager apparatus.
(5) we know that the tts system of Chinese character mainly was divided into for three steps: the first step is a word segmentation processing; Second step handled for polyphone; The 3rd step handled for the rhythm.Obviously, the work in the first step and second step has been finished in the multidirectional conversion of Word message, has just solved participle problem and polyphone problem; The Word message of multilevel machine code is changed through whole tone, be converted to voice then and send to sound-producing device, this character information processor just is based on the TTS device of multilevel machine code.
(6) in the treating apparatus, input media is if store the storage medium that contains the Word message of multilevel machine code or contain Word message whole tone information, as disk or CD etc., the relevant conversion operations that the Word message process contains multilevel machine code that contains of storage medium will be stored in, and then being converted to voice, this also is a kind of sound word read device.
(7) described character processor is that voice messaging outputs to telephone device by transmission medium with the whole tone information translation of literal, phone text voice system that Here it is.
The whole tone information translation that this device can send the literal whole tone information that contains or other equipment of storage medium is that voice messaging sends to the telephone subscriber by transmission medium; Itself also can contain the device of the relevant operation of multilevel machine code this device, thereby carries out the operation of relevant multilevel machine code.

Claims (9)

1. literal information processing method, word or character represent with an internal code, and the speech that is made of two or more words or character is with representation in the corresponding single-stage, and described literal information processing method is characterised in that:
(1) speech that is made of two or more words or character is also represented with corresponding multilevel machine code except representing with corresponding single-stage ISN set;
(2) different word of pronunciation or character or speech represent that with different internal codes the internal code of one of them sound correspondence is the single-stage ISN, and internal codes of all the other pronunciation correspondences are multilevel machine code;
Described literal information processing method comprises the steps:
(1) will be converted to the single-stage ISN of corresponding word or character by the pairing multilevel machine code of speech that two or more words or character constitute;
(2) multilevel machine code with polyphone or speech correspondence is converted to corresponding single-stage ISN.
2. literal information processing method as claimed in claim 1 is characterized in that:
Multilevel machine code is converted to the single-stage ISN and uses composition storehouse device and stress word composition storehouse device, and operation steps is as follows;
(1) calculation procedure; Calculate this multilevel machine code corresponding position that becomes to itemize in the device of composition storehouse in composition storehouse or stress word composition storehouse device according to multilevel machine code;
(2) switch process: replace multilevel machine code with corresponding one-tenth subitem, change (4) if become not contain multilevel machine code in the subitem;
(3) identification step: according to the coding characteristic of multilevel machine code, discern in the corresponding one-tenth subitem whether contain multilevel machine code, change (1) if contain multilevel machine code;
(4) EOC.
3. literal information processing method as claimed in claim 1 is characterized in that:
Adopt multidirectional conversion that the single-stage ISN is converted to multilevel machine code, multidirectional conversion is divided into the multidirectional conversion of speech and the multidirectional conversion of word, composition storehouse device is used in the multidirectional conversion of speech, mapping composition storehouse device and its indexing unit, the subitem that is mapped in the mapping composition storehouse is arranged by the size order of the single-stage ISN of corresponding one-tenth subitem, and the many of word use word sound storehouse device, stress word sound storehouse device and stress word composition storehouse device with conversion;
The operation steps of the multidirectional conversion of institute's predicate is as follows;
(1) looks into the index step, look into indexing unit,, continue, otherwise return if find this address according to corresponding ISN;
(2) comparison match step is compared with the corresponding Word message that is converted being mapped to subitem, does following selection according to the result;
If satisfy and to jump out condition and change last treatment step,
If equate, then carry out matching operation;
(3) mobile step will be mapped to subitem and move one by working direction, change the comparison match step;
(4) last treatment step if mate, returns the corresponding multilevel machine code of itemizing that is mapped to of last coupling;
The multidirectional conversion operations step of described word is as follows;
(1) according to the single-stage ISN of the word of changing, looks into word sound storehouse,, change (4) if this word is not the stress word;
(2) this stress word is that keynote is denied, if keynote changes (4);
(3) look into stress word composition storehouse device, according to stress word sound storehouse device, that the single-stage ISN of this word is alternative with the multilevel machine code of its corresponding secondary noise correspondence;
(4) EOC.
4. literal information processing method, word or character represent with an internal code, and the speech that is made of two or more words or character is with representation in the corresponding single-stage, and described literal information processing method is characterised in that:
The method of multidirectional conversion is used for the maximum match of the Word message that single-stage ISN or multilevel machine code represent, dictionary is made up of the dictionary item, and the dictionary item contains the one-tenth subitem that mates usefulness, and proportionately the itemize size order of corresponding ISN of dictionary item is arranged;
The operation steps of the maximum match of Word message sequence is as follows:
(1) scan text information sequence is looked into indexing unit according to corresponding ISN, if find address entries, continues, otherwise returns;
(2) relatively become appropriate section in subitem and the Word message sequence that corresponding quilt mates by comparison means;
Make following Action Selection according to comparative result:
If the result satisfies the redirect condition, change step (4);
If equate, carry out matching operation;
(3) the dictionary item is moved one by working direction, change step (2);
(4) if mate, then return the Word message of maximum match at last;
Execution matching operation in the above-mentioned steps generally is to carry out " set match flag, this becomes subitem the coupling pointed ", and the purpose of matching operation is in order to return the one-tenth subitem of maximum match at last;
Above-mentioned " jumping out condition " is meant: when holding in both hands row as subitem by corresponding single-stage ISN ascending order be: become subitem greater than by the Word message of cutting;
During descending sort be: become subitem less than by the Word message of cutting:
Above-mentioned working direction is meant: when becoming subitem to hold in both hands row by corresponding single-stage ISN ascending order be: the ascending order direction that becomes subitem;
Descending sort is to be the descending direction that becomes subitem.
5. literal information processing method as claimed in claim 4 is characterized in that:
In character input process, the input code of Word message is converted to single-stage ISN or multilevel machine code, adopt the character word stock device in the method, the character word stock device contains word item and lexical item, word item and lexical item are referred to as the words item, the word item contain same input code Hanzi internal code, lexical item contains input code item and the multilevel machine code or the single-stage ISN of the corresponding speech of input code item therewith; The word of same input code or speech are regularly arranged by certain, and words item single-stage ISN size order by corresponding input code character in the character word stock device is arranged; The indexing unit of character word stock contains the address of input code in the character word stock device; Switch process is as follows;
(1) look into indexing unit with current input code, draw the address of character word stock:
(2) appropriate section in the sequence is compared with the input code item of character word stock appropriate address,, change (5) if satisfy the condition of jumping out;
(3) if equate, carry out matching operation;
(4) the words item is moved one by working direction, change (2);
(5) or return the single-stage ISN or the multilevel machine code of the literal composition correspondence that the importer chooses, perhaps return single-stage ISN or the multilevel machine code chosen by certain priority rule.
6. literal information processing method, word or character represent with an internal code, and the speech that is made of two or more words or character is with representation in the corresponding single-stage, and described literal information processing method is characterised in that:
In character input process, the input code of Word message is converted to single-stage ISN or multilevel machine code, adopt stroke indexing unit, radicals by which characters are arranged in traditional Chinese dictionaries indexing unit device and character library device in the method, operation steps is as follows:
(1) stroke number of the radicals by which characters are arranged in traditional Chinese dictionaries of input word equals 10 strokes or more than 10 strokes of inputs " 0 ";
(2) machine is looked into the stroke indexing unit according to the stroke number of input, finds the start address of radicals by which characters are arranged in traditional Chinese dictionaries in the radicals by which characters are arranged in traditional Chinese dictionaries indexing unit of this stroke;
(3) machine shows the corresponding radicals by which characters are arranged in traditional Chinese dictionaries of this stroke;
(4) user selects required radicals by which characters are arranged in traditional Chinese dictionaries;
(5) machine (information) retrieval radicals by which characters are arranged in traditional Chinese dictionaries character library device shows this radicals by which characters are arranged in traditional Chinese dictionaries corresponding Chinese character;
(6) return the single-stage ISN or the multilevel machine code of user-selected Chinese character.
7. literal information processing method, word or character are represented with an internal code, the speech that is made of two or more words or character is with representation in the corresponding single-stage, and described literal information processing method is characterised in that: use the unidirectional conversion equipment of whole tone that single-stage ISN or multilevel machine code are converted to corresponding whole tone; The unidirectional conversion equipment of described whole tone comprises with lower device:
(1) word sound storehouse device: be used for the corresponding whole tone of verification certificate level ISN, and judge whether the Chinese character of this single-stage ISN correspondence is the stress word;
(2) speech sound storehouse device: the corresponding whole tone of multilevel machine code of looking into the corresponding words correspondence;
(3) stress word sound storehouse device: look into whole tone corresponding to the stress word;
The step of the unidirectional conversion of whole tone is as follows:
(1) ISN in the scan text information;
(2) if the single-stage ISN, its whole tone is a corresponding whole tone in the device of word sound storehouse;
(3) if multilevel machine code is the corresponding multilevel machine code of speech, then its whole tone is a corresponding whole tone in the device of speech sound storehouse, if multilevel machine code is the multilevel machine code of secondary noise word correspondence then is the whole tone in the device of stress word sound storehouse.
8. literal information processing method, word or character represent with an internal code, and the speech that is made of two or more words or character is with representation in the corresponding single-stage, and described literal information processing method is characterised in that:
Using the whole tone voice conversion device is corresponding voice with the whole tone information translation of literal; Described whole tone voice conversion device comprises with lower device:
(1) sound storehouse device; Deposit the device of corresponding syllable waveform of whole tone or synthetic parameters;
(2) whole tone indexing unit: deposit corresponding waveform of whole tone or the parameter position in the device of sound storehouse; Described conversion operations comprises following steps:
(1) calculates the position of the corresponding index entry of this whole tone in the whole tone indexing unit according to whole tone;
(2) take out address entries in this index entry, obtain the position in the device of sound storehouse;
(3) whole tone is converted to corresponding waveform or parameter.
9. character information processor, one of them word or character represent with an internal code, and the speech that is made of two or more words or character is with representation in the corresponding single-stage, and described character information processor comprises with lower device:
(1) whole tone device: a whole tone is made of two bytes, contains phonetic and tone information; Perhaps;
(2) whole tone voice conversion device, be used for the whole tone of Word message is converted to corresponding voice, it is made up of sound storehouse device and whole tone indexing unit, sound storehouse device is used for depositing the device of corresponding syllable waveform of whole tone or synthetic parameters, and the whole tone indexing unit is deposited corresponding waveform of whole tone or the position of parameter in the device of sound storehouse; Perhaps;
(3) the unidirectional conversion equipment of whole tone: be used for the single-stage ISN or the multilevel machine code of Word message are converted to corresponding whole tone, wherein, word sound storehouse device is used for the corresponding whole tone of verification certificate level ISN, and whether the Chinese character of judging this single-stage ISN correspondence is the stress word, sound storehouse device is deposited the corresponding whole tone of speech, and stress word sound storehouse device is used for depositing the whole tone of stress word; Perhaps;
(4) conversion device for pipeline: conversion device for pipeline contains pipeline composition storehouse device, pipeline composition storehouse device can be divided into pipeline single hop composition storehouse device and pipeline multistage composition storehouse device, pipeline single hop composition storehouse device is pressed pipeline and is become corresponding single-stage ISN ascending order of subitem or descending sort, least significant end is called top, most significant end is called terminal, its pipeline becomes subitem by becoming subitem length, become subitem and multilevel machine code to constitute, or by becoming subitem and multilevel machine code to constitute, or by becoming subitem to constitute, the rank segmentation that the pipeline of pipeline multistage composition storehouse device is pressed ISN, then, each section connected to a pipeline by low ISN section successively to high ISN section, the outer end of minimum ISN section is called top, and the outer end of the highest ISN section is called terminal, and the one-tenth subitem of pipeline multistage composition storehouse device is by becoming subitem length, become subitem and multilevel machine code to constitute, or by becoming subitem and multilevel machine code to constitute.
CN96115997A 1996-10-04 1996-10-04 Text data processing method and device Expired - Fee Related CN1068127C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN96115997A CN1068127C (en) 1996-10-04 1996-10-04 Text data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN96115997A CN1068127C (en) 1996-10-04 1996-10-04 Text data processing method and device

Publications (2)

Publication Number Publication Date
CN1182234A CN1182234A (en) 1998-05-20
CN1068127C true CN1068127C (en) 2001-07-04

Family

ID=5123194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN96115997A Expired - Fee Related CN1068127C (en) 1996-10-04 1996-10-04 Text data processing method and device

Country Status (1)

Country Link
CN (1) CN1068127C (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567174A (en) 2003-06-09 2005-01-19 吴胜远 Method for expressing and processing object and apparatus thereof
CN102567296B (en) * 2011-01-04 2016-03-30 中国移动通信有限公司 A kind of disposal route of Chinese character information and the treating apparatus of Chinese character information

Also Published As

Publication number Publication date
CN1182234A (en) 1998-05-20

Similar Documents

Publication Publication Date Title
CN1155906C (en) data processing method, system, processing program and recording medium
CN1842702A (en) Speech synthesis apparatus and speech synthesis method
CN1331449A (en) Method and relative system for dividing or separating text or decument into sectional word by process of adherence
CN1225484A (en) Address recognition apparatus and method
CN1091906C (en) Pattern recognizing method and system and pattern data processing system
CN1452083A (en) Character information transformation processing system
CN1447261A (en) Specific factor, generation of alphabetic string and device and method of similarity calculation
CN1567174A (en) Method for expressing and processing object and apparatus thereof
CN1311423C (en) System and method for performing speech recognition by utilizing a multi-language dictionary
CN1577229A (en) Method for inputting note string into computer and diction production, and computer and medium thereof
CN1855223A (en) Audio font output device, font database, and language input front end processor
CN1068127C (en) Text data processing method and device
CN1154502A (en) Method and device for ducation standardized inputting Chinese characters by five stroke
CN101036138A (en) Method for automatic translation from a first language to a second language and/or for processing functions in integrated-circuit processing units, and apparatus for performing the method
CN1050914C (en) Lin code Chinese character input method
CN1109608A (en) Free combination code Chinese character input method and key board
CN1129836C (en) Li Ming multifunctional shape-meaning-class-letter encode technique for inputting Chinese characters
CN1025896C (en) New concept Chinese character coding
CN1170158A (en) Arrangement of symbol marking for keyboard inputing Chinese characters and principle of keyboard design
CN1485718A (en) Chinese character input method for inputting sentence, phrase, word and character
CN1208187A (en) Holographic universal Chinese character keyboard and its input method
CN1399185A (en) Integral Chinese character input method and its keyboard
CN1110806A (en) Intelligence five-stroke double-spelling code letter-word chain type positioning association input method
CN1303504C (en) 'Letter' input-method for Chinese characters
CN1442780A (en) English quick input method and its keyboard mouse

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee