CN100390711C - Computer processing and keyboard inputting method for Chinese word - Google Patents

Computer processing and keyboard inputting method for Chinese word Download PDF

Info

Publication number
CN100390711C
CN100390711C CNB2005101354752A CN200510135475A CN100390711C CN 100390711 C CN100390711 C CN 100390711C CN B2005101354752 A CNB2005101354752 A CN B2005101354752A CN 200510135475 A CN200510135475 A CN 200510135475A CN 100390711 C CN100390711 C CN 100390711C
Authority
CN
China
Prior art keywords
dictionary
chinese
speech
word
sign indicating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2005101354752A
Other languages
Chinese (zh)
Other versions
CN1790238A (en
Inventor
贾惠波
焦慧
刘迁
熊剑平
马骋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB2005101354752A priority Critical patent/CN100390711C/en
Publication of CN1790238A publication Critical patent/CN1790238A/en
Application granted granted Critical
Publication of CN100390711C publication Critical patent/CN100390711C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention belongs to the technical field of Chinese information processing, which is characterized in that a computer processing and keyboard inputting method for Chinese words orderly comprises the following steps that vocabularies in 'contemporary Chinese common word list for information processing' are categorized according to parts of speech to form dictionary words; the dictionary words are codes by four bytes to form dictionary codes; the dictionary codes correspond to inner machine codes which denote Chinese characters to form a dictionary code list; phonetic codes are inputted through keyboard operation in a phonetic mode by using the words as basic units; the dictionary codes are formed according to a comparison lift of the phonetic codes and the dictionary codes; the dictionary codes correspond to the inner machine codes according to the dictionary code list for completing input and forming a form file of the dictionary codes by using the words as the basic units. The present invention prevents participles from causing difficulty to the Chinese information processing, and realizes the automatic categorization of the parts of speech through first bytes in the dictionary codes. Under the conditions that any inner structure and arrangement of the existing computer are not changed, file representation by using the words as the basic units is realized.

Description

The method of a kind of Computer Processing of Chinese word and keyboard input
Technical field
The invention belongs to the Chinese information processing field.Specially refer to expression and the storage mode of Chinese in computing machine.
Background technology
Along with developing rapidly of computer technology and artificial intelligence theory, people begin the method research natural language with form, thus a branch of the artificial intelligence that has been born---natural language processing (NLP).Nearly all natural language processing system is important information carrier and basic operation unit with speech all.And in Chinese linguistics, the definite definition of " speech " does not still have final conclusion, and at this, we claim in the Chinese significant, and the least unit that can independently use is a speech.Indicating as separating between speech and the speech with the space in writing form of western languages such as English, and Chinese is continuous Chinese character string in writing form, Chinese character file also is that unit encodes with the word in computing machine, and internal code is continuous code word, does not have obvious distinctive mark between speech and the speech.Like this, the top priority of understanding Chinese is the sequence that continuous Chinese character string is divided into speech, promptly so-called participle.
The application of Chinese in computing machine for convenience, the technology of computer realization Chinese word segmentation is promptly used in people's automatic word segmentation that begins one's study.Along with going deep into of Chinese information processing research, the importance of automatic word segmentation is outstanding further, it is the basis of all Chinese information processing, as Chinese analysis and understanding, Chinese-foreign language mechanical translation, Chinese literature automatic indexing or in full information retrieval, Chinese Character Recognition, Chinese speech identification and synthetic, Chinese simplified and traditional body conversion and Chinese manuscript all need at first to have divided speech in the multinomial application such as check and correction automatically automatically.
About automatic word segmentation, though have the research history in 10 years at home and abroad, the effort that is put to is very big, also a lot (the Words partition systems of having set up of the achievement that obtains, China's Mainland, Taiwan, Hong Kong and Singapore add up to more than 20), but the utility system that there is no real maturation up to now emerges, and becomes one of bottleneck of serious restriction Chinese information processing development.
Summary of the invention
Participle results from the Chinese inner code method for expressing based on the word platform, is that Chinese is peculiar, and it has caused the bottleneck of Chinese information processing.The objective of the invention is fundamentally to solve the automatic word segmentation problem that fetters the Chinese information processing development for a long time.Propose a kind of computing machine method of Chinese character coding, thereby form a kind of new Chinese document format based on the speech platform.It is characterized in that document based on the speech platform, and be based on the word platform unlike Chinese document commonly used at present.Because in Chinese linguistics, the definite definition of " speech " does not still have final conclusion, our said speech here, be meant in the Chinese statement significant, the least unit that can independently use, promptly usually said speech, phrase, phrase and Chinese idiom etc.The present invention according to " information processing with Modern Chinese everyday words vocabulary " (to call " everyday words vocabulary " in the following text, this table meets " the information processing Modern Chinese standard of word segmentation " GB13715 fully, be the vocabulary that generally uses at present) each speech is wherein encoded, article uses like this coded format based on speech just can make speech become minimal information carrier in the computing machine Chinese language processing, need not carry out Chinese word segmentation again, Chinese language computer is handled be in identical starting point level with western language, and this system has been arranged, western language can be used in Chinese language processing for the achievement in research of Language Processing.
The present invention proposes a kind of computing machine method of Chinese character coding based on the speech platform, comprise new Chinese document coding form, the Hanzi keyboard input method of dictionary code table that the new coding of each speech is corresponding with internal code in " everyday words vocabulary " (i.e. database) and the new document format of generation.It is characterized in that said Chinese document coding form comprises the ASCII control code that meets international standard, the western language of standard and the ascii character-set of symbol, and the four byte code of Chinese entry and Chinese punctuation mark; Said dictionary code table is according to parts of speech classification, presses the phonetic alphabet series arrangement again in each class speech, gives each speech with four byte codes, and the corresponding internal code of forming the Chinese character of this speech, the database of a dictionary code table of formation; The Hanzi keyboard input method of said new document format, be based on present phonetic code imput method, make the keypad code that produces by word sound scheme in phonetic-dictionary sign indicating number table of comparisons, search corresponding dictionary sign indicating number through the decoding back, again by the dictionary code table, be dictionary sign indicating number and internal code corresponding relation, find corresponding internal code to show and import.
The invention is characterized in: a kind of Chinese word computing machine that described method is is basic input unit in the phonetic mode, with the speech, handle based on phonetic sign indicating number-dictionary sign indicating number, dictionary sign indicating number-internal code order contrast ground successively and the method for keyboard input, described speech is meant and comprises the least unit that speech, phrase, phrase and Chinese idiom independently use the interior significant user of confession in the Chinese statement that described method contains following steps successively:
Step 1: the Chinese vocabulary in " everyday words vocabulary " is divided into noun, verb (comprising verb phrase), adjective, adverbial word, pronoun, number, measure word, onomatopoeia, interjection, preposition, conjunction, auxiliary word and modal particle by their the most frequently used parts of speech, also has Chinese idiom, form dictionary word, each dictionary word all is made of 1 to 7 Chinese character;
Step 2: each dictionary word in the step 1 is encoded in the following manner, form the dictionary sign indicating number.Each dictionary sign indicating number all is made of 4 bytes, by the concrete form of hexadecimal representation is:
The Gao Siwei of first byte of [AxH xxH xxH xxH] (H is hexadecimal expression symbol, and following dictionary sign indicating number is all used hexadecimal representation) must be AH, and binary representation is 1010; The scope of low four x of first byte is that 1H is to FH, the part of speech of representing this speech, 1 to the 9 little speech of representing noun, adjective, adverbial word, pronoun, number, measure word, Chinese idiom, preposition and conjunction successively and comprising onomatopoeia, interjection, auxiliary word and modal particle wherein, A represents punctuation mark, B to F represents verb and the verb phrase in " everyday words vocabulary ", and each class is respectively by series arrangement in the table.The Gao Siwei of second byte is for keeping the position, the Chinese character number that low four these speech of bit representation of second byte are comprised; The 3rd byte and nybble are formed a sequence code, scope be 1H to FFFFH, promptly 65535, be used for vocabulary is arranged and numbered by the phonetic alphabet order, coding can hold 14 * 65536 entries at least in this way;
Step 3: form phonetic sign indicating number-dictionary sign indicating number table of comparisons input computing machine according to the result of step 1 and the result of step 2;
Step 4: set up the dictionary sign indicating number of the speech in " everyday words vocabulary " and the table of comparisons between the internal code of this speech in computing machine, be called the dictionary code table: comprise 12 kinds of tables distinguishing by part of speech in the described dictionary code table, it is respectively non-dictionary vocabulary, the termini generales table, describe vocabulary, verb list, the adverbial word table, the pronoun table, the number table, measure word table, preposition/conjunction table, onomatopoeia/interjection/auxiliary word/tone vocabulary, Chinese idiom table and punctuation mark table, the dictionary sign indicating number and the pairing internal code of this dictionary sign indicating number of each speech of record constitute the dictionary code table in the table, wherein said non-dictionary word is meant and comprises name, place name, trade name is in some interior proper nouns, and described non-dictionary word and punctuation mark adopt nybble method coding equally;
Step 5: the dictionary code table that step 4 is obtained is input in the described computing machine and goes, and forms the file of dictionary sign indicating number form and internal code form during input in Chinese respectively;
Step 6: in the phonetic mode, with the speech is that base unit is to described computing machine input Chinese word.
When described nybble method was encoded to punctuation mark, wherein first byte perseverance was AaH, and the second byte perseverance is got 00H; The 3rd byte and nybble are the internal code of this punctuation mark.
When described nybble method was encoded to non-dictionary word, wherein the first byte perseverance was got AOH; High four reservations of second byte, the Chinese character number that low four these the non-dictionary words of bit representation of second byte are comprised; The 3rd byte and nybble are represented the serial number in the non-dictionary vocabulary of this non-dictionary word in the dictionary code table.
Function of the present invention and characteristics:
(1) realized with the speech being the document format of minimal information carrier, thereby avoided participle fully, made Chinese language processing and western language treating stations on the same height to the obstacle that Chinese information processing brings;
(2) realized the automatic classification of part of speech, clearly expressed the part of speech of speech in the dictionary sign indicating number, need not mark with other method again;
(3) do not change any inner structure of existing computing machine and setting, still use the Hanzi internal code international standard, only set up a cover system on its basis, structure is based on the document format of speech platform.
Description of drawings
Fig. 1 is the general structure synoptic diagram.
Fig. 2 is the processing procedure synoptic diagram.
Embodiment cryptoprinciple of the present invention is: the entire document file is made up of a series of code words, the ASCII character that control code is adopted international standards is represented, the expression western language character that the western language character is adopted international standards and the ASCII character of symbol are represented, overlap coded system and the Chinese vocabulary of " everyday words vocabulary " lining has been set up one.
The coding method that the present invention proposes is that the Chinese vocabulary in " everyday words vocabulary " is classified by its most frequently used part of speech, is divided into: noun, verb, adjective, adverbial word, pronoun, number, measure word, onomatopoeia, interjection, preposition, conjunction, auxiliary word and modal particle.In addition, also have a large amount of Chinese idioms in the Chinese, they are classified as a class speech.We claim that having divided the speech of class by above method is dictionary word.Vocabulary in " everyday words vocabulary " is minimum to comprise 1 Chinese character, comprise 7 Chinese characters at most, so each dictionary word of the present invention all is to be made of 1 to 7 Chinese character, encode for each dictionary word, be called the dictionary sign indicating number, all dictionary sign indicating numbers all are to be made of 4 bytes, by the concrete form of hexadecimal representation are:
[AxH xxH xxH xxH] (H is hexadecimal expression symbol, and together, following dictionary sign indicating number is all used hexadecimal representation down)
Wherein the Gao Siwei of first byte must be AH (binary form is shown 1010), the scope of low four x of first byte be 1H to FH, be used for representing the part of speech of this speech, big class part of speech such as noun, adjective independently is one group, group parts of speech such as auxiliary word, interjection, modal particle synthesize one group, for verb and verb phrase, situation more complicated such as tense, number, B represents the individual character notional verb, C represents the multiword notional verb, D represents verb phrase, and E and F keep, to treat extended function in the future.Concrete corresponding as follows:
1 2 3 4 5 6 7 8 9 A B-F
Noun Describe Adverbial word Pronoun Number Measure word Chinese idiom Preposition, Little speech Punctuate Verb
Speech Conjunction (auxiliary word, modal particle, onomatopoeia, interjection) Symbol And verb phrase
The Gao Siwei of second byte is for keeping the position, low four of second byte is used for representing the number of words (1-7) that this speech comprises: remaining the 3rd byte and nybble are formed a sequence code, scope is 1 to FFFFH (promptly 65535), is used for vocabulary is arranged by pinyin order.The entry number that can hold at least of encoding in this way is 14 * 65535=917490 bar, totally 39016 of one-level everyday words in " everyday words vocabulary ", secondary everyday words and monosyllabic words, the subordinate list that also comprises some proper nouns in addition is so the space of dictionary sign indicating number is enough.
As: " younger brother Ah " this speech is arranged in " everyday words vocabulary ", and the dictionary sign indicating number that we compile for it is:
[A1020002]
Wherein to represent this speech be termini generales to A1, and on behalf of this speech, 02 comprise two Chinese characters, 0002 this speech of expression serial number in the noun list in the dictionary code table.
The present invention claims that a class speech is non-dictionary word, i.e. some proper nouns are as name, place name, trade name etc.We also adopt the method for four byte code to the coding of this class speech,
[AxH?xxH?xxH?xxH]
Just the x perseverance gets zero among first byte AxH, i.e. first byte of non-dictionary word A0H always, and remainder is identical with above coding method.As: " Arab " this proper noun, dictionary sign indicating number are [A0030001], and wherein to represent this speech be non-dictionary word to A0, and on behalf of this speech, 03 comprise three Chinese characters, 0001 this speech of expression serial number in the non-dictionary vocabulary in the dictionary code table.
The present invention adopts four byte code equally for symbols (or SBC case) such as Chinese character punctuates:
[AaH xxH xxH xxH] wherein first byte perseverance is AaH, and second byte perseverance is 00H, the internal code that two bytes in back are these punctuation marks.The coding of punctuate commonly used is as follows:
Aa00a3ac
. Aa00a1a3
Aa00a3ba
Aa00a3bb
Aa00a3bf
Aa00a1a2
Aa00a1ae
Aa00a1af
Aa00a1b6
Aa00a1b7
Aa00a1b0
Aa00a1b1
( Aa00a3a8
) Aa00a3a9
Aa00a3a1
... Aa00a1ad
Aa00a1aa
- Aa00a3ad
For the coded system that the present invention is proposed can realize on computers, the internal code of expression Chinese character in dictionary sign indicating number of the present invention and the present computing machine need be connected, the present invention sets up the corresponding relation of dictionary sign indicating number and internal code by a database, is called the dictionary code table.Comprising 12 kinds of tables distinguishing by part of speech in the dictionary code table, is respectively non-dictionary vocabulary, the termini generales table, describe vocabulary, verb list, adverbial word table, pronoun table, the number table, the measure word table, preposition/conjunction table, onomatopoeia/interjection/auxiliary word/tone vocabulary, Chinese idiom table and punctuation mark table, each dictionary sign indicating number of record and the pairing internal code string of this dictionary sign indicating number in the table.The structure of database illustrates following (ISN is internalstatement number, internal code):
The non-dictionary word of table 0
Numbering dictionarynumber ISN
1 A0030001 b0a2c0adb2ae
2 A0040002 b0c2c1d6c6a5bfcb
...... ...... ......
Table 1 termini generales
Numbering dictionarynumber ISN
1 A1020001 b0a2b0d6
2 A1020002 b0a2b5dc
3 A1020003 b0a2b8e7
4 A1020004 b0a2c2e8
5 A1020005 b0a2c3c3
6 A1020006 b0a2c6c5
...... ...... ......
The Chinese-character keyboard input method that the present invention proposes, be to be main input medium with phonetic, with the speech is the basic unit that imports, set up the table of comparisons of a phonetic sign indicating number and dictionary sign indicating number,, in the table of comparisons, search the dictionary sign indicating number that is mated with phonetic according to the phonetic of input speech, find corresponding internal code to show by the dictionary code table again and import, final form two files, one is the file of the dictionary sign indicating number form of input content, and another is the file of common internal code form.Show or still show when printing or print by internal code corresponding in the dictionary code table.The table of comparisons of phonetic sign indicating number and dictionary sign indicating number is mapped the phonetic of each speech and its dictionary sign indicating number according to the order of " everyday words vocabulary ".Though phonetically similar word extensively exists in the Modern Chinese, the homonym much less.So in general language environment, adopt the speech input can reach man-to-man mapping substantially.But also can produce repeated code under the situation about having, the corresponding several dictionary sign indicating numbers of such spelling sound sign indicating number possibility, system finds all speech that meet this phonetic sign indicating number to allow the importer select, thus the speech that affirmation will be imported.Phonetic sign indicating number and dictionary sign indicating number table of comparisons structure are as follows:
Numbering pinyin dictionarynumber
1 a A9010001
2 aha A9020002
3 aba A1020001
4 adi A1020002
...... ...... ......
In the time of for example will importing " spring " this speech, key in eight letters of phonetic keypad code " chuntian " with keyboard, translator is just searched the dictionary sign indicating number corresponding with " chuntian " in the table of comparisons, find the pairing internal code of this speech by the dictionary sign indicating number again, be presented at then and allow the importer confirm on the screen, after confirming the dictionary sign indicating number and the internal code of this speech are all preserved, exist respectively in two files.After like this each speech all having been imported, just can form one piece of complete article, and the internal format of document is with the dictionary representation.On the platform of this dictionary sign indicating number, just can carry out some information processings like this, such as text classification, automatic abstract etc. directly read per four bytes during processing, and a speech can directly be handled exactly, thereby has walked around this difficult problem of participle.

Claims (1)

  1. The Computer Processing of a Chinese word and keyboard the input method, it is characterized in that, a kind of Chinese word computing machine that described method is is basic input unit in the phonetic mode, with the speech, handle based on phonetic sign indicating number-dictionary sign indicating number, dictionary sign indicating number-internal code order contrast ground successively and the method for keyboard input, described speech is meant and comprises the least unit that speech, phrase, phrase and Chinese idiom independently use the interior significant user of confession in the Chinese statement that described method contains following steps successively:
    Step 1: the Chinese vocabulary in " information processing with Modern Chinese everyday words vocabulary " is divided into noun, verb, verb phrase, adjective, adverbial word, pronoun, number, measure word, onomatopoeia, interjection, preposition, conjunction, auxiliary word and modal particle by their the most frequently used parts of speech, also has Chinese idiom, form dictionary word, each dictionary word all is made of 1 to 7 Chinese character;
    Step 2: each dictionary word in the step 1 is encoded in the following manner, form the dictionary sign indicating number, each dictionary sign indicating number all is made of 4 bytes, by the concrete form of hexadecimal representation is:
    [AxH xxH xxH xxH], H is hexadecimal expression symbol, and following dictionary sign indicating number is all used hexadecimal representation, and the Gao Siwei of first byte must be AH, and binary representation is 1010; The scope of low four x of first byte is that 1H is to FH, the part of speech of representing this speech, 1 to the 9 little speech of representing noun, adjective, adverbial word, pronoun, number, measure word, Chinese idiom, preposition and conjunction successively and comprising onomatopoeia, interjection, auxiliary word and modal particle wherein, A represents punctuation mark, B to F represents verb and the verb phrase in " information processing Modern Chinese everyday words vocabulary ", each class is respectively by series arrangement in the table, the Gao Siwei of second byte is for keeping the position, the Chinese character number that low four these speech of bit representation of second byte are comprised; The 3rd byte and nybble are formed a sequence code, scope be 1H to FFFFH, promptly 65535, be used for vocabulary is arranged and numbered by the phonetic alphabet order, coding can hold 14 * 65536 entries at least in this way;
    Step 3: form phonetic sign indicating number-dictionary sign indicating number table of comparisons according to the result of the result of step 1 and step 2 and deposit computing machine in;
    Step 4: set up the dictionary sign indicating number of the speech in " information processing Modern Chinese everyday words vocabulary " and the table of comparisons between the internal code of this speech in computing machine, be called the dictionary code table: comprise 12 kinds of tables distinguishing by part of speech in the described dictionary code table, it is respectively non-dictionary vocabulary, the termini generales table, describe vocabulary, verb list, the adverbial word table, the pronoun table, the number table, measure word table, preposition/conjunction table, onomatopoeia/interjection/auxiliary word/tone vocabulary, Chinese idiom table and punctuation mark table, the dictionary sign indicating number and the pairing internal code of this dictionary sign indicating number of each speech of record constitute the dictionary code table in the table, wherein said non-dictionary word is meant and comprises name, place name, trade name is in some interior proper nouns, and described non-dictionary word and punctuation mark adopt nybble method coding equally;
    Step 5: the dictionary code table that step 4 is obtained stores in the described computing machine and goes, and forms the file of dictionary sign indicating number form and internal code form during input in Chinese respectively;
    Step 6: in the phonetic mode, with the speech is that base unit is to described computing machine input Chinese word;
    When described nybble method was encoded to punctuation mark, wherein first byte perseverance was AaH, and the second byte perseverance is got 00H; The 3rd byte and nybble are the internal code of this punctuation mark;
    When described nybble method was encoded to non-dictionary word, wherein the first byte perseverance was got A0H; High four reservations of second byte, the Chinese character number that low four these the non-dictionary words of bit representation of second byte are comprised; The 3rd byte and nybble are represented the serial number in the non-dictionary vocabulary of this non-dictionary word in the dictionary code table.
CNB2005101354752A 2005-12-31 2005-12-31 Computer processing and keyboard inputting method for Chinese word Expired - Fee Related CN100390711C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005101354752A CN100390711C (en) 2005-12-31 2005-12-31 Computer processing and keyboard inputting method for Chinese word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005101354752A CN100390711C (en) 2005-12-31 2005-12-31 Computer processing and keyboard inputting method for Chinese word

Publications (2)

Publication Number Publication Date
CN1790238A CN1790238A (en) 2006-06-21
CN100390711C true CN100390711C (en) 2008-05-28

Family

ID=36788141

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101354752A Expired - Fee Related CN100390711C (en) 2005-12-31 2005-12-31 Computer processing and keyboard inputting method for Chinese word

Country Status (1)

Country Link
CN (1) CN100390711C (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909537A (en) * 2019-11-19 2020-03-24 曲英洲 Artificial intelligence method for modern Chinese component analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1294360A (en) * 1999-10-29 2001-05-09 株式会社东芝 Chinese character input change processor, its method and recording medium
CN1305137A (en) * 2000-11-07 2001-07-25 安连芳 Chinese-character '216 code' input method for computer and its keyboard

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1294360A (en) * 1999-10-29 2001-05-09 株式会社东芝 Chinese character input change processor, its method and recording medium
CN1305137A (en) * 2000-11-07 2001-07-25 安连芳 Chinese-character '216 code' input method for computer and its keyboard

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
制订《信息处理用现代汉语常用词词表》的原则与问题的讨论. 梁南元,刘源,沈旭昆,谭强,杨铁鹰.中文信息学报,第5卷第3期. 1991
制订《信息处理用现代汉语常用词词表》的原则与问题的讨论. 梁南元,刘源,沈旭昆,谭强,杨铁鹰.中文信息学报,第5卷第3期. 1991 *
北京大学现代汉语语料库基本加工规范. 俞士汶,段慧明,朱学锋,孙斌.中文信息学报,第16卷第5期. 2002
北京大学现代汉语语料库基本加工规范. 俞士汶,段慧明,朱学锋,孙斌.中文信息学报,第16卷第5期. 2002 *
汉语语料的切分标注加工系统. 徐菁,张辉,陆汝占.计算机工程,第29卷第9期. 2003
汉语语料的切分标注加工系统. 徐菁,张辉,陆汝占.计算机工程,第29卷第9期. 2003 *

Also Published As

Publication number Publication date
CN1790238A (en) 2006-06-21

Similar Documents

Publication Publication Date Title
Silberztein Formalizing natural languages: The NooJ approach
US5893133A (en) Keyboard for a system and method for processing Chinese language text
US6014615A (en) System and method for processing morphological and syntactical analyses of inputted Chinese language phrases
US9110980B2 (en) Searching and matching of data
CN109241540B (en) Hanblindness automatic conversion method and system based on deep neural network
US20130054225A1 (en) Searching and matching of data
US5410306A (en) Chinese phrasal stepcode
US20100106481A1 (en) Integrated system for recognizing comprehensive semantic information and the application thereof
CN100429648C (en) Automatic segmentation of texts comprising chunsk without separators
CN111178061A (en) Multi-lingual word segmentation method based on code conversion
CN113449514A (en) Text error correction method and device suitable for specific vertical field
Zhang et al. Design and implementation of Chinese Common Braille translation system integrating Braille word segmentation and concatenation rules
CN102929865A (en) PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
Lu Computers and Chinese writing systems
CN101382931A (en) Interchange internal code for electronic, information and communication system and use thereof
CN103246354B (en) Input method and the keyboard thereof of Chinese character is expressed with common language literal code
CN101882158A (en) Automatic translation sequence adjusting method based on contexts
CN115455981B (en) Semantic understanding method, device and equipment for multilingual sentences and storage medium
CN100390711C (en) Computer processing and keyboard inputting method for Chinese word
JP7247460B2 (en) Correspondence Generating Program, Correspondence Generating Device, Correspondence Generating Method, and Translation Program
CN115310433A (en) Data enhancement method for Chinese text proofreading
CN104641367B (en) For formatting formatting module, the system and method for electronic character sequence
CN113705223A (en) Personalized English text simplification method taking reader as center
JP4088171B2 (en) Text analysis apparatus, method, program, and recording medium recording the program
Pavlović-Lažetić et al. Towards full lexical recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080528

Termination date: 20100201