CN105824793A - Processing system, method and device for transforming Chinese characters into numbers and Latin letters - Google Patents

Processing system, method and device for transforming Chinese characters into numbers and Latin letters Download PDF

Info

Publication number
CN105824793A
CN105824793A CN201610351991.7A CN201610351991A CN105824793A CN 105824793 A CN105824793 A CN 105824793A CN 201610351991 A CN201610351991 A CN 201610351991A CN 105824793 A CN105824793 A CN 105824793A
Authority
CN
China
Prior art keywords
mother
stroke
female
numeral
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610351991.7A
Other languages
Chinese (zh)
Inventor
潘昌仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610351991.7A priority Critical patent/CN105824793A/en
Publication of CN105824793A publication Critical patent/CN105824793A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Abstract

The invention provides a processing system, method and device for transforming Chinese characters into numbers and Latin letters. The system comprises a preprocessing module used for dividing strokes of Chinese characters into character formation elements of multiple classes, a storage module, an element decomposition module used for splitting target Chinese characters, a code classifying module used for digitizing split target Chinese characters, and a Latin letter forming module used for transforming split Chinese characters into Latin letters. By means of the processing system, method and device, Chinese characters are automatically transformed into Arabic numbers and Latin letters, numbers or Latin letters are transformed into binary digits, machines are helped to recognize, read and display Chinese characters, and thus a new way is developed for Chinese character programming; besides, a new thought and a new method are provided for digital search and teaching of Chinese characters, and convenience is provided for global citizens to search for and study Chinese characters through machines.

Description

Chinese character is converted into numeral and the processing system of the Latin alphabet, method and apparatus
Technical field
The invention mainly relates to Chinese character for computer processing technology field, be specifically related to a kind of Chinese character be converted into numeral and the processing system of the Latin alphabet, method and apparatus.
Background technology
21 century, the mankind enter big digital Age, and advance from big digital Age towards the ultimate aim of automatization.Digitized is the prelude of automatization, the Pinch technology of the digitizing technique Ze Shi automatization of spoken and written languages.
Computer and the extensive application of IT net, human society is made to enter the stage of high speed development, science and technology is flourishing, information flow, data magnanimity, highway, network delivery ... make the exchange between people more and more closer, live more and more convenient, yet with in information technology invention development process, assembly instruction on machine instruction, programming language, developing instrument, operating system, application program is all that the American National Standards Institute (ANSI)s in 1967 developed in English have delivered ASCII(American Standard Code for information interchange) ASCII (AmericanStandardCodeforInformationInterchange).ASCII is a set of computer code's system based on the Latin alphabet, is mainly used in showing current english and other Western-European languages, and programs with the form proper calculation machine of standard.It is the most general single byte coding system, and is equal to international standard ISO/IEC646.ASC II is mainly used in machine display English, and world's word of Chinese character and other non-Romances is just repelled outside computer code's system by this with other Western-European languages.
On the other hand, in new century, China's emergence proposes new requirement to the international communication of Chinese character, within 2004, found first based on the Confucius institute of Chinese characters teaching in South Korea Seoul, by December in 2014 7 days, 126 countries (regional) in the whole world set up 475 Confucius institutes and 851 Confucius classrooms.But, owing to ten hundreds of Chinese characters belongs to spell shape word, it is impossible to combine into syllables, the Chinese character teaching difficulty in teaching Chinese as a foreign language is the biggest.Denmark children's stories writer Andersen once said in children's stories collection in this way: " book from heaven is to be write as with the Chinese character being most difficult in the world ".Chinese character is considered as " the extremely difficult peak gone beyond of Qomolangma one " in word field by most of cosmopolite always.
As far back as April 25 calendar year 2001, " world culture multiformity declaration " action plan main points that UNESCO delivers propose " promote ' numeral is eliminated illiteracy '; using information with propagate new technique as the subject in teaching programme and teaching means that teaching efficiency can be improved, improve the ability of these new techniques of grasp;" (to put 9)
The difficulty of Chinese character also resides in the difficulty of retrieval, retrieval is study and the important step in practice of Chinese character at all times, since the Eastern Han Dynasty is permitted careful discovery six scripts method for generating Chinese character and invention radical stroke descriptor index method, so far the history of more than 2,000 year is had, Xu Shen, in " origin of Chinese character ", is divided into 540 radicals adopting Chinese character form radical class at first.Descendant, through simplifying, to " Chinese character radicals table " regulation main radical 201 that on May 1st, 2009, the Ministry of Education of the state announced, attached shape radical 99, still has 300 more than, and compared with alphabetic writing, Chinese character index is the most inconvenient.
Wanting it, the epoch need the digitized of Chinese character and the new technique of automatization to arise at the historic moment.Precisely because the developing Chinese character digitalized and Chinese character of invention is converted to the new technique of the Latin alphabet, the automatization of the Chinese nation could realize, teaching Chinese as a foreign language include Chinese character index also get to UNESCO " using information with propagate new technique as the subject in teaching programme and teaching means that teaching efficiency can be improved " target;And make machine while by ASC II " display current english and other Western-European languages ", compatible also Display of Chinese characters, realize directly programming with Chinese characters, so that the Chinese nation holds the general trend of events in big digital Age and automated process, gains the initiative, establish oneself in an unassailable position.
Summary of the invention
The technical problem to be solved is to provide and a kind of Chinese character is converted into numeral and the processing system of the Latin alphabet, method and apparatus, realize Chinese character and be automatically converted to Arabic numerals and the Latin alphabet, and be converted to binary digit by numeral or the Latin alphabet, thus help machine recognition and Display of Chinese characters.
The technical scheme is that a kind of system that Chinese character is converted into numeral and the Latin alphabet, including pretreatment module, memory module, solution element module, return yard module and change drawing-die block,
Described pretreatment module, for Chinese-character stroke is divided into the stroke form of, two or many according to the Chinese character separating rule preset, thus obtains the structure character of multiple classification;It is additionally operable to encode each structure character respectively by the numeral set, more described numeral is indicated with the Latin alphabet set respectively;Described structure character is to constitute the stroke form that block character is most basic, and structure character can be the symbol that some pictures, two strokes or many strokes are formed.
Described memory module, is used for storing each structure character, and stores each structure character, numeral and the corresponding relation of the Latin alphabet;
Described solution element module, for the target characters of typing being split according to described Chinese character separating rule, obtain multiple stroke forms of target characters, and each stroke form is matched to respectively under the structure character of corresponding classification, thus obtain multiple structure characters of this target characters;
Described return a yard module, for being mated with corresponding numeral respectively by multiple structure characters of described target characters, thus obtain the digital form of this target characters;
Describedization drawing-die block, for the digital form of target characters being mated with the corresponding Latin alphabet respectively, thus obtains the Latin alphabet form of this target characters.
The invention has the beneficial effects as follows: 1) realize Chinese character and be automatically converted to Arabic numerals and the Latin alphabet, thus help machine recognition and Display of Chinese characters, thus program for Chinese character and open new road;2) it is the numerical search of Chinese character and teaching provides new approaches and new method, searched by machine for cosmopolite and learning Chinese characters is provided convenience.
On the basis of technique scheme, the present invention can also do following improvement.
Further, in described pretreatment module, according to Chinese character separating rule Chinese-character stroke is divided into ten structure characters: class 0 is female, horizontal female, skim female, right-falling stroke is female, perpendicular mother, angle mother female, curved, class 7 is female, class 8 is female and class 9 mother.
Above-mentioned further scheme is used to provide the benefit that: to break the rule of traditional tens of kinds of stroke structure words, and be ten structure characters by tens of for tradition kinds of stroke scientific abstractions, be conducive to Hanzi structure quickly being distinguished and disassembling, it is simple to reciprocity represent Chinese character with few numeral and the Latin alphabet are next.
Further, in described pretreatment module, being represented by numeral 0~9 respectively by ten structure characters: class 0 mother is 0, and horizontal mother is 1, skimming mother is 2, and right-falling stroke mother is 3, and perpendicular mother is 4, and angle mother is 5, and curved mother is 6, and class 7 mother is 7, and class 8 mother is 8 and class 9 mother is 9;Numeral 0~9 is expressed as with the Latin alphabet respectively: 1234567890=ABCDEFGHIO.
Above-mentioned further scheme is used to provide the benefit that: for establishing etc. code chain between Chinese character and numeral, between Chinese character and the Latin alphabet and between Chinese character and numeral and Latin alphabet three.
Further, in described ten structure characters,
Female many stroke forms for circumference Guan Bi of class 0;
Horizontal female for by a stroke form forming bar from left to right;
Skimming mother is by a stroke form formed from the up to lower left corner;
Right-falling stroke mother is from the up to lower right corner or the stroke form that angle from left to right by one;
Perpendicular mother is the stroke form being formed perpendicular pen by from top to bottom;
Angle mother is the stroke form being become a knuckle by or two forms of a stroke or a combination of strokes;
Curved mother is the stroke form being become two knuckles by or two or many forms of a stroke or a combination of strokes;
Class 7 is female, and in order to be crossed by one, angle is female or the stroke form of curved mother;
Class 8 is female for by two stroke forms intersected to form;
Class 9 mother is the stroke form being become " 9 " shape (forward " 9 " or reverse " 9 ") by many forms of a stroke or a combination of strokes.
Above-mentioned further scheme is used to provide the benefit that: to carry out returning code by Chinese character by ten structure characters, realize Chinese character and be automatically converted to Arabic numerals and the Latin alphabet, thus help machine quickly to distinguish Chinese character, for establishing etc. code chain between Chinese character and numeral, between Chinese character and the Latin alphabet and between Chinese character and numeral and Latin alphabet three.
Another technical scheme that the present invention solves above-mentioned technical problem is as follows: a kind of processing method that Chinese character is converted into numeral and the Latin alphabet, the regular stroke form that Chinese-character stroke is divided into, two or many of Chinese character separating preset including basis, thus obtain the step of the structure character of multiple classification;
Each structure character is encoded by the numeral set respectively, then the step that described numeral is indicated with the Latin alphabet set respectively;
Described multiple structure character is encoded by the numeral set respectively, and the step that described numeral is indicated with the Latin alphabet set respectively;
Store each structure character, and store the step of the corresponding relation of each structure character, numeral and the Latin alphabet;
According to described Chinese character separating rule, the target characters of typing is split, obtain multiple stroke forms of target characters, and each stroke form is matched to respectively under the structure character of corresponding classification, thus obtain the step of multiple structure characters of this target characters;
Multiple structure characters of described target characters are mated with corresponding numeral respectively, thus obtains the step of the digital form of this target characters;And
The digital form of target characters is mated with the corresponding Latin alphabet respectively, thus obtains the step of the Latin alphabet form of this target characters.
Further, according to Chinese character separating rule Chinese character is divided into ten structure characters: class 0 is female, horizontal female, skim female, right-falling stroke is female, perpendicular mother, angle mother female, curved, class 7 is female, class 8 is female and class 9 mother.
Further, ten structure characters are represented successively by numeral 0~9 respectively: class 0 mother is 0, horizontal mother is 1, and skimming mother is 2, and right-falling stroke mother is 3, perpendicular mother is 4, angle mother is 5, and curved mother is 6, and class 7 mother is 7, class 8 mother is 8 and class 9 mother is 9, and numeral 0~9 is expressed as with the Latin alphabet respectively: 1234567890=ABCDEFGHIO.
Further, in described ten structure characters,
Female many stroke forms for circumference Guan Bi of class 0;
Horizontal female for by a stroke form forming bar from left to right;
Skimming mother is by a stroke form formed from the up to lower left corner;
Right-falling stroke mother is from the up to lower right corner or the stroke form that angle from left to right by one;
Perpendicular mother is the stroke form being formed perpendicular pen by from top to bottom;
Angle mother is the stroke form being become a knuckle by or two forms of a stroke or a combination of strokes;
Curved mother is the stroke form being become two knuckles by or two or many forms of a stroke or a combination of strokes;
Class 7 is female, and in order to be crossed by one, angle is female or the stroke form of curved mother;
Class 8 is female for by two stroke forms intersected to form;
Class 9 mother is the stroke form being become " 9 " shape (forward " 9 " or reverse " 9 ") by many forms of a stroke or a combination of strokes.
Another technical scheme that the present invention solves above-mentioned technical problem is as follows: a kind of processing means that Chinese character is converted into numeral and the Latin alphabet, including the described processing system that Chinese character is converted into numeral and the Latin alphabet, also include collecting device, data base and outut device, described collecting device is for the target characters of typing, and the target characters of typing is transmitted to described processing system, target characters is converted into digital form and Latin alphabet form and calls described outut device and show by described processing system, described data base is for storing each target characters and the digital form of correspondence thereof and Latin alphabet form.
Further, also including binary translator, described binary translator carries out the conversion of binary numeral respectively for the numeral conversion of this target characters obtained and the Latin alphabet, thus obtains two kinds of binary numerals of this target characters.
Above-mentioned further scheme is used to provide the benefit that: Chinese character is converted to binary digit by numeral or the Latin alphabet, it is simple to machine processing Chinese character.
Accompanying drawing explanation
Fig. 1 is the module diagram of present system;
Fig. 2 is the structural representation of apparatus of the present invention;
Fig. 3 is the schematic diagram of the embodiment of the present invention;
Fig. 4 is the schematic diagram of embodiment of the present invention structure character;
Fig. 5 is the operating diagram that stamper recognized in Chinese character of the present invention.
Detailed description of the invention
Being described principle and the feature of the present invention below in conjunction with accompanying drawing, example is served only for explaining the present invention, is not intended to limit the scope of the present invention.
The present invention is a kind of is converted into numeral and the processing system of the Latin alphabet, a kind of scheme of method and apparatus proposition by Chinese character, the Latin type making the digital and Chinese character that Chinese character is automatically converted to the scholar who won the first place in provincial imperial examinations formula of Chinese character, Chinese character by computer by square-shaped enables the machine to automatically identifying and reading Chinese character, realize numeral eliminate illiteracy, numerical search and directly program, and then make ASC II with Chinese characters and upgrade to CHINA-ASCII (abbreviation C-ASCII).After ASCII upgrades to C-ASCII, it is used not only for showing current english and other Western-European languages, moreover it can be used to display Chinese Chinese character, thus increases powerful competitiveness for China's word in big digital Age.
As it is shown in figure 1, a kind of processing system that Chinese character is converted into numeral and the Latin alphabet, including pretreatment module, memory module, solution element module, return yard module and change drawing-die block,
Described pretreatment module, for Chinese-character stroke is divided into the stroke form of, two or many according to the Chinese character separating rule preset, thus obtains the structure character of multiple classification;It is additionally operable to encode each structure character respectively by the numeral set, more described numeral is indicated with the Latin alphabet set respectively;Described structure character is to constitute the stroke form that block character is most basic, and structure character can be the symbol that some pictures, two strokes or many strokes are formed;
Described memory module, is used for storing each structure character, and stores each structure character, numeral and the corresponding relation of the Latin alphabet;
Described solution element module, for the target characters of typing being split according to described Chinese character separating rule, obtain multiple stroke forms of target characters, and each stroke form is matched to respectively under the structure character of corresponding classification, thus obtain multiple structure characters of this target characters;
Described return a yard module, for being mated with corresponding numeral respectively by multiple structure characters of described target characters, thus obtain the digital form of this target characters;
Describedization drawing-die block, for the digital form of target characters being mated with the corresponding Latin alphabet respectively, thus obtains the Latin alphabet form of this target characters.
In described pretreatment module, according to Chinese character separating rule Chinese-character stroke is divided into ten structure characters (i.e. Chinese character 10 is female): class 0 is female, horizontal female, skim female, right-falling stroke is female, perpendicular mother, angle mother female, curved, class 7 is female, class 8 is female and class 9 mother.
As it is shown on figure 3, in described pretreatment module, represented by numeral 0~9 respectively by ten structure characters: class 0 mother is 0, and horizontal mother is 1, skimming mother is 2, and right-falling stroke mother is 3, and perpendicular mother is 4, and angle mother is 5, and curved mother is 6, and class 7 mother is 7, and class 8 mother is 8 and class 9 mother is 9;Numeral 0~9 is expressed as with the Latin alphabet respectively: 1234567890=ABCDEFGHIO.
Therefore, described memory module stores ten structure characters (i.e. Chinese character 10 female), numeral and the corresponding relation of the Latin alphabet: a ノShu70 Jie mouths=1234567890=ABCDEFGHIO.
When target characters is split, (taking mother) can be split according to stroke writing successively according to first left and then right, first up and then down, first outside and then inside rule;
First left and then right, as;People (2,3), in (0,4), big (8,3), by (0,8), little (2,5,3) and do not make (5,2,3), these (26318) do not make (31836);
First up and then down, such as: sky (1,8,3);
First outside and then inside, such as: prisoner (0,2,3) state (01813).
As shown in Figure 3-4, in described ten structure characters,
Female many stroke forms for circumference Guan Bi of class 0;Normal formula " mouth " and deformation thereof, deformation type includes mouth
Horizontal female for by a stroke form forming bar from left to right;Normal formula " one ";
Skimming mother is by a stroke form formed from the up to lower left corner;Normal formula " Pie " and deformation thereof;
Right-falling stroke mother is from the up to lower right corner or the stroke form that angle from left to right by one;Normal formulaAnd deformation, deformation type includesDian;
Perpendicular mother is the stroke form being formed perpendicular pen by from top to bottom;Normal formula " Shu ";
Angle mother is the stroke form being become a knuckle by or two forms of a stroke or a combination of strokes;Normal formulaAnd deformation, deformation type includes??, the namely multiple stroke form at Different Plane angle;
Curved mother is the stroke form being become two knuckles by or two or many forms of a stroke or a combination of strokes;Normal formulaDeformation type includesJiong Mi secondNamely contain the multiple stroke form of two jiaos of " curved " being connected;
Class 7 is female, and in order to be crossed by one, angle is female or the stroke form of curved mother;Normal formula " seven " and deformation thereof, deformation type includesPower again
Class 8 is female for by two stroke forms intersected to form;Normal formulaAnd deformation, deformation type includesメ Na
Class 9 mother is the stroke form being become similar " 9 " (forward " 9 " or reverse " 9 ") by many forms of a stroke or a combination of strokes;Normal formula " Jie " and deformation thereof, deformation type includes corpse Fu
The above deformation refer to flexional symbols with letter standard symbol book reason identical, and itself or because of the form of a stroke or a combination of strokes change or because of angle change or length change and there is the variation of form.
The above Chinese character i.e. Chinese character 10 of ten structure characters is female, owing to being only 10 female by the spell shape component abstract generalization of Chinese character, therefore-10 yards of China can be called for short, its lucky and Arabic numerals chassis each other, constituting " at the bottom of word truth of a matter word/numeric word ", " word truth of a matter word " refers to the Arabic array corresponding with the structure character of concrete Chinese character;" at the bottom of numeric word " refers to concrete Chinese character or the scholar who won the first place in provincial imperial examinations formula of Chinese character of specific Arabic array uniquely correspondence.
State issues 8105 one-levels in " 2013 general word table " and is combined into by above-mentioned 10 i.e. Pinxing letters of structure character to three grades of words.
In order to absolutely prove the female mosaic of Chinese character 10 situation with numeral chassis each other, below we enumerate " rich, strong, the people, master, literary composition, bright and, humorous, from, by " illustrate as a example by 10 words:
Example 1: rich=361008
Female (the Dian)+curved mother (Mi) of right-falling stroke+horizontal mother (-)+class 0 mother+ class 0 is female+ class 8 is female
Example 2: strong=6600433
Curved mother+ curved mother+ class 0 is female+ class 0 is female+ perpendicular female (Shu)+right-falling stroke mother+ right-falling stroke is female (Dian)
Example 3: the people=97
Class 9 is female+ class 7 is female
Example 4: main=3181
Right-falling stroke female (Dian)+horizontal female (one)+class 8 female (ten)+horizontal female (one)
Example 5: literary composition=318
Right-falling stroke female (Dian)+horizontal female (one)+class 8 is female
Example 6: bright=01611
Class 0 is female+ horizontal mother (_)+curved mother+ horizontal mother (_)+horizontal mother (_)
Example 7: and=28230
Skim mother+ class 8 is female+ skim female (Pie)+right-falling stroke female (Dian)+class 0 mother
Example 8: humorous=365162201
Female (the Dian)+curved mother of right-falling stroke+ angle is female+ horizontal mother (_)+curved mother's ()+slash mother+ skim mother+ class 0 female (mouth)+horizontal mother (_)
Example 9: from=2011
Skim mother+ class 0 is female+ horizontal mother (_)+horizontal mother (_)
Example 10: by=08
Female (the mouth)+class 8 of class 0 is female
The Latin type of the digital and Chinese character constituting the square-shaped of Chinese character, the scholar who won the first place in provincial imperial examinations formula of Chinese character, Chinese character can wait the relation that code turns mutually, and the Latin type of Chinese character is referred to as Chinese Latin, and Chinese Latin is Chinese character and Latin system word such as the English form integrated with;Such as: good=GHEG (7857), friend=FAAFAA (611611), friend=HG (87).
Therefore, Chinese character is four formula forms by single entry metamorphosis, i.e. the Latin type of the digital and Chinese character of the square-shaped of Chinese character, the scholar who won the first place in provincial imperial examinations formula of Chinese character, Chinese character, the basis formula of the i.e. Chinese character of the square-shaped of Chinese character, usual formula and applying equation;
The scholar who won the first place in provincial imperial examinations formula of Chinese character is exactly the arranged type that individual Chinese character is decomposed into the structure character (or structure character deformation type) of Chinese character by square-shaped, and the spread pattern of the structure character (or structure character deformation type) of Chinese character is also the linear formula of Chinese character 10 mother;
The digital-code type of Chinese character is the form of the arranged type that the scholar who won the first place in provincial imperial examinations formula of Chinese character is converted into number.Due to code relations such as Chinese character 10 mother and 10 Arabic numerals formation, get final product chassis each other, therefore the square-shaped of Chinese character is decomposed into the arranged type of 10 structure characters of Chinese character referred to as " scholar who won the first place in provincial imperial examinations ", and the arranged type of 10 structure characters is converted to the form of the Arabic array of 10, referred to as " returning code ", above-mentioned two process is connected together, is just called " scholar who won the first place in provincial imperial examinations returns code " of Chinese character;
The Latin type of described Chinese character is sequentially front for Latin 9 letters to be added o, composition is female with Chinese character 10 and the form of the Arabic numerals code such as each other, referred to as " change is drawn ", above-mentioned " change is drawn " process is connected together with " returning code " process, is just called " returning code to draw " of Chinese character.
Returning code and the process returning code to draw by the scholar who won the first place in provincial imperial examinations of above-mentioned Chinese character, Chinese character is just provided with four formula forms, is exemplified below in optical layers face:
I
Square-shaped: I
Scholar who won the first place in provincial imperial examinations formula (linear formula):Dian
Digital-code type: 277773
Latin type: BGGGGC
You
Square-shaped: you
Scholar who won the first place in provincial imperial examinations formula (linear formula):Shu?Dian
Digital-code type: 2425523
Latin type: BDBEEBC
He
Square-shaped: he
Scholar who won the first place in provincial imperial examinations formula (linear formula): Pie Shu?
Digital-code type: 2476
Latin type: BDGF
China
Square-shaped: China
Scholar who won the first place in provincial imperial examinations formula (linear formula):Shu seven
Digital-code type: 2478
Latin type: BDGH
Change
Square-shaped: change
Scholar who won the first place in provincial imperial examinations formula (linear formula):Shu seven
Digital-code type: 247
Latin type: CDG
Flower
Square-shaped: flower
Scholar who won the first place in provincial imperial examinations formula (linear formula):Shu seven
Digital-code type: 88247
Latin type: HHBDG
Draw
Square-shaped: draw
Scholar who won the first place in provincial imperial examinations formula (linear formula) :-mouth
Digital-code type: 1086
Latin type: AOHF
From previous example it can be seen that
1) part Chinese character is identical in acoustics aspect possibility with other Chinese characters, but is then not quite similar or entirely different in optics aspect, here it is so-called " sound is different with word ", and so-called " word is different " to study carefully be because structure character i.e. Pinxing letter is different in fact;
2) Latin type of described Chinese character is called for short Chinese Latin, although it can not combine into syllables, but can understand for people and machine, Chinese characters therefore can not only be shown in people, and can be shown in machine, therefore Chinese character the most also can directly display Chinese Latin for ASC II by Chinese Latin is computer compatibility Chinese character and Display of Chinese characters creates condition.
A kind of processing method that Chinese character is converted into numeral and the Latin alphabet, comprises the steps:
According to default Chinese character separating rule, Chinese-character stroke is divided into the stroke form of, two or many, thus obtains the step of the structure character of multiple classification;
Each structure character is encoded by the numeral set respectively, then the step that described numeral is indicated with the Latin alphabet set respectively;
Described multiple structure character is encoded by the numeral set respectively, and the step that described numeral is indicated with the Latin alphabet set respectively;
Store each structure character, and store the step of the corresponding relation of each structure character, numeral and the Latin alphabet;
According to described Chinese character separating rule, the target characters of typing is split, obtain multiple stroke forms of target characters, and each stroke form is matched to respectively under the structure character of corresponding classification, thus obtain the step of multiple structure characters of this target characters;
Multiple structure characters of described target characters are mated with corresponding numeral respectively, thus obtains the step of the digital form of this target characters;And
The digital form of target characters is mated with the corresponding Latin alphabet respectively, thus obtains the step of the Latin alphabet form of this target characters.
According to Chinese character separating rule Chinese character is divided into ten structure characters: class 0 is female, horizontal female, skim female, right-falling stroke is female, perpendicular mother, angle mother female, curved, class 7 is female, class 8 is female and class 9 mother.
Being represented successively by numeral 0~9 respectively by ten structure characters: class 0 mother is 0, and horizontal mother is 1, skimming mother is 2, and right-falling stroke mother is 3, perpendicular mother is 4, and angle mother is 5, and curved mother is 6, class 7 mother is 7, and class 8 mother is 8 and class 9 mother is 9, and numeral 0~9 is expressed as with the Latin alphabet respectively: 1234567890=ABCDEFGHIO.
In described ten structure characters,
Female many stroke forms for circumference Guan Bi of class 0;
Horizontal female for by a stroke form forming bar from left to right;
Skimming mother is by a stroke form formed from the up to lower left corner;
Right-falling stroke mother is from the up to lower right corner or the stroke form that angle from left to right by one;
Perpendicular mother is the stroke form being formed perpendicular pen by from top to bottom;
Angle mother is the stroke form being become a knuckle by or two forms of a stroke or a combination of strokes;
Curved mother is the stroke form being become two knuckles by or two or many forms of a stroke or a combination of strokes;
Class 7 is female, and in order to be crossed by one, angle is female or the stroke form of curved mother;
Class 8 is female for by two stroke forms intersected to form;
Class 9 mother is the stroke form being become " 9 " shape (forward " 9 " or reverse " 9 ") by many forms of a stroke or a combination of strokes.
As shown in Figure 2, a kind of processing means that Chinese character is converted into numeral and the Latin alphabet, including the described processing system that Chinese character is converted into numeral and the Latin alphabet, also include collecting device, data base and outut device, described collecting device is for the target characters of typing, and the target characters of typing is transmitted to described processing system, target characters is converted into digital form and Latin alphabet form and calls described outut device and show by described processing system, and described data base is for storing each target characters and the digital form of correspondence thereof and Latin alphabet form.
Concrete, described collecting device includes scanner, phonographic recorder and keyboard equipment, and user (operator) can carry out the typing of target characters by one of which equipment;The memory module of the processing system of described numeral and the Latin alphabet includes that stamper recognized in Chinese character, stamper storage recognized in described Chinese character has and processes each structure character of obtaining and the information of deformation type thereof through pretreatment module, and stores the information of the corresponding relation of each structure character, numeral and the Latin alphabet;Solving element module and include scholar who won the first place in provincial imperial examinations instrument, the target characters of typing is recognized in stamper at described Chinese character and is identified with pretreatment information and splits by described scholar who won the first place in provincial imperial examinations instrument, thus obtains the relevant structure character of this target characters;Described return a yard module to include returning a yard instrument, described in return yard instrument numeral corresponding with it for the relevant structure character of the target characters of fractionation to be encoded, thus obtain the digital form of this target characters, and be stored in data base;Describedization drawing-die including of block draws instrument, and describedization draws instrument the digital form of target characters to be changed with its corresponding Latin alphabet, thus obtains the Latin alphabet form of this target characters, and is stored in data base;Described outut device includes display screen, and digital form and the Latin alphabet form of this target characters are shown by described display screen.As it is shown in figure 5, for returning yard instrument and changing the schematic diagram drawing instrument to mate numeral and the Latin alphabet in stamper recognized in Chinese character.
Described data base arranges in the following order: 1) according to first letter mother stock be 10 arrangements, 2) arrange from less to more by the Pinxing letter number of individual Chinese character, 3) by word truth of a matter word from little and big order arrangement;
1) according to first letter mother stock is 10 arrangements
By Chinese character 10 female (i.e. ten structure characters) arrangement, i.e. by the sequence of 0123456789 branch, namely return portion according to the initial of word, be calculated as:
0 (class 0 female) such as: product Lu 000-000 00-00 electricity 07-OG is that 0181-OAHA scold 00561-OOEFA
1 (horizontal female) such as: two 11-AA fourth 15-AE beggar 1416-ADAF no 12430-ABDCO brother 104105-AODAOE
2 (skim mother) such as: people 23-BC thousand 28-BH male 2353-BCEC and 28230-BHBCO axe 238514-BCHEAD
3 (right-falling strokes mother) such as: wide 35-CE family 39-CI profound 31553-CAEEC flood 33368-CCCFH visitor 36270-CFBGO
4 (perpendicular mother) such as: 46-DF old Lu 401-DOA 419-DAI in mountain walks 4411242-DDAABDB tooth 4411623-DDAAFBC
5 (angles female) such as: the sub-57-EG li of 50881-EOHHA of 55-EE tames and dociles 563244-EFCBDD line 5537773-EECGGGC
6 (curved mothers) such as: cutter 62-FB craftsman 6514-FEAD tripe 61181-FAAHA changes 65218-FEBAH Europe 682523-FHBEBC
7 (classes 7 female) such as: generation 76-GF female 78-GH beats 7315-GCAE and turns 78773-GHGGC and pause 7612623
8 (classes 8 female) such as: the ancient rotten 82316 HBCAF dam 83623-HCFBC of the 80-H0 left 8141-HADA of too 833-HCC
9 (classes 9 female) such as: 97-IG 908-IOH of the chi 93-IC people brushes 9745-IGDE rank 92324-IBCBD
2) arrange from less to more by the Pinxing letter number of individual Chinese character
Arrange by the Pinxing letter number of Chinese character and refer to that the alphabetical number of the Chinese character because of same portion is different, therefore the word truth of a matter word with the Chinese character in portion is the most different, and by its letter number arrangement from less to more, such as:
1 (horizontal female)
1 female (1 word) 1-A
2 dry-the 18AH of female (3 word) two 11-AA butyl-15AE
3 female (12 words) can lose 116-AAF by 105-AOE tri-111-AAA ...
4 female (21 word) western 1025-AOBE draw 1086-AOHF more 1088-AOHH
Hundred 1201-ABOA ...
……
13 female (2 words) cover 1044224210127-AODDBBDBAOABG
Despot 1711111608611-AGAAAAAFOHFAA
14 female (1 word) dew 17111104143270-AGAAAAODADCBGO
3) word truth of a matter word is pressed from little and big order arrangement
Described by the order arrangement from small to large of word truth of a matter word, refer to the individual Chinese character in same portion owing to structure character is different, structure character is how many or different, so its word truth of a matter word is inevitable different, such as:
9 (class 9 is female) front 10 words arrange as follows according to word truth of a matter word from small to large:
Corpse 901-IOA 923-ICB office of 908-IOH team 960-IFO Buddhist nun 962-IFB of 9I-IA chi 93-IC people's 97-IG sun bends 976-IGF battle array 978-IGH.
This device also includes requestor, when described requestor receives the inquiry of Chinese character of keyboard, from data base, transfers digital form and the Latin alphabet form of this inquiry of Chinese character, and shown by display screen.
This device also includes binary translator, described binary translator carries out the conversion of binary numeral respectively for the numeral conversion of this target characters obtained and the Latin alphabet, thus obtain two kinds of binary numerals of this target characters, and at random one or both binary numerals of gained can be shown by display screen.
Such as, input Chinese character square-shaped " I ", " " respectively, the processing system of numeral and the Latin alphabet is automatically converted to decimal scale word truth of a matter word and the Latin type of Chinese character respectively according to the scholar who won the first place in provincial imperial examinations formula information transmitted, " 277773-BGGGGC " (I) and " 243416-BDCDAF " (), is automatically converted in incoming binary translator respectively:
I=277773=1000011110100001101
=243416=111011011011011000
We=277773243416=1000011110100001101111011011011011000;
Or
I=BGGGGC=010000100100011101000111010001110100011101000011
=BDCDAF=010000100100010001000011010001000100000101000110
We=BGGGGCBDCDAF=0100001001000111010001110100011101000111010 00011010000100100010001000011010001000100000101000110.
Chinese character can be to be directly changed into weights form by decimal scale word truth of a matter word, so-called weights i.e. weighted mean, (English has 26 letters to be commonly also referred to the probability that binary coding corresponding to character occur, when its transcription is weights and binary system, it is typically the ten's digit finding letter corresponding from ascii table, transcription becomes binary system or calculates weights again), Chinese character 10 is female because waiting code each other with Arabic numerals, therefore can be directly changed into binary system or calculate weights.As a example by " first " " denier " " good " " saving " four word, its word truth of a matter word decimal scale array can directly read:
Unit=1126, denier=011, good=248181, joint=8864,
According to 10 system number conversion weights formulas, it is respectively as follows:
Unit=1126=1*10^3+1*10^2+2*10^1+6*10^0
Denier=011=0*10^2+1*10^1+1*10^0
Good=248181=2*10^5+4*10^4+8*10^3+1*10^2+8*10^+1*10^0
Joint=8864=8*10^3+8*10^2+6*10^1+4*10^0
After above-mentioned example shows that Chinese character is abstract and is 10 female spell shapes, can be that word truth of a matter word be converted directly into weights or binary form by the digital of Chinese character, this just creates facility for computer compatibility Chinese character.
The Latin type making the digital and Chinese character that Chinese character is automatically converted to the scholar who won the first place in provincial imperial examinations formula of Chinese character, Chinese character by computer by square-shaped enables the machine to automatically identifying and reading Chinese character, realize that numeral is eliminated illiteracy, numerical search and direct Chinese characters program, and then make ASC II upgrade to CHINA-ASCII (abbreviation C-ASCII) C-ASCII is presented herein below to be simultaneously used for English and the example of Chinese i.e. Chinese Latin display Essential Terms, conversion as shown in table 1:
After ASCII upgrades to C-ASCII, it is used not only for showing current english and other Western-European languages, moreover it can be used to display Chinese Chinese character, so that Chinese character is as English, network and each International Technology field can be entered by machine, increase powerful competitiveness for China's word in big digital Age.
The present invention realizes Chinese character and is automatically converted to Arabic numerals and the Latin alphabet, and is converted to binary digit by numeral or the Latin alphabet, thus helps machine recognition and Display of Chinese characters, thus opens new road for Chinese character programming;And numerical search and teaching for Chinese character provide new approaches and new method, provided convenience by machine lookup and learning Chinese characters for cosmopolite.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. made, should be included within the scope of the present invention.

Claims (10)

1. the processing system that Chinese character is converted into numeral and the Latin alphabet, it is characterised in that include pretreatment module, memory module, solution element module, return yard module and change drawing-die block,
Described pretreatment module, for Chinese-character stroke is divided into the stroke form of, two or many according to the Chinese character separating rule preset, thus obtains the structure character of multiple classification;It is additionally operable to encode each structure character respectively by the numeral set, more described numeral is indicated with the Latin alphabet set respectively;
Described memory module, is used for storing each structure character, and stores each structure character, numeral and the corresponding relation of the Latin alphabet;
Described solution element module, for the target characters of typing being split according to described Chinese character separating rule, obtain multiple stroke forms of target characters, and each stroke form is matched to respectively under the structure character of corresponding classification, thus obtain multiple structure characters of this target characters;
Described return a yard module, for being mated with corresponding numeral respectively by multiple structure characters of described target characters, thus obtain the digital form of this target characters;
Describedization drawing-die block, for the digital form of target characters being mated with the corresponding Latin alphabet respectively, thus obtains the Latin alphabet form of this target characters.
A kind of processing system that Chinese character is converted into numeral and the Latin alphabet the most according to claim 1, it is characterized in that, in described pretreatment module, according to Chinese character separating rule Chinese-character stroke is divided into ten structure characters: class 0 is female, horizontal female, skim female, right-falling stroke is female, perpendicular mother, angle mother female, curved, class 7 is female, class 8 is female and class 9 mother.
A kind of system that Chinese character is converted into numeral and the Latin alphabet the most according to claim 2, it is characterized in that, in described pretreatment module, ten structure characters are represented by numeral 0~9 respectively: class 0 mother is 0, horizontal mother is 1, skimming mother is 2, and right-falling stroke mother is 3, and perpendicular mother is 4, angle mother is 5, curved mother is 6, and class 7 mother is 7, and class 8 mother is 8 and class 9 mother is 9;Numeral 0~9 is expressed as with the Latin alphabet respectively: 1234567890=ABCDEFGHIO.
4. according to a kind of processing system that Chinese character is converted into numeral and the Latin alphabet described in Claims 2 or 3, it is characterised in that in described ten structure characters,
Female many stroke forms for circumference Guan Bi of class 0;
Horizontal female for by a stroke form forming bar from left to right;
Skimming mother is by a stroke form formed from the up to lower left corner;
Right-falling stroke mother is from the up to lower right corner or the stroke form that angle from left to right by one;
Perpendicular mother is the stroke form being formed perpendicular pen by from top to bottom;
Angle mother is the stroke form being become a knuckle by or two forms of a stroke or a combination of strokes;
Curved mother is the stroke form being become two knuckles by or two or many forms of a stroke or a combination of strokes;
Class 7 female for crossed that angle is female by one or curved mother stroke form;
Class 8 is female for by two stroke forms intersected to form;
Class 9 mother is the stroke form being become " 9 " shape by many forms of a stroke or a combination of strokes.
5. the processing method that Chinese character is converted into numeral and the Latin alphabet, it is characterised in that include the stroke form that according to the Chinese character separating rule preset, Chinese-character stroke is divided into, two or many, thus obtain the step of the structure character of multiple classification;
Each structure character is encoded by the numeral set respectively, then the step that described numeral is indicated with the Latin alphabet set respectively;
Described multiple structure character is encoded by the numeral set respectively, and the step that described numeral is indicated with the Latin alphabet set respectively;
Store each structure character, and store the step of the corresponding relation of each structure character, numeral and the Latin alphabet;
According to described Chinese character separating rule, the target characters of typing is split, obtain multiple stroke forms of target characters, and each stroke form is matched to respectively under the structure character of corresponding classification, thus obtain the step of multiple structure characters of this target characters;
Multiple structure characters of described target characters are mated with corresponding numeral respectively, thus obtains the step of the digital form of this target characters;And
The digital form of target characters is mated with the corresponding Latin alphabet respectively, thus obtains the step of the Latin alphabet form of this target characters.
A kind of processing method that Chinese character is converted into numeral and the Latin alphabet the most according to claim 5, it is characterized in that, according to Chinese character separating rule Chinese character is divided into ten structure characters: class 0 is female, horizontal female, skim female, right-falling stroke is female, perpendicular mother, angle mother female, curved, class 7 is female, class 8 is female and class 9 mother.
A kind of processing method that Chinese character is converted into numeral and the Latin alphabet the most according to claim 6, it is characterized in that, ten structure characters are represented successively by numeral 0~9 respectively: class 0 mother is 0, and horizontal mother is 1, skimming mother is 2, right-falling stroke mother is 3, and perpendicular mother is 4, and angle mother is 5, curved mother is 6, class 7 mother is 7, and class 8 mother is 8 and class 9 mother is 9, and numeral 0~9 is expressed as with the Latin alphabet respectively: 1234567890=ABCDEFGHIO.
8. according to a kind of processing method that Chinese character is converted into numeral and the Latin alphabet described in claim 5 or 6, it is characterised in that in described ten structure characters,
Female many stroke forms for circumference Guan Bi of class 0;
Horizontal female for by a stroke form forming bar from left to right;
Skimming mother is by a stroke form formed from the up to lower left corner;
Right-falling stroke mother is from the up to lower right corner or the stroke form that angle from left to right by one;
Perpendicular mother is the stroke form being formed perpendicular pen by from top to bottom;
Angle mother is the stroke form being become a knuckle by or two forms of a stroke or a combination of strokes;
Curved mother is the stroke form being become two knuckles by or two or many forms of a stroke or a combination of strokes;
Class 7 is female, and in order to be crossed by one, angle is female or the stroke form of curved mother;
Class 8 is female for by two stroke forms intersected to form;
Class 9 mother is the stroke form being become similar " 9 " by many forms of a stroke or a combination of strokes.
9. the processing means that Chinese character is converted into numeral and the Latin alphabet, it is characterized in that, including the processing system that Chinese character is converted into numeral and the Latin alphabet described in any one of claim 1-4, also include collecting device, data base and outut device, described collecting device is for the target characters of typing, and the target characters of typing is transmitted to processing system, target characters is converted into digital form and Latin alphabet form and calls described outut device and show by processing system, described data base is for storing each target characters and the digital form of correspondence thereof and Latin alphabet form.
A kind of processing means that Chinese character is converted into numeral and the Latin alphabet the most according to claim 9, it is characterized in that, also include binary translator, described binary translator carries out the conversion of binary numeral respectively for the numeral conversion of this target characters obtained and the Latin alphabet, thus obtains two kinds of binary numerals of this target characters.
CN201610351991.7A 2016-05-25 2016-05-25 Processing system, method and device for transforming Chinese characters into numbers and Latin letters Pending CN105824793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610351991.7A CN105824793A (en) 2016-05-25 2016-05-25 Processing system, method and device for transforming Chinese characters into numbers and Latin letters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610351991.7A CN105824793A (en) 2016-05-25 2016-05-25 Processing system, method and device for transforming Chinese characters into numbers and Latin letters

Publications (1)

Publication Number Publication Date
CN105824793A true CN105824793A (en) 2016-08-03

Family

ID=56531310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610351991.7A Pending CN105824793A (en) 2016-05-25 2016-05-25 Processing system, method and device for transforming Chinese characters into numbers and Latin letters

Country Status (1)

Country Link
CN (1) CN105824793A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354701A (en) * 2016-08-30 2017-01-25 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN109271610A (en) * 2018-07-27 2019-01-25 昆明理工大学 A kind of vector expression of Chinese character
CN110413965A (en) * 2019-07-23 2019-11-05 广州国音智能科技有限公司 A kind of method, apparatus, equipment and the computer readable storage medium of Chinese character revolution word

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612091A (en) * 2003-10-30 2005-05-04 陶振维 Letter-shape-matching chinese-character inputting method
CN101169690A (en) * 2007-11-30 2008-04-30 徐海中 Latin type five-stroke input method for Chinese character
CN101196782A (en) * 2007-12-29 2008-06-11 徐贤笃 Chinese character inputting method
CN102262683A (en) * 2011-08-18 2011-11-30 何瑞芳 Method for processing Chinese character information and method for separating and storing Chinese characters
CN103076890A (en) * 2012-07-01 2013-05-01 潘昌仁 Character digitalized encoding and numeral international general reading method
CN103838393A (en) * 2014-03-03 2014-06-04 万仁芳 Chinese character structure digital literacy input method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612091A (en) * 2003-10-30 2005-05-04 陶振维 Letter-shape-matching chinese-character inputting method
CN101169690A (en) * 2007-11-30 2008-04-30 徐海中 Latin type five-stroke input method for Chinese character
CN101196782A (en) * 2007-12-29 2008-06-11 徐贤笃 Chinese character inputting method
CN102262683A (en) * 2011-08-18 2011-11-30 何瑞芳 Method for processing Chinese character information and method for separating and storing Chinese characters
CN103076890A (en) * 2012-07-01 2013-05-01 潘昌仁 Character digitalized encoding and numeral international general reading method
CN103838393A (en) * 2014-03-03 2014-06-04 万仁芳 Chinese character structure digital literacy input method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354701A (en) * 2016-08-30 2017-01-25 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN106354701B (en) * 2016-08-30 2019-06-21 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN109271610A (en) * 2018-07-27 2019-01-25 昆明理工大学 A kind of vector expression of Chinese character
CN110413965A (en) * 2019-07-23 2019-11-05 广州国音智能科技有限公司 A kind of method, apparatus, equipment and the computer readable storage medium of Chinese character revolution word

Similar Documents

Publication Publication Date Title
CN105824793A (en) Processing system, method and device for transforming Chinese characters into numbers and Latin letters
CN102053719B (en) Input method for Chinese characters
CN109086285B (en) Intelligent Chinese processing method, system and device based on morphemes
Moser The Ethics of Immutable Things: Interpreting Lü Dalin's" Illustrated Investigations of Antiquity"
Geraci Epenthesis in Italian Sign Language
CN102479078A (en) Chinese programming method for computer by using Chinese phonetic codes
CN103076890A (en) Character digitalized encoding and numeral international general reading method
CN101576924A (en) Mongolian retrieval method
CN103207684A (en) Phonemic letter double-input method
CN101587381B (en) Input method for audio-shaped characters without repeated code
Stanley To Read Images Not Words: Computer-Aided Analysis of the Handwriting in the Codex Seraphinianus.
Kumar et al. Comparative analysis of automatic sign language generation systems
CN1081353C (en) Latinized phonetic codes for modern Chinese works
CN1455358A (en) Chinese phonetic alphabet unified scheme, and single phonetic alphabet input and intelligent conversion translation
Hsieh Hanzi, Concept and Computation: A preliminary survey of Chinese Characters as a Knowledge Resource in NLP
CN100485590C (en) Chinese character input method
CN106959764A (en) It is a kind of to contribute to the code input method of correct writing Chinese characters
CN112989068B (en) Knowledge graph construction method for Tang poetry knowledge and Tang poetry knowledge question-answering system
Zhao Filial Piety in Fluidity: The Tension between the Textual and Visual Traditions of Śyāma Jātaka in Early Medieval China
CN1050629A (en) World language sign indicating number and encoding law thereof
CN1542591A (en) Chinese spelling simulation input method
CN101498966B (en) Chinese character input method by using shorthand notation
CN1108552C (en) Perfecting method (PHF) for phoenticizing Chinese charaters
Mohinur THEORITICAL ASPECTS OF COMPARATIVE LINGUISTICS
CN1328649C (en) Chinese character 'three-shape association' shape-code input method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160803