CN1303064A - Intelligent multi-language character processing system - Google Patents
Intelligent multi-language character processing system Download PDFInfo
- Publication number
- CN1303064A CN1303064A CN 00110015 CN00110015A CN1303064A CN 1303064 A CN1303064 A CN 1303064A CN 00110015 CN00110015 CN 00110015 CN 00110015 A CN00110015 A CN 00110015A CN 1303064 A CN1303064 A CN 1303064A
- Authority
- CN
- China
- Prior art keywords
- coding
- application program
- literary composition
- encoding
- processing system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Document Processing Apparatus (AREA)
Abstract
An intelligent multi-language processing system for processing more than 2 languages on the same screen features its encode technique used in computer or printer system, including code setting means of the language for application program, language code storage area and code execution means of particular code set for storage.
Description
The present invention relates to the machine word disposal system, relate in particular to two or more spoken and written languages are carried out with the multi-language character processing system that shields processing.
Along with international association is increasing, the requirement of carrying out the different language processing on same computing machine and printer simultaneously is more and more stronger.Such as, require in Chinese Windows environment, both can carry out the Chinese editor or print editors such as also can carrying out English, Japanese, Russian or printing, further requirement can not only move Chinese application program and also can move the foreign language application program.
Because various countries are when carrying out computer code to national literal, and reckon without compatibility (simultaneously treated needs) with other countries' literal code, so produced the coding conflict phenomenon.This just brings difficulty to carry out the different language processing simultaneously on same computing machine or printer.
Accompanying drawing 3 is existing Chinese GB2312-80, and the coding distribution schematic diagram of Japanese SJIS (coding range among the figure does not comprise Ox7F) is among the figure, W region representation byte code area, J1, J2 region representation Japanese code area, Z region representation Chinese character code district, the Sino-Japan coding of ZJ region representation overlay region.Here, W, J1, J2, Z can be called the non-overlapped district of coding again.As known in the figure, the Chinese and japanese coding has all used the ZJ zone, so exist conflict between the Chinese and japanese literal code, exists this conflict between the existing literal code in various countries in fact.
Unicode is a worldwide character code standard, and Windows uses it in system-level character and character string processing.We can say that Unicode has improved the processing power of multi country language and characters to a certain extent.But this encoding scheme has been upset the coding of former China, Japan and Korea literal, this incompatible be exactly a problem that be difficult to solve, thereby influence applying of its scheme.
For the so-called multi-language character processing system of other class now, the encoding scheme that adopts is also fixed, also exist the incompatibility problem that is difficult to overcome, the a certain moment can only guarantee state's software its normal down use, and the operation under it of the application program of other countries will produce the mess code phenomenon.
In sum, there is the serious incompatibility problem of can not ignore in a fixing multi-lingual encoding scheme, can not make the operation simultaneously under same operating system of various countries' application software, that is to say: be difficult to be implemented in and carry out the different language processing on same computing machine and the printer simultaneously.
Purpose of the present invention just is to provide a kind of new two or more spoken and written languages are carried out with the multi-language character processing system that shields processing, and it can not cause the chaotic situation of coding, but also can move the not application program of identical text kind under same system simultaneously.
Multi-language character processing system of the present invention comprises the coding means that adopts in computing machine or the printer system, it is characterized in that: said coding means comprises following technological means:
(1) the encoding setting means planted in literary composition under the application program,
(2) subregion is provided with some literary composition kind codings conservation zone of preservation,
(3) sign of planting coding according to literary composition under the set application program of said encoding setting means makes disposal system carry out the coding executive means of setting the specific coding of preserving in a certain subregion of mapping in the said coding conservation zone with respect to application program.
Literary composition kind encoding setting means can comprise under the said application program: (1) application program string resource is obtained means, (2) will plant encoding ratio that the non-overlapped district of coding coding compares than means according to said each literal code that obtains the string resource institute mapping that means obtain and each literary composition, (3) are identified literary composition under the application program according to said encoding ratio than the comparative result of means and are planted the go forward side by side code identification sign means of line flag of coding.
Literary composition kind encoding setting means can also comprise that the hand-coding that can manually be provided with according to user's operation is provided with means under the said application program.
Can find out from above technical scheme: multi-language character processing system of the present invention adopts the object-oriented dynamic literal code scheme based on intelligent identification technology, broken the conventional thinking that people deal with problems, coding assignment and coding collision problem have been removed to solve in plane coding spatial spread to D encoding space.Like this, concerning the user, can not cause the chaotic situation of coding, but also can under same system, move the not application program of identical text kind simultaneously.
Describe multi-language character processing system of the present invention in detail below in conjunction with embodiment.
Fig. 1 is a kind of concrete organigram of multi-language character processing system of the present invention.
A kind of multi-lingual dynamic coding synoptic diagram that Fig. 2 provides for the present invention based on intelligent identification technology.
Fig. 3 is Chinese GB2312-80, the coding distribution schematic diagram of Japanese SJIS.
Fig. 4 is certain two countries' literal code distribution schematic diagram.
According to Fig. 1, multi-language character processing system of the present invention comprises the coding means that adopts in computing machine or the printer system, it is characterized in that: said coding means comprises following technological means: the encoding setting means planted in literary composition under (1) application program, (2) coding conservation zone 4 (with reference to figure 3) planted in subregion some literary composition that preservation is set, and the sign that (3) plant coding according to literary composition under the set application program of said encoding setting means makes disposal system carry out the coding executive means 5 of setting the specific coding of preserving in a certain subregion of mapping in the said coding conservation zone with respect to application program.
In the present embodiment, literary composition kind encoding setting means comprise under the said application program: (1) application program string resource is obtained means 1, (2) will plant encoding ratio that the non-overlapped district of coding coding compares than means 2 according to said each literal code that obtains the string resource institute mapping that means obtain and each literary composition, (3) are identified literary composition under the application program according to said encoding ratio than the comparative result of means 2 and are planted the go forward side by side code identification sign means 3 of line flag of coding.In addition, literary composition kind encoding setting means can also comprise that the hand-coding that can manually be provided with according to user's operation is provided with means 6 under the said application program.
In order to be implemented in the same Windows Windows localization application program of operation country variant down, at first must obtain the country origin information of this application program, determine that just this application program belongs to that state's localization software, only in this way could realize correct control to it, and an application program by localization after, do not provide relevant country origin or language message in application program (16bit) inside, unique is exactly that its information has been translated into a kind of newspeak than notable attribute, so, we have only the non-overlapped part of utilizing between various countries' literal code distribution, by identification, determine the country origin and the used speech encoding information of this program to this application program string resource.
Certainly, this identifying is a kind of operation of transparent mode to the user, promptly when application program launching, the affiliated literary composition kind of the application program of system of the present invention encoding setting means just obtain its string resource automatically and finish its speech encoding identification and the registration of recognition result.
At first be that the application program string resource is obtained the string resource that means 1 obtain application program, carry out code identification by encoding ratio than means 2 then, identify under the application program literary composition according to encoding ratio than the comparative result code identification sign means 3 of means 2 and plant the coding line flag of going forward side by side.Be the various situations that string resource to a certain application program may run into when carrying out literal code identification below:
If S=C
0C
1^C
2C
3C
4^C
5C
6C
11Be all string resources among the application A ppS, wherein Cx represents x+1 literal in this character string, ' sign cut apart in the speech of ^ ' expression character string or sign cut apart in short sentence.
Suppose that AppS is a certain coding (wherein A represents the coded set of LanA literal, and B represents the coded set of LanB literal) in listed two countries literal (LanA, LanB) coding among Fig. 4.
Then the result of all possible identification is as follows:
1), if exist a Cx (Cx ∈ S) to satisfy Cx ∈ A-A ∩ B,
Reach a conclusion: the local language coding page or leaf of AppS is LanA.
2), if exist a Cx (Cx ∈ S) to satisfy Cx ∈ B-A ∩ B,
Reach a conclusion: the local language coding page or leaf of AppS is LanB.
3), get a Cx (Cx ∈ S) and all satisfy Cx ∈ A ∩ B if appoint
Reach a conclusion: the local language coding page or leaf of AppS is LanA or LanB.
For the 3rd) the kind situation, be for further processing, promptly investigate the rationality of the semanteme of character string S when belonging to LanA or LanB.Promptly on to the Intelligent Recognition of various countries' literal code, except being judged, literal code distribution in various countries' also to carry out semantic validity check to the overlapping character string of encoding, here we provide the knowledge base of the common character string information of various countries localization window application, and practical effect is preferable.
In addition,, both can adopt system default language codes page or leaf, and also can adopt hand-coding that means 6 are set and manually be provided with for the application program of some NULI character string resource.
With reference to figure 3, when literary composition under the encoding setting means application programs is planted after coding is provided with (sign), coding executive means 5 according to this set disposal system is carried out set in a certain subregion of mapping in coding conservation zone 4 specific coding of preserving (such as execution carry out the Japanese coding task one, carry out Chinese character code task two, to carry out the task that the English-word joint encodes third-class).
Adopted dynamic coding scheme, made that the application software under arbitrary languages Windows OS does not add the operation simultaneously under same Windows OS of modification ground based on intelligent identification technology.Guarantee that the coding in each application software (comprising current process or background process) is not random, and on each application program, all can carry out multi-lingual operation.In addition for the text of different language, also can under native system, use, but also can carry out multi-lingual mixed editorial by user's code conversion in advance.
Claims (3)
1, intelligent multi-language character processing system comprises the coding means that adopts in computing machine or the printer system, it is characterized in that: said coding means comprises following technological means:
(1) the encoding setting means planted in literary composition under the application program,
(2) subregion is provided with some literary composition kind codings conservation zone of preservation,
(3) sign of planting coding according to literary composition under the set application program of said encoding setting means makes disposal system carry out the coding executive means of setting the specific coding of preserving in a certain subregion of mapping in the said coding conservation zone with respect to application program.
2, according to the said intelligent multi-language character processing system of claim 1, it is characterized in that: literary composition kind encoding setting means can comprise under the said application program: (1) application program string resource is obtained means, (2) will plant encoding ratio that the non-overlapped district of coding coding compares than means according to said each literal code that obtains the string resource institute mapping that means obtain and each literary composition, (3) are identified literary composition under the application program according to said encoding ratio than the comparative result of means and are planted the go forward side by side code identification sign means of line flag of coding.
3, according to the said intelligent multi-language character processing system of claim 2, it is characterized in that: literary composition kind encoding setting means can also comprise that the hand-coding that can manually be provided with according to user's operation is provided with means under the said application program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 00110015 CN1303064A (en) | 2000-01-05 | 2000-01-05 | Intelligent multi-language character processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 00110015 CN1303064A (en) | 2000-01-05 | 2000-01-05 | Intelligent multi-language character processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1303064A true CN1303064A (en) | 2001-07-11 |
Family
ID=4580045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 00110015 Pending CN1303064A (en) | 2000-01-05 | 2000-01-05 | Intelligent multi-language character processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1303064A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758218B (en) * | 2004-09-29 | 2010-09-01 | 微软公司 | Method and computer-readable medium for consistent configuration of language support across operating system and application programs |
CN102194004A (en) * | 2011-05-25 | 2011-09-21 | 福州瑞芯微电子有限公司 | Method for processing complex text by using Android browser |
-
2000
- 2000-01-05 CN CN 00110015 patent/CN1303064A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758218B (en) * | 2004-09-29 | 2010-09-01 | 微软公司 | Method and computer-readable medium for consistent configuration of language support across operating system and application programs |
CN102194004A (en) * | 2011-05-25 | 2011-09-21 | 福州瑞芯微电子有限公司 | Method for processing complex text by using Android browser |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE10162156B4 (en) | The user navigation through multimedia file content supporting system and method | |
US5748953A (en) | Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols | |
CN104881406B (en) | Web page translation method and system | |
US20030023425A1 (en) | Tokenizer for a natural language processing system | |
CN101079031A (en) | Web page subject extraction system and method | |
EA200000321A1 (en) | LANGUAGE AUTOMATIC IDENTIFICATION SYSTEM FOR MULTI-LANGUAGE OPTICAL SYMBOL RECOGNITION | |
US6055365A (en) | Code point translation for computer text, using state tables | |
MXPA04003187A (en) | Idiom recognizing document splitter. | |
CN111178061A (en) | Multi-lingual word segmentation method based on code conversion | |
US7051278B1 (en) | Method of, system for, and computer program product for scoping the conversion of unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets | |
CN101008940A (en) | Method and device for automatic processing font missing | |
Nayak et al. | Odia characters recognition by training tesseract OCR engine | |
CN1303064A (en) | Intelligent multi-language character processing system | |
CN100347706C (en) | Method for converting PDF file to XML file | |
US20030200535A1 (en) | System for program source code conversion | |
CN105573981A (en) | Method and device for extracting Chinese names of people and places | |
CN1464430A (en) | System for distinguishing organization names in Asian language writing system | |
CN1858701A (en) | Realizing method for on-screen aid | |
Belaïd et al. | Morphological tagging approach in document analysis of invoices | |
Carbonell et al. | New approaches to machine translation | |
CN1290886A (en) | Method system and computer program products for optimum byte and character processing | |
CN1369833A (en) | Lexial system and method for conversion between unsimplified and simplified Chinese characters | |
US20050216495A1 (en) | Conversion method for multi-language multi-code databases | |
CN117076806A (en) | QML and WEB interface interconversion method | |
EP0175928A2 (en) | Image understanding system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |