CN1303064A - Intelligent multi-language character processing system - Google Patents

Intelligent multi-language character processing system Download PDF

Info

Publication number
CN1303064A
CN1303064A CN 00110015 CN00110015A CN1303064A CN 1303064 A CN1303064 A CN 1303064A CN 00110015 CN00110015 CN 00110015 CN 00110015 A CN00110015 A CN 00110015A CN 1303064 A CN1303064 A CN 1303064A
Authority
CN
China
Prior art keywords
coding
application program
literary composition
encoding
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 00110015
Other languages
Chinese (zh)
Inventor
张桂平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GEWEI SOFTWARE CO Ltd SHENYANG
Original Assignee
GEWEI SOFTWARE CO Ltd SHENYANG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GEWEI SOFTWARE CO Ltd SHENYANG filed Critical GEWEI SOFTWARE CO Ltd SHENYANG
Priority to CN 00110015 priority Critical patent/CN1303064A/en
Publication of CN1303064A publication Critical patent/CN1303064A/en
Pending legal-status Critical Current

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

An intelligent multi-language processing system for processing more than 2 languages on the same screen features its encode technique used in computer or printer system, including code setting means of the language for application program, language code storage area and code execution means of particular code set for storage.

Description

Intelligent multi-language character processing system
The present invention relates to the machine word disposal system, relate in particular to two or more spoken and written languages are carried out with the multi-language character processing system that shields processing.
Along with international association is increasing, the requirement of carrying out the different language processing on same computing machine and printer simultaneously is more and more stronger.Such as, require in Chinese Windows environment, both can carry out the Chinese editor or print editors such as also can carrying out English, Japanese, Russian or printing, further requirement can not only move Chinese application program and also can move the foreign language application program.
Because various countries are when carrying out computer code to national literal, and reckon without compatibility (simultaneously treated needs) with other countries' literal code, so produced the coding conflict phenomenon.This just brings difficulty to carry out the different language processing simultaneously on same computing machine or printer.
Accompanying drawing 3 is existing Chinese GB2312-80, and the coding distribution schematic diagram of Japanese SJIS (coding range among the figure does not comprise Ox7F) is among the figure, W region representation byte code area, J1, J2 region representation Japanese code area, Z region representation Chinese character code district, the Sino-Japan coding of ZJ region representation overlay region.Here, W, J1, J2, Z can be called the non-overlapped district of coding again.As known in the figure, the Chinese and japanese coding has all used the ZJ zone, so exist conflict between the Chinese and japanese literal code, exists this conflict between the existing literal code in various countries in fact.
Unicode is a worldwide character code standard, and Windows uses it in system-level character and character string processing.We can say that Unicode has improved the processing power of multi country language and characters to a certain extent.But this encoding scheme has been upset the coding of former China, Japan and Korea literal, this incompatible be exactly a problem that be difficult to solve, thereby influence applying of its scheme.
For the so-called multi-language character processing system of other class now, the encoding scheme that adopts is also fixed, also exist the incompatibility problem that is difficult to overcome, the a certain moment can only guarantee state's software its normal down use, and the operation under it of the application program of other countries will produce the mess code phenomenon.
In sum, there is the serious incompatibility problem of can not ignore in a fixing multi-lingual encoding scheme, can not make the operation simultaneously under same operating system of various countries' application software, that is to say: be difficult to be implemented in and carry out the different language processing on same computing machine and the printer simultaneously.
Purpose of the present invention just is to provide a kind of new two or more spoken and written languages are carried out with the multi-language character processing system that shields processing, and it can not cause the chaotic situation of coding, but also can move the not application program of identical text kind under same system simultaneously.
Multi-language character processing system of the present invention comprises the coding means that adopts in computing machine or the printer system, it is characterized in that: said coding means comprises following technological means:
(1) the encoding setting means planted in literary composition under the application program,
(2) subregion is provided with some literary composition kind codings conservation zone of preservation,
(3) sign of planting coding according to literary composition under the set application program of said encoding setting means makes disposal system carry out the coding executive means of setting the specific coding of preserving in a certain subregion of mapping in the said coding conservation zone with respect to application program.
Literary composition kind encoding setting means can comprise under the said application program: (1) application program string resource is obtained means, (2) will plant encoding ratio that the non-overlapped district of coding coding compares than means according to said each literal code that obtains the string resource institute mapping that means obtain and each literary composition, (3) are identified literary composition under the application program according to said encoding ratio than the comparative result of means and are planted the go forward side by side code identification sign means of line flag of coding.
Literary composition kind encoding setting means can also comprise that the hand-coding that can manually be provided with according to user's operation is provided with means under the said application program.
Can find out from above technical scheme: multi-language character processing system of the present invention adopts the object-oriented dynamic literal code scheme based on intelligent identification technology, broken the conventional thinking that people deal with problems, coding assignment and coding collision problem have been removed to solve in plane coding spatial spread to D encoding space.Like this, concerning the user, can not cause the chaotic situation of coding, but also can under same system, move the not application program of identical text kind simultaneously.
Describe multi-language character processing system of the present invention in detail below in conjunction with embodiment.
Fig. 1 is a kind of concrete organigram of multi-language character processing system of the present invention.
A kind of multi-lingual dynamic coding synoptic diagram that Fig. 2 provides for the present invention based on intelligent identification technology.
Fig. 3 is Chinese GB2312-80, the coding distribution schematic diagram of Japanese SJIS.
Fig. 4 is certain two countries' literal code distribution schematic diagram.
According to Fig. 1, multi-language character processing system of the present invention comprises the coding means that adopts in computing machine or the printer system, it is characterized in that: said coding means comprises following technological means: the encoding setting means planted in literary composition under (1) application program, (2) coding conservation zone 4 (with reference to figure 3) planted in subregion some literary composition that preservation is set, and the sign that (3) plant coding according to literary composition under the set application program of said encoding setting means makes disposal system carry out the coding executive means 5 of setting the specific coding of preserving in a certain subregion of mapping in the said coding conservation zone with respect to application program.
In the present embodiment, literary composition kind encoding setting means comprise under the said application program: (1) application program string resource is obtained means 1, (2) will plant encoding ratio that the non-overlapped district of coding coding compares than means 2 according to said each literal code that obtains the string resource institute mapping that means obtain and each literary composition, (3) are identified literary composition under the application program according to said encoding ratio than the comparative result of means 2 and are planted the go forward side by side code identification sign means 3 of line flag of coding.In addition, literary composition kind encoding setting means can also comprise that the hand-coding that can manually be provided with according to user's operation is provided with means 6 under the said application program.
In order to be implemented in the same Windows Windows localization application program of operation country variant down, at first must obtain the country origin information of this application program, determine that just this application program belongs to that state's localization software, only in this way could realize correct control to it, and an application program by localization after, do not provide relevant country origin or language message in application program (16bit) inside, unique is exactly that its information has been translated into a kind of newspeak than notable attribute, so, we have only the non-overlapped part of utilizing between various countries' literal code distribution, by identification, determine the country origin and the used speech encoding information of this program to this application program string resource.
Certainly, this identifying is a kind of operation of transparent mode to the user, promptly when application program launching, the affiliated literary composition kind of the application program of system of the present invention encoding setting means just obtain its string resource automatically and finish its speech encoding identification and the registration of recognition result.
At first be that the application program string resource is obtained the string resource that means 1 obtain application program, carry out code identification by encoding ratio than means 2 then, identify under the application program literary composition according to encoding ratio than the comparative result code identification sign means 3 of means 2 and plant the coding line flag of going forward side by side.Be the various situations that string resource to a certain application program may run into when carrying out literal code identification below:
If S=C 0C 1^C 2C 3C 4^C 5C 6C 11Be all string resources among the application A ppS, wherein Cx represents x+1 literal in this character string, ' sign cut apart in the speech of ^ ' expression character string or sign cut apart in short sentence.
Suppose that AppS is a certain coding (wherein A represents the coded set of LanA literal, and B represents the coded set of LanB literal) in listed two countries literal (LanA, LanB) coding among Fig. 4.
Then the result of all possible identification is as follows:
1), if exist a Cx (Cx ∈ S) to satisfy Cx ∈ A-A ∩ B,
Reach a conclusion: the local language coding page or leaf of AppS is LanA.
2), if exist a Cx (Cx ∈ S) to satisfy Cx ∈ B-A ∩ B,
Reach a conclusion: the local language coding page or leaf of AppS is LanB.
3), get a Cx (Cx ∈ S) and all satisfy Cx ∈ A ∩ B if appoint
Reach a conclusion: the local language coding page or leaf of AppS is LanA or LanB.
For the 3rd) the kind situation, be for further processing, promptly investigate the rationality of the semanteme of character string S when belonging to LanA or LanB.Promptly on to the Intelligent Recognition of various countries' literal code, except being judged, literal code distribution in various countries' also to carry out semantic validity check to the overlapping character string of encoding, here we provide the knowledge base of the common character string information of various countries localization window application, and practical effect is preferable.
In addition,, both can adopt system default language codes page or leaf, and also can adopt hand-coding that means 6 are set and manually be provided with for the application program of some NULI character string resource.
With reference to figure 3, when literary composition under the encoding setting means application programs is planted after coding is provided with (sign), coding executive means 5 according to this set disposal system is carried out set in a certain subregion of mapping in coding conservation zone 4 specific coding of preserving (such as execution carry out the Japanese coding task one, carry out Chinese character code task two, to carry out the task that the English-word joint encodes third-class).
Adopted dynamic coding scheme, made that the application software under arbitrary languages Windows OS does not add the operation simultaneously under same Windows OS of modification ground based on intelligent identification technology.Guarantee that the coding in each application software (comprising current process or background process) is not random, and on each application program, all can carry out multi-lingual operation.In addition for the text of different language, also can under native system, use, but also can carry out multi-lingual mixed editorial by user's code conversion in advance.

Claims (3)

1, intelligent multi-language character processing system comprises the coding means that adopts in computing machine or the printer system, it is characterized in that: said coding means comprises following technological means:
(1) the encoding setting means planted in literary composition under the application program,
(2) subregion is provided with some literary composition kind codings conservation zone of preservation,
(3) sign of planting coding according to literary composition under the set application program of said encoding setting means makes disposal system carry out the coding executive means of setting the specific coding of preserving in a certain subregion of mapping in the said coding conservation zone with respect to application program.
2, according to the said intelligent multi-language character processing system of claim 1, it is characterized in that: literary composition kind encoding setting means can comprise under the said application program: (1) application program string resource is obtained means, (2) will plant encoding ratio that the non-overlapped district of coding coding compares than means according to said each literal code that obtains the string resource institute mapping that means obtain and each literary composition, (3) are identified literary composition under the application program according to said encoding ratio than the comparative result of means and are planted the go forward side by side code identification sign means of line flag of coding.
3, according to the said intelligent multi-language character processing system of claim 2, it is characterized in that: literary composition kind encoding setting means can also comprise that the hand-coding that can manually be provided with according to user's operation is provided with means under the said application program.
CN 00110015 2000-01-05 2000-01-05 Intelligent multi-language character processing system Pending CN1303064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 00110015 CN1303064A (en) 2000-01-05 2000-01-05 Intelligent multi-language character processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 00110015 CN1303064A (en) 2000-01-05 2000-01-05 Intelligent multi-language character processing system

Publications (1)

Publication Number Publication Date
CN1303064A true CN1303064A (en) 2001-07-11

Family

ID=4580045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 00110015 Pending CN1303064A (en) 2000-01-05 2000-01-05 Intelligent multi-language character processing system

Country Status (1)

Country Link
CN (1) CN1303064A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758218B (en) * 2004-09-29 2010-09-01 微软公司 Method and computer-readable medium for consistent configuration of language support across operating system and application programs
CN102194004A (en) * 2011-05-25 2011-09-21 福州瑞芯微电子有限公司 Method for processing complex text by using Android browser

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758218B (en) * 2004-09-29 2010-09-01 微软公司 Method and computer-readable medium for consistent configuration of language support across operating system and application programs
CN102194004A (en) * 2011-05-25 2011-09-21 福州瑞芯微电子有限公司 Method for processing complex text by using Android browser

Similar Documents

Publication Publication Date Title
DE10162156B4 (en) The user navigation through multimedia file content supporting system and method
US5748953A (en) Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
CN104881406B (en) Web page translation method and system
US20030023425A1 (en) Tokenizer for a natural language processing system
CN101079031A (en) Web page subject extraction system and method
EA200000321A1 (en) LANGUAGE AUTOMATIC IDENTIFICATION SYSTEM FOR MULTI-LANGUAGE OPTICAL SYMBOL RECOGNITION
US6055365A (en) Code point translation for computer text, using state tables
MXPA04003187A (en) Idiom recognizing document splitter.
CN111178061A (en) Multi-lingual word segmentation method based on code conversion
US7051278B1 (en) Method of, system for, and computer program product for scoping the conversion of unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets
CN101008940A (en) Method and device for automatic processing font missing
Nayak et al. Odia characters recognition by training tesseract OCR engine
CN1303064A (en) Intelligent multi-language character processing system
CN100347706C (en) Method for converting PDF file to XML file
US20030200535A1 (en) System for program source code conversion
CN105573981A (en) Method and device for extracting Chinese names of people and places
CN1464430A (en) System for distinguishing organization names in Asian language writing system
CN1858701A (en) Realizing method for on-screen aid
Belaïd et al. Morphological tagging approach in document analysis of invoices
Carbonell et al. New approaches to machine translation
CN1290886A (en) Method system and computer program products for optimum byte and character processing
CN1369833A (en) Lexial system and method for conversion between unsimplified and simplified Chinese characters
US20050216495A1 (en) Conversion method for multi-language multi-code databases
CN117076806A (en) QML and WEB interface interconversion method
EP0175928A2 (en) Image understanding system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication