TW523681B - Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese - Google Patents

Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese Download PDF

Info

Publication number
TW523681B
TW523681B TW90101023A TW90101023A TW523681B TW 523681 B TW523681 B TW 523681B TW 90101023 A TW90101023 A TW 90101023A TW 90101023 A TW90101023 A TW 90101023A TW 523681 B TW523681 B TW 523681B
Authority
TW
Taiwan
Prior art keywords
chinese
simplified
converted
word
traditional chinese
Prior art date
Application number
TW90101023A
Other languages
Chinese (zh)
Inventor
Li-Wei Yang
Original Assignee
Eland Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eland Technologies Co Ltd filed Critical Eland Technologies Co Ltd
Priority to TW90101023A priority Critical patent/TW523681B/en
Application granted granted Critical
Publication of TW523681B publication Critical patent/TW523681B/en

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a vocabulary conversion system between traditional Chinese and simplified Chinese, which includes a data receiving module, a word-dividing processing module, a simplified/traditional Chinese conversion module, and an output module, wherein the word-dividing processing module divides the words with the data to be converted from the data receiving module according to the long-word-first word-dividing method, and forms the composite word to be converted and the vocabulary to be converted, respectively. The present invention also includes a vocabulary conversion system between traditional Chinese and simplified Chinese to accurately proceed the simplified/traditional Chinese conversion method.

Description

523681 五、發明說明(1) 【發明領域】 本發明係關於一種詞彙式繁體中文與簡體中文之轉換 系統,尤其是一種利用長詞優先斷詞法進行詞彙分割之後 再予以轉換漢字之詞彙式繁體中文與簡體中文之轉換系 統。本發明亦包括利用上述詞彙式繁體中文與簡體中文之 轉換系統,進行簡繁漢字轉換之詞彙式繁體中文與簡體中 文之轉換方法。 【習知技術】 目前世界上所使用的中文電腦系統中,通行的漢字系參 統係分為兩種:一為繁體中文系統,如我國以及香港所採 用的漢字系統;另一為簡體中文系統,如中國大陸及新加 坡所採用者。由一種漢字系統所編輯出來的文字資料並不 能直接由另一種漢字系統處理。也就是說,繁體中文系統 不能處理簡體中文文字資料;而簡體中文系統者無法處理 繁體中文文字資料。 為使電腦能同時處理二種中文資料,必須先利用簡繁 漢字轉換系統,將電腦不能辨識的漢字轉換成電腦可以辨 識之漢字。如此,使用任何一種中文系統之電腦可以同日y 處理簡/繁漢字,而不受所使用之中文系統限制。 習用之簡繁漢字轉換系統中,其簡繁漢字之轉換方 法,均使用單字對應單字之轉換運算。即,習用系統中的 漢字轉換法,係以每一個漢字單字為處理單位,由一簡/ 繁單字對應表,進行單字對單字之對應運算,然後逐字地523681 V. Description of the Invention (1) [Field of the Invention] The present invention relates to a lexical-type traditional Chinese and simplified Chinese conversion system, especially a lexical-type traditional Chinese that is converted to vocabulary using long-word priority word segmentation. Chinese and Simplified Chinese conversion system. The present invention also includes a lexical-type traditional Chinese and simplified Chinese conversion method using the above-mentioned lexical-type traditional Chinese and simplified Chinese conversion system to perform conversion of simplified Chinese characters. [Knowledge technology] Among the Chinese computer systems currently used in the world, the popular Chinese character system is divided into two types: one is a traditional Chinese system, such as the Chinese character system used in China and Hong Kong; the other is a simplified Chinese system , Such as those used in mainland China and Singapore. Text data edited by one Chinese character system cannot be processed directly by another Chinese character system. In other words, traditional Chinese system cannot process simplified Chinese text data; while simplified Chinese system cannot process traditional Chinese text data. In order for the computer to process two kinds of Chinese data at the same time, it is necessary to use the simplified Chinese character conversion system to convert Chinese characters that cannot be recognized by the computer into Chinese characters that can be recognized by the computer. In this way, a computer using any Chinese system can handle simplified / traditional Chinese characters on the same day, without being limited by the Chinese system used. In the conventional conversion system of simplified and traditional Chinese characters, the conversion methods of simplified and traditional Chinese characters all use the conversion operation corresponding to each character. That is, the Chinese character conversion method in the conventional system uses each Chinese character as the processing unit, and uses a simplified / complex single character correspondence table to perform the corresponding operation from character to character, and then verbatim.

523681 五、發明說明(2) 轉換單字,使每一單字轉換成電腦系統可辨識之漢字。 例如,圖3(a)之習用簡繁漢字轉換系統所用之簡繁漢 字轉換法步驟中,首先在資料接收步驟7〇1中,將待轉換 中文文件7 0之一漢字取出;接著,在簡繁漢字轉換步驟 702中,將所取出之漢字對照内建之單字對應表8〇,找出 相對應之譯字。如果漢字能在單字對應表8〇中具有相對應 之譯字,則將漢字進行轉換,成為單字對應表8〇中所對應 到之譯字。將待轉換中文文件7〇之每一個漢字逐一進行上 述之對應及㈣,使待轉換中文文件所有可對應於單字對 應表80之漢字均替換成所要之譯字。最後,在輸出步驟 703中輸出。藉此,待轉換中文文件7〇被轉換成電腦系统 相容之中文文件90,達成將簡體中文文件轉換成繁體中 文件或將繁體中文文件轉換成簡體中目 例如,所輸入之待轉換中文文件70為:』;=。523681 V. Description of the invention (2) Conversion of individual characters, so that each character can be converted into a Chinese character recognized by a computer system. For example, in the simplified and simplified Chinese character conversion method used in the conventional simplified and simplified Chinese character conversion system in FIG. 3 (a), first, in the data receiving step 701, one of the Chinese characters of the Chinese file to be converted 70 is taken out; In the traditional Chinese character conversion step 702, the extracted Chinese character is compared with the built-in single character correspondence table 80 to find the corresponding translated character. If the Chinese character can have a corresponding translation in the single-character correspondence table 80, the Chinese character is converted to become the corresponding translation in the single-character correspondence table 80. Each of the Chinese characters of the Chinese file to be converted 70 is correspondingly described above and ㈣, so that all the Chinese characters of the Chinese file to be converted that can correspond to the single-word correspondence table 80 are replaced with the desired translation characters. Finally, it is output in output step 703. In this way, the Chinese file to be converted 70 is converted into a Chinese file compatible with the computer system 90, so that the Simplified Chinese file can be converted into a Traditional Chinese file or the Traditional Chinese file can be converted into Simplified Chinese. For example, the input Chinese file to be converted 70 is: 』; =.

=物」時’其中各個漢字71(圖3(b))會逐 】J 應的譯字7 11 (圖3 ( c ))。 、成相對 然而’上述習用簡繁漢字轉換系統所用之 =使用簡繁漢字之人民中,某些同義之辭棄:复:二 習用簡繁漢字轉換# ί!!ϋί字資料轉換應用上存在有許多的缺Γ因 漢字轉換 不相同,甚至用語字數不同。因此 統之單字對應單字之轉換方法 成所要的另一種漢字。 —前述之例子中,英文中的p〇tat〇,在台灣的 馬鈴薯」或手于」,然而,在中國大陸的人民則= 物 ”时’ Where each of the Chinese characters 71 (Figure 3 (b)) will be translated one by one] J Ying ’s 7 11 (Figure 3 (c)). However, the relative use of the above-mentioned Chinese-Simplified Chinese-Simplified and Traditional Chinese Character Conversion System = Some synonymous renunciations among the people who use Chinese-Simplified Chinese and Traditional Chinese characters. Many of the shortcomings are different due to the conversion of Chinese characters, and even the number of words used is different. Therefore, the conversion method of the corresponding single word to another word becomes another desired Chinese character. -In the previous example, p〇tat〇 in English is "potato" in Taiwan, but people in Mainland China

第5頁 五、發明說明(3) 會叫它為「土 3 - 關係轉換時,;C體令文文件以星、 地轉換虏箝挪繁體漢字中的辭彙「£早子對應單字之 轉=間體漢字中的辭彙「土Γ,鈴著」並無法精確 字對庫單?。反之亦然’當簡體漢i ’使大眾不易理解被 :對應早字之方式轉換時,c之辭•「土豆」以單 5馬鈐薯」4「洋字」。故也J法轉換成繁體漢字之辭 決之道。 文,有必要對此-缺點提出: 【發明針 地進行 糸統。 為 之轉換簡繁漢能依照 因資料中 中優先 而成為 體中文 體漢字 種利用 簡繁漢 概要】 對上述問題 簡繁漢字轉 元成本發明 系統,包括 字轉換模組 長詞優先斷 為’斷詞處 之所有文字 分割出符合 待轉換複合 之轉換系統 之間的轉換 上述詞彙式 字轉換之詞 ’本發明之—目 換之詞彙式繁體 的為提供一種能夠確實鲁 中文與簡體中文之轉換 上述目的 一資料接 、及一輸 詞法對資 理模組依 進行詞彙 斷詞處理 詞,所以 ,能夠進 更加確實 繁體中文 彙式繁體 ,詞彙 收模組 出模組 料進行 照長詞 對應運 模組内 本發明 行複合 。本發 式繁體中文與 、'--斷詞處理 斷詞處 ,其中 分割。 優先斷算,而 最長複 之詞彙 詞之轉 簡體中文 模組、一 理模組係 對待轉換 轉換資料 複合詞, 中文與簡, 詞法’ 在該待 合詞之 式繁體 換,使繁體及簡 明之另〆目的為提供一 與簡體中文之 統,進行 換方法。 轉換系 文之轉5. Description of the invention on page 5 (3) It will be called "Tu 3-When the relationship is changed; the C-style script file converts the vocabulary in traditional Chinese characters with stars and places. = The vocabulary "土 Γ , 铃 着" in the Chinese characters cannot accurately match the library list? . The opposite is also true. When the simplified Chinese i makes it difficult for the general public to understand: it is converted in the way corresponding to the early characters, the word of c • "potato" with a single 5 horse potatoes, 4 "foreign characters". Therefore, the J method is converted into traditional Chinese characters. In the text, it is necessary to put forward this-shortcomings: [Invented to carry out systematically. For the conversion of Simplified Chinese and Traditional Chinese, it is possible to use the Simplified Chinese and Traditional Chinese according to the type of Chinese characters that have been given priority in the materials.] Simplified Chinese and Traditional Chinese characters are converted to the invention system including the word conversion module. The text is segmented to match the conversion between the conversion systems that are to be converted. The above vocabulary-type word conversion word 'The invention of this invention-the vocabulary type of the traditional type is to provide a data connection that can truly convert between Chinese and simplified Chinese.' The vocabulary and word processing are performed on the asset management module according to the word input method. Therefore, it is possible to enter more traditional Chinese and traditional Chinese vocabulary. Line compound. Traditional Chinese with this style, and '-word-breaking processing Word-breaking, where segmentation. Priority calculation, while the longest complex vocabulary words are converted to simplified Chinese modules and one literacy modules are the compound words of the data to be converted. 〆The purpose is to provide a unified method with simplified Chinese. Conversion System

第6頁 五 、發明說明(4) 【較佳貫施例之詳細說明】 良 以下將參照相關圖式,翊昍分丄 菜式繁豸中文與簡ϋ中文之轉 ^發明車交佳實施例之詞 將以相同的參考符號加以說明。、'、、41 中相同的元件 參考圖1 、圖2(a)〜2(M ,士政 2體中文之轉換系統包括本气之:囊式繁體中文與 斷詞處理模組20、一簡繁二括鐘二資料接收模組10、-40。 1繁漢子轉換模組30、及一輸出模組 資料接收模組1 〇,係 :換資料5。進入本發明之;=;:轉換資料5。’使_ 換系統}中。資料接收 式繁體中文與簡體中文之轉 收任# _ 接收^ 的例子為利用網際網路來接 仕何間繁漢字資料之斂 丨不、,问給木得 為顯示在顯干蒂i,更體、、且a 。待轉換資料50的例子 相似之的任何簡繁漢字資料、資訊、文件及 斷詞處理模組20,复佑日刀旦1復a钱 料接收模組1〇的待轉換資優先斷詞法,對來自資 待轉換複合及待轉換字而分別形成 主電子辭β吳早子52。斷詞處理模組20包括一 逼子辭典22、以及一對應運算次模組24。 字。子”於儲存大量的簡繁漢字複合詞和單· 料,脾%植、應運异次模組24可基於主電子辭典22之資 割i : &到的待轉換資料50進行對應運算,而予以分 由貝料接收模組10傳送來的資料,基於主電子辭典22 第7頁 五、發明說明⑸ ☆ί中應運算次模組24進行詞囊對應運算而予以 行。在ilH對應運算係採取長詞優先斷詞的方式進 處理桓㈣::所謂之長詞優先斷詞法係指,利用斷詞 典22 C轉換資,中之所有文字,基於主電子辭 優先:=二;對應運算,而在待轉換資料5。中 而成為= 理模組20内最長複合詞之複合詞, 種植I: '闰繁體漢字之待轉換資料5°之句子「馬鈴薯是- 「馬】二會t割成「馬」、「鈴著」或· 別形成待轉換ii詞割其::轉”料5°,分 轉換單字52「是」(圖2(c))。 植物」,农後留下待 繁對】ί==Γ30包括,—簡繁漢字辭典32、-簡 Γ0用於分別依照複合詞及單字之繁體/簡體/ 待轉:二:二斷= ;Γ20的待轉換複合詞51及 ^个卞J d之子體分別轉換成所要的字體。 字辭典32包括儲存有大量之簡體漢 ,,子,菜之對照關係的簡繁用語辭典322、 繁# ==字單字及繁體漢字單字之對照關係 料,ΐ m運算次模組34能利用簡繁漢字辭典&内之資 ’、自畊詞處理模組24之待轉換複合詞51及待轉換單 523681 五、發明說明(6) 字5 2分別進行對照運算。 例如,在圖2(c)中,待轉換複合詞51「馬鈐薯 ;至相對應複合詞511「土豆」、及待轉換複合詞51,一中 」「έ對應至相對應複合詞5 11「一种」,而待轉換單字 52 是」則對應至相對應單字521「是」。 、 、軍瞀Ϊ換運算次模組36 ’可利用簡繁對應運算次模組34之 ^果,分別依照複合詞及單字之繁體/簡體漢字對昭 係,將待轉換複合詞51及待轉換單字52之字體分別換 的對應複合詞511及對鮮字521,因而將待轉換資 斟雍的f成所要的已轉換資料6〇。例如,依照圖2(d)之相· 3關係’待轉換複合詞51「馬鈴著」會轉換成相對應複 :U互」、「及待轉換複合詞51「一種」會轉換成 」’而待轉換單字52「是」則被 轉換成相對應單字5 2 1「是」。 、"最後,在被分割之待轉換資料50中,其他不存在於上 述簡繁對應運算次模組34之運算結果内的單字,並不經由 運算次模組36之轉換運算,而直接輪出。其例子:一 些私點符號(未示)。 輸出模組40可為任何能夠輸出上述轉換結果之軟硬 ίΐ敏如—顯示螢幕。士。前述之例子,藉上述本發明之: 、菜:體中文與簡體中文之轉換系統1所處理之待轉換資 料50馬鈴薯是一種植物」被轉換成已轉換資料6〇「土豆 是一种植物」(圖2 (e))。 、 藉由上述本發明之詞彙式繁體中文與簡體中文之轉換Page 6 V. Description of the invention (4) [Detailed description of the preferred embodiment] The following will refer to the related drawings to distinguish between the traditional Chinese and simplified Chinese ^ the invention of the car Words will be explained with the same reference symbols. The same components in, ', and 41 are referred to FIG. 1 and FIG. 2 (a) to 2 (M, Shizheng 2 body Chinese conversion system includes the essence of qi: traditional Chinese and word processing module 20, a simple The second and second data receiving modules 10 and -40. The 1-man conversion module 30 and the output module data receiving module 10 are: change the data 5. Enter the invention; = ;: convert data 5. 'Using _ changing system}. The data receiving type of traditional Chinese and simplified Chinese is transferred to accept # _ The example of receiving ^ is the use of the Internet to access the convergence of Chinese traditional Chinese character data. No, ask Weimu It can be displayed in Xiangandi i, more tangible, and a. Any simplified and complex Chinese character data, information, documents and word segmentation processing module 20 similar to the example of the data to be converted 50, Fu You Ri Dao Dan 1 Fu a The to-be-converted asset prior to word segmentation of the money receiving module 10 is to form the main electronic word β Wu Zaozi 52 from the to-be-converted compound and the to-be-converted word respectively. The word segmentation processing module 20 includes a dice dictionary 22, and One corresponds to the operation sub-module 24. 字. 子 ”is used to store a large number of simple and complex Chinese compound words and lists. The spleen is planted, The different sub-module 24 can perform corresponding operations based on the data of the main electronic dictionary 22: & to-be-converted data 50, and divide the data transmitted by the shell material receiving module 10 based on the main electronic dictionary 22 5. Explanation of the invention on page 7 ⑸ ☆ The middle operation module 24 should be used to perform the word bag correspondence operation. In the ilH correspondence operation system, the long word priority hyphenation method is adopted. 桓 ㈣: The so-called long word priority hyphenation Lexics refers to the use of 22 C to convert data, all the words in the dictionary, based on the main electronic word priority: = two; corresponding operations, while in the data to be converted 5. becomes the longest compound word in the module 20 compound word Planting I: '闰 Traditional Chinese characters to be converted data 5 ° sentence "Potato is-" horse "Erhui t cut into" horse "," bell with "or · do not form the word to be converted ii cut it :: turn" If the data is 5 °, the conversion word 52 is "Yes" (Fig. 2 (c)). The plant is "left after the farming and will be left intact"] == Γ30 Includes, —Simplified Chinese Traditional Chinese Dictionary 32, and -Simplified Γ0 are used in accordance with Traditional / simplified / complex for compound words and words: two: two breaks =; Γ20 to be converted compound 51 and ^ 个 卞 J d 的 子 体 are converted into the desired font respectively. The word dictionary 32 includes a simple and complex dictionary 322, 繁 # == 字 单字 and 繁 繁体字 which stores a large number of simplified Chinese, Chinese, and Chinese dishes. Contrast data, the 运算 m operation sub-module 34 can use the simplified and complex Chinese character dictionary & internal resources', the self-cultivated word processing module 24's to-be-converted compound word 51 and the to-be-converted form 523681 5. Invention description (6) Word 5 2 respectively perform a comparison operation. For example, in FIG. 2 (c), the compound word 51 to be converted is “horse potato”; the corresponding compound word 511 is “potato”; and the compound word to be converted 51, “one” is corresponding to the corresponding compound word. 5 11 "one", and the word 52 to be converted "corresponds to the corresponding word 521" yes ". ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, And through the use of, simplified Chinese-Simplified, The fonts are changed to correspond to the compound word 511 and the fresh word 521, respectively, so f to be converted into the desired converted data 60. For example, according to the phase of Fig. 2 (d), the 3 relationship 'to be converted compound 51 "Ma Lingzhe" will be converted into the corresponding complex: U mutual "," and the compound 51 to be converted "one" will be converted into "' and wait The conversion word 52 "yes" is converted into the corresponding word 5 2 1 "yes". Finally, in the segmented to-be-converted data 50, other words that do not exist in the calculation result of the above-mentioned simplified-simplified-correspondence calculation sub-module 34 are not directly converted by the calculation sub-module 36. Out. Examples: some private symbols (not shown). The output module 40 can be any software or hardware capable of outputting the above conversion results. Taxi. In the foregoing example, the above-mentioned invention is used: The data to be converted processed by the conversion system 1 for vegetables: Chinese and simplified Chinese 50 potato is a plant "is converted into converted data 60" potato is a plant "( Figure 2 (e)). With the conversion of the lexical traditional Chinese and simplified Chinese of the present invention,

523681 五、發明說明(7) --— 系統1,詞彙對應運算係採取長詞優先斷詞的方式進行, 使,土 =斷巧處理模組2 0内最長複合詞之複合詞形成待轉 換複合㈣5 1。待轉換複合詞5 i再藉由簡繁漢字轉換模組3〇 進行轉換成為合適之用語。 f别述之例子,繁體漢字系統中的漢字資料句子「馬 ,薯=一種植物」中,用語「馬鈴薯」將被優先選出,再 藉由簡繁漢字轉換模組3 0將用語「馬鈐薯」轉換成合適之 簡體=字系統用語「土豆」;接著選丨「—種」、「植 物」分別進行轉換;然後進行單字對單字之轉換,最後形 成所要=已轉換資料6 〇「土豆是一种植物」。如此,整句籲 ^ ί漢Ϊ系統中的各用語及單字都能確實的轉換,在習用 單字對,字轉換系統中用語轉換不良的問題,不會在本發 明之词彙式繁體中文與簡體中文之轉換系統丨中發生。 關於本發明之詞彙式繁體中文與簡體中文之轉換方 法’/其^係利用上述詞彙式繁體中文與簡體中文之轉換系統 1進行簡繁漢字之轉換方法。配合圖i、圖2 (a) — 2 (e),詳 細說明本發明之詞彙式繁體中文及簡體中文之轉換方法如 下0 本發明之詞彙式繁體中文與簡體中文之轉換方法 要包括以下步驟·· 一資料接收步驟101、一斷詞處理步 102、一簡繁漢字轉換步驟1Q3、及一輸出步驟。 /在資料接收步驟1〇1中,其利用一資料接收模組1〇讀 取待轉換資料50。如前文所述,資料接收模組丨〇可° 網際網路來接收任何簡繁漢字資料之軟硬體組合。所用 丨ί要收523681 V. Description of the invention (7) --- System 1, the vocabulary correspondence calculation system adopts the long word priority hyphenation method, so that the soil = the broken skill processing module 2 The longest compound word in 0 forms the compound to be converted ㈣ 5 1 . The compound word 5 i to be converted is then converted into a suitable term by the simplified Chinese character conversion module 30. f An example that is not mentioned. In the Chinese character data sentence "horse, potato = a plant" in the traditional Chinese character system, the term "potato" will be selected first, and then the simplified Chinese and traditional Chinese character conversion module 3 0 will use the term "horse potato" ”Into the appropriate simplified = word system term" potato "; then select 丨" species "and" plant "to convert them separately; then convert from word to word, and finally form the desired = converted data 6 〇" Potato is a Planting. " In this way, the entire sentence ^ ί Hanyu system can accurately convert the terms and words, the problem of poor word conversion in the conventional word pair, word conversion system, will not be in the traditional Chinese and simplified vocabulary of the present invention Chinese translation system 丨. Regarding the conversion method of the lexical-type traditional Chinese and simplified Chinese of the present invention '/ its ^ is a conversion method of simplified Chinese characters using the above-mentioned conversion system 1 of lexical-type traditional Chinese and simplified Chinese. With reference to Figures i and 2 (a)-2 (e), the conversion method of vocabulary traditional Chinese and simplified Chinese of the present invention is explained in detail as follows. The conversion method of vocabulary traditional Chinese and simplified Chinese of the present invention includes the following steps. A data receiving step 101, a word segmentation processing step 102, a simplified Chinese character conversion step 1Q3, and an output step. / In the data receiving step 101, it uses a data receiving module 10 to read the data to be converted 50. As mentioned above, the data receiving module 丨 〇 can receive any combination of software and hardware of Simplified and Traditional Chinese characters on the Internet. Used 丨 ί To be collected

523681 五、發明說明(8) 的待轉換資料50被送到下一個步驟進—+ 接下來進行斷詞處理步驟J 〇 2。苴乂、处王 組20,依照長詞優先斷詞法Ύ -斷詞處理模 轉換資料5。進行詞囊分割,而模組1?的待 待轉換單车r 9 ^ ^ ^ ^ ^ 成待轉換複合詞5 1及 竹得換早子52。所謂之長詞優先 不再詳加解釋。 』决如刖文所述,在此 如前述之例子,繁體漢字之待轅 鈐箸是-種植物」(圖2(b)),「馬寺2;= 之句子「馬 優先分割成為待轉換複合詞51 “η-辭將 「鈴薯」或「馬鈴」、「薯」。接^刀割成馬」、 細,細成待轉換複合詞51τ::分割其他麵 最後留下待轉換單字52「是」(圖2(c))。」 植物」, 其次,在簡繁漢字轉換步驟1〇3中,以一 換模組30,分別依照複合詞及單字之繁 漢字、 關係(如前文所述係使用簡繁漢字辭典32之;=== 辭典322及簡繁漢字對應表324來完成),將來自斷詞處王^ 模組20的待轉換複合詞51及待轉換單字52之字體分別轉換 成所要的字體(如前文所述,由轉換運算次模組36完成)。 例如,在圖2 (d)中’待轉換複合詞5丨「馬鈴薯」會對 應至相對應複合g 5 11「土丑」、及待轉換複合詞5 1「一着 種」會對應至相對應複合^ 5 11「一种」,而待轉換單字 5 2「疋」則對應至相對應單字5 2 1「是」。接著,依照圖 2 (d)之相對應關係,待轉換複合詞5 1「馬鈴薯」會轉換成 相對應複合詞5 11「土豆」(圖2 (d ))、及待轉換複合詞5 j523681 V. The description of the invention (8) The data to be converted 50 is sent to the next step — + Next, the word segmentation processing step J 〇 2 is performed.处, the king group 20, according to the long word priority word segmentation method Ύ-word segmentation processing module conversion data 5. Segmentation of words is performed, and the bicycle to be converted r 9 ^ ^ ^ ^ ^ in module 1? Is converted into the compound word 5 1 to be converted and Zhude to change early child 52. The so-called long word priority will not be explained in detail. "As stated in the scriptures, here as in the previous example, the traditional Chinese characters are treated as" plants "(Figure 2 (b)), and the sentence" horse temple 2; = "horse is divided into the pending conversion" Compound 51 "η-Civilian" "bell potato" or "horse bell", "potato". Then cut into a horse ", thin, into the compound word 51τ :: to split other faces, and finally leave the word 52" yes "to be converted (Figure 2 (c)). "Plant", secondly, in the conversion step of Simplified Chinese and Traditional Chinese Characters 103, a module 30 is used to change the complex characters and relationships of the compound words and single characters respectively (as mentioned above, the Simplified Chinese Traditional Chinese Dictionary 32 is used; == = Dictionary 322 and Simplified and Traditional Chinese Characters Correspondence Table 324), convert the fonts of the compound word 51 to be converted and the word 52 to be converted from the word segmentation king ^ module 20 into the desired font (as described above, by conversion Computational module 36 is completed). For example, in FIG. 2 (d), the compound to be converted 5 丨 “potato” would correspond to the corresponding compound g 5 11 “Earth”, and the compound to be converted 5 1 “one seed” would correspond to the corresponding compound ^ 5 11 "one", and the word 5 2 "疋" to be converted corresponds to the corresponding word 5 2 1 "yes". Next, according to the corresponding relationship in Figure 2 (d), the compound word 5 1 "potato" to be converted will be converted into the corresponding compound word 5 11 "potato" (Figure 2 (d)), and the compound word 5 to be converted 5 j

^23681 五、發明說明(9) 而待3換;對應複合詞511「-种」(圖2(d)), (圖2(d))。予 疋」則被轉換成相對應單字521「是」 表後’在輪出步驟,— 繁漢字轉換模組3〇的轉換 雨出模組,輸出簡 轉換之文字直接於 、、、、σ果,並將待轉換資料50中未經 與簡體中文之轉二法本發明之詞囊式繁體中文 繁體中文與簡體中文之轉:元成。藉上述本發明之詞彙式 「馬鈴薯是一種植物」被魅f法所處理後,待轉換資料5 0 种植物」(圖2(e))。 得換成已轉換資料60「土豆是一 如前所述,由於本發 ^ ( 之轉換方法係使用長詞優,词彙式繁體中文與簡體中文 換方法中的用語轉換不以詞〉去’所以習用簡繁漢字; 法中。 問題,不會發生在本發明之方 以上所述僅為舉例性, 本發明之精神與範疇,而斜其二限帝〗性者。任何未脫離 應包括於後附之申請專利範^中仃之等效修改或變更,均^ 23681 V. Description of the invention (9) and 3 to be replaced; Corresponds to the compound word 511 "-species" (Figure 2 (d)), (Figure 2 (d)). "疋" is converted into the corresponding word 521 "Yes" after the table 'in the rotation step, — the conversion of the traditional Chinese character conversion module 30, the rain output module, and the output of the simplified conversion text directly to the ,,,, σ results And convert the traditional Chinese Traditional Chinese and Simplified Chinese in the data 50 to be converted without conversion to Simplified Chinese: Yuancheng. According to the above-mentioned lexical form of the present invention, "potato is a plant", after being processed by the charm method, 50 types of plants are to be converted into data "(Fig. 2 (e)). You have to replace it with converted data 60 "Potato is as described above. Since the conversion method of this post uses long words, the conversion of vocabulary in traditional Chinese and simplified Chinese is not based on the word"> " Therefore, the traditional and simplified Chinese characters are used. The problem does not occur in the above aspects of the present invention. The above description is only exemplary, and the spirit and scope of the present invention, and the two are limited to the emperor. Anything that does not leave should be included in Equivalent amendments or changes in the attached patent application ^

第12頁 523681 圖式簡單說明 【圖式之簡單說明】 圖1為一示意圖,顯示依本發明較佳實施例之詞& 繁體中文與簡體中文之轉換系統。 菜式 圖2(a)為一示意圖,顯示依本發明較佳實施例之詞& 式繁體中文與簡體中文之轉換方法的步驟。 °菜 圖2 (b)為一示意圖,顯示依圖2 (a)之本發明〜 繁體中文與簡體中文之轉換方法的步驟 理的待轉換資料。 1要處 ί 實於2 (^)為一示意圖,顯示在依照圖2(a)之本菸明# a 也列之同彙式繁體中文與*較佳 理步驟之後,样鑪換資粗:間體中文之轉換方法的斷詞處 單字。 待轉換貝枓被切割成待轉換複合詞及待轉換 實施=文:示在依照圖2⑷之本發明較佳 J夂早子之對應關係。 Γ應之復舍^ 圖2(e)為一干音圖,站_ 實施例之詞囊式=文2::在依照圖2(a)之本發明較佳 換資料。式繁體中文與簡體中文之轉換方法後的已轉 第13頁 523681 圖式簡單說明 圖3 (a)為一示意圖,顯示習用之簡繁漢字轉換方法步 圖3 ( b)為一示意圖,顯示將要依照圖3 (a)之習用簡繁 漢字轉換方法步驟進行簡繁漢字轉換之待轉換中文文件。 圖3 ( c)為一示意圖,顯示依照圖3 ( a )之習用簡繁漢字 轉換方法步驟進行簡繁漢字轉換之已轉換中文文件。 【圖式符號說明】 1 詞彙式繁體中文與簡體中文之轉換系統 101 資料接收步驟 102 斷詞處理步驟 103 簡繁漢字轉換步驟 1 04 輸出步驟 10 資料接收模組 20 斷詞處理模組 22 主電子辭典 φ 24 對應運算次模組 30 簡繁漢字轉換模組 32 簡繁漢字辭典 322 簡繁用語辭典 324 簡繁漢字對應表Page 12 523681 Brief description of the drawings [Simplified description of the drawings] FIG. 1 is a schematic diagram showing a conversion system of words & traditional Chinese and simplified Chinese according to a preferred embodiment of the present invention. Dish Fig. 2 (a) is a schematic diagram showing the steps of a conversion method of the traditional & simplified Chinese word & style according to the preferred embodiment of the present invention. ° Vegetable Fig. 2 (b) is a schematic diagram showing the steps of the method for converting traditional Chinese and simplified Chinese according to the invention of Fig. 2 (a). 1 要 处 ί Real 2 (^) is a schematic diagram showing the following sample of the same type of traditional Chinese and * better rationale following the same convergent traditional Chinese as shown in Figure 2 (a): Words in the Chinese word segmentation method. The to-be-converted betel is cut into the to-be-converted compound words and to-be-converted. Implementation = text: The correspondence relationship between the preferred J 夂 early child of the present invention shown in FIG. 2⑷ is shown. Γ 应 之 复 ^ Figure 2 (e) is a dry sound diagram, the word form of the embodiment _ embodiment = text 2 :: In the present invention according to Figure 2 (a), the information is preferably changed. The traditional Chinese and simplified Chinese conversion methods have been transferred. Page 13 523681 The diagram is briefly explained. Figure 3 (a) is a schematic diagram showing the steps of the conversion method of simplified and traditional Chinese characters. Figure 3 (b) is a schematic diagram showing the According to the steps of the traditional Chinese-Simplified Chinese-to-Chinese conversion method shown in Figure 3 (a), the Chinese-to-be-converted Chinese file is converted. Fig. 3 (c) is a schematic diagram showing a converted Chinese file for converting Simplified and Traditional Chinese characters according to the steps of the conventional Simplified and Traditional Chinese character conversion method of Fig. 3 (a). [Illustration of Graphical Symbols] 1 Lexical Traditional Chinese and Simplified Chinese Conversion System 101 Data Receiving Step 102 Word Breaking Processing Step 103 Simplified Traditional Chinese Character Conversion Step 1 04 Output Step 10 Data Receiving Module 20 Word Breaking Processing Module 22 Main Electronics Dictionary φ 24 Correspondence calculation module 30 Simplified Chinese-traditional Chinese character conversion module 32 Simplified Chinese-traditional Chinese character dictionary 322 Simplified-traditional Chinese dictionary 324 Simplified-traditional Chinese character correspondence table

第14頁 523681 圖式簡單說明 324 簡繁漢字對應表 34 簡繁對應運算次模組 36 轉換運算次模組 40 輸出模組 50 待轉換資料 51 待轉換複合詞 511 對應複合詞 52 待轉換單字 521 對應單字 60 已轉換資料 70 待轉換中文文件 71 漢字 711 譯字 80 單字對應表 90 已轉換中文文件Page 14 523681 Simple explanation of the diagram 324 Chinese-Simplified Chinese correspondence table 34 Chinese-Simplified Chinese correspondence module 36 Conversion module 40 Output module 50 Data to be converted 51 Compound word to be converted 511 Corresponding word 52 Corresponding word 521 Corresponding word 60 Converted data 70 Chinese documents to be converted 71 Chinese characters 711 Translated characters 80 Word correspondence table 90 Chinese documents converted

第15頁Page 15

Claims (1)

523681 六、申請專利範圍 包 含:種詞彙式繁體中文與簡體中文之轉換系統 資料接收模組,用於接收待轉換資料; 接收模二詞Λ理模組,依照一斷詞法,對來自該資料 複合詞及待轉枓進打詞彙分割,而分別形成待轉換 繁辦/#二簡繁漢字轉換模組,分別依照複合詞及單字之 ^^^^4S^ M ^ ##^ 寺轉換早子之子體分別轉換成所要的字體·及 換社要—輸出模組,用於輸出該w繁漢字轉換模組的轉< 、、°果亚將该待轉換資料中未經轉換之文字直接輸出。 # Λ't請專利範圍第1項所述的詞囊式繁體中文盘fi 體中文之轉換系統,其中 丁又興間 該斷詞法為長詞優先斷詞法,係 組對該待轉換資料中之所有文字進行詞彙對库運管處里上 該待轉換資料中優先分割出符合斷詞:心= 詞之複合詞,而成為待轉換複合詞。 、取長禝合 3·如申請專利範圍第丨項所述的 體中文之轉換系統,其中 j菜式繁體中文與簡 该斷祠處理模組包括一儲存簡繁漢字複人士 字的主電子辭典、以及一利用該主 子硬//和早 电丁辭典内之資料進行523681 6. The scope of the patent application includes: a vocabulary conversion system of traditional Chinese and simplified Chinese data receiving module for receiving the data to be converted; the receiving module is a two-language module based on a word segmentation method. Compound words and to-be-converted words are divided into vocabulary segments, and the to-be-converted complex / # 二 简繁 汉字 转换 Character conversion modules are formed, respectively, according to compound words and single characters ^^^^ 4S ^ M ^ ## ^ Convert them into the required fonts and change the company's requirements-output module, which is used to output the conversion of the traditional Chinese character conversion module <, °, and Goya directly outputs the unconverted text in the data to be converted. # Λ't Please refer to the conversion system of the traditional Chinese disc fi body Chinese as described in item 1 of the patent scope. Among them, Ding Youxing's word segmentation is the long word priority segmentation method, which is the group of the data to be converted. All the words in the vocabulary shall be prioritized in the data to be converted in the warehouse transportation management office to match the word segmentation: heart = word compound, and become the compound word to be converted. 3. Take the combination 3. The conversion system of Chinese as described in item 丨 of the scope of application for patents, in which the j dish traditional Chinese and Jane Broken Temple processing module includes a master electronic dictionary that stores Jane and Traditional Chinese characters and people. , And one using the information in the master's hard // and early electricity dictionary 第16頁 523681 六、申請專利範圍 詞彙對應運算之對應運算次模組。 4. 如申請專利範圍第1項所述的詞彙式繁體中文與簡 體中文之轉換系統,其中 該簡繁漢字轉換模組包含一簡繁漢字辭典、一利 用該簡繁漢字辭典内之資料,對來自該斷詞處理模組之待 轉換複合詞及待轉換單字分別進行對應運算之簡繁對應運 算次模組、以及利用該簡繁對應運算次模組之運算結果進 行轉換運算之轉換運算次模組。 « 5. 如申請專利範圍第4項所述的詞彙式繁體中文與簡 體中文之轉換系統,其中 該簡繁漢字辭典包含一簡繁用語辭典及一簡繁漢 字對應表。 6. 如申請專利範圍第1項所述的詞彙式繁體中文與簡 體中文之轉換系統,其中該資料接收模組係利用網際網路 來接收資料。 7. 如申請專利範圍第1項所述的詞彙式繁體中文與簡# 體中文之轉換系統’其中該輸出模組為一顯示螢幕。 8. —種詞彙式繁體中文與簡體中文之轉換方法,其包 含下列步驟:Page 16 523681 6. Scope of patent application Correspondence calculation module of vocabulary correspondence calculation. 4. The lexical-type traditional Chinese and simplified Chinese conversion system described in item 1 of the scope of patent application, wherein the Simplified and Traditional Chinese Character Conversion Module includes a Simplified and Traditional Chinese Character Dictionary, and using information in the Simplified and Traditional Chinese Character Dictionary, the Simplified Chinese-Traditional and Chinese-English correspondence sub-modules for performing corresponding operations on the to-be-converted compound words and to-be-converted words from the word segmentation processing module, and conversion operation sub-modules for performing conversion operations using the operation results of the Simplified-Chinese-Traditional Chinese correspondence processing module . «5. The conversion system of vocabulary traditional Chinese and simplified Chinese as described in item 4 of the scope of patent application, wherein the Simplified and Traditional Chinese Dictionary contains a Simplified and Traditional Chinese Dictionary and a Simplified and Traditional Chinese Character Correspondence Table. 6. The vocabulary traditional Chinese and simplified Chinese conversion system described in item 1 of the scope of patent application, wherein the data receiving module uses the Internet to receive data. 7. The conversion system of vocabulary traditional Chinese and simplified # Chinese as described in item 1 of the scope of patent application, wherein the output module is a display screen. 8. — A vocabulary conversion method between traditional Chinese and simplified Chinese, which includes the following steps: 第17頁 六、申請專利範圍 =貝料接收步驟,直利 轉換資料; 〃用一資料接收模組讀取待 一断詞處理+驟 ^ ”斷詞法,對來自資料接―::處理模組^ 菜分割…別形成待轉換==寺:換資料進行詞 分別依照複合詞及Λ敏驟,以—簡繁漢字轉換模組, 自斷詞處理模么且的mf體/簡體漢字對照關係’將來 轉換成所要:字複合詞及待轉換單字之字體分別 換模組的轉要驟’以一輪出模組’冑出該簡繁漢字轉· 直接輸出。、、、、°果,並將該待轉換資料中未經轉換之文字 9 如由士主 體中令—清專利範圍第8項所述的詞彙式繁體中文與簡 Τ又之轉換方法,其中 轉換資料该長詞優先斷詞法,係利用斷詞處理模組對該待 資料ί 中之所有文字進行詞彙對應運算’而在該待轉換 詞, < 先分割出符合斷詞處理模組内最長複合詞之複合 成為待轉換複合詞。Page 17 6. Scope of patent application = Steps to receive shellfish, convert data straightforwardly; 〃 Use a data receiving module to read a word break processing + step ^ "word break, connect to the data from the ::: processing module ^ Dish segmentation ... Don't form it to be converted == Temple: Change the data according to the compound word and Λmin step, use the simplified Chinese-traditional Chinese character conversion module, the self-segment processing module, and the mf style / simplified Chinese character control relationship in the future Conversion into the desired: the character compound word and the font of the single character to be converted, respectively, to change the module's turn step 'out the module' in a round to output the simple and complex Chinese character conversion directly output. ,,,, °, and the conversion Unconverted text in the data 9 The conversion method of lexical-type traditional Chinese and simplified Chinese as described in item 8 of the patent subject's order-Qing patent scope, in which the long word is preceded by the word segmentation method. The word processing module performs a lexical correspondence operation on all the words in the data to be converted ', and in the word to be converted, < first divides the compound that matches the longest compound word in the word segmentation processing module into a compound to be converted. I 第18頁I Page 18
TW90101023A 2001-01-17 2001-01-17 Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese TW523681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW90101023A TW523681B (en) 2001-01-17 2001-01-17 Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW90101023A TW523681B (en) 2001-01-17 2001-01-17 Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese

Publications (1)

Publication Number Publication Date
TW523681B true TW523681B (en) 2003-03-11

Family

ID=28037013

Family Applications (1)

Application Number Title Priority Date Filing Date
TW90101023A TW523681B (en) 2001-01-17 2001-01-17 Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese

Country Status (1)

Country Link
TW (1) TW523681B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8600729B2 (en) 2010-11-03 2013-12-03 Institute For Information Industry Method and system for co-occurrence-based text conversion
CN103885941A (en) * 2012-12-24 2014-06-25 鸿富锦精密工业(深圳)有限公司 Patent application document conversion system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8600729B2 (en) 2010-11-03 2013-12-03 Institute For Information Industry Method and system for co-occurrence-based text conversion
CN103885941A (en) * 2012-12-24 2014-06-25 鸿富锦精密工业(深圳)有限公司 Patent application document conversion system and method

Similar Documents

Publication Publication Date Title
US9805024B2 (en) Anaphora resolution for semantic tagging
CN104485105B (en) A kind of electronic health record generation method and electronic medical record system
US11158349B2 (en) Methods and systems of automatically generating video content from scripts/text
US20200067860A1 (en) File sending in instant messaging application
WO2021034376A1 (en) Example based entity extraction, slot filling and value recommendation
CN110096701A (en) Message conversion processing method and device, storage medium and electronic equipment
KR20220130863A (en) Apparatus for Providing Multimedia Conversion Content Creation Service Based on Voice-Text Conversion Video Resource Matching
JP2004318510A (en) Original and translation information creating device, its program and its method, original and translation information retrieval device, its program and its method
TW523681B (en) Vocabulary conversion system and conversion method between traditional Chinese and simplified Chinese
Kirmani et al. ShortMail: an email summarizer system
CN117688220A (en) Multi-mode information retrieval method and system based on large language model
US11645472B2 (en) Conversion of result processing to annotated text for non-rich text exchange
JP2002251412A (en) Document retrieving device, method, and storage medium
KR20220130864A (en) A system for providing a service that produces voice data into multimedia converted contents
JP3465615B2 (en) Search method and apparatus and recording medium on which the method is programmed and recorded
JP2010191851A (en) Article feature word extraction device, article feature word extraction method and program
Keerthana et al. Transfiguring Handwritten Text and Typewritten Text
KR102435244B1 (en) An apparatus for providing a producing service of transformed multimedia contents using matching of video resources
Banari Applications of artificial intelligence for the resource-scarce cultural heritage domain: from language and image processing to multi-modality
WO2024166155A1 (en) Information processing device, information processing method, and program
Magomedov et al. Bazur–Linking the Languages of the Caucasus
JP4617015B2 (en) Document display device, document display method, and program
KR20220130861A (en) Method of providing production service that converts audio into multimedia content based on video resource matching
KR20220130859A (en) A method of providing a service that converts voice information into multimedia video contents
KR20220130862A (en) A an apparatus for providing a producing service of transformed multimedia contents

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees