WO2005116865A2 - 言語判別装置、翻訳装置、翻訳サーバ、言語判別方法並びに翻訳処理方法 - Google Patents
言語判別装置、翻訳装置、翻訳サーバ、言語判別方法並びに翻訳処理方法 Download PDFInfo
- Publication number
- WO2005116865A2 WO2005116865A2 PCT/JP2005/009890 JP2005009890W WO2005116865A2 WO 2005116865 A2 WO2005116865 A2 WO 2005116865A2 JP 2005009890 W JP2005009890 W JP 2005009890W WO 2005116865 A2 WO2005116865 A2 WO 2005116865A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- language
- character
- character code
- undefined
- character string
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/263—Language identification
Definitions
- Language determination device language determination device, translation device, translation server, language determination method, and translation processing method
- the present invention relates to a language discriminating apparatus, a translating apparatus, and a translation apparatus used for automatically discriminating a language of a WEB (World Wide Web) page accessed by a user via the Internet and translating the language into a language used by the user.
- Server language discrimination method and translation processing method
- the language used for the WEB page is automatically determined (see, for example, JP-A-2000-330992).
- the present invention has been made to solve such a problem, and provides a language discriminating apparatus and a language discriminating method capable of automatically and surely performing language discrimination.
- the purpose of the present invention is to provide a translation device, a translation server, and a translation processing method using the above.
- the present invention provides the following means.
- a storage means for storing an undefined character code list to which no character is assigned in a character code table of each language, and a character of each character included in a character string to be language-determined Collating means for collating the code with an undefined character code list of each language stored in the storage means; and collation by the collating means.
- a language discriminating device comprising: a discriminating means for discriminating a language in which a corresponding character is not included in the character string as a language of the character string.
- the character code of each character included in the character string to be subjected to language discrimination is collated with the undefined character code list of each language stored in the storage means.
- a powerful language in which a character corresponding to an undefined character code is not included in the character string is determined as a language of the character string.
- the discrimination is performed using the undefined character code unique to each language, the language is used for a common encoding, such as when referring to the character encoding described on a homepage (WEB page). Language determination can be performed reliably and automatically without the possibility that the determination becomes difficult.
- the collation unit collates the character code with the undefined character code list of each language for each character included in the character string. Languages can be narrowed down quickly and reliably.
- a storage means for storing an undefined character code list to which no character is assigned in a character code table of each language, and a character of each character included in a character string to be language-determined Collating means for collating a code with an undefined character code list of each language stored in the storage means; and collation by the collating means.
- the corresponding character is not included in the character string and
- a translation unit for translating the character string whose language has been determined by the determination unit into another language.
- the character string whose language has been determined is translated into another language, so that the translation into an appropriate language can be performed by reliable language determination.
- the character code is collated with the undefined character code list of each language, so that the collation processing and the narrowing down of the language are quickly performed.
- appropriate translation can be promptly performed.
- storage means for storing an undefined character code list to which no character is assigned in the character code table of each language, and display on a WEB page accessed by a user via a terminal device Collating means for collating the character code of each character included in the given character string with an undefined character code list of each language stored in the storage means;
- a determining unit that determines a language in which a character corresponding to an undefined character code is not included in the character string as a language of the character string;
- a translation server comprising: translation means for translating a character string of a web page into another language; and display control means for displaying, on the user terminal, a web page reflecting the translation result.
- the language of the WEB page accessed by the user is automatically determined and the WEB page reflecting the translation result is displayed on the user terminal. You can enjoy surfing continuously without being aware of the differences.
- the collation processing and the narrowing down of languages can be performed quickly and reliably, and appropriate translation can be performed promptly. Therefore, a WEB page reflecting the translation result is quickly displayed. Can be done.
- Characters are assigned to the character code of each character included in the character string to be subjected to language determination and the character code table of each language for a plurality of languages,! /, Na, undefined Collating with a character code list, and as a result of the collation, a language in which the character string corresponding to an undefined character code is not included in the character string is identified as a language of the character string.
- a language discriminating step is performed.
- a common character encoding is referred to in a case where a character encoding described on a homepage (WEB page) is referred to.
- the language can be reliably and automatically determined without the possibility that the language may be difficult to determine due to encoding.
- Characters are assigned to the character code of each character included in the character string to be subjected to language discrimination and the character code table of each language for a plurality of languages,! /, Na, undefined Collating with a character code list, and as a result of the collation, a language that is not included in the character string and that corresponds to an undefined character code among the plurality of languages is referred to as the language of the character string.
- a translation processing method comprising: a step of determining a word; and a step of translating a character string whose language has been determined into another language.
- the character string whose language has been determined is translated into another language, so that the translation into an appropriate language can be performed by reliable language determination.
- a translation processing method comprising: translating the character string of the web page into another language; and displaying a web page on which the translation result is reflected on the user terminal.
- the language of the WEB page accessed by the user is automatically determined, and the WEB page reflecting the translation result is displayed on the user terminal. Continue to enjoy surfing without being aware of the differences It comes out.
- FIG. 1 is a block diagram showing a schematic configuration of a web page translation system according to an embodiment of the present invention.
- FIG. 2 is a flow chart showing the operation of a translation server used in the web page translation system of FIG.
- FIG. 3 is a flowchart showing the content of a language determination process in S4 in the flowchart of FIG.
- FIG. 4 (a) (b) is a diagram showing an example of a character code table for explaining the basic concept.
- FIG. 1 is a block diagram showing a schematic configuration of a web page translation system according to one embodiment of the present invention.
- reference numeral 1 denotes a user terminal such as a personal computer, which can be connected to the translation server 3 via the Internet 2 by a Web browser 11.
- the translation server 3 includes a network interface unit 31, an undefined character code list storage unit 32, a web page storage unit 33, a language determination unit 34, a translation unit 35, a translation file storage unit 36, A web page restructuring unit 37 and a control unit 38 are provided.
- the net interface unit 31 functions as an input / output unit that connects the Internet 2 and the translation server 3.
- the undefined character code list storage unit 32 previously stores a list of undefined character codes to which characters in the character code table are not assigned for each of a plurality of languages.
- the undefined character codes A1 to A16 in the character code table are stored as an undefined character code list.
- the language B shown in FIG. 4B is stored as an undefined character code list of undefined character codes B1 to B6 in the character code table.
- an undefined character code list is stored in advance.
- the list of undefined character codes for all languages used on the Internet is stored, but at least in the main language!
- the present invention is not limited to this, and stores an undefined character code list for a plurality of languages.
- the WEB page storage unit 33 stores the contents of the WEB page at the address specified by the user on the user terminal 1 by using a URL (Uniform Resource Locator).
- URL Uniform Resource Locator
- the language discriminating section 34 is for automatically discriminating the language of the character string displayed on the web page stored in the web page storage section 33. The specific contents of the determination processing will be described later.
- the translation unit 35 includes a plurality of translation engines corresponding to each language, and translates the character string of the WEB page whose language has been determined by the language determination unit 34 into the language used by the user. For example, if it is determined that the web page accessed by a Japanese user is a page on an English site, the content of the web page is translated into Japanese and a Chinese web page is displayed! Is translated into Japanese.
- the translation file storage unit 36 stores the translation result by the translation unit 35, and the WEB page reconstruction unit 37 reconstructs a WEB page reflecting the translation result.
- the control unit 38 controls the entire translation server 3 as a whole. For example, a web page of the URL specified by the user is fetched and stored in the web page storage unit 33, language determination and translation are performed, a translated file after translation is stored in the translation file storage unit 36, The page restructuring unit 37 reconstructs the web page reflecting the translation result, or transmits the reconstructed web page to the user terminal 1 for display.
- the control unit 38 of the translation server 3 determines whether the URL has been designated or not, and if not (NO in S1), terminates the processing. If the URL is specified (YES in S1), in S2, the control unit 38 transmits the contents of the WEB page specified by the URL to the Internet and the Internet. After acquiring through the face unit 31, the content of the acquired web page is stored in the web page storage unit 33 in S3.
- the language discriminating section 34 is displayed on the WEB page stored in the WEB page storage section 33! Determine the language of the character string. This language discrimination processing will be described later.
- the translation unit 35 When the language is determined, in S5, the translation unit 35 translates the character string of the WEB page into the user's language (for example, Japanese) by using the translation engine of the determined language, and then in S6. To save the translation file in the translation file storage unit 36.
- the web page restructuring unit 37 converts the web page from the content of the web page stored in the web page storage unit 33 and the translation file stored in the translation file storage unit 36. Reconstruct the string into the translated version. Then, in S8, the control unit 38 transmits the reconstructed contents of the WEB page to the user terminal 1 via the network interface unit 31, and terminates the processing on the translation server 3 side.
- the translated WEB page transmitted to the user terminal 1 is displayed on a display device (not shown) of the user terminal 1 so that the user can view the accessed WEB page in the user's language.
- FIG. 3 is a flowchart showing the content of the language discriminating process of S4 in the flowchart of FIG.
- the language determination unit 34 extracts the character code of the first character of the character string from the undefined character code list. It is determined whether or not one of the languages stored in the storage unit 32, for example, an undefined character code (A1 to A16 in FIG. 4A) of the A language shown in FIG.
- the first character of the character string corresponds to an undefined character code of another language, for example, the B language shown in FIG. 4 (b) (B1 to B6 in FIG. 4 (b)). It is determined whether or not to do.
- the first character of the character string is collated with the undefined character code list for all languages stored in the undefined character code list storage unit 32.
- S46 it is determined whether or not the collation of the first character with all languages has been completed. If not completed (NO in S46), the process returns to S42 and continues collation until the collation with all languages is completed. If the first character has been collated with all languages (YES in S46), it is determined in S47 whether the number of language candidates has been reduced to one.
- the process returns to S42, and the second character of the character string is used to select the language candidate by the collation processing in S42 to S46. Execute the refinement. The matching process is performed on the third character, the fourth character, and so on of the character string until the number of candidates in the language is reduced.
- the collation processing and the narrowing down of the language can be performed quickly and reliably.
- the language candidates are determined to be the languages used on the WEB page. For all the characters of the character string, all the languages are undefined.
- the language may be determined after the collation with the character code list is performed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Information Transfer Between Computers (AREA)
- Document Processing Apparatus (AREA)
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05745036A EP1760608A2 (en) | 2004-05-31 | 2005-05-30 | Language identification equipment, translation equipment, translation server, language identification method, and translation processing method |
US11/597,913 US20080281577A1 (en) | 2004-05-31 | 2005-05-30 | Language Identification Equipment, Translation Equipment, Translation Server, Language Identification Method, and Translation Processing Method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-161801 | 2004-05-31 | ||
JP2004161801A JP4384939B2 (ja) | 2004-05-31 | 2004-05-31 | 言語判別装置、翻訳装置、翻訳サーバ、言語判別方法並びに翻訳処理方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005116865A2 true WO2005116865A2 (ja) | 2005-12-08 |
Family
ID=35451530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/009890 WO2005116865A2 (ja) | 2004-05-31 | 2005-05-30 | 言語判別装置、翻訳装置、翻訳サーバ、言語判別方法並びに翻訳処理方法 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20080281577A1 (ja) |
EP (1) | EP1760608A2 (ja) |
JP (1) | JP4384939B2 (ja) |
KR (1) | KR20070049606A (ja) |
CN (1) | CN101027665A (ja) |
TW (1) | TW200606664A (ja) |
WO (1) | WO2005116865A2 (ja) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4812421B2 (ja) * | 2005-12-22 | 2011-11-09 | オリンパスイメージング株式会社 | 文字処理装置、文字処理プログラム、文字処理方法 |
US7849144B2 (en) * | 2006-01-13 | 2010-12-07 | Cisco Technology, Inc. | Server-initiated language translation of an instant message based on identifying language attributes of sending and receiving users |
US20090287471A1 (en) * | 2008-05-16 | 2009-11-19 | Bennett James D | Support for international search terms - translate as you search |
US8374873B2 (en) * | 2008-08-12 | 2013-02-12 | Morphism, Llc | Training and applying prosody models |
US8548797B2 (en) * | 2008-10-30 | 2013-10-01 | Yahoo! Inc. | Short text language detection using geographic information |
EP2680162A1 (en) | 2010-07-13 | 2014-01-01 | Motionpoint Corporation | Localisation of website content |
US8635061B2 (en) * | 2010-10-14 | 2014-01-21 | Microsoft Corporation | Language identification in multilingual text |
US9164988B2 (en) * | 2011-01-14 | 2015-10-20 | Lionbridge Technologies, Inc. | Methods and systems for the dynamic creation of a translated website |
US20120215520A1 (en) * | 2011-02-23 | 2012-08-23 | Davis Janel R | Translation System |
US8942974B1 (en) * | 2011-03-04 | 2015-01-27 | Amazon Technologies, Inc. | Method and system for determining device settings at device initialization |
US9031829B2 (en) | 2013-02-08 | 2015-05-12 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996355B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for reviewing histories of text messages from multi-user multi-lingual communications |
US9600473B2 (en) | 2013-02-08 | 2017-03-21 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9298703B2 (en) | 2013-02-08 | 2016-03-29 | Machine Zone, Inc. | Systems and methods for incentivizing user feedback for translation processing |
US8990068B2 (en) | 2013-02-08 | 2015-03-24 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996352B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for correcting translations in multi-user multi-lingual communications |
US9231898B2 (en) | 2013-02-08 | 2016-01-05 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996353B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US10650103B2 (en) | 2013-02-08 | 2020-05-12 | Mz Ip Holdings, Llc | Systems and methods for incentivizing user feedback for translation processing |
US9280537B2 (en) * | 2013-10-30 | 2016-03-08 | Google Inc. | Techniques for automatically selecting a natural language for configuring an input method editor at a computing device |
US9128930B2 (en) * | 2013-10-31 | 2015-09-08 | Tencent Technology (Shenzhen) Company Limited | Method, device and system for providing language service |
US10162811B2 (en) | 2014-10-17 | 2018-12-25 | Mz Ip Holdings, Llc | Systems and methods for language detection |
US9372848B2 (en) | 2014-10-17 | 2016-06-21 | Machine Zone, Inc. | Systems and methods for language detection |
CN104794625A (zh) * | 2015-04-28 | 2015-07-22 | 酷悠悠科技(深圳)有限公司 | 一种跨境电子商务网站运行的方法及系统 |
US10765956B2 (en) | 2016-01-07 | 2020-09-08 | Machine Zone Inc. | Named entity recognition on chat data |
WO2019060353A1 (en) | 2017-09-21 | 2019-03-28 | Mz Ip Holdings, Llc | SYSTEM AND METHOD FOR TRANSLATION OF KEYBOARD MESSAGES |
CN111274458B (zh) * | 2020-01-17 | 2023-12-01 | 中国工商银行股份有限公司 | 一种应用软件的多语言核对方法及系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020002452A1 (en) * | 2000-03-28 | 2002-01-03 | Christy Samuel T. | Network-based text composition, translation, and document searching |
US20030115040A1 (en) * | 2001-02-09 | 2003-06-19 | Yue Xing | International (multiple language/non-english) domain name and email user account ID services system |
US7013289B2 (en) * | 2001-02-21 | 2006-03-14 | Michel Horn | Global electronic commerce system |
FI20010792A (fi) * | 2001-04-17 | 2002-10-18 | Nokia Corp | Käyttäjäriippumattoman puheentunnistuksen järjestäminen |
US7225222B1 (en) * | 2002-01-18 | 2007-05-29 | Novell, Inc. | Methods, data structures, and systems to access data in cross-languages from cross-computing environments |
-
2004
- 2004-05-31 JP JP2004161801A patent/JP4384939B2/ja not_active Expired - Lifetime
-
2005
- 2005-05-30 EP EP05745036A patent/EP1760608A2/en not_active Withdrawn
- 2005-05-30 WO PCT/JP2005/009890 patent/WO2005116865A2/ja active Application Filing
- 2005-05-30 US US11/597,913 patent/US20080281577A1/en not_active Abandoned
- 2005-05-30 KR KR1020067027921A patent/KR20070049606A/ko not_active Application Discontinuation
- 2005-05-30 CN CNA2005800256074A patent/CN101027665A/zh active Pending
- 2005-05-31 TW TW094117838A patent/TW200606664A/zh unknown
Also Published As
Publication number | Publication date |
---|---|
JP2005346166A (ja) | 2005-12-15 |
CN101027665A (zh) | 2007-08-29 |
JP4384939B2 (ja) | 2009-12-16 |
EP1760608A2 (en) | 2007-03-07 |
KR20070049606A (ko) | 2007-05-11 |
TW200606664A (en) | 2006-02-16 |
US20080281577A1 (en) | 2008-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005116865A2 (ja) | 言語判別装置、翻訳装置、翻訳サーバ、言語判別方法並びに翻訳処理方法 | |
CN108399150B (zh) | 文本处理方法、装置、计算机设备和存储介质 | |
US20080172218A1 (en) | Web Page Translation Device and Web Page Translation Method | |
JP3959180B2 (ja) | 通信翻訳装置 | |
CN1494695B (zh) | 无疏漏翻译系统 | |
US20090313536A1 (en) | Dynamically Providing Relevant Browser Content | |
US8468494B2 (en) | In-line editor | |
JPH1063597A (ja) | クライアント側、サーバ側および協調部で実行するurlのスペルチェック | |
JPH11306171A (ja) | 項目情報入力方法及び記録媒体 | |
CN111079043A (zh) | 一种关键内容定位方法 | |
CN112052364B (zh) | 敏感信息检测方法、装置、设备与计算机可读存储介质 | |
CN108062468A (zh) | 一种基于图片验证码识别的网络爬虫方法 | |
JPH1153392A (ja) | 情報フィルタリング装置および同装置に適用される関連情報提供方法 | |
CN111428230A (zh) | 一种信息验证方法、装置、服务器及存储介质 | |
CN105787032B (zh) | 网页快照的生成方法及装置 | |
JP4756764B2 (ja) | プログラム及び情報処理装置並びに情報処理方法 | |
JP2009259248A (ja) | ウェブページに含まれるイメージに対してタグ付けを実行し、その結果を利用してウェブ検索サービスを提供するための方法、装置及びコンピュータ読み取り可能な記録媒体 | |
US10789245B2 (en) | Semiconductor parts search method using last alphabet deletion algorithm | |
JP2010003159A (ja) | Web利用者支援システム、Web利用者支援方法、およびWeb利用者支援プログラム | |
US20130311489A1 (en) | Systems and Methods for Extracting Names From Documents | |
KR100953627B1 (ko) | 웹 페이지에 포함되는 이미지 상의 텍스트를 판독하고 이에대한 번역 서비스를 제공하기 위한 방법, 장치 및 컴퓨터판독 가능한 기록 매체 | |
JP7116940B2 (ja) | オープンデータを効率的に構造化し補正する方法及びプログラム | |
EP1729284A1 (en) | Method and systems for a accessing data by spelling discrimination letters of link names | |
KR100811290B1 (ko) | 자연어처리를 활용한 자동기능 구현형 쇼핑몰 관리 시스템 | |
JPH11259477A (ja) | 文書処理システムおよび記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005745036 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020067027921 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580025607.4 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2005745036 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11597913 Country of ref document: US |