SG155069A1 - Method of language coding identification and data format therefor - Google Patents

Method of language coding identification and data format therefor

Info

Publication number
SG155069A1
SG155069A1 SG200801249-4A SG2008012494A SG155069A1 SG 155069 A1 SG155069 A1 SG 155069A1 SG 2008012494 A SG2008012494 A SG 2008012494A SG 155069 A1 SG155069 A1 SG 155069A1
Authority
SG
Singapore
Prior art keywords
language
coding
text
coding identification
identification method
Prior art date
Application number
SG200801249-4A
Inventor
Yu Cheng
Tan Tze Kian
Du Fei
Original Assignee
Victor Company Of Japan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Victor Company Of Japan filed Critical Victor Company Of Japan
Priority to SG200801249-4A priority Critical patent/SG155069A1/en
Publication of SG155069A1 publication Critical patent/SG155069A1/en

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method of language coding identification for identifying different language coding is composed of the steps of: checking whether or not a text of a language to be identified is ASCII coded, applying the double-byte coding identification method; applying the Latin-based coding identification method; and applying the single-byte coding identification method sequentially to the text in case the previous step is not successful, the single-byte coding identification method is further composed of the steps of, generating a language profile of each supported language coding by analyzing a sample text and extracting occurrence possibility of all two character sequence combinations in the sample text; calculating a confidence value of each language coding by checking the occurrence possibility of all the two character sequence combinations in a text string of the sample text by the respective language profile; and selecting language coding with the highest confidence value.
SG200801249-4A 2008-02-14 2008-02-14 Method of language coding identification and data format therefor SG155069A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG200801249-4A SG155069A1 (en) 2008-02-14 2008-02-14 Method of language coding identification and data format therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SG200801249-4A SG155069A1 (en) 2008-02-14 2008-02-14 Method of language coding identification and data format therefor

Publications (1)

Publication Number Publication Date
SG155069A1 true SG155069A1 (en) 2009-09-30

Family

ID=41212346

Family Applications (1)

Application Number Title Priority Date Filing Date
SG200801249-4A SG155069A1 (en) 2008-02-14 2008-02-14 Method of language coding identification and data format therefor

Country Status (1)

Country Link
SG (1) SG155069A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020059448A1 (en) * 2000-11-13 2002-05-16 Square Co., Ltd. Message processing for handling unsupported character codes
US20030182128A1 (en) * 2002-03-08 2003-09-25 Katie Kuwata Method of encoding and decoding for multi-language applications
US20040138869A1 (en) * 2002-12-17 2004-07-15 Johannes Heinecke Text language identification
CN1916888A (en) * 2005-08-15 2007-02-21 国际商业机器公司 Method and system of identifying language of double-byte character set character data
US20070104365A1 (en) * 2004-03-31 2007-05-10 Yuki Yoshimura Automatic Character Code recognition/Display System, Method, and Program Using Mobile Telephone
CN101055593A (en) * 2007-06-15 2007-10-17 中国科学院软件研究所 Tibetan web page and its code identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020059448A1 (en) * 2000-11-13 2002-05-16 Square Co., Ltd. Message processing for handling unsupported character codes
US20030182128A1 (en) * 2002-03-08 2003-09-25 Katie Kuwata Method of encoding and decoding for multi-language applications
US20040138869A1 (en) * 2002-12-17 2004-07-15 Johannes Heinecke Text language identification
US20070104365A1 (en) * 2004-03-31 2007-05-10 Yuki Yoshimura Automatic Character Code recognition/Display System, Method, and Program Using Mobile Telephone
CN1916888A (en) * 2005-08-15 2007-02-21 国际商业机器公司 Method and system of identifying language of double-byte character set character data
CN101055593A (en) * 2007-06-15 2007-10-17 中国科学院软件研究所 Tibetan web page and its code identification method

Similar Documents

Publication Publication Date Title
CN109597886B (en) Extraction generation mixed abstract generation method
WO2005010727A3 (en) Extracting data from semi-structured text documents
CN102722479B (en) A kind of method of implementation language translation and device
WO2010117424A3 (en) Computer-assisted abstraction of data and document coding
AU2018388932A1 (en) Method and device using wikipedia link structure to generate chinese language concept vector
HK1100586A1 (en) Apparatus and method for handwriting recognition
WO2010105265A3 (en) Text creation system and method
WO2010050675A3 (en) Method for automatically extracting relation triplets through a dependency grammar parse tree
CN103544408A (en) Method for embedment and extraction of PDF document hidden information according to composite font
CN105426379A (en) Keyword weight calculation method based on position of word
MY156899A (en) Word recognition apparatus, word recognition method, non-transitory computer readable medium storing word recognition program, and delivery item sorting apparatus
CN111507083A (en) Text analysis method, device, equipment and storage medium
CN112749283A (en) Entity relationship joint extraction method for legal field
CN113763937A (en) Method, device and equipment for generating voice processing model and storage medium
EP4152280A3 (en) Method and apparatus for recognizing text, and method and apparatus for training text recognition model
CN103530574B (en) A kind of hide Info embedding and extracting method based on English PDF document
EP3193260A3 (en) Encoding program, encoding method, encoding device, decoding program, decoding method, and decoding device
CN104484323A (en) Translation processing method based on document segment
SG155069A1 (en) Method of language coding identification and data format therefor
KR20110139959A (en) System for applying to format for sql for accessing database
WO2010038997A3 (en) Method and apparatus for encoding and decoding xml documents using path code
CN103678284A (en) Method and device for translating page characters
MY166591A (en) Automatic transformation of non-relational database into relational database
DE60217313D1 (en) METHOD FOR PERFORMING LANGUAGE RECOGNITION OF DYNAMIC REPORTS
US20130311489A1 (en) Systems and Methods for Extracting Names From Documents