SG155069A1 - Method of language coding identification and data format therefor - Google Patents
Method of language coding identification and data format thereforInfo
- Publication number
- SG155069A1 SG155069A1 SG200801249-4A SG2008012494A SG155069A1 SG 155069 A1 SG155069 A1 SG 155069A1 SG 2008012494 A SG2008012494 A SG 2008012494A SG 155069 A1 SG155069 A1 SG 155069A1
- Authority
- SG
- Singapore
- Prior art keywords
- language
- coding
- text
- coding identification
- identification method
- Prior art date
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
A method of language coding identification for identifying different language coding is composed of the steps of: checking whether or not a text of a language to be identified is ASCII coded, applying the double-byte coding identification method; applying the Latin-based coding identification method; and applying the single-byte coding identification method sequentially to the text in case the previous step is not successful, the single-byte coding identification method is further composed of the steps of, generating a language profile of each supported language coding by analyzing a sample text and extracting occurrence possibility of all two character sequence combinations in the sample text; calculating a confidence value of each language coding by checking the occurrence possibility of all the two character sequence combinations in a text string of the sample text by the respective language profile; and selecting language coding with the highest confidence value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG200801249-4A SG155069A1 (en) | 2008-02-14 | 2008-02-14 | Method of language coding identification and data format therefor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG200801249-4A SG155069A1 (en) | 2008-02-14 | 2008-02-14 | Method of language coding identification and data format therefor |
Publications (1)
Publication Number | Publication Date |
---|---|
SG155069A1 true SG155069A1 (en) | 2009-09-30 |
Family
ID=41212346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
SG200801249-4A SG155069A1 (en) | 2008-02-14 | 2008-02-14 | Method of language coding identification and data format therefor |
Country Status (1)
Country | Link |
---|---|
SG (1) | SG155069A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020059448A1 (en) * | 2000-11-13 | 2002-05-16 | Square Co., Ltd. | Message processing for handling unsupported character codes |
US20030182128A1 (en) * | 2002-03-08 | 2003-09-25 | Katie Kuwata | Method of encoding and decoding for multi-language applications |
US20040138869A1 (en) * | 2002-12-17 | 2004-07-15 | Johannes Heinecke | Text language identification |
CN1916888A (en) * | 2005-08-15 | 2007-02-21 | 国际商业机器公司 | Method and system of identifying language of double-byte character set character data |
US20070104365A1 (en) * | 2004-03-31 | 2007-05-10 | Yuki Yoshimura | Automatic Character Code recognition/Display System, Method, and Program Using Mobile Telephone |
CN101055593A (en) * | 2007-06-15 | 2007-10-17 | 中国科学院软件研究所 | Tibetan web page and its code identification method |
-
2008
- 2008-02-14 SG SG200801249-4A patent/SG155069A1/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020059448A1 (en) * | 2000-11-13 | 2002-05-16 | Square Co., Ltd. | Message processing for handling unsupported character codes |
US20030182128A1 (en) * | 2002-03-08 | 2003-09-25 | Katie Kuwata | Method of encoding and decoding for multi-language applications |
US20040138869A1 (en) * | 2002-12-17 | 2004-07-15 | Johannes Heinecke | Text language identification |
US20070104365A1 (en) * | 2004-03-31 | 2007-05-10 | Yuki Yoshimura | Automatic Character Code recognition/Display System, Method, and Program Using Mobile Telephone |
CN1916888A (en) * | 2005-08-15 | 2007-02-21 | 国际商业机器公司 | Method and system of identifying language of double-byte character set character data |
CN101055593A (en) * | 2007-06-15 | 2007-10-17 | 中国科学院软件研究所 | Tibetan web page and its code identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109597886B (en) | Extraction generation mixed abstract generation method | |
WO2005010727A3 (en) | Extracting data from semi-structured text documents | |
CN102722479B (en) | A kind of method of implementation language translation and device | |
WO2010117424A3 (en) | Computer-assisted abstraction of data and document coding | |
AU2018388932A1 (en) | Method and device using wikipedia link structure to generate chinese language concept vector | |
HK1100586A1 (en) | Apparatus and method for handwriting recognition | |
WO2010105265A3 (en) | Text creation system and method | |
WO2010050675A3 (en) | Method for automatically extracting relation triplets through a dependency grammar parse tree | |
CN103544408A (en) | Method for embedment and extraction of PDF document hidden information according to composite font | |
CN105426379A (en) | Keyword weight calculation method based on position of word | |
MY156899A (en) | Word recognition apparatus, word recognition method, non-transitory computer readable medium storing word recognition program, and delivery item sorting apparatus | |
CN111507083A (en) | Text analysis method, device, equipment and storage medium | |
CN112749283A (en) | Entity relationship joint extraction method for legal field | |
CN113763937A (en) | Method, device and equipment for generating voice processing model and storage medium | |
EP4152280A3 (en) | Method and apparatus for recognizing text, and method and apparatus for training text recognition model | |
CN103530574B (en) | A kind of hide Info embedding and extracting method based on English PDF document | |
EP3193260A3 (en) | Encoding program, encoding method, encoding device, decoding program, decoding method, and decoding device | |
CN104484323A (en) | Translation processing method based on document segment | |
SG155069A1 (en) | Method of language coding identification and data format therefor | |
KR20110139959A (en) | System for applying to format for sql for accessing database | |
WO2010038997A3 (en) | Method and apparatus for encoding and decoding xml documents using path code | |
CN103678284A (en) | Method and device for translating page characters | |
MY166591A (en) | Automatic transformation of non-relational database into relational database | |
DE60217313D1 (en) | METHOD FOR PERFORMING LANGUAGE RECOGNITION OF DYNAMIC REPORTS | |
US20130311489A1 (en) | Systems and Methods for Extracting Names From Documents |