CN112199576A - Method and system for realizing Chinese pinyin search - Google Patents

Method and system for realizing Chinese pinyin search Download PDF

Info

Publication number
CN112199576A
CN112199576A CN202011125475.5A CN202011125475A CN112199576A CN 112199576 A CN112199576 A CN 112199576A CN 202011125475 A CN202011125475 A CN 202011125475A CN 112199576 A CN112199576 A CN 112199576A
Authority
CN
China
Prior art keywords
chinese
character
pinyin
search
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011125475.5A
Other languages
Chinese (zh)
Inventor
张亚运
牛玉山
林帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Business System Co Ltd
Original Assignee
Shandong Inspur Business System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Business System Co Ltd filed Critical Shandong Inspur Business System Co Ltd
Priority to CN202011125475.5A priority Critical patent/CN112199576A/en
Publication of CN112199576A publication Critical patent/CN112199576A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a method and a system for realizing Chinese pinyin search, belonging to the technical field of character indexing.A mapping library of Chinese characters and pinyin is constructed, a mapping table of simplified Chinese characters and traditional Chinese characters is constructed, and the Chinese characters and the pinyin are mapped, and the simplified Chinese characters and the traditional Chinese characters are mapped; carrying out format conversion on the input keywords to realize character unification; compiling Polyfill supporting Internet Explorer 8; judging the input character, and if the character is IE8, carrying out code processing compatible with IE 8; if the character contains pinyin, starting a pinyin search engine, calling the Chinese character and pinyin mapping library and the simplified Chinese and traditional Chinese mapping table to perform character processing and target search, and displaying a search result. The method can support Chinese character pinyin search and traditional Chinese pinyin search, improves search efficiency, and has strong applicability and wide application range.

Description

Method and system for realizing Chinese pinyin search
Technical Field
The invention relates to the technical field of character indexing, in particular to a method and a system for realizing Chinese pinyin search.
Background
At present, in most software applications or retrieval programs, the existing Chinese search, especially Chinese character search, mostly lacks the advanced function which can be matched by using Chinese pinyin. A few Internet Explorer 8 browsers lack pinyin search support for traditional chinese even though supporting pinyin search for chinese characters, and do not support older versions but have a higher market share.
Disclosure of Invention
The technical task of the invention is to provide a method and a system for realizing Chinese pinyin search, which can support Chinese pinyin search and traditional Chinese pinyin search, improve search efficiency and improve applicability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a Chinese phonetic alphabet searching method comprises the steps of constructing a Chinese character and phonetic alphabet mapping library, constructing a simplified Chinese and traditional Chinese mapping table, and mapping Chinese characters and phonetic alphabets and simplified and traditional Chinese;
carrying out format conversion on the input keywords to realize character unification;
compiling Polyfill supporting Internet Explorer 8;
judging the input character, and if the character is IE8, carrying out code processing compatible with IE 8; if the character contains pinyin, starting a pinyin search engine, calling the Chinese character and pinyin mapping library and the simplified Chinese and traditional Chinese mapping table to perform character processing and target search, and displaying a search result.
The method can support pinyin initial or complete pinyin search, can remarkably improve the convenience and efficiency of searching a target function menu or a universal text, and can be suitable for areas using traditional Chinese: the hong Kong special administrative district, the Australian special administrative district, the Taiwan province and the like enable the general applicability and the portability of the software system to be obviously enhanced without changing related source codes. Meanwhile, the system is compatible with Internet Explorer 8, and has stronger applicability and wide application range.
Preferably, in the method, the character pattern matching is performed by performing word segmentation processing on the input character and splitting words.
Preferably, the method also comprises polyphone processing, a plurality of mapping lines of Chinese characters and pinyin are constructed, and the mapping lines are called to carry out character processing and target search when searching is carried out. Some Chinese characters have multiple pronunciations, and the situation of polyphone characters can be dealt with by constructing multiple mapping lines of the Chinese characters and pinyin.
Specifically, the starting of the pinyin search engine includes the following operations:
processing complex Chinese;
matching word first letters;
processing polyphone characters;
and splitting words.
Further, the search result of the application system is subjected to target character highlighting processing through highlight matching character color processing.
Preferably, the method is implemented as follows:
1) constructing a mapping library of Chinese characters and pinyin,
firstly, basic data needs to be constructed, and Chinese characters and pinyin are accurately mapped to form a basic mapping library;
2) compiling a mapping table of simplified Chinese and traditional Chinese,
in order to support the pattern matching of traditional Chinese, a comparison mapping table of a simplified form and a traditional form needs to be compiled;
3) converting capital and small-case formats of the characters, and uniformly converting input key words English characters or Chinese pinyin characters into uppercase or lowercase, so that the mode matching of the characters is facilitated;
4) processing browser compatibility, and compiling polyfil supporting Internet Explorer 8;
polyfil is a piece of code, usually a JavaScript code block on the Web, that is used to provide older versions of the browser with newer functionality that it does not have native support.
5) Processing polyphone characters;
some Chinese characters have a plurality of pronunciations, and in this case, a plurality of mapping lines of the Chinese characters and the pinyin are required to be constructed to process polyphonic characters;
6) performing word segmentation processing on Chinese and other input characters,
splitting words and phrases to prepare for character mode matching;
7) and the character pattern is matched with the character pattern,
executing a search process using a character pattern matching API;
8) and displaying the search result of the application system.
Preferably, the Chinese character and pinyin mapping library at least comprises Chinese characters contained in the GBK standard. GBK is the national standard extension in the Chinese inner code extension specification CICES.
Preferably, the simplified Chinese to traditional Chinese mapping table at least contains Chinese characters in the GBK character set range.
The invention also claims a system for implementing Chinese pinyin search, which comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is used for calling the machine readable program and executing the method.
The invention also claims a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the above-described method.
Compared with the prior art, the method and the system for realizing Chinese pinyin search have the following beneficial effects:
the method and the system can support pinyin search of Chinese characters and pinyin search of traditional Chinese; not only can support complete pinyin matching of the words, but also can support pinyin first letter matching of the words; meanwhile, polyphone characters are supported, 6763 simplified Chinese characters and corresponding traditional Chinese characters are included, and the applicability is strong;
the highlight color matches the characters, and the search result is highlighted, so that the search is more visual;
not only can support the modern browser Internet Explorer 9 and the versions above, Chrome, Firefox, Safari, Opera and the like, but also can support the ancient browser Internet Explorer 8, and has wide application range.
Drawings
FIG. 1 is a flow chart of an implementation method of Chinese pinyin search according to an embodiment of the present invention;
FIG. 2 is an exemplary diagram of an initial state interface for Chinese pinyin search, according to an embodiment of the invention;
FIG. 3 is an exemplary illustration of a Chinese vocabulary search time interface for a Chinese pinyin search, according to an embodiment of the invention;
FIG. 4 is an exemplary illustration of an interface display for a Pinyin full form search of the Chinese Pinyin search provided by an embodiment of the present invention;
FIG. 5 is an exemplary diagram of an interface display for a traditional Chinese search in the embodiment of the present invention;
FIG. 6 is an exemplary illustration of an interface display for a partial pinyin initial search for a Chinese pinyin search, as provided by one embodiment of the present invention;
FIG. 7 is an exemplary illustration of an interface display for a complete pinyin initial search for a Chinese pinyin search, in accordance with an embodiment of the present invention;
fig. 8 is an exemplary illustration of an interface display in searching non-consecutive pinyin initials for a chinese pinyin search, according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the following specific examples.
A Chinese phonetic alphabet searching method comprises the steps of constructing a Chinese character and phonetic alphabet mapping library, constructing a simplified Chinese and traditional Chinese mapping table, and mapping Chinese characters and phonetic alphabets and simplified and traditional Chinese;
carrying out format conversion on the input keywords to realize character unification;
compiling Polyfill supporting Internet Explorer 8;
judging the input character, and if the character is IE8, carrying out code processing compatible with IE 8; if the character contains pinyin, starting a pinyin search engine, calling the Chinese character and pinyin mapping library and the simplified Chinese and traditional Chinese mapping table to perform character processing and target search, and displaying a search result.
The method also comprises polyphone processing, constructing a plurality of mapping lines of Chinese characters and pinyin, and calling the mapping lines to perform character processing and target search when searching. Some Chinese characters have multiple pronunciations, and the situation of polyphone characters can be dealt with by constructing multiple mapping lines of the Chinese characters and pinyin.
In the method, the word segmentation processing is carried out on the input characters, and the words are split, so that the character pattern matching is carried out.
The starting of the pinyin search engine comprises the following operations:
processing complex Chinese;
matching word first letters;
processing polyphone characters;
splitting words;
and highlighting the matched character color, and performing target character highlighting processing on the search result of the application system through highlight matched character color processing.
The method comprises the following concrete implementation processes:
1. constructing a Chinese character and pinyin mapping library:
firstly, basic data is required to be constructed, Chinese characters and pinyin are accurately mapped to form a basic mapping library, and at least the Chinese characters contained in the GBK standard are included.
2. Compiling a simplified Chinese and traditional Chinese mapping table:
in order to support the pattern matching of traditional Chinese, a mapping table for comparing the traditional Chinese with the simplified Chinese needs to be prepared, and the mapping table at least comprises Chinese characters in the GBK character set range.
3. Converting the character case format:
the input key words English characters or Chinese phonetic characters are converted into upper case or lower case in a unified mode, which is convenient for mode matching of the characters.
4. Handling browser compatibility:
polyfill supporting Internet Explorer 8 was written.
Polyfil is a piece of code, usually a JavaScript code block on the Web, that is used to provide older versions of the browser with newer functionality that it does not have native support.
5. Processing polyphone characters:
some Chinese characters have multiple pronunciations, and in this case, multiple mapping lines of Chinese characters and Pinyin need to be constructed.
6. Word segmentation:
the Chinese and other input characters are word-segmented. And splitting words and phrases to prepare for character pattern matching.
7. Character pattern matching:
using the character pattern matching API, a search process is performed.
8. And displaying the search result:
the search result of the application system is highlighted, colored and the like, so that the user can more clearly recognize the target character.
As shown in fig. 1, it is a flow chart of the implementation method of the chinese pinyin search.
The method supports pinyin initial or complete pinyin search, can remarkably improve the convenience and efficiency of searching a target function menu or a universal text, and can be suitable for areas using traditional Chinese: the hong Kong special administrative district, the Australian special administrative district, the Taiwan province and the like enable the general applicability and the portability of the software system to be obviously enhanced without changing related source codes.
In the pinyin searching process, the processing of polyphone is a difficult point. The search problem can be easily solved by constructing a plurality of mapping lines of Chinese characters and pinyin. For example: during searching, the algorithm firstly splits the characters according to the input word pinyin first character, matches polyphones line by line, then newly combines the matching items, highlights, colors, lists and displays the matched results according to the original sequence of occurrence.
The embodiment of the invention also provides a method for realizing Chinese pinyin search, which is shown by referring to fig. 2-8 and shows the application effect of the algorithm in detail.
(1) Initial state, as shown in fig. 2;
(2) chinese vocabulary search, as shown in FIG. 3;
(3) searching in a pinyin complete form, as shown in fig. 4;
(4) traditional Chinese search, as shown in FIG. 5;
(5) partial pinyin initial search, as shown in fig. 6;
(6) searching for the complete pinyin initial, as shown in fig. 7;
(7) and searching the non-continuous pinyin initial, as shown in fig. 8.
The embodiment of the invention also provides a system for realizing Chinese pinyin search, which comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is used for calling the machine readable program to execute the implementation method of the Chinese pinyin search in the embodiment.
The embodiment of the invention also provides a computer readable medium, wherein a computer instruction is stored on the computer readable medium, and when the computer instruction is executed by a processor, the processor executes the implementation method of the Chinese pinyin search in the embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims (10)

1. A Chinese phonetic alphabet search implementation method is characterized in that a Chinese character and phonetic alphabet mapping library is established, a simplified Chinese and traditional Chinese mapping table is established, and the Chinese character and phonetic alphabet and simplified Chinese and traditional Chinese are mapped;
carrying out format conversion on the input keywords to realize character unification;
compiling Polyfill supporting Internet Explorer 8;
judging the input character, and if the character is IE8, carrying out code processing compatible with IE 8; if the character contains pinyin, starting a pinyin search engine, calling the Chinese character and pinyin mapping library and the simplified Chinese and traditional Chinese mapping table to perform character processing and target search, and displaying a search result.
2. The method as claimed in claim 1, wherein the character pattern matching is performed by performing word segmentation on the inputted character and splitting the word.
3. The method as claimed in claim 1 or 2, further comprising polyphone processing, constructing a plurality of mapping lines of Chinese characters and pinyin, and calling the mapping lines to perform character processing and target search when performing search.
4. The method as claimed in claim 3, wherein the starting of the pinyin search engine includes the following operations:
processing complex Chinese;
matching word first letters;
processing polyphone characters;
and splitting words.
5. The method as claimed in claim 4, wherein the search result of the application system is highlighted by highlighting the target character through highlighting the matching character.
6. The method for implementing pinyin search in chinese language according to claim 1 or 2, wherein the implementation process is as follows:
1) constructing a mapping library of Chinese characters and pinyin;
2) compiling a mapping table of simplified Chinese and traditional Chinese;
3) the input key words English characters or Chinese phonetic characters are uniformly converted into upper case or lower case, so that the mode matching of the characters is facilitated;
4) processing browser compatibility, and compiling polyfil supporting Internet Explorer 8;
5) processing polyphone characters;
6) carrying out word segmentation processing on Chinese characters and other input characters;
7) executing a search process using a character pattern matching API;
8) and displaying the search result of the application system.
7. The method as claimed in claim 1, wherein the database of pinyin maps and chinese characters at least includes chinese characters included in GBK standard.
8. The method of claim 7, wherein the simplified chinese to traditional chinese mapping table at least contains chinese characters in the GBK character set.
9. A Chinese pinyin search implementation system is characterized by comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 8.
10. Computer readable medium, characterized in that it has stored thereon computer instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 8.
CN202011125475.5A 2020-10-20 2020-10-20 Method and system for realizing Chinese pinyin search Pending CN112199576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011125475.5A CN112199576A (en) 2020-10-20 2020-10-20 Method and system for realizing Chinese pinyin search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011125475.5A CN112199576A (en) 2020-10-20 2020-10-20 Method and system for realizing Chinese pinyin search

Publications (1)

Publication Number Publication Date
CN112199576A true CN112199576A (en) 2021-01-08

Family

ID=74009789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011125475.5A Pending CN112199576A (en) 2020-10-20 2020-10-20 Method and system for realizing Chinese pinyin search

Country Status (1)

Country Link
CN (1) CN112199576A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836070A (en) * 2021-02-02 2021-05-25 山东寻声网络科技有限公司 Application of NLP technology in data analysis
CN113722426A (en) * 2021-07-30 2021-11-30 福建拓尔通软件有限公司 Government website searching method, system, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1421803A (en) * 2001-11-30 2003-06-04 英业达股份有限公司 System and method capable of performing pinyin romanization-phonetic notation conversion of multiple-syllable word
TW542977B (en) * 2001-12-18 2003-07-21 Inventec Besta Co Ltd Data sharing method for traditional and simplified Chinese input method
CN1954315A (en) * 2004-03-16 2007-04-25 Google公司 Systems and methods for translating chinese pinyin to chinese characters
CN101131690A (en) * 2006-08-21 2008-02-27 富士施乐株式会社 Method and system for mutual conversion between simplified Chinese characters and traditional Chinese characters
CN101814073A (en) * 2009-02-23 2010-08-25 未序网络科技(上海)有限公司 Search engine method based on special word form information
CN102567406A (en) * 2010-12-22 2012-07-11 北京新媒传信科技有限公司 Pinyin searching method
CN103365925A (en) * 2012-04-09 2013-10-23 高德软件有限公司 Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1421803A (en) * 2001-11-30 2003-06-04 英业达股份有限公司 System and method capable of performing pinyin romanization-phonetic notation conversion of multiple-syllable word
TW542977B (en) * 2001-12-18 2003-07-21 Inventec Besta Co Ltd Data sharing method for traditional and simplified Chinese input method
CN1954315A (en) * 2004-03-16 2007-04-25 Google公司 Systems and methods for translating chinese pinyin to chinese characters
CN101131690A (en) * 2006-08-21 2008-02-27 富士施乐株式会社 Method and system for mutual conversion between simplified Chinese characters and traditional Chinese characters
JP2008052720A (en) * 2006-08-21 2008-03-06 Fuji Xerox Co Ltd Method of mutual conversion between simplified characters and traditional characters, and its conversion apparatus
CN101814073A (en) * 2009-02-23 2010-08-25 未序网络科技(上海)有限公司 Search engine method based on special word form information
CN102567406A (en) * 2010-12-22 2012-07-11 北京新媒传信科技有限公司 Pinyin searching method
CN103365925A (en) * 2012-04-09 2013-10-23 高德软件有限公司 Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一个被写代码耽误的厨师: "Polyfill", 《简书 HTTPS://WWW.JIANSHU.COM/P/8191AE28D1B1》 *
武助宇: "中文搜索引擎的发展现状、问题与对策", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836070A (en) * 2021-02-02 2021-05-25 山东寻声网络科技有限公司 Application of NLP technology in data analysis
CN113722426A (en) * 2021-07-30 2021-11-30 福建拓尔通软件有限公司 Government website searching method, system, equipment and medium

Similar Documents

Publication Publication Date Title
US8935148B2 (en) Computer-assisted natural language translation
US20150269140A1 (en) Dynamic software localization
US20070242071A1 (en) Character Display System
US20170132216A1 (en) Systems and methods for facilitating software infterface localization between multiple languages
CN112199576A (en) Method and system for realizing Chinese pinyin search
WO2010020087A1 (en) Automatic word translation during text input
JP2017211993A (en) Method for correspondence detection of claim component noun belonging component corresponding code in claim
US20120109994A1 (en) Robust auto-correction for data retrieval
JP2010134922A (en) Similar word determination method and system
US9384191B2 (en) Written language learning using an enhanced input method editor (IME)
JP2010262325A (en) Method of converting character string, program, and storage medium recorded with the program
CN105404624A (en) Chinese character recognition method, device and terminal
US7503036B2 (en) Testing multi-byte data handling using multi-byte equivalents to single-byte characters in a test string
JP7102710B2 (en) Information generation program, word extraction program, information processing device, information generation method and word extraction method
JP2008305105A (en) Document data processor
WO2019225560A1 (en) Search word suggestion device, method for generating unique expression information, and program for generating unique expression information
JP2022074852A (en) Dictionary editing device, dictionary editing method, and dictionary editing program
JP2012083815A (en) Character string conversion device, character string conversion method, computer program and recording medium
JP2008210229A (en) Device, method and program for retrieving intellectual property information
CN111274352B (en) Method and equipment for marking characteristic words in tool book
JP6076285B2 (en) Translation apparatus, translation method, and translation program
JP2001109740A (en) Device and method for preparing chinese document
KR20070083757A (en) Text data structure, text data processing method, text data processing program and text data processing program recorded recording medium
JP2016207036A (en) Support device, support method, and support program
KR100606807B1 (en) method for searching and sorting Chinese language in real-time, and terminal for the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210108