WO2009003328A1 - Système et procédé de requête de données - Google Patents
Système et procédé de requête de données Download PDFInfo
- Publication number
- WO2009003328A1 WO2009003328A1 PCT/CN2007/003409 CN2007003409W WO2009003328A1 WO 2009003328 A1 WO2009003328 A1 WO 2009003328A1 CN 2007003409 W CN2007003409 W CN 2007003409W WO 2009003328 A1 WO2009003328 A1 WO 2009003328A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word segmentation
- homophone
- module
- search
- query
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- a search engine system is a system that collects information resources and provides information inquiry to users after processing and organizing the information. The user can input the content that he wants to query, and the search engine system quickly and accurately locates the information required by the user in the massive information according to the user's input, and returns the search result to the user.
- the search engine when performing search query analysis of user input strings, the search engine generally adopts the following methods: Directory search mode, adopting a system preset search word, users select through keywords, select keywords; directly input user input as a key The word is queried; and after the user's query request is segmented according to the dictionary, the word segmentation result is used as a keyword for query.
- Directory search mode adopting a system preset search word, users select through keywords, select keywords; directly input user input as a key
- the word is queried; and after the user's query request is segmented according to the dictionary, the word segmentation result is used as a keyword for query.
- the search engine system will directly divide the word into single words for query, so that the search results are numerous, and most of them are garbage results, and the user cannot obtain such massive data. Find the results you really want.
- the data query system includes: an entry module, configured to receive a query string that the user wants to query; a word segmentation module, configured to perform a preliminary word segmentation on the query string, and determine whether a homophone search is needed for the preliminary word segmentation result, and The judgment result processes the preliminary word segmentation result; and the query module is configured to perform data query on the processed final word segmentation result.
- the word segmentation module includes: a Chinese word segmentation module, configured to perform preliminary word segmentation on the query string according to a specific rule, and send the preliminary word segmentation result to the portal module if the homophone part search is not required to be performed on the preliminary word segmentation result; the search judgment module, It is used to determine whether it is necessary to perform a homophone search for the preliminary word segmentation result; and a homophone component processing module for performing pinyin labeling on the preliminary word segmentation result in the case where the preliminary word segmentation result needs to be searched for the homophone part, and the pinyin labeling information on the preliminary word segmentation result Perform homophone search.
- a Chinese word segmentation module configured to perform preliminary word segmentation on the query string according to a specific rule, and send the preliminary word segmentation result to the portal module if the homophone part search is not required to be performed on the preliminary word segmentation result
- the search judgment module It is used to determine whether it is necessary to perform a homophone search for the preliminary word segmentation result
- a homophone component processing module
- the homophone processing module comprises: a pinyin labeling module, which is used for pinyin labeling of the preliminary word segmentation result; and a homophone search module, which is used for homophone search in the homophone dictionary in the homophone dictionary according to the pinyin annotation information.
- the query module includes: a data query module, configured to perform data query according to the preliminary word segmentation result or the final word segmentation result; the data indexing module is configured to index the data to be queried, and maintain the index.
- the data query system according to the present invention may further comprise: a homonym dictionary module for generating and storing a homophone dictionary and updating the homophone dictionary in real time.
- the data query method includes the following steps: S302: Receive a query string to be queried by a user; S304, perform preliminary word segmentation on the query string, determine whether a homophone search is needed for the preliminary word segmentation result, and The word segmentation result is processed; and S306, the data query is performed according to the processed final word segmentation result.
- Step S304 includes the following steps: S3042-1, performing a preliminary word segmentation on the query string; S3044-1, determining whether a homophone search is needed for the preliminary word segmentation result; S3046-1, in the case where the homophone search is not required for the preliminary word segmentation result
- step S306 is directly performed.
- step S304 includes the following steps: S3042-2, performing preliminary word segmentation on the query string; S3044-2, determining whether it is necessary to perform a homophone search for the preliminary word segmentation result; S3046-2, in the case where the homophone search is required for the preliminary word segmentation result, the homophone search is performed in the homophone dictionary in the homophone dictionary, and then step S306 is performed.
- step S306 the data to be queried is also indexed, and the index is maintained.
- the data query method according to the present invention may further comprise the steps of: generating and storing a homonym dictionary, and updating the homophone dictionary in real time.
- the user can obtain a highly accurate result by inputting an accurate and clean keyword, and can search by inputting pinyin.
- FIG. 1 is a block diagram of a data query system in accordance with an embodiment of the present invention
- FIG. 2 is a block diagram of a data query system in accordance with another embodiment of the present invention.
- FIGS. 3A through 3C are FIGS. 1 and 2 A flow chart of the data query method and steps performed by the system. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
- FIG. 1 a data query system in accordance with an embodiment of the present invention is illustrated. As shown in FIG. 1
- the data query system includes: an import module 102, configured to receive a query string that the user wants to query; a word segmentation module 104, configured to perform preliminary word segmentation on the query string, and determine whether a homophone search is needed for the preliminary word segmentation result. And processing the preliminary word segmentation result according to the judgment result; and the query module 106 is configured to perform data query according to the processed final word segmentation result.
- the word segmentation module 104 includes: a Chinese word segmentation module 1042, configured to perform a preliminary word segmentation on the query string according to a specific rule, and send the preliminary word segmentation result to the portal module without performing a homophone search on the preliminary word segmentation result;
- the module 1044 is configured to determine whether a homophone search is needed for the preliminary word segmentation result, and the homophone component processing module 1046 is configured to perform pinyin labeling on the preliminary word segmentation result and label the pinyin according to the pinyin when the preliminary word segmentation result needs to be searched for the homophone part Perform a homophone search on the preliminary word segmentation results.
- the homophone processing module 1046 includes: a pinyin labeling module, configured to perform pinyin labeling on the preliminary word segmentation result; and a homophone search module, configured to perform homophone search on the preliminary word segmentation result in the homophone dictionary according to the pinyin annotation information.
- the query module 106 includes: a data query module 1062, configured to: perform data query according to the preliminary word segmentation result or the final word segmentation result; the data indexing module 1064 is configured to index the data to be queried, and maintain the index.
- the data query system according to the present invention may further comprise: a homonym dictionary module for generating and storing a homonym dictionary, and a homologous dictionary for real-time updating.
- the data query system includes the following parts: a query entry module 202, configured to complete reception of user input content through interaction with an external system, and send the received text content to a word segmentation module, and The word segmentation information returned from the word segmentation module is received, and the word segmentation information is searched as a keyword.
- the word segmentation module 204 is configured to call the Chinese word segmentation module to obtain a preliminary word segmentation result, and determine whether a homophone search is needed according to the preliminary word segmentation result.
- the word segmentation information is searched for homophones, and the information to be processed is sent to the homophone processing module, and the homophone information returned from the module is received, and the homophone information is returned to the query entry module together with other word segmentation information.
- the Chinese word segmentation module 206 is used according to different Word segmentation strategy for users
- the input information is segmented.
- the homophone processing module 208 is configured to receive the text information from the word segmentation module, send the text information to the pinyin tagging module, and receive the pinyin tagging information returned from the pinyin tagging module.
- the pinyin annotation module 210 is configured to convert the text information into corresponding pinyin information.
- the labeling result is sent to the query entry module, and in the homophone dictionary maintenance process, the labeling result is sent to the homophone dictionary maintenance module.
- the homophone search module 212 is configured to extract the homophones with the highest frequency of occurrence by searching for the multi-word dictionary, and send the search result to the homophone processing module.
- the homophone dictionary maintenance module 214 is used to maintain a homonym dictionary that the homophone system needs to use.
- the dictionary is marked word by word by calling the Pinyin annotation module to form a homophone dictionary. And update the homonym dictionary synchronously when the dictionary is updated.
- multiple words are marked with multiple pinyin to form multiple terms.
- the homonym dictionary is sorted in the order of the pinyin.
- the query module 216 is configured to receive a keyword from the query entry module, and generate a query condition to query the index file to obtain a result matching the user query request, and return the query result to the user index module 218 for maintenance.
- Full-text indexing The indexing module indexes each word by scanning each word in the information that needs to be searched, indicating the number and location of the word in the article.
- the query entry module sends the query string to the word segmentation module, and the word segmentation module calls the Chinese word segmentation module to segment the user's query string according to different word segmentation strategies, and judges whether the homophone search is needed according to the word segmentation result. If it is judged that the homophone search is not required, the word segmentation result is directly returned to the query module, and the data search process is entered. If it is judged that the homophone search is required, the information to be processed is sent to the homophone processing module.
- the homophone processing module receives the to-be-processed information and sends it to the pinyin annotation module, and the pinyin annotation module performs pinyin annotation on the text information, and returns the annotation result to the homophone processing module.
- the homophone processing module After receiving the phonetic annotation information, the homophone processing module calls the homophone query module to search the pinyin annotation information in the homophone dictionary, and returns the information to the word segmentation module after obtaining the search result.
- the word segmentation module integrates the information obtained from the homophone processing module and the keyword information obtained from the other word segments, and returns it to the query entry module as a search keyword.
- the query entry module sends the keyword to the query module.
- the query module uses the obtained keywords to search the index library, and returns the matching information in the index library to the query entry module.
- the result is adjusted by the query entry module and returned to the user.
- the data query system may further use an indexing module to index the data to be searched, maintain the index, and synchronously update the index library information when the source information is updated.
- the data query method includes the following steps: S302: Receive a query string that the user wants to query; S304, perform preliminary word segmentation on the query string, determine whether it is necessary to perform a homophone search on the preliminary word segmentation result, and The preliminary word segmentation result is processed; and S306, the processed final word segmentation result is used for data query.
- S302 Receive a query string that the user wants to query
- S304 perform preliminary word segmentation on the query string, determine whether it is necessary to perform a homophone search on the preliminary word segmentation result, and The preliminary word segmentation result is processed
- S306 the processed final word segmentation result is used for data query.
- step S304 includes the following steps: S3042-1, performing a preliminary word segmentation on the query string; S3044-1, determining whether a homophone search is needed for the preliminary word segmentation result; S3046-1, without preliminary word segmentation When the homophone search is performed as a result, step S306 is directly performed.
- step S304 includes the following steps: S3042-2, performing a preliminary word segmentation on the query string; S3044- 2, determining whether a homophone search is needed for the preliminary word segmentation result; S3046-2, in need of the preliminary word segmentation result
- the homophone search is performed on the preliminary word segmentation in the homophone dictionary, and then step S306 is performed.
- step S306 the data to be queried is also indexed, and the index is maintained.
- the data query method according to the present invention may further comprise the steps of generating and storing a homophone dictionary and updating the homophone dictionary in real time.
- the invention can process the error input such as homophonic characters and fuzzy sounds input by the user when the user input is analyzed, automatically converts it into standard input, returns the user query result after the search, and the user's error Enter to prompt.
- the user can quickly and conveniently search for the information he needs, and at the same time, the threshold of the search engine user can be lowered. With the present invention, the user can even find information by directly inputting the form of pinyin.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
La présente invention décrit un système et un procédé de requête de données. Le système de requête de données comprend un module d'entrée permettant de recevoir la chaîne de requête que l'utilisateur veut rechercher ; un module de segmentation permettant de segmenter essentiellement la chaîne de requête, de déterminer s'il faut rechercher les mots homophones pour le résultat de segmentation, et de traiter le résultat de segmentation primaire selon le résultat de détermination ; et un module de requête permettant d'effectuer une requête de données selon le résultat de segmentation final traité.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200710126036.4 | 2007-06-29 | ||
CNA2007101260364A CN101082936A (zh) | 2007-06-29 | 2007-06-29 | 数据查询系统及方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009003328A1 true WO2009003328A1 (fr) | 2009-01-08 |
Family
ID=38912505
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2007/003409 WO2009003328A1 (fr) | 2007-06-29 | 2007-11-30 | Système et procédé de requête de données |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN101082936A (fr) |
WO (1) | WO2009003328A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408794A (zh) * | 2017-08-17 | 2019-03-01 | 阿里巴巴集团控股有限公司 | 一种频次词典建立方法、分词方法、服务器和客户端设备 |
CN109977398A (zh) * | 2019-02-21 | 2019-07-05 | 江苏苏宁银行股份有限公司 | 一种特定领域的语音识别文本纠错方法 |
CN110851484A (zh) * | 2019-11-13 | 2020-02-28 | 北京香侬慧语科技有限责任公司 | 一种获取多指标问题答案的方法及装置 |
CN112686041A (zh) * | 2021-01-06 | 2021-04-20 | 北京猿力未来科技有限公司 | 一种拼音标注方法及装置 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101685021B (zh) * | 2008-09-24 | 2012-12-26 | 高德软件有限公司 | 一种兴趣点信息获取方法及装置 |
CN101699440B (zh) * | 2009-11-24 | 2011-12-07 | 中国电信股份有限公司 | 按业务检索的方法及系统 |
CN101853280B (zh) * | 2010-05-19 | 2012-07-04 | 北京友录在线科技发展有限公司 | 一种手持设备中联系人查找方法 |
CN102467544B (zh) * | 2010-11-16 | 2015-01-21 | 中国电信股份有限公司 | 基于空间模糊编码的信息智能搜索方法及系统 |
CN103530380B (zh) * | 2013-10-17 | 2017-10-17 | 北京奇虎科技有限公司 | 一种垂直搜索设备及方法 |
CN103577591B (zh) * | 2013-11-12 | 2017-02-01 | 广东金宇恒软件科技有限公司 | 一种生成记账凭证的方法及装置 |
WO2016154838A1 (fr) * | 2015-03-29 | 2016-10-06 | 王志强 | Procédé pour fournir des informations de produit pendant l'affichage de marques de commerce homophones, et système de recherche de marque commerciale |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1335574A (zh) * | 2001-09-05 | 2002-02-13 | 罗笑南 | 智能语义搜索方法 |
DE10339697A1 (de) * | 2003-08-28 | 2005-04-07 | Siemens Ag | Verfahren, Computerprogramm mit Programmcode-Mitteln und Computerprogramm-Produkt zu einer Bearbeitung einer Suchanfrage unter Verwendung mindestens einer Suchmaschine und mindestens eines Verzeichnisdienstes |
US20050187920A1 (en) * | 2004-01-23 | 2005-08-25 | Porto Ranelli, Sa | Contextual searching |
CN1873642A (zh) * | 2006-04-29 | 2006-12-06 | 上海世纪互联信息系统有限公司 | 具有自动分类功能的搜索引擎 |
CN1909522A (zh) * | 2006-08-18 | 2007-02-07 | 北京金山软件有限公司 | 获取网页关键字的方法及其应用系统 |
CN101075308A (zh) * | 2006-11-08 | 2007-11-21 | 腾讯科技(深圳)有限公司 | 一种编辑电子邮件的方法 |
CN101079032A (zh) * | 2006-06-23 | 2007-11-28 | 腾讯科技(深圳)有限公司 | 数字串模糊匹配的方法 |
-
2007
- 2007-06-29 CN CNA2007101260364A patent/CN101082936A/zh active Pending
- 2007-11-30 WO PCT/CN2007/003409 patent/WO2009003328A1/fr active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1335574A (zh) * | 2001-09-05 | 2002-02-13 | 罗笑南 | 智能语义搜索方法 |
DE10339697A1 (de) * | 2003-08-28 | 2005-04-07 | Siemens Ag | Verfahren, Computerprogramm mit Programmcode-Mitteln und Computerprogramm-Produkt zu einer Bearbeitung einer Suchanfrage unter Verwendung mindestens einer Suchmaschine und mindestens eines Verzeichnisdienstes |
US20050187920A1 (en) * | 2004-01-23 | 2005-08-25 | Porto Ranelli, Sa | Contextual searching |
CN1873642A (zh) * | 2006-04-29 | 2006-12-06 | 上海世纪互联信息系统有限公司 | 具有自动分类功能的搜索引擎 |
CN101079032A (zh) * | 2006-06-23 | 2007-11-28 | 腾讯科技(深圳)有限公司 | 数字串模糊匹配的方法 |
CN1909522A (zh) * | 2006-08-18 | 2007-02-07 | 北京金山软件有限公司 | 获取网页关键字的方法及其应用系统 |
CN101075308A (zh) * | 2006-11-08 | 2007-11-21 | 腾讯科技(深圳)有限公司 | 一种编辑电子邮件的方法 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408794A (zh) * | 2017-08-17 | 2019-03-01 | 阿里巴巴集团控股有限公司 | 一种频次词典建立方法、分词方法、服务器和客户端设备 |
CN109977398A (zh) * | 2019-02-21 | 2019-07-05 | 江苏苏宁银行股份有限公司 | 一种特定领域的语音识别文本纠错方法 |
CN109977398B (zh) * | 2019-02-21 | 2023-06-06 | 江苏苏宁银行股份有限公司 | 一种特定领域的语音识别文本纠错方法 |
CN110851484A (zh) * | 2019-11-13 | 2020-02-28 | 北京香侬慧语科技有限责任公司 | 一种获取多指标问题答案的方法及装置 |
CN112686041A (zh) * | 2021-01-06 | 2021-04-20 | 北京猿力未来科技有限公司 | 一种拼音标注方法及装置 |
CN112686041B (zh) * | 2021-01-06 | 2024-06-04 | 北京猿力未来科技有限公司 | 一种拼音标注方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN101082936A (zh) | 2007-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009003328A1 (fr) | Système et procédé de requête de données | |
US7272558B1 (en) | Speech recognition training method for audio and video file indexing on a search engine | |
US9613166B2 (en) | Search suggestions of related entities based on co-occurrence and/or fuzzy-score matching | |
CN1171199C (zh) | 基于语言模型的信息检索和语音识别 | |
US8560513B2 (en) | Searching for information based on generic attributes of the query | |
US8126897B2 (en) | Unified inverted index for video passage retrieval | |
US20090287680A1 (en) | Multi-modal query refinement | |
CN106383836B (zh) | 将可操作属性归于描述个人身份的数据 | |
US9798776B2 (en) | Systems and methods for parsing search queries | |
Mendels et al. | Improving speech recognition and keyword search for low resource languages using web data | |
CN105159938B (zh) | 检索方法和装置 | |
CN101019121A (zh) | 对存储在数据库中的文档编制索引和进行检索的方法和系统 | |
CN101952824A (zh) | 计算机执行的对数据库中的文献进行索引和检索的方法以及信息检索系统 | |
CN101149758A (zh) | 搜索系统及搜索方法 | |
CN1744087A (zh) | 搜索文档的文档处理装置及其控制方法 | |
JP2004110808A (ja) | ネットワークを介してデータを検索及び提示する方法及びマシン可読記憶装置 | |
WO2003010754A1 (fr) | Systeme de recherche a entree vocale | |
CN103885949A (zh) | 一种基于歌词的歌曲检索系统及其检索方法 | |
KR101174057B1 (ko) | 인덱스 분석장치와 인덱스 검색장치 및 그 방법 | |
CN102339294A (zh) | 一种对关键词进行预处理的搜索方法和系统 | |
US20070112839A1 (en) | Method and system for expansion of structured keyword vocabulary | |
US9507834B2 (en) | Search suggestions using fuzzy-score matching and entity co-occurrence | |
CN105787029A (zh) | 一种基于solr的关键字词识别办法 | |
JP2002251402A (ja) | 文書検索方法及び文書検索装置 | |
CN110309258A (zh) | 一种输入检查方法、服务器和计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07845772 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07845772 Country of ref document: EP Kind code of ref document: A1 |