CN113569010A - Method, device, equipment and storage medium for filtering search results - Google Patents

Method, device, equipment and storage medium for filtering search results Download PDF

Info

Publication number
CN113569010A
CN113569010A CN202110841839.8A CN202110841839A CN113569010A CN 113569010 A CN113569010 A CN 113569010A CN 202110841839 A CN202110841839 A CN 202110841839A CN 113569010 A CN113569010 A CN 113569010A
Authority
CN
China
Prior art keywords
word
polyphonic
pronunciation information
search
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110841839.8A
Other languages
Chinese (zh)
Other versions
CN113569010B (en
Inventor
谢楚曦
李雅楠
何伯磊
和为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110841839.8A priority Critical patent/CN113569010B/en
Publication of CN113569010A publication Critical patent/CN113569010A/en
Application granted granted Critical
Publication of CN113569010B publication Critical patent/CN113569010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a method, an apparatus, a device, a storage medium, and a program product for filtering a search result, which relate to the field of computer technologies, and in particular, to the field of intelligent search technologies. The specific implementation scheme is as follows: determining pronunciation information of the search term; determining pronunciation information of a hit word in each retrieval result aiming at each retrieval result in at least one retrieval result corresponding to the retrieval word; and filtering at least one search result according to the pronunciation information of the search word and the pronunciation information of the hit word in each search result.

Description

Method, device, equipment and storage medium for filtering search results
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to the field of intelligent search technology.
Background
In the Chinese characters, a large number of polyphones exist, and the polyphones mean that one character has two or more pronunciations, and different pronunciations have different meanings, different usages and different parts of speech.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, storage medium, and program product for filtering search results.
According to an aspect of the present disclosure, there is provided a method of filtering search results, including: determining pronunciation information of the search term; determining pronunciation information of a hit word in each retrieval result aiming at each retrieval result in at least one retrieval result corresponding to the retrieval word; and filtering the at least one search result according to the pronunciation information of the search word and the pronunciation information of the hit word in each search result.
According to another aspect of the present disclosure, there is provided an apparatus for filtering search results, including: the first determining module is used for determining pronunciation information of the search terms; the second determining module is used for determining pronunciation information of a hit word in each retrieval result aiming at each retrieval result in at least one retrieval result corresponding to the retrieval word; and the filtering module is used for filtering the at least one retrieval result according to the pronunciation information of the retrieval words and the pronunciation information of the hit words in each retrieval result.
Another aspect of the present disclosure provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.
According to another aspect of the disclosed embodiments, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method shown in the disclosed embodiments.
According to another aspect of the embodiments of the present disclosure, there is provided a computer program product, a computer program, which when executed by a processor implements the method shown in the embodiments of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates a flow diagram of a method of filtering search results according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of determining pronunciation information for a retrieved word in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a diagram of a prefix tree according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a method of determining pronunciation information for a retrieved word according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a method of determining pronunciation information for a hit word in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method of determining pronunciation information for a hit word according to another embodiment of the present disclosure;
FIG. 7 schematically shows a schematic diagram of a group retrieval method according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of an apparatus for filtering search results according to an embodiment of the present disclosure; and
FIG. 9 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 schematically shows a flow chart of a method of filtering search results according to an embodiment of the present disclosure.
As shown in fig. 1, the method includes determining pronunciation information of a search term in operation S110.
According to an embodiment of the present disclosure, a search term may be a related word used for expressing search content, and in this embodiment, the search term may include at least one character.
According to an embodiment of the present disclosure, the search term may be obtained from a search request (query). For example, a search request may be acquired, and in the case where the search request includes a special character, the special character may be replaced with a blank space to obtain a search term. The special characters may include punctuation marks, operation symbols, separators, and the like.
According to embodiments of the present disclosure, pronunciation information may be used to represent pronunciation of a word. Illustratively, in this embodiment, the pronunciation information may include pinyin, for example.
Then, in operation S120, for each of at least one search result corresponding to the search word, pronunciation information of a hit word in each search result is determined.
According to the embodiment of the disclosure, the search operation can be executed aiming at the search word, and at least one search result is obtained. Wherein each search result in the at least one search result comprises at least one hit word, and the hit word is matched with the search word.
According to the embodiment of the disclosure, the field of the hit search word in the search result can be obtained as the hit word. According to other embodiments of the present disclosure, a preset number of fields before and after a field of a hit search word may also be obtained, and the fields and the field of the hit search word may be used together as the hit search word. The preset number can be determined according to actual needs. For example, in the present embodiment, the preset number may be 3.
For example, a group search operation may be performed on a search term, resulting in group information corresponding to the search term, including, for example, a group name, a group member name, group member business card information, and so on. The field of the hit search word contained in the group information and 3 characters before and after the field are used as the hit word.
In operation S130, at least one search result is filtered according to the pronunciation information of the search term and the pronunciation information of the hit term in each search result.
According to the embodiment of the disclosure, whether the pronunciation information of the hit word in the search result matches with the pronunciation information of the search word can be determined for each search result. And if the pronunciation information of the hit word is not matched with the pronunciation information of the search word, the hit word is irrelevant to the search content expressed by the search word. Based on the above, the search results in which the pronunciation information of the hit word in the at least one search result does not match the pronunciation information of the search word may be deleted to filter the at least one search result.
For example, it may be determined whether the pronunciation information of each hit word in the search result includes the pronunciation information of the search word, and if the pronunciation information of the hit word includes the pronunciation information of the search word, it is determined that the hit word matches the search word. And if all the hit words in the retrieval result are not matched with the retrieval words, deleting the retrieval result.
According to the method for filtering the search result, the correct pronunciation information of the search word can be identified, the search result is screened according to the pronunciation information, and the accuracy of the search can be improved.
According to other embodiments of the present disclosure, there are multiple common sounds for a partial polyphonic word, such as a name or the like. For example, the word "strike" has two pronunciations "ba, liao" and "ba, le". Based on this, in the process of determining the pronunciation information of the search word or the hit word, the search word or the hit word can be allowed to have a plurality of pinyins. Exemplarily, in this embodiment, for a polyphonic word having multiple common voices, the multiple common voices of the polyphonic word are stored in the prefix tree in the form of a regular expression, and when matching with a search word or a hit word, the regular expression is used for matching. For example, for the word "strike", the following regular expression (ba; liao; | ba; le;) can be stored in the prefix tree corresponding to the "strike". After matching with "this event", the "this event" sound information can be obtained as follows zhe; jian; shi; (ba; liao | ba; le). According to the embodiment of the disclosure, the convenience of adding and modifying the pronunciation information corresponding to the polyphonic words is improved by adopting the storage method.
Fig. 2 schematically shows a flow chart of a method of determining pronunciation information of a retrieved word according to an embodiment of the present disclosure.
As shown in fig. 2, the method 210 for determining pronunciation information of a search word includes determining a first polyphonic word in the search word and pronunciation information of the first polyphonic word using a polyphonic word dictionary in operation S211.
According to an embodiment of the present disclosure, a polyphonic dictionary includes a plurality of polyphonic words stored in the form of an affix tree, and pronunciation information of the plurality of polyphonic words. The polyphonic words comprise one or more characters, and at least one of the one or more characters is a polyphonic character.
According to the embodiment of the disclosure, a forward longest matching algorithm can be utilized to determine a first polyphonic word in a search word which is matched with a word in a polyphonic word dictionary and pronunciation information of the first polyphonic word.
Then, in operation S212, in the case where there are other words than the first polyphonic word in the search word, pronunciation information of the other words in the search word is determined.
According to the embodiment of the disclosure, the pronunciation information of other words in the search word can be determined by the pinyin word segmentation device.
According to an embodiment of the present disclosure, the multipronunciation word dictionary may include at least one prefix tree data structure. Each prefix tree includes a root node and at least one child node. Wherein at least one child node comprises a leaf node. At each node, characters and corresponding speech information are stored. The leaf nodes store the voice information corresponding to the words formed by the characters of all the nodes from the root node to the leaf node.
According to the embodiment of the disclosure, for each polyphone, the character pinyin combination corresponding to the polyphone can be stored through the prefix tree structure, for example, for the polyphone "changjiang", the information such as "chang jiang", "long jiang", and "chang jiang" can be stored in the prefix tree.
Fig. 3 schematically shows a schematic diagram of a prefix tree according to an embodiment of the present disclosure.
As shown in fig. 3, prefix tree 300 includes root node 310, and root node 310 is used to store characters "long" and "long" reading information chang and zhang. The root node 310 has two child nodes 321, 322, wherein the node 321 is used to store the characters "river" and the pronunciation information jiang of "river". Meanwhile, the node 321 is also a leaf node, and is further configured to store the pronunciation information chang, jiang of the word "Yangtze river" formed by characters of each node from the root node 310 to the child node 321. Similarly, the child node 322 is configured to store the pronunciation information da of the characters "large", "big", and the pronunciation information zhang, da of "big".
The related technology converts each character in the search word into pronunciation information, and recalls the search result according to the pronunciation information. The related art can cause the false recall when the polyphone exists in the search word. According to the embodiment of the disclosure, the pronunciation information of the search word is determined according to the multi-pronunciation dictionary, and the determined pronunciation information is more accurate.
Fig. 4 schematically shows a flow chart of a method of determining pronunciation information for a retrieved word according to another embodiment of the present disclosure.
As shown in fig. 4, the method for determining polyphonic characters and corresponding pronunciation information in a search term includes acquiring a first character in the search term as a current character in operation S411.
In operation S412, it is determined whether the current character matches a root node of each prefix tree in the polyphonic dictionary, and if so, the root node is determined as the current node, and operation S413 is performed, otherwise, operation S417 is performed.
In operation S413, it is determined whether the end of the search term is reached. If the end of the search term is reached, operation S4110 is performed. Otherwise, operation S414 is performed.
According to the embodiment of the disclosure, if the current character is the last character of the search word, it indicates that the end of the search word is reached.
In operation S414, a character next to the current character in the search term is obtained as a new current character.
In operation S415, it is determined whether the current character matches a child node of the current node. If so, the jump is performed in operation S413. Otherwise, the jump is performed in operation S416.
According to the embodiment of the present disclosure, if the current node has no child node, the operation S416 is executed by jumping.
In operation S416, for each matched character, the pinyin of the node to which each character is matched is determined as the pinyin for the character.
According to the embodiment of the disclosure, if the last matched node is a leaf node, it indicates that the matched characters form a polyphone, so that the pronunciation information of the polyphone stored in the leaf node can be directly used as the pronunciation information of the characters.
In operation S417, a pinyin for the current character is determined using the pinyin word splitter.
In operation S418, it is determined whether the end of the term is reached. If the end of the search term is reached, operation S4111 is performed. Otherwise, operation S419 is performed.
In operation S419, a character next to the current character in the search term is acquired as a new current character. The jump performs operation S412.
In operation S4110, for each matched character, a pinyin of a node to which each character is matched is determined as a pinyin for the character.
In operation S4111, a pinyin for a term is output.
The following further describes the method for determining the pronunciation information of the search term with reference to a specific embodiment. Those skilled in the art will appreciate that the following example embodiments are only for the understanding of the present disclosure, and the present disclosure is not limited thereto.
Illustratively, in this embodiment, the search word is "Yangtze river No. 1", and the polyphonic dictionary includes a prefix tree. The prefix tree includes a root node, and the root node stores the characters 'long' and 'long' pronunciation information chang and zhang. The root node has child nodes for storing the pronunciation information jiang of the characters "river" and "jian" and the pronunciation information chang, jian of the word "changjiang" composed of the characters of the root node and the child nodes.
According to the embodiment of the disclosure, the first character "long" in the search term is read, and whether a matched prefix tree root node exists is determined. In this embodiment, if the root node of the prefix tree matches the character "long", it is continuously determined whether a character following the "long" character matches a child node of the root node, and if so, it is continuously determined whether a next character matches until all continuously matched characters are found. The forward maximum matching result obtained in this embodiment is "Changjiang" and the corresponding pinyin is chang, jiang. If the following characters "1" and "number" cannot be matched, the characters are converted into pinyin yi and hao respectively. And finally obtaining the pinyin of the search term: chang, jiang, yi, hao.
Fig. 5 schematically shows a flowchart of a method of determining pronunciation information of a hit word according to an embodiment of the present disclosure.
As shown in fig. 5, the method 520 of determining pronunciation information of a hit word includes determining a second polyphonic word and pronunciation information of the second polyphonic word in the hit word using a polyphonic word dictionary for each hit word in the search result in operation S521.
According to an embodiment of the present disclosure, a second polyphonic word in the hit word that matches a word in the polyphonic dictionary and pronunciation information of the second polyphonic word may be determined, for example, using a forward longest match algorithm.
In operation S522, in the case where there are other words except for the second polyphonic word in the hit word, pronunciation information of the other words in the hit word is determined.
According to the embodiment of the disclosure, pronunciation information of other words in the hit word can be determined by a pinyin word splitter, for example.
For example, a search may be performed according to a search term to obtain one or more search results, where each search result includes a hit word matching the search term. Similarly to the determination of the pronunciation information of the search word, the pronunciation information of the hit word is determined.
And determining whether the pinyin of each hit word is matched with the pinyin of the search word, and deleting the search result of the hit word if the pinyin of each hit word is not matched with the pinyin of the search word so as to filter out incorrect search results and improve the accuracy of the search.
Fig. 6 schematically shows a flowchart of a method of determining pronunciation information of a hit word according to another embodiment of the present disclosure.
As shown in fig. 6, the method for determining polyphonic characters and corresponding pronunciation information in a hit word includes acquiring a first character of the hit word as a current character in operation S621.
In operation S622, it is determined whether the current character matches a root node of each prefix tree in the polyphonic dictionary, and if so, the root node is determined as the current node, and operation S613 is performed, otherwise, operation S627 is performed.
In operation S623, it is determined whether the end of the hit word is reached. If the end of the hit word is reached, operation S6210 is performed. Otherwise, operation S624 is performed.
According to the embodiment of the disclosure, if the current character is the last character of the search word, it indicates that the end of the search word is reached.
In operation S624, a character next to the current character in the hit word is acquired as a new current character.
In operation S625, it is determined whether the current character matches a child node of the current node. If so, the operation S623 is skipped to execute. Otherwise, the jump is performed in operation S626.
According to the embodiment of the present disclosure, if the current node has no child node, the jump is performed in operation S626.
In operation S626, for each matched character, the pinyin of the node to which each character is matched is determined as the pinyin for the character.
According to the embodiment of the disclosure, if the last matched node is a leaf node, it indicates that the matched characters form a polyphone, so that the pronunciation information of the polyphone stored in the leaf node can be directly used as the pronunciation information of the characters.
In operation S627, the pinyin for the current character is determined using the pinyin word splitter.
In operation S628, it is determined whether the end of the hit word is reached. If the end of the hit word is reached, operation S6211 is performed. Otherwise, operation S629 is performed.
In operation S629, a character next to the current character in the hit word is acquired as a new current character. The jump is performed in operation S622.
In operation S6210, for each matched character, the pinyin of the node to which each character is matched is determined as the pinyin for the character.
In operation S6211, a pinyin for the hit word is output.
According to the embodiment of the disclosure, the pronunciation information of the hit word is determined according to the multi-pronunciation dictionary, and the determined pronunciation information is more accurate.
The method as shown above is further explained below with reference to fig. 7 in conjunction with specific embodiments. Those skilled in the art will appreciate that the following example embodiments are only for the understanding of the present disclosure, and the present disclosure is not limited thereto.
The method for filtering the search result according to the embodiment of the present disclosure can be applied to an application scenario of group search, for example.
Fig. 7 schematically shows a schematic diagram of a group retrieval method according to an embodiment of the present disclosure.
In fig. 7, it is shown that in operation S710, pronunciation information of a search term is determined. The method for acquiring the pronunciation information of the search term can be referred to above, and is not described herein again.
In operation S720, a search is performed for the search term, and a plurality of search results are obtained, wherein the search results correspond to a plurality of hit domains. Illustratively, in the present embodiment, the search results may be divided into a search result 71 for a pinyin for a hit group member, a search result 72 for a pinyin for a business card for a hit group member, a search result 73 for a pinyin for a hit group name, and a search result 74 for a hit in another domain, depending on the hit domain.
According to the embodiment of the disclosure, in the process of retrieval, whether to filter part or all of the retrieval result can be determined according to actual requirements, and the flexibility is higher. In this embodiment, the search result 73 of the pinyin for the name of the hit group is filtered, and the search result 71 of the pinyin for the name of the hit group member, the search result 72 of the pinyin for the name card of the hit group member, and the search result 74 of the pinyin for the name of the hit group member in other fields are not filtered.
In operation S730, for each search result in the search results 74 of the hit group name pinyin, a field of each hit search word in the search result, the first 3 characters and the last 3 characters of each field are determined as the hit word.
In operation S740, pronunciation information of each hit word is determined.
In operation S750, it is determined whether the pronunciation information of each hit word in the search result matches the pronunciation information of the search word, and if the pronunciation information of all the hit words in the search result does not match the search word, operation S760 is performed, otherwise operation S770 is performed.
In operation S760, the search result is deleted to filter out incorrect search results.
In operation S770, the filtered search result of the name pinyin for the hit group is output together with the name pinyin for the hit group member, and other search results.
According to the embodiment of the disclosure, the retrieval result is filtered through the pronunciation information, so that the accuracy of the retrieval result can be improved.
Fig. 8 schematically shows a block diagram of an apparatus for filtering search results according to an embodiment of the present disclosure.
As shown in fig. 8, the apparatus 800 for filtering search results may include a first determining module 810, a second determining module 820, and a filtering module 830.
The first determining module 810 may be configured to determine pronunciation information of a search term.
The second determining module 820 may be configured to determine, for each search result of the at least one search result corresponding to the search word, pronunciation information of a hit word in each search result.
The filtering module 830 may be configured to filter at least one search result according to the pronunciation information of the hit word in each search result.
According to an embodiment of the present disclosure, the first determination module may include a first determination submodule and a second determination submodule. The first determining submodule may be configured to determine a first polyphonic word in the search word and pronunciation information of the first polyphonic word by using a polyphonic word dictionary. The second determining submodule may be configured to determine pronunciation information of other words in the search term when there are other words in the search term except for the first polyphonic word.
According to an embodiment of the present disclosure, the polyphonic word dictionary may include a plurality of polyphonic words stored in the form of an prefix tree, and pronunciation information of the plurality of polyphonic words.
According to an embodiment of the present disclosure, the first determining submodule includes a first determining unit, and is configured to determine, by using a forward longest matching algorithm, a first polyphonic word in the search word that matches a word in a polyphonic word dictionary and pronunciation information of the first polyphonic word.
According to an embodiment of the present disclosure, the second determination module may include a third determination submodule and a fourth determination submodule. The third determining submodule may be configured to determine, for each hit word in the search result, a second polyphonic word in the hit word and pronunciation information of the second polyphonic word by using a polyphonic word dictionary. And the fourth determining submodule can be used for determining the pronunciation information of other words in the hit word under the condition that other words except the second polyphonic word exist in the hit word.
According to an embodiment of the present disclosure, the third determining sub-module may include a second determining unit, which may be configured to determine a second polyphonic word of the hit word that matches a word in a polyphonic word dictionary and pronunciation information of the second polyphonic word using a forward longest matching algorithm.
According to an embodiment of the present disclosure, the filtering module may include a matching sub-module and a deleting sub-module. The matching sub-module may be configured to determine, for each search result, whether the pronunciation information of the hit word in the search result matches the pronunciation information of the search word. And the deleting submodule can be used for deleting the search results of which the pronunciation information of the hit word in the at least one search result is not matched with the pronunciation information of the search word.
According to an embodiment of the present disclosure, the apparatus for filtering search results may further include a search module, which may be configured to perform a search operation on the search term to obtain at least one search result, where each search result in the at least one search result includes at least one hit word.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a method of filtering a search result. For example, in some embodiments, the method of filtering search results may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, may perform one or more of the steps of the above described method of filtering search results. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of filtering the retrieval results.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A method of filtering search results, comprising:
determining pronunciation information of the search term;
determining pronunciation information of a hit word in each retrieval result aiming at each retrieval result in at least one retrieval result corresponding to the retrieval word; and
and filtering the at least one search result according to the pronunciation information of the search word and the pronunciation information of the hit word in each search result.
2. The method of claim 1, wherein the determining pronunciation information for a term comprises:
determining a first polyphonic word in the search word and pronunciation information of the first polyphonic word by using a polyphonic word dictionary; and
and determining pronunciation information of other words in the search word under the condition that the other words except the first polyphonic word exist in the search word.
3. The method of claim 2, wherein the polyphonic dictionary comprises a plurality of polyphonic words stored in an affix tree and pronunciation information for the plurality of polyphonic words; the determining a first polyphonic word in the search word and pronunciation information of the first polyphonic word by using a polyphonic word dictionary comprises:
and determining a first polyphonic word matched with words in a polyphonic word dictionary in the search words and pronunciation information of the first polyphonic word by utilizing a forward longest matching algorithm.
4. The method of claim 1, wherein the determining pronunciation information of the hit word in each search result comprises:
aiming at each hit word in the retrieval result, determining a second polyphone in the hit word and pronunciation information of the second polyphone by using a polyphone dictionary; and
and determining pronunciation information of other words in the hit word under the condition that other words except the second polyphonic word exist in the hit word.
5. The method of claim 4, wherein the polyphonic dictionary comprises a plurality of polyphonic words stored in an affix tree and pronunciation information for the plurality of polyphonic words; the determining a second polyphonic word in the hit word and pronunciation information of the second polyphonic word by using a polyphonic word dictionary includes:
and determining a second polyphonic word matched with the words in the polyphonic word dictionary in the hit words and the pronunciation information of the second polyphonic word by utilizing a forward longest matching algorithm.
6. The method of claim 1, wherein said filtering said at least one search result according to pronunciation information of said search term and pronunciation information of a hit term in said each search result comprises:
determining whether the pronunciation information of the hit word in the retrieval result is matched with the pronunciation information of the retrieval word or not according to each retrieval result; and
and deleting the retrieval results of which the pronunciation information of the hit word is not matched with the pronunciation information of the retrieval word in the at least one retrieval result.
7. The method of claim 1, further comprising:
and executing a retrieval operation aiming at the retrieval word to obtain at least one retrieval result, wherein each retrieval result in the at least one retrieval result comprises at least one hit word.
8. An apparatus for filtering search results, comprising:
the first determining module is used for determining pronunciation information of the search terms;
the second determining module is used for determining pronunciation information of a hit word in each retrieval result aiming at each retrieval result in at least one retrieval result corresponding to the retrieval word; and
and the filtering module is used for filtering the at least one retrieval result according to the pronunciation information of the retrieval words and the pronunciation information of the hit words in each retrieval result.
9. The apparatus of claim 8, wherein the first determining means comprises:
the first determining submodule is used for determining a first polyphonic word in the search word and pronunciation information of the first polyphonic word by utilizing a polyphonic word dictionary; and
and the second determining submodule is used for determining the pronunciation information of other words in the search word under the condition that the other words except the first polyphonic word exist in the search word.
10. The apparatus of claim 9, wherein the polyphonic dictionary comprises a plurality of polyphonic words stored in an affix tree and pronunciation information for the plurality of polyphonic words; the first determination submodule includes:
and the first determining unit is used for determining a first polyphonic word matched with words in a polyphonic word dictionary in the search words and pronunciation information of the first polyphonic word by utilizing a forward longest matching algorithm.
11. The apparatus of claim 8, wherein the second determining means comprises:
a third determining submodule, configured to determine, for each hit word in the search result, a second polyphonic word in the hit word and pronunciation information of the second polyphonic word by using a polyphonic word dictionary; and
and the fourth determining submodule is used for determining the pronunciation information of other words in the hit word under the condition that other words except the second polyphonic word exist in the hit word.
12. The apparatus of claim 11, wherein the polyphonic dictionary comprises a plurality of polyphonic words stored in an affix tree and pronunciation information for the plurality of polyphonic words; the third determination submodule includes:
and the second determining unit is used for determining a second polyphonic word matched with the words in the polyphonic word dictionary in the hit words and the pronunciation information of the second polyphonic word by utilizing a forward longest matching algorithm.
13. The apparatus of claim 8, wherein the filtering module comprises:
the matching submodule is used for determining whether the pronunciation information of the hit word in the retrieval result is matched with the pronunciation information of the retrieval word or not according to each retrieval result; and
and the deleting submodule is used for deleting the search results of which the pronunciation information of the hit word is not matched with the pronunciation information of the search word in the at least one search result.
14. The apparatus of claim 8, further comprising:
and the retrieval module is used for executing retrieval operation aiming at the retrieval words to obtain at least one retrieval result, wherein each retrieval result in the at least one retrieval result comprises at least one hit word.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202110841839.8A 2021-07-23 2021-07-23 Method, device, equipment and storage medium for filtering search result Active CN113569010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110841839.8A CN113569010B (en) 2021-07-23 2021-07-23 Method, device, equipment and storage medium for filtering search result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110841839.8A CN113569010B (en) 2021-07-23 2021-07-23 Method, device, equipment and storage medium for filtering search result

Publications (2)

Publication Number Publication Date
CN113569010A true CN113569010A (en) 2021-10-29
CN113569010B CN113569010B (en) 2023-12-12

Family

ID=78167216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110841839.8A Active CN113569010B (en) 2021-07-23 2021-07-23 Method, device, equipment and storage medium for filtering search result

Country Status (1)

Country Link
CN (1) CN113569010B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768681A (en) * 2012-06-26 2012-11-07 北京奇虎科技有限公司 Recommending system and method used for search input
CN110008463A (en) * 2018-11-15 2019-07-12 阿里巴巴集团控股有限公司 Method, apparatus and computer-readable medium for event extraction
CN112307183A (en) * 2020-10-30 2021-02-02 北京金堤征信服务有限公司 Search data identification method and device, electronic equipment and computer storage medium
CN112527819A (en) * 2020-12-08 2021-03-19 北京百度网讯科技有限公司 Address book information retrieval method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768681A (en) * 2012-06-26 2012-11-07 北京奇虎科技有限公司 Recommending system and method used for search input
WO2014000517A1 (en) * 2012-06-26 2014-01-03 北京奇虎科技有限公司 Recommendation system and method for input searching
CN110008463A (en) * 2018-11-15 2019-07-12 阿里巴巴集团控股有限公司 Method, apparatus and computer-readable medium for event extraction
CN112307183A (en) * 2020-10-30 2021-02-02 北京金堤征信服务有限公司 Search data identification method and device, electronic equipment and computer storage medium
CN112527819A (en) * 2020-12-08 2021-03-19 北京百度网讯科技有限公司 Address book information retrieval method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨宪泽;谈文蓉;刘玉萍;张楠;殷锋;: "汉语同音字和多音字处理方法研究", 计算机与现代化, no. 02 *

Also Published As

Publication number Publication date
CN113569010B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN112153206B (en) Contact person matching method and device, electronic equipment and storage medium
CN112528641A (en) Method and device for establishing information extraction model, electronic equipment and readable storage medium
CN113850386A (en) Model pre-training method, device, equipment, storage medium and program product
CN114399772B (en) Sample generation, model training and track recognition methods, devices, equipment and media
CN112527819B (en) Address book information retrieval method and device, electronic equipment and storage medium
CN115145924A (en) Data processing method, device, equipment and storage medium
CN113408660B (en) Book clustering method, device, equipment and storage medium
CN113408273B (en) Training method and device of text entity recognition model and text entity recognition method and device
CN115858773A (en) Keyword mining method, device and medium suitable for long document
CN113408280A (en) Negative example construction method, device, equipment and storage medium
CN113157877A (en) Multi-semantic recognition method, device, equipment and medium
CN113869046B (en) Method, device and equipment for processing natural language text and storage medium
CN113569010B (en) Method, device, equipment and storage medium for filtering search result
JP5921601B2 (en) Speech recognition dictionary update device, speech recognition dictionary update method, program
CN112560481B (en) Statement processing method, device and storage medium
CN113204616B (en) Training of text extraction model and text extraction method and device
CN113051926B (en) Text extraction method, apparatus and storage medium
CN115035890A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN114417862A (en) Text matching method, and training method and device of text matching model
CN113868254A (en) Method, device and storage medium for removing duplication of entity node in graph database
JP6840862B2 (en) Utterance sentence generation system and utterance sentence generation program
CN113051896A (en) Method and device for correcting text, electronic equipment and storage medium
CN112860626A (en) Document sorting method and device and electronic equipment
CN106598936B (en) Letter word extraction method and device
US20240221727A1 (en) Voice recognition model training method, voice recognition method, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant