CN117251521A

CN117251521A - Content searching method, content searching device, computer equipment, storage medium and product

Info

Publication number: CN117251521A
Application number: CN202210657374.5A
Authority: CN
Inventors: 占浪
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2023-12-19

Abstract

The embodiment of the application discloses a content searching method, a content searching device, computer equipment, a storage medium and a product, wherein at least two query keywords are obtained; inquiring a plurality of preset index sets based on the inquiry keywords, and determining a target index set containing the first inquiry keywords and second inquiry keywords of which the corresponding index sets are not inquired; screening expected search content from candidate search content with a corresponding relation with a target index set; calculating the similarity between the second query keyword and the content segmentation word of the expected search content to obtain set feedback data; and updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword. The scheme can reduce the requirement on the accuracy of the input of the query text and improve the accuracy of content search.

Description

Content searching method, content searching device, computer equipment, storage medium and product

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a content searching method, apparatus, computer device, storage medium and product, where the storage medium is a computer readable storage medium, and the product is a computer program product.

Background

In the content searching scenario, searching may be performed according to the content input by the user, and the search result is returned to the user, where a scheme one may be generally adopted: according to the whole fuzzy search scheme, complete input information of a user is used as a single condition to search, only content containing the information which needs to be completely input by the user can be searched, the search effect is general, accurate information needs to be input by the user, and any incorrect input can lead to inaccurate search results or the occurrence of inexistence of the search results. Scheme two may also be employed: the common word segmentation searching scheme is also used for searching based on the word segmentation, so that the content containing the word segmentation can be searched, the more effective information is input by a user, the better the searching effect is, but certain requirements are met on the normalization of the input of the user, if the user uses the word segmentation searching scheme for searching for short, the effect is poor, and the accuracy is not high.

Disclosure of Invention

The embodiment of the application provides a content searching method, a content searching device, computer equipment, a storage medium and a product, which reduce the requirement on the accuracy of query text input and improve the accuracy of content searching.

The content searching method provided by the embodiment of the application comprises the following steps:

acquiring a query text, wherein the query text comprises at least two query keywords;

inquiring a plurality of preset index sets based on the inquiry keywords, and determining a target index set containing a first inquiry keyword and a second inquiry keyword which does not inquire a corresponding index set, wherein the first inquiry keyword and the second inquiry keyword are contained in the inquiry keyword;

screening expected search contents of the query text from candidate search contents with corresponding relations with the target index set;

calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain set feedback data;

and updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword.

Correspondingly, the embodiment of the application also provides a content searching device, which comprises:

The acquisition unit is used for acquiring and inquiring the text, wherein the inquiring text comprises at least two inquiring keywords;

the query unit is used for querying a plurality of preset index sets based on the query keywords, determining a target index set containing a first query keyword and a second query keyword which does not query the corresponding index set, wherein the first query keyword and the second query keyword are contained in the query keywords;

a screening unit, configured to screen desired search content of the query text from the candidate search content having a correspondence with the target index set;

the generation unit is used for calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain aggregate feedback data;

and the updating unit is used for updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword.

In an embodiment, the generating unit includes:

The word segmentation subunit is used for carrying out word segmentation processing on the expected search content to obtain at least one content word;

a combination subunit, configured to combine, for each second query keyword, the second query keyword with each content word segment, to obtain a plurality of synonymous phrases associated with each second query keyword;

the calculating subunit is used for calculating the similarity between the second keyword and the content word segmentation contained in the synonymous word group to obtain the similarity of the synonymous word group;

and the data is used as a subunit, and the similarity between the synonymous word group and the synonymous word group is used as the aggregate feedback data.

In an embodiment, the updating unit includes: the phrase is used as a subunit and is used for taking the synonymous phrases with the similarity meeting the preset condition as target synonymous phrases of the second query keywords aiming at each second query keyword;

and the adding subunit is used for adding the second query keyword to a target index set where the content word segmentation in the target synonymous word group is located, so as to obtain the updated index set.

In an embodiment, the adding subunit includes:

the phrase acquisition module is used for acquiring historical target synonymous phrases of the second query keywords;

The statistics module is used for carrying out data statistics on the target synonymous phrase and the historical target synonymous phrase to obtain a statistics result;

the word segmentation determining module is used for determining target content word segmentation from the content word segmentation contained in the target synonymous phrase and the historical target synonymous phrase according to the statistical result;

and the set updating module is used for adding the second query keyword to a target index set where the target content word is located to obtain the updated index set.

In an embodiment, the screening unit comprises:

the data statistics subunit is used for carrying out data statistics on the candidate search contents to obtain the queried times of each candidate search content;

a selecting subunit, configured to select target search content from the candidate search content according to the queried times;

a content determination subunit configured to determine a desired search content from the target search content according to a content selection operation for the target search content.

In an embodiment, the content search device further includes:

the index word segmentation unit is used for acquiring a plurality of candidate search contents and carrying out word segmentation on the candidate search contents to obtain a plurality of index word segments;

The identification unit is used for carrying out index synonym identification on the index participles to obtain an index set;

and the mapping unit is used for mapping the index set and the candidate search content to obtain the candidate search content and the index set with mapping relation.

In an embodiment, the identification unit comprises:

a content acquisition unit configured to acquire constituent content related to the candidate search content;

the synonymous expansion unit is used for synonymously expanding the composition content according to the text rule of the composition content to obtain an expanded content set of the composition content;

and the synonym recognition unit is used for carrying out synonym recognition on the index word according to the extended content set to obtain an index set.

Correspondingly, the embodiment of the application also provides computer equipment, which comprises a memory and a processor; the memory stores a computer program, and the processor is configured to run the computer program in the memory to perform any one of the content searching methods provided in the embodiments of the present application.

Accordingly, embodiments of the present application also provide a computer readable storage medium for storing a computer program loaded by a processor to perform any of the content search methods provided by the embodiments of the present application.

Accordingly, embodiments of the present application also provide a computer program product, including a computer program, which when executed by a processor implements any of the content search methods provided in the embodiments of the present application.

As can be seen from the above, in the embodiment of the present application, by obtaining a query text, the query text includes at least two query keywords; inquiring a plurality of preset index sets based on the inquiry keywords, and determining a target index set containing a first inquiry keyword and a second inquiry keyword which does not inquire the corresponding index set, wherein the first inquiry keyword and the second inquiry keyword are contained in the inquiry keyword; screening expected search contents of the query text from candidate search contents with corresponding relations with the target index set; calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain aggregate feedback data; and updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword.

According to the scheme, the preset index set and the candidate search content have a corresponding relation, and as the preset index set can comprise a plurality of indexes of the candidate search content, the candidate search content can be searched only by containing any one index in the index set in the query text, the requirement on the input accuracy of the query text can be reduced, the target index set is updated according to the expected search content, the number of indexes in the target index set can be increased, the requirement on the input accuracy of the query text is further reduced, and the accuracy of content search is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a scene diagram of a content searching method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a content search method provided by an embodiment of the present application;

FIG. 3 is another flow chart of a content search method provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of index setup provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a feedback mechanism provided by an embodiment of the present application;

fig. 6 is a schematic diagram of a content search apparatus provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The embodiment of the application provides a content searching method, a content searching device, computer equipment and a computer readable storage medium. The content search device may be integrated into a computer device, which may be a server or a terminal.

The terminal may include a mobile phone, a wearable intelligent device, a tablet computer, a notebook computer, a personal computer (PC, personal Computer), a car-mounted computer, and the like.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.

For example, as shown in fig. 1, a computer device is used as a server to explain, a client sends a query text to the server, the server queries a preset index set through query keywords contained in the query text, and for each query keyword, if the index set containing the query keyword can be queried, a target index set containing a first query keyword is obtained, otherwise, a second query keyword corresponding to the index set is not queried. The server sends candidate search contents corresponding to the target index set to the client, and the client responds to the content selection operation aiming at the client and sends a content selection instruction to the server so as to instruct the server to screen out expected search contents corresponding to the query text from the candidate search contents; then, generating set feedback data according to the query text and the expected search content; and updating the target index set corresponding to the expected search content based on the positive feedback data to obtain an updated index set so as to search the content through the updated index set.

The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

The present embodiment will be described from the viewpoint of a content search apparatus which may be integrated in a computer device, which may be a server or a terminal or the like.

As shown in fig. 2, a specific flow of the content searching method provided in the embodiment of the present application may be as follows:

101. query text is obtained, the query text including at least two query keywords.

The query text may be text for retrieving candidate search contents, for example, text input by a user through a terminal.

The query keywords may be word fragments obtained by performing word fragment processing on the query text.

For example, the method specifically may include obtaining a query text input by a user through a client, and performing word segmentation processing on the query text to obtain at least two query keywords.

102. And inquiring a plurality of preset index sets based on the inquiry keywords, determining a target index set containing the first inquiry keywords and a second inquiry keyword which does not inquire the corresponding index set, wherein the first inquiry keywords and the second inquiry keywords are contained in the inquiry keywords.

The index set may include a preset set, and the index set may have a correspondence with the candidate search content, and may determine the corresponding candidate search content according to the index set.

The candidate search content may include text, video, audio, pictures and other content, each candidate search content corresponds to at least one synonym set, specific content of the candidate search content may be flexibly set according to an application scenario, for example, in a scenario related to banking such as binding a bank card, the candidate search content may include content such as a bank name, and optionally, one synonym set may correspond to at least one candidate search content.

The first query keyword may include a keyword in the query keywords that can query a corresponding target index set, and the second query keyword may include a keyword in the query keywords that cannot query a corresponding target index set, where the target index set corresponding to the first query keyword is a target index keyword that includes the first query keyword.

Because the index set contains synonyms obtained by carrying out index synonym recognition on candidate search contents, the candidate search contents matched with the index set can be queried based on the index set.

For example, the method specifically may be based on whether the query keyword queries an index set containing the query keyword, if yes, a target index set containing the query keyword is obtained, that is, the query keyword is a first query keyword, the target index set is an index set containing the first query keyword, and if not, the query keyword is a second query keyword. The index set includes synonyms obtained by identifying index synonyms of candidate search content, specifically, before the step of querying a plurality of preset index sets based on query keywords to determine a target index set including a first query keyword and a second query keyword not querying a corresponding index set, the method for searching content provided by the embodiment of the application may specifically further include:

Acquiring a plurality of candidate search contents, and performing word segmentation processing on the candidate search contents to obtain a plurality of index word segments;

carrying out index synonym recognition on the index segmentation words to obtain an index set;

and mapping the index set and the candidate search content to obtain the candidate search content and the index word set with mapping relation.

For example, a plurality of candidate search contents may be obtained from a database or a blockchain, word segmentation is performed on the candidate search contents to obtain index words corresponding to the candidate search contents, index synonym recognition is performed on the index words to obtain at least one index synonym corresponding to each index word, the at least one index synonym is generated into an index set, a corresponding index set can be obtained for each index word, a mapping relation between the index set and the candidate search contents may be established, the index set generated according to each index word is an index set matched with the candidate search contents, and each index synonym in the index set is an index of the candidate search contents.

It can be appreciated that, the synonyms contained in the index set are not comprehensive, or the synonyms are not updated timely, so that the corresponding index set cannot be queried by the second query keyword input by the user, and therefore, the second query keyword can be added into the proper index set. Thus, the index set may include other synonyms, such as query keywords entered during a user search, in addition to index tokens and their synonyms.

In an embodiment, a synonym set may be preset to obtain a synonym of an index word, and the index set may be generated according to a feature of a candidate search content, for example, the candidate content search includes the same sub-content, a synonym set may be generated according to the sub-content, and an index set of the candidate search content may be obtained according to the synonym set of the sub-content, that is, the step of "identifying the synonym of the index word to obtain the index set" includes:

acquiring composition content related to the candidate search content;

carrying out synonymous expansion on the composition content according to the text rule of the composition content to obtain an expansion content set of the composition content;

and carrying out synonym recognition on the index segmentation according to the extended content set to obtain an index set.

The component content may be a sub-content of the candidate search content, for example, the candidate search content is a bank branch name, because the bank branch name will generally include a bank name, and then the component content may be a bank name.

The text rule is a naming policy or the like for composing the content, for example, a bank name is generally formed by a certain text rule: china+xx+bank, china+xx+co-worker, etc.

For example, the composition content of the candidate search content may be obtained, and the composition content may be divided according to a text rule of the composition content, for example, the composition content may be divided by country name, geographic name, and the like, to obtain a plurality of segmentation words. Combining the first words of the multiple word segmentation to obtain a synonymous expansion word; combining the first characters of the several segmentation words to obtain a synonymous expansion word; the method can also combine several of the segmented words to obtain a synonymous expanded word; or respectively combining a character in the multiple participles to obtain a synonymous expansion word, and obtaining an expansion content set according to the obtained synonymous expansion word.

The extended content set of the constituent content is taken as an index set containing candidate search content of the constituent content.

103. And screening the expected search content of the query text from the candidate search content with the corresponding relation with the target index set.

Wherein the desired search content is the content desired to be queried based on the query text, and the desired search content is the correct search content of the query text.

For example, specifically, at least one index set may be queried according to a query keyword included in a query text, each index set may correspond to at least one candidate search content, that is, at least one candidate search content may be queried based on the query text, a target index set is queried according to a first query keyword, and the target index set corresponds to at least one candidate search content, where the candidate search content is considered to be queried once, and a desired search content is obtained from the candidate search content according to the number of times (the number of times of being queried) that each candidate search content is queried by the query text.

Alternatively, the candidate search content may be further sent to the client for selection by the user, and the desired search content is obtained from the candidate search content in response to a content selection operation performed by the user on the candidate search content.

Optionally, the query text may hit a plurality of candidate search contents, so that it is difficult for the user to select the desired search content from the plurality of candidate search contents, and therefore, the target search content may be screened from the candidate search contents according to the number of times the search content is queried, and then the desired search content is obtained from the target search content in response to the content selection operation of the user on the target search content, that is, in an embodiment, the step of screening the desired search content of the query text from the candidate search contents having a correspondence with the target index set may specifically include:

carrying out data statistics on the candidate search contents to obtain the queried times of each candidate search content;

selecting target search content from the candidate search content according to the queried times;

according to the content selection operation for the target search content, the desired search content is determined from among the target search content.

For example, the data statistics may be performed on candidate search contents, and the number of times each candidate search content is queried by the query text is counted to obtain the queried number of times of each candidate search content, where the candidate search content whose queried number of times meets a preset threshold is used as the target search content; optionally, the candidate search contents may be ranked according to the number of times of being queried, so as to obtain ranked candidate search contents, and a preset number of candidate search contents are obtained from the ranked candidate search contents as target search contents; alternatively, when the number of candidate search contents is small, the candidate search contents are regarded as target search contents.

And sending the target search content to the client for the user to select the target search content to be queried at the client, and responding to the content selection operation aiming at the target search content, selecting the desired search content from the candidate search contents.

104. And calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain the aggregate feedback data.

For example, the method specifically may include performing word segmentation on the desired search content to obtain content words, calculating the similarity between the second query keyword and each content word, and using the similarity between the second query keyword and each content word as the set feedback data

Optionally, the second query keyword has an association relationship with the desired search content, so that each query keyword can be combined with each result word of the desired search content to obtain a plurality of synonymous word groups related to the query keyword, and the plurality of synonymous word groups related to each query keyword and the similarity thereof are used as the set feedback data. That is, in an embodiment, the step of "calculating the similarity between the second query keyword and the content word included in the desired search content to obtain the aggregate feedback data" may specifically include:

performing word segmentation processing on the expected search content to obtain at least one content word;

combining the second query keywords with each content word segment for each second query keyword to obtain a plurality of synonymous word groups associated with each second query keyword;

calculating the similarity between the second keywords contained in the synonymous phrases and the content word segmentation to obtain the similarity of the synonymous phrases;

and taking the synonymous phrase and the similarity of the synonymous phrase as the aggregate feedback data.

For example, the word segmentation processing may be specifically performed on the desired search content to obtain at least one content word segment, and for each second query keyword, the second query keyword and each content word segment are respectively combined to obtain a synonym phrase associated with the second query keyword, where the synonym phrase includes the second query keyword and the content word segment, the number of synonym phrases that can be obtained by combining one second query keyword is equal to the number of content word segments that are included in the desired search content, and the synonym phrase and the similarity between the second query keyword and the content word segment that are included in the synonym phrase are used as the set feedback data.

The similarity between the query keyword and the result word can be calculated through the distance between the word vector of the query keyword and the word vector of the result word, and optionally, the similarity between the query keyword and the result word can be calculated through an edit distance algorithm.

105. And updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword.

For example, the second query keyword with similarity greater than the preset threshold may be specifically added to the index set (target index set) containing the content segmentation, so as to update the target index set, so that the updated target index set containing the content segmentation may be queried based on the second query keyword, and further candidate search content corresponding to the target index set may be queried based on the second query keyword, for example, the desired search content.

According to the similarity between the target query keyword and the content segmentation word in the set feedback data, adding a second query keyword whose similarity satisfies a condition (for example, greater than a preset threshold value) into the target index set, that is, in an embodiment, the step of updating the second query keyword to the target index set corresponding to the desired search content based on the set feedback data to obtain an updated index set may specifically include: aiming at each second query keyword, taking the synonymous phrase with similarity meeting the preset condition as a target synonymous phrase of the second query keyword;

And adding the second query keyword into a target index set where the content word segmentation in the target synonymous word group is located, so as to obtain an updated index set.

For example, the synonym phrase with the similarity meeting the condition (for example, greater than a preset threshold or the similarity being the largest) may be specifically used as the target synonym phrase of the second query keyword, and the second query keyword is added to the target index set containing the content word in the target synonym phrase, so as to update the target index set, and obtain the updated index set.

Alternatively, since the candidate search content may correspond to one index set, the index set may include a plurality of sub-index sets, each sub-index set includes an index word obtained by performing index word segmentation on the candidate search content, and when different candidate search contents have the same index word, they may correspond to the same sub-index set, which contains the same index word.

The second query keyword may be added to the target index set where the content word is located, or the second query keyword may be added to the sub index set containing the content word.

After the sub-index set is updated, the second query keyword is input, so that not only the expected search content, but also other candidate search contents with a mapping relation with the sub-index set can be searched.

It can be understood that if the desired index set containing the content word segment does not exist, the second query keyword and the content word segment in the synonymous phrase are used as a target index set corresponding to the desired search content, and the updated index set is obtained.

Because the user may not query the corresponding index set by the target query keyword due to the input error, if the second query keyword is directly added into the desired index set, the accuracy of the query cannot be improved, but the resource waste is caused by the fact that the query speed is slow and more storage space is occupied due to the excessive number of synonyms contained in the index set, and besides the processing similarity satisfies the condition, whether the desired index set is updated based on the target query keyword or not can be determined according to the history record, that is, in an embodiment, the step of adding the second query keyword into the target index set where the content word in the target synonym phrase is located to obtain the updated index set may specifically include:

acquiring a historical target synonymous phrase of the second query keyword;

carrying out data statistics on the target synonymous phrase and the historical target synonymous phrase to obtain a statistical result;

Determining target content word segmentation from the content word segmentation contained in the target synonymous word group and the historical target synonymous word group according to the statistical result;

and adding the second query keyword into the target index set where the target result word is located, so as to obtain an updated index set.

The historical index set may include a second query keyword and a historical result word, where the historical result word is a result word corresponding to a desired search content obtained by searching for content in the past.

For example, when searching for content through a query text in the past, the query text includes a second query keyword, and a history synonym phrase is obtained based on the second query keyword, and the specific process may refer to the above description about obtaining the second synonym phrase, which is not described herein.

For example, specifically, for each second query keyword, acquiring a related historical target synonymous phrase, where content word segments contained in the target synonymous phrase and the historical target synonymous phrase may be different or the same, and performing data statistics on the target synonymous phrase and the historical target synonymous phrase based on the contained content word segments to obtain a statistical result, where the statistical result includes the number of word segments including different content word segments in the target synonymous phrase and the historical target synonymous phrase, and the total number of word segments in the target synonymous phrase and the historical target synonymous phrase; the accuracy of each content word segmentation is calculated according to the statistical result, for example, the ratio between the word group number of the word segmentation and the total word group number can be calculated, and the ratio is taken as the accuracy of the content word segmentation.

And taking the content word segmentation with the total phrase number larger than the preset number and the accuracy larger than the preset accuracy as a target content word segmentation, and adding the query keyword to a target index set where the target content word segmentation is positioned to obtain an updated index set.

If the index set where the target content word is located does not exist, the second query keyword and the target content word are used as a target index set, and an updated index set is obtained.

Alternatively, step 105 may be performed periodically, or may be processed based on a condition, for example, when the total number of phrases corresponding to the second query keyword satisfies the preset number, step 105 is performed.

On the basis of the above embodiments, examples will be described in further detail below.

The present embodiment will be described in terms of a content search apparatus, which may be integrated in a computer device, which may be a server or the like, taking candidate search content as an example of a bank branch name.

The content searching method provided by the embodiment of the application can comprise three parts: index creation, content search, and synonym refinement, as shown in fig. 3, the specific flow of the content search method may be as follows:

1. And (3) establishing an index:

201. the server acquires a plurality of bank branch names, and performs word segmentation processing on the bank branch names to obtain a plurality of index word segments.

For example, the server may obtain a plurality of bank branch names from a database or a blockchain, perform word segmentation on the bank branch names through an index word segmentation device to obtain index words corresponding to the bank branch names, and perform index synonym identification on the index words to obtain at least one index synonym corresponding to each index word.

202. And the server performs index synonym recognition on the index segmentation words to obtain an index set.

For example, as shown in fig. 4, the server generates a synonym set by using the synonym identifier for at least one index synonym, and a corresponding synonym set can be obtained for each index word.

For example, the bank branch names are xx city bank BVB branch, china A bank KKS branch and the like, the bank branch names are subjected to word segmentation, mixed word segmentation is adopted in word segmentation, words with containing relations are output together, for example, a city is one word, a city bank is also one word, and then index word segmentation comprises the city and the city bank.

Optionally, as shown in fig. 4, the server may further store a synonym library, where a plurality of synonym sets exist in the synonym library in advance, and match the synonym sets in the synonym library according to the index segmentations, so as to obtain the synonym set corresponding to each index segmentations.

Optionally, since the bank branch name includes a bank name, a synonym set related to the bank name may be stored in a synonym library, specifically, as shown in fig. 4, a bank name synonym generator may obtain a plurality of bank names, and perform synonym expansion on the bank names based on a naming rule of the bank names, for example, by performing word segmentation on the bank names to obtain a plurality of words, and combining the whole words or head/tail words of the words to obtain a plurality of bank name synonyms.

For example, the China Aa bank can perform word segmentation processing to obtain China, aa and banks, and the Chinese, aa and banks can be obtained by combining the word segmentation, such as Aa bank, middle A line or A line.

203. And the server maps the index set and the bank branch names to obtain the index set with a mapping relation with the bank branch names.

For example, as shown in fig. 4, the server may establish a mapping relationship between the index set and the bank branch name, and the index set generated according to each index word is an index set having a mapping relationship with the bank branch name.

2. Searching content:

204. the server acquires the query text input by the client and performs word segmentation processing on the query text to obtain query keywords.

For example, the method includes that a user inputs a query text on a user interface displayed on a client, the client sends the query text to a server, the server obtains the query text input by the user through the client, and word segmentation is performed on the query text to obtain at least one query keyword.

The client can be a client for bank-enterprise payment, which can provide management capabilities such as payment, account inquiry and the like for enterprises based on a bank account system, for example, functions such as single payment, batch payment, reimbursement, inquiry balance, inquiry account details, receipt and account checking and the like, help the enterprises solve the problems of asynchronous bank-enterprise information, industry-financial separation and the like in the self-owned system, and is mainly suitable for a Service platform of various Software-as-a-Service (SaaS) for providing internal services for the enterprises, and cooperatively improve the financial management efficiency of the enterprises.

205. The server queries a plurality of preset index sets according to the query keywords to obtain an index set corresponding to the first query keywords and second query keywords which do not query the corresponding index sets.

For example, as shown in fig. 4, the server may specifically query whether there is an index set including the query keyword based on the query keyword, and if so, obtain a target index set including the query keyword, that is, the query keyword is a first query keyword, and the target index set is an index set including the first query keyword, and if not, the query keyword is a second query keyword.

206. The server will have the bank branch name of mapping relation with the target index set.

Because the target index set contains the synonyms obtained by carrying out index synonym recognition on the bank branch names, the server obtains the bank branch names matched with the target index set based on the mapping relationship between the synonym set and the bank branch names.

207. The server selects a desired bank branch name from the bank branch names in response to a content selection operation for the client.

For example, data statistics may be performed on the bank branch names, and the number of times each bank branch name is queried by the query text is counted to obtain the queried number of times of each bank branch name, where the bank branch name whose queried number of times meets a preset threshold is used as target search content; optionally, sorting the bank branch names according to the number of times to be queried to obtain sorted bank branch names, and obtaining a preset number of bank branch names from the sorted bank branch names as target search content; optionally, when the number of the bank branch names is small, taking the bank branch names as target search contents; optionally, the bank branch names may be screened according to the query keyword, for example, if the query text input by the user includes the bank names, the bank branch names that do not include the bank names are filtered from the bank branch names, and then the filtered bank branch names are screened to obtain the target search content.

And sending the target search content to the client for the user to select the target search content to be queried at the client, and responding to the content selection operation aiming at the target search content, selecting the desired bank branch name from the bank branch names.

Optionally, the relevant information of the bank branch can be returned according to the requirement of the actual application scene, for example, the bank serial number of the bank branch is a unique identification mark of a regional bank.

3. Synonym refinement:

208. and the server generates aggregate feedback data according to the second query keyword and the expected bank branch name.

For example, as shown in fig. 5, the server may obtain a query text input by the user through the client and the selected desired bank branch name, and perform word segmentation processing on the desired bank branch name through the word segmentation tool to obtain at least one content word.

As shown in fig. 5, the server combines, for each second query keyword in the query text, the second query keyword with each content word segment through the filter, to obtain a synonym phrase associated with the second query keyword, where the synonym phrase includes the second query keyword and the content word segment, and the number of synonym phrases that can be obtained by combining one second query keyword is equal to the number of content word segments obtained by word segmentation processing of the desired bank branch name, and the similarity between the second query keyword and the content word segment in the synonym phrase and the synonym phrase is used as the set feedback data.

Optionally, a query keyword set corresponding to the query text is obtained according to a second query keyword contained in the query text, word segmentation is performed on the expected bank branch names, a content word segmentation set corresponding to the bank branch names can be obtained, cartesian products between the query keyword set and the content word segmentation set are calculated, and elements in the Cartesian products are a group of synonymous phrases. Calculating the similarity between the second query keyword and the content word in the synonymous word group by using an edit distance algorithm, wherein the edit distance between the second query keyword and the content word is ED _AB The edit distance is also called the lycenstant distance (Levenshtein Distance). Levenshtein Distance is an index for measuring the similarity of two strings. Colloquially, the edit distance refers to the minimum number of single character editing operations required to convert one string to another between two strings. Similarity calculation formula: similarity=1-ED _AB /max(L _A ,L _B ) L represents the length of the character string.

The editing distance is more than 1 (ED) _AB >1) And the synonymous phrase with highest similarity in each second query keyword and all content word fragments is used as a group of target synonymous phrases.

209. The server updates a target index set corresponding to the expected bank branch name based on the set feedback data to obtain an updated index set, so that the bank branch name is searched through the updated index set.

For example, as shown in fig. 5, the server may specifically use, as the target synonym phrase of the second query keyword, a synonym phrase whose similarity satisfies a condition (for example, greater than a preset threshold or the maximum similarity) for each second query keyword through the filter.

When content searching is carried out through a query text in the past, the query text contains second query keywords, historical synonymous phrases are obtained based on the second query keywords, and for each second query keyword, historical target synonymous phrases related to the second query keywords are obtained, wherein content word fragments contained in the target synonymous phrases and the historical target synonymous phrases are possibly different or the same, data statistics is carried out on the target synonymous phrases and the historical target synonymous phrases based on the contained content word fragments, statistical results are obtained, and the statistical results comprise the word fragment number of the occurrence times of different content word fragments in the target synonymous phrases and the historical target synonymous phrases and the total word fragment number of the target synonymous phrases and the historical target synonymous phrases; and calculating the accuracy of each content word according to the statistical result. For example, as shown in fig. 5, the server may calculate, by using the analyzer, a ratio between the number of word groups (E) and the total number of word groups (e+n), and take the ratio as an accuracy rate (ratio) of content word groups, where a calculation formula of the accuracy rate may be: ratio=e/(e+n), for each content word segment in the target synonym phrase and the history target synonym phrase associated with the second query keyword, the number of phrases containing the content word segment is E, the number of phrases not containing the content word segment is N, for example, the target synonym phrase is (a, B), (a, C), and (a, B), where a is the second query keyword, B and C are content word segments, for content word segment B, 2 phrases contain B,1 phrase does not contain B, so e=2, n=1.

As shown in fig. 5, the content word with the total word group number larger than the preset number and the accuracy larger than the preset accuracy is used as the target content word, and the second query keyword is added to the target index set where the target content word is located, so as to obtain the updated index set.

If the target index set where the target content word is located does not exist, the second query keyword and the target content word are used as one target index set, and an updated index set is obtained.

As shown in fig. 5, the server generates report output according to the target synonymous phrases and the similarity, accuracy, the total number of phrases and other data by using the analyzer for each second query keyword, so as to acquire the update condition, the abnormal condition and the like of the index set in time.

Synonym refining can automatically screen and input synonyms, and according to the use condition of a user, a server optimizes search query logic by continuously supplementing synonyms, so that the manual operation and maintenance cost is reduced, the omission problem that synonyms are difficult to be input manually is solved, and the user experience and the content search efficiency are improved.

As can be seen from the above, the server in the embodiment of the present application obtains a plurality of index participles by obtaining a plurality of bank branch names and performing participle processing on the bank branch names; carrying out index synonym recognition on the index segmentation words to obtain an index set; mapping the index set and the bank branch names to obtain an index set with a mapping relation with the bank branch names; acquiring a query text input through a client, and performing word segmentation processing on the query text to obtain a query keyword; inquiring a plurality of preset index sets according to the inquiry keywords to obtain an index set corresponding to the first inquiry keywords and a second inquiry keyword which does not inquire the corresponding index set; the bank branch names with the mapping relation with the index set; selecting a desired bank branch name from the bank branch names in response to a content selection operation for the client; generating set feedback data according to the second query keyword and the expected bank branch name; updating a target index set corresponding to the expected bank branch name based on the set feedback data to obtain an updated index set, so that the bank branch name is searched through the updated index set.

In order to facilitate better implementation of the content searching method provided in the embodiments of the present application, in an embodiment, a content searching apparatus is also provided. Where the meaning of a noun is the same as in the content search method described above, specific implementation details may be referred to the description in the method embodiment.

The content search apparatus may be integrated in a computer device, as shown in fig. 6, and may include: the acquisition unit 301, the inquiry unit 302, the screening unit 303, the generation unit 304, and the update unit 305 are specifically as follows:

(1) The acquisition unit 301: for retrieving and querying a text, the query text comprising at least two query keywords. In an embodiment, the content search device may further include an index word segmentation unit, an identification unit, and a mapping unit, specifically:

Index word segmentation unit: the method comprises the steps of obtaining a plurality of candidate search contents, and performing word segmentation processing on the candidate search contents to obtain a plurality of index word segments;

an identification unit: the method comprises the steps of carrying out index synonym recognition on index segmentation to obtain an index set;

mapping unit: and the method is used for mapping the synonym set and the candidate search content to obtain the candidate search content and the synonym set with mapping relation.

In an embodiment, the identification unit comprises a content acquisition unit, a synonym expansion unit and a synonym identification unit, in particular:

(2) Query unit 302: the method comprises the steps of inquiring a plurality of preset index sets based on inquiry keywords, determining a target index set containing first inquiry keywords and second inquiry keywords which do not inquire corresponding index sets, wherein the first inquiry keywords and the second inquiry keywords are contained in the inquiry keywords.

(3) Screening unit 303: for filtering desired search contents of a query text from candidate search contents having a correspondence with a target index set.

In an embodiment, the filtering unit 303 may include a data statistics subunit, a selection subunit, and a content determination subunit, specifically:

a data statistics subunit: the method comprises the steps of carrying out data statistics on candidate search contents to obtain queried times of each candidate search content;

selecting a subunit: selecting target search content from the candidate search content according to the queried times;

content determination subunit: for determining desired search content from among the target search content according to a content selection operation for the target search content.

(4) The generation unit 304: and the method is used for calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain the aggregate feedback data.

In an embodiment, the generating unit 304 may include a word segmentation subunit, a combination subunit, a calculation subunit, and data as subunits, specifically:

word segmentation subunit: the method comprises the steps of performing word segmentation processing on expected search content to obtain at least one content word;

combining subunits: for each second query keyword, combining the second query keyword with each content word segment to obtain a plurality of synonymous word groups associated with each second keyword;

data as subunits: and the similarity of the synonymous phrase and the synonymous phrase is used as positive feedback data.

(5) The updating unit 305: and updating the second query keyword to a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set, so that candidate search content matched with the updated index set is searched through the second query keyword.

In an embodiment, the update unit 305 may include a phrase as a subunit and an add subunit, specifically: the phrase is used as a subunit: the synonymous phrase is used for aiming at each second query keyword, and the synonymous phrase with similarity meeting the preset condition is used as a target synonymous phrase of the second query keyword;

adding subunits: and the second query keyword is added to the target index set where the content word is located in the target synonymous word group, so that an updated index set is obtained.

In an embodiment, the adding subunit may include a phrase obtaining module, a statistics module, a word segmentation determining module, and a set updating module, specifically:

Phrase acquisition module: the historical target synonymous phrase is used for acquiring the second query keyword;

and a statistics module: the method comprises the steps of performing data statistics on target synonymous phrases and historical target synonymous phrases to obtain a statistical result;

the word segmentation determining module: the method comprises the steps of determining target content word segmentation from content word segmentation contained in target synonymous word groups and historical target synonymous word groups according to statistical results;

a set updating module: and the second query keyword is added to the target index set where the target content word is located, so that an updated index set is obtained.

As can be seen from the above, the content search device according to the embodiment of the present application acquires a query text through the acquisition unit 301, where the query text includes at least two query keywords; querying, by the querying unit 302, a plurality of preset index sets based on the query keywords, and determining a target index set including a first query keyword and a second query keyword for which no corresponding index set is queried, where the first query keyword and the second query keyword are included in the query keyword; screening desired search contents of the query text from candidate search contents having a correspondence with the target index set by a screening unit 303; calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content by the generating unit 304 to obtain set feedback data; finally, the second query keyword is updated to the target index set corresponding to the desired search content according to the set feedback data by the updating unit 305, so as to obtain an updated index set, and candidate search content matched with the updated index set is searched through the second query keyword.

The embodiment of the present application further provides a computer device, which may be a terminal or a server, as shown in fig. 7, which shows a schematic structural diagram of the computer device according to the embodiment of the present application, specifically:

the computer device may include one or more processors 1001 of a processing core, one or more memories 1002 of a computer readable storage medium, a power supply 1003, and an input unit 1004, among other components. Those skilled in the art will appreciate that the computer device structure shown in FIG. 7 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components. Wherein:

The processor 1001 is a control center of the computer device, connects respective portions of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 1002 and calling data stored in the memory 1002, thereby performing overall monitoring of the computer device. Optionally, the processor 1001 may include one or more processing cores; preferably, the processor 1001 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, a computer program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 1001.

The memory 1002 may be used to store software programs and modules, and the processor 1001 executes various functional applications and data processing by executing the software programs and modules stored in the memory 1002. The memory 1002 may mainly include a stored program area that may store an operating system, computer programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a stored data area; the storage data area may store data created according to the use of the computer device, etc. In addition, memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 1002 may also include a memory controller to provide the processor 1001 with access to the memory 1002.

The computer device also includes a power supply 1003 for powering the various components, preferably, the power supply 1003 is logically connected to the processor 1001 by a power management system, such that charge, discharge, and power consumption management functions are performed by the power management system. The power supply 1003 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The computer device may also include an input unit 1004, which input unit 1004 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 1001 in the computer device loads executable files corresponding to the processes of one or more computer programs into the memory 1002 according to the following instructions, and the processor 1001 executes the computer programs stored in the memory 1002, so as to implement various functions, as follows:

inquiring a plurality of preset index sets based on the inquiry keywords, and determining a target index set containing a first inquiry keyword and a second inquiry keyword which does not inquire the corresponding index set, wherein the first inquiry keyword and the second inquiry keyword are contained in the inquiry keyword;

calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain aggregate feedback data;

and updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword. The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.

As can be seen from the above, the computer device in the embodiments of the present application may obtain a query text, where the query text includes at least two query keywords; inquiring a plurality of preset index sets based on the inquiry keywords, and determining a target index set containing a first inquiry keyword and a second inquiry keyword which does not inquire the corresponding index set, wherein the first inquiry keyword and the second inquiry keyword are contained in the inquiry keyword; screening expected search contents of the query text from candidate search contents with corresponding relations with the target index set; calculating the similarity between the second query keyword and the content segmentation word contained in the expected search content to obtain aggregate feedback data; and updating the second query keyword into a target index set corresponding to the expected search content according to the set feedback data to obtain an updated index set so as to search candidate search content matched with the updated index set through the second query keyword.

According to one aspect of the present application, there is provided a computer program product comprising a computer program containing computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations of the above embodiments.

It will be appreciated by those of ordinary skill in the art that all or part of the steps of the various methods of the above embodiments may be performed by a computer program, or by computer program control related hardware, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a computer program that can be loaded by a processor to perform any of the content search methods provided by embodiments of the present application.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Since the computer program stored in the computer readable storage medium can execute any content searching method provided in the embodiments of the present application, the beneficial effects that any content searching method provided in the embodiments of the present application can achieve are detailed in the previous embodiments, and are not described herein.

The foregoing has described in detail the methods, apparatuses, computer devices and computer readable storage medium for searching content provided by the embodiments of the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, where the above description of the embodiments is only for aiding in understanding the methods and core ideas of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A content search method, comprising:

2. The method of claim 1, wherein the calculating the similarity between the second query keyword and the content word included in the desired search content to obtain the aggregate feedback data includes:

combining the second keywords with each content word segment aiming at each second query keyword to obtain a plurality of synonymous word groups associated with the second query keywords;

calculating the similarity between the second keyword and the content word segmentation contained in the synonymous word group to obtain the similarity of the synonymous word group;

and taking the similarity of the synonymous phrase and the synonymous phrase as the aggregate feedback data.

3. The method of claim 1, wherein updating the second query keyword to the target index set corresponding to the desired search content according to the set feedback data, to obtain an updated index set, comprises:

aiming at each second query keyword, taking the synonymous phrase with the similarity meeting a preset condition as a target synonymous phrase of the second query keyword;

and adding the second query keyword into a target index set where the content word segmentation in the target synonymous word group is located, so as to obtain the updated index set.

4. The method of claim 3, wherein the adding the second query keyword to the target index set where the content word segment in the target synonymous word group is located, to obtain the updated index set includes:

Acquiring a historical target synonymous phrase of the second query keyword;

and adding the second query keyword to a period target index set where the target content word is located, so as to obtain the updated index set.

5. The method of claim 1, wherein the screening the desired search content of the query text from the candidate search content having a correspondence with the target index set comprises:

and determining the expected search content from the target search content according to the content selection operation aiming at the target search content.

6. The method according to any one of claims 1-5, wherein the querying a plurality of preset index sets based on the query keyword determines a target index set including a first query keyword, and before a second query keyword corresponding to the index set is not queried, the method further comprises:

and mapping the index set and the candidate search content to obtain the candidate search content and the index set with mapping relation.

7. The method of claim 6, wherein the identifying the synonym for the index word to obtain the index set comprises:

acquiring composition content related to the candidate search content;

and carrying out synonym recognition on the index word according to the extended content set to obtain an index set.

8. A content search apparatus, comprising:

9. A computer device comprising a memory and a processor; the memory stores a computer program, and the processor is configured to execute the computer program in the memory to perform the content search method according to any one of claims 1 to 7.

10. A computer readable storage medium for storing a computer program, the computer program being loaded by a processor to perform the content search method of any one of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the content search method of any one of claims 1 to 7.