WO2023020506A1

WO2023020506A1 - Search method with diversified and equalized search results, and computer device

Info

Publication number: WO2023020506A1
Application number: PCT/CN2022/112863
Authority: WO
Inventors: 包伟
Original assignee: 深圳市世强元件网络有限公司
Priority date: 2021-08-16
Filing date: 2022-08-16
Publication date: 2023-02-23
Also published as: CN113779433A

Abstract

The present invention relates to a search method with diversified and equalized search results, and a computer device. The method comprises the following steps: S1, establishing an industry lexicon, wherein the industry lexicon comprises a plurality of professional industry vocabularies; and converting various types of original data models into preset data models; S2, receiving search content input by a user, and extracting, according to the industry lexicon, a search keyword from the search content; and S3, retrieving all the preset data models by using the search keyword, calculating the total weight value of each preset data model in search results, and sorting the search results according to the total weight values. By means of the present invention, various types of original data models are uniformly converted into preset data models, so as to avoid affecting a search due to expression forms of data types, such that search results are more diversified and equalized.

Description

A search method and computer equipment for diversification and equalization of search results

technical field

The invention relates to the field of search, and more specifically, to a search method and computer equipment for diversification and equalization of search results.

Background technique

Search technology is a commonly used technology on the Internet, and users find target content by inputting search content. Most of the existing search technologies only consider the correlation between the search content and the target content. For example, the higher the number of occurrences, the higher the correlation, and they are sorted and displayed according to the high or low correlation. This search method does not consider the variety of target content, resulting in Some types of target content are rarely displayed, while some types of target content are displayed too much, and the search results are not diversified and balanced enough.

technical problem

The technical problem to be solved by the present invention is to provide a search method and computer equipment for diversification and equalization of search results in view of the above-mentioned defects of the prior art.

technical solution

The technical solution adopted by the present invention to solve the technical problem is: to construct a search method for diversification and equalization of search results, comprising the following steps:

S1. Establish an industry thesaurus, which includes a plurality of industry professional vocabularies; convert various types of original data models into preset data models;

S2. Receive the search content input by the user, and extract search keywords from the search content according to the industry thesaurus;

S3. Retrieve all the preset data models by using the search keywords, calculate the total weight value of each preset data model in the search results, and sort the search results according to the total weight values.

Further, in the search method for diversifying and equalizing search results according to the present invention, the preset data model includes content title, content abstract, text, keywords and content type.

Further, in the method for diversifying and equalizing search results according to the present invention, converting various types of original data models into preset data models in the step S1 includes:

Convert various types of original data models into preset data models and set the weight values of each part of the preset data models, wherein the weight value of the keywords is greater than the weight value of the content title, and the weight value of the content title The value is greater than the weight value of the content abstract, and the weight value of the content abstract is greater than the weight value of the text.

Further, in the method for diversifying and equalizing the search results of the present invention, the calculation of the total weight value of each of the preset data models in the search results in the step S3 includes: separately calculating the search keywords in the The sub-weight value of the content title, content abstract, text, keywords and content type, and the total weight value is obtained from all the sub-weight values.

Further, in the method for diversifying and equalizing the search results of the present invention, when calculating the weighted values of the search keywords in the content title, content abstract, text, keywords and content types, the score The weight value is positively correlated with the number of occurrences of the search keyword.

Further, in the method for diversifying and equalizing the search results of the present invention, after the step S3, it also includes:

S4. According to the distribution of each type of data model corresponding to the preset data model in the search result, adjust the weight value of each part of the preset data model, so that the distribution of each type in the search result is balanced.

Further, in the search method for diversifying and equalizing search results according to the present invention, using the search keyword to retrieve all the preset data models in the step S3 includes:

S31. Classify all the preset data models according to the classification standard;

S32. Counting the total number of the preset data types in each category, and dividing the categories with the same total number into the same group;

S33. Retrieve all the preset data models in each group by using the search keyword.

Further, in the search method for diversification and equalization of search results according to the present invention, after the step S33, it further includes: making each group generate a preset number of preset data models.

Further, in the search method for diversification and equalization of search results according to the present invention, the preset quantity corresponding to each group is positively correlated with the total number of the group.

In addition, the present invention also provides a computer device, including a memory and a processor, and the processor is communicatively connected to the memory. The memory is used to store computer programs; the processor is used to execute the computer programs stored in the memory to realize the search method for diversification and equalization of search results as described above.

Beneficial effect

Implementing a search method and computer equipment for diversification and balance of search results of the present invention has the following beneficial effects: the present invention uniformly transforms various types of original data models into preset data models, avoiding the impact of search due to the expression of data types, making Search results are more diverse and balanced.

Description of drawings

The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

FIG. 1 is a flow chart of a search method for diversification and equalization of search results provided by an embodiment of the present invention;

FIG. 2 is a flow chart of a search method for diversifying and equalizing search results provided by an embodiment of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

In order to have a clearer understanding of the technical features, purposes and effects of the present invention, the specific implementation manners of the present invention will now be described in detail with reference to the accompanying drawings.

In a preferred embodiment, with reference to FIG. 1, the search method for diversification and equalization of search results in this embodiment includes the following steps:

S1. Establish an industry thesaurus, which includes multiple industry professional vocabularies; convert various types of original data models into preset data models. Specifically, industry professional vocabulary refers to professional terms used in a certain industry. The professional terms are different from everyday expressions and are proper nouns with exclusive meanings in the industry. Setting up an industry thesaurus is beneficial for scientific word segmentation of search content entered by users, thereby improving search professionalism and accuracy. There can be one or more industry thesauruses. When there are multiple industry thesauruses, the industry thesauruses are classified according to the content type to form multiple industry thesauruses of different categories, that is, the industry professional vocabulary contained in each industry thesaurus for the same vocabulary.

In the existing technology, the original data model is used in the search, that is, the original format of the original data is kept and the search is performed directly. Because various original data models vary widely, various original data models are not on the "same starting line", which will lead to certain types of data in the search results. Too many are displayed, some types are displayed too little, and some types are not displayed at all, and the search results are not diversified and balanced enough. For example, the original data models of news, movies, songs, encyclopedias and variety shows all contain "Andy Lau". If there are too many "Andy Lau" keywords, the search results will basically be news, and there will be few movies, songs, encyclopedias, and variety shows, especially movies and songs. Due to the limitation of data types, it is rare to directly Appearing in the search results on the home page, the search results that users see are relatively single, not diversified and balanced enough. In order to solve the problem of insufficient diversity and balance of search results caused by differences in data models, this embodiment converts various types of original data models into preset data models, and after conversion, all original data models have a unified data model, so that all preset The data models have a "same starting line", and all preset data models have a more balanced probability of being searched when they are retrieved, thus making the search results more diversified and balanced. Alternatively, the industry thesaurus and all converted preset data models are stored on the server.

S2. Receive the search content input by the user, and extract search keywords from the search content according to the industry thesaurus. Specifically, the user inputs search content in the search box, and the search content is uploaded to the server through the network, and the server divides the search content into words according to the industry professional vocabulary in the industry thesaurus, and extracts the search keywords corresponding to the search content. For example, the search content is "epson S1C17801 mcu "data booklet", identify the word segmentation results according to the industry thesaurus: "epson" is the brand word, "S1C17801" is the model word, "mcu" is the category word, "data booklet" is the resource word, then the extracted search keywords They are: "epson", "S1C17801", "mcu", "data booklet". Alternatively, if the industry thesaurus does not cover the search content, the basic language structure can be used to extract the search keywords, that is, the subject-verb-object complement Language structure to parse the search content to get search keywords.

S3. Use the search keywords to retrieve all preset data models in the institute, calculate the total weight value of each preset data model in the search results, and sort the search results according to the total weight values. Specifically, if the search content contains only one search keyword, use the search keyword to retrieve all preset data models, calculate the total weight value of each preset data model in the search results, and sort the search results according to the total weight value . If the search content contains at least two search keywords, first use one search keyword to retrieve all preset data models to obtain the first search result; then use another search keyword to search in the first search structure to obtain the second Search results; and so on, until all search keywords are searched. After the search is completed, the total weight value of each preset data model in the search results is calculated, and the search results are sorted according to the total weight value. After the server completes the sorting, the search results are sorted Send it to the user terminal for display. It can be understood that the search result delivered by the server to the user terminal is not a preset data model, but an original data model corresponding to the preset data model.

In this embodiment, various types of original data models are uniformly transformed into preset data models, so as to avoid affecting the search due to the expression form of the data type, and make the search results more diversified and balanced.

In the search method for diversification and balance of search results in some embodiments, the preset data model includes content title, content abstract, text, keywords and content type. When converting various types of original data models into preset data models, regardless of the original Whether the data model has content title, content abstract, body text, keywords and content type, the converted preset data model has content title, content abstract, body text, keywords and content type. For example, a song file usually only has song title and artist information, but no content summary and text. At this time, song lyrics can be used as content summary and content text to complete the conversion. In this embodiment, various types of original data models are uniformly transformed into preset data models, so as to avoid affecting the search due to the expression form of the data type, and make the search results more diversified and balanced.

In the search method for diversifying and equalizing search results in some embodiments, converting various types of original data models into preset data models in step S1 includes: converting various types of original data models into preset data models and setting the preset data models The weight value of each part of the content, wherein the weight value of the keyword is greater than the weight value of the content title, the weight value of the content title is greater than the weight value of the content abstract, and the weight value of the content abstract is greater than the weight value of the text. Correspondingly, the calculation of the total weight value of each preset data model in the search results in step S3 includes: separately calculating the sub-weight values of the search keywords in the content title, content abstract, text, keywords and content types, and all sub-weights value to get the total weight value. Alternatively, all sub-weight values are summed directly to obtain the total weight value. In addition, when calculating the weighted values of search keywords in content titles, content abstracts, texts, keywords, and content types, the weighted values are positively correlated with the number of occurrences of search keywords, that is, the search keywords appear in a certain part The more times, the greater the score weight it gets in this part. In this embodiment, differences between original data models are balanced through weight configuration and unified preset data models, so that search results are more diversified and balanced.

In some embodiments, the search method for diversification and equalization of search results, referring to FIG. 2 , after step S3, further includes: S4, adjusting each preset data model according to the distribution of each type of data model corresponding to the preset data model in the search results. The weighting value for some content so that the types are evenly distributed in the search results. Among them, the distribution of each type of data model corresponding to the preset data model in the search results refers to whether each type of data model corresponds to the preset data model in the preset ranking number (the search result shows the home page), if each type of data model If the corresponding preset data models appear in the preset ranking numbers, it means that the existing weight value setting is relatively reasonable; if one or several types of data models corresponding to the preset data models do not appear in the search results of the preset ranking numbers In , it means that the existing weight value setting is unreasonable, and the diversification and balance of search results cannot be realized. It is necessary to adjust the weight value of each part of the preset data model to make the distribution of various types in the search results balanced.

Furthermore, the distribution of each type of data model corresponding to the preset data model in the search results refers to the proportion of each type of data model corresponding to the preset data model in the number of preset rankings (the search results display the home page). If each type of data If the proportion of the model corresponding to the preset data model in the preset ranking quantity is balanced, it means that the existing weight value setting is relatively reasonable; if one or several types of data models correspond to the preset data model in the preset ranking quantity If the proportion is too low or too high to realize the diversification and balance of search results, it is necessary to adjust the weight value of each part of the preset data model to balance the distribution of various types in the search results.

In this embodiment, the weight value of each part of the content of the preset data model is adjusted through the feedback of search results, and the setting of the weight value of each part of the content of the preset data model is continuously optimized, so that the search results are more diversified and balanced.

In the search method for diversification and equalization of search results in some embodiments, using the search keyword to retrieve all preset data models in step S3 includes:

S31. Classify all preset data models according to the classification standard. Classification standards can be flexibly selected according to user needs, such as manufacturers, processors, resources, etc., and news, film and television, songs, encyclopedias, and variety shows, etc.

S32. Counting the total number of preset data types in each category, and classifying the categories with the same total number into the same group. The same total number means that the total number is within the same preset number range. For example, some kinds of preset data types are more than 10 million, some kinds of preset data types are between 5 million and 10 million, some kinds of preset data types are between 1 million and 5 million, and some kinds of preset It is assumed that the data type is between 500,000 and 1 million, some types of preset data types are between 100,000 and 500,000, and some types of default data types are below 100,000. Correspondingly, if the total number of Type A and Type B is 6.5 million and 8.5 million respectively, then Type A and Type B form a group; the total number of Type C and Type D is 650,000 and 850,000 respectively, then Type C and Type D are One group; the total number of types E and F is 60,000 and 80,000 respectively, then types E and F form a group.

S33. Retrieve all preset data models in each group by using a search keyword. Specifically, all the preset data models in each group are retrieved by using the search keywords respectively, and the search results of the search keywords in this group are obtained. In order to balance the distribution of various types in the search results, it is necessary to ensure that each group has a preset number of rankings (the search results display the home page), and each group is required to generate a preset number of preset data models, and each group The corresponding preset quantity is positively related to the total number of the group. That is to say, the more the total number of the group, the more it occupies in the preset number of rankings (the search results display the home page), so that it can not only ensure that each type is displayed, but also ensure that the total number of displayed more The more preset data models, the more diverse and balanced the search results.

In this embodiment, groups are grouped according to the quantity level, and searches are performed in each group separately, so as to ensure that each group has a preset data model output, so that the search results are more diversified and balanced.

In a preferred embodiment, the computer device in this embodiment includes a memory and a processor, and the processor is communicatively connected to the memory. The memory is used to store computer programs; the processor is used to execute the computer programs stored in the memory to implement the search method for diversification and equalization of search results as in the above embodiments. Alternatively, the computer device is a server. The computer device in this embodiment uniformly transforms various types of original data models into preset data models, so as to avoid affecting the search due to the expression of the data type, and make the search results more diversified and balanced.

Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.

The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

The above embodiments are only to illustrate the technical conception and characteristics of the present invention. The purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly, and cannot limit the protection scope of the present invention. All equivalent changes and modifications made in accordance with the scope of the claims of the present invention shall fall within the scope of the claims of the present invention.

Claims

A search method for diversification and equalization of search results, characterized in that it comprises the following steps:

S1. Establish an industry thesaurus, which includes a plurality of industry professional vocabularies; convert various types of original data models into preset data models;

S2. Receive the search content input by the user, and extract search keywords from the search content according to the industry thesaurus;

S3. Use the search keyword to search all the preset data models, calculate the total weight value of each preset data model in the search results, and sort the search results according to the total weight values.
The search method for diversification and equalization of search results according to claim 1, wherein the preset data model includes content title, content summary, text, keywords and content type.
The search method for diversifying and equalizing search results according to claim 2, wherein converting various types of original data models into preset data models in the step S1 includes:

Convert various types of original data models into preset data models and set the weight values of each part of the preset data models, wherein the weight value of the keywords is greater than the weight value of the content title, and the weight value of the content title The value is greater than the weight value of the content abstract, and the weight value of the content abstract is greater than the weight value of the text.
The search method for diversification and equalization of search results according to claim 3, wherein the calculation of the total weight value of each of the preset data models in the search results in the step S3 includes: calculating the search keywords respectively In the sub-weight values of the content title, content abstract, text, keywords and content type, a total weight value is obtained from all the sub-weight values.
The search method for diversifying and equalizing search results according to claim 4, wherein when calculating the weighted values of the search keywords in the content title, content abstract, text, keywords and content types, the The score weight value is positively correlated with the number of occurrences of the search keyword.
The search method for diversification and equalization of search results according to claim 3, further comprising:

S4. According to the distribution of each type of data model corresponding to the preset data model in the search result, adjust the weight value of each part of the preset data model, so that the distribution of each type in the search result is balanced.
The search method for diversification and equalization of search results according to claim 1, characterized in that, using the search keywords in step S3 to retrieve all the preset data models includes:

S31. Classify all the preset data models according to the classification standard;

S32. Counting the total number of the preset data types in each category, and dividing the categories with the same total number into the same group;

S33. Retrieve all the preset data models in each group by using the search keyword.
The search method for diversification and equalization of search results according to claim 7, further comprising: making each group generate a preset number of preset data models after the step S33.
The search method for diversification and equalization of search results according to claim 8, wherein the preset quantity corresponding to each group is positively correlated with the total number of the group.
A computer device, characterized in that it includes a memory and a processor, and the processor is communicatively connected to the memory;

The memory is used to store computer programs;

The processor is configured to execute the computer program stored in the memory to realize the search method for diversification and equalization of search results according to any one of claims 1 to 9.