CN113204697A - Searching method, searching device, electronic equipment and storage medium - Google Patents

Searching method, searching device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113204697A
CN113204697A CN202110476131.7A CN202110476131A CN113204697A CN 113204697 A CN113204697 A CN 113204697A CN 202110476131 A CN202110476131 A CN 202110476131A CN 113204697 A CN113204697 A CN 113204697A
Authority
CN
China
Prior art keywords
search
search result
statement
result
relevance score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110476131.7A
Other languages
Chinese (zh)
Inventor
李刚强
侯志明
张熠
史忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuba Co Ltd
Original Assignee
Wuba Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuba Co Ltd filed Critical Wuba Co Ltd
Priority to CN202110476131.7A priority Critical patent/CN113204697A/en
Publication of CN113204697A publication Critical patent/CN113204697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a searching method, a searching device, electronic equipment and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring a search statement; obtaining at least one first search result according to the search statement; determining a relevance score for the search statement and each of the first search results; selecting a second search result from the first search results, wherein the second search result is the first search result with the relevance score larger than a first threshold value; determining a return parameter of the second search result, wherein the return parameter is used for representing the probability that the second search result is executed with a preset operation; and selecting at least one search result from the second search results according to the return parameters for displaying. Therefore, in the embodiment of the application, the accuracy of the returned search result can be improved.

Description

Searching method, searching device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a search method, an apparatus, an electronic device, and a storage medium.
Background
Search engines are generally classified into general search engines and vertical search engines. The general search engine integrates information on all websites into one platform to provide services. The vertical search engine is a professional search engine for a certain industry, and even an in-site search service of a certain company. Thus, for a search in a particular area of expertise, vertical search engines return more accurate search results than general search engines.
However, whether it is a general search engine or a vertical search engine, the searched results are only presented directly to the user at present, so that the search results do not meet the expectations of the user.
Therefore, the accuracy of the search results returned by the current search engine is low.
Disclosure of Invention
The embodiment of the application provides a searching method, a searching device, electronic equipment and a storage medium, and aims to solve the problem that the accuracy of a searching result returned by a current search engine is low.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a search method, where the method includes:
acquiring a search statement;
obtaining at least one first search result according to the search statement;
determining a relevance score for the search statement and each of the first search results;
selecting a second search result from the first search results, wherein the second search result is the first search result with the relevance score larger than a first threshold value;
determining a return parameter of the second search result, wherein the return parameter is used for representing the probability that the second search result is executed with a preset operation;
and selecting at least one search result from the second search results to display according to the return parameters.
In a second aspect, an embodiment of the present application additionally provides a display device for a cover picture, where the device includes:
the search sentence acquisition module is used for acquiring a search sentence;
a first result obtaining module, configured to obtain at least one first search result according to the search statement;
a score determining module for determining a relevance score of the search statement and each of the first search results;
the screening module is used for selecting a second search result from the first search results, wherein the second search result is the first search result with the relevance score larger than a first threshold value;
the parameter determination module is used for determining a return parameter of the second search result, wherein the return parameter is used for representing the probability that the second search result is executed with a preset operation;
and the display module is used for selecting at least one search result from the second search results to display according to the return parameters.
In a third aspect, an embodiment of the present application additionally provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the search method as set forth in the preceding first aspect.
In a fourth aspect, the present embodiments additionally provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the search method according to the first aspect.
In the embodiment of the application, a search statement can be obtained, at least one first search result is obtained according to the search statement, so that a relevance score of the search statement and each first search result is determined, the first search result with the relevance score larger than a first threshold is selected to serve as a second search result, then a return parameter of the second search result is determined, and at least one search result is selected from the second search result according to the return parameter and is displayed. The return parameter is used for representing the probability that the second search result is executed with the preset operation, namely the return parameter represents the interest degree of the user in the second search result. Therefore, according to the embodiment of the application, the search results with higher relevance scores and interest of the user can be selected for display, so that the user can view the search results which accord with the search purpose and are interested by the user, and the accuracy of the returned search results is improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of steps of a search method provided by an embodiment of the present application;
fig. 2 is a schematic structural and schematic diagram of a specific implementation of a search apparatus according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating steps performed by the intent recognition layer in an embodiment of the present application;
FIG. 4 is a schematic diagram of a training process of a probabilistic predictive model in an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating an iterative update process of a probability prediction model in an embodiment of the present application;
fig. 6 is a block diagram of a display device of a cover picture according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The searching method of the embodiment of the application can be operated on the terminal equipment or the server. The terminal device may be a local terminal device. When the method operates as a server, it can be presented as a cloud.
In an optional embodiment, the cloud presentation refers to an information presentation manner based on cloud computing. In the cloud display operation mode, an operation main body and an information picture presentation main body of an information processing program are separated, storage and operation of a display switching method are completed on a cloud display server, and a cloud display client is used for receiving and sending data and presenting an information picture, for example, the cloud display client can be a display device with a data transmission function close to a user side, such as a mobile terminal, a television, a computer, a palm computer and the like; however, the terminal device for processing the information data is a cloud display server at the cloud end. When browsing, a user operates the cloud display client to send an operation instruction to the cloud display server, the cloud display server performs coding compression on data according to operation instruction display information, returns the data to the cloud display client through a network, and finally decodes the data through the cloud display client and outputs a cell live-action picture and a landmark live-action picture.
In another alternative embodiment, the terminal device may be a local terminal device. The local terminal device stores an application program and is used for presenting an application interface. The local terminal device is used for interacting with a user through a graphical user interface, namely, downloading and installing an application program through the electronic device and running the application program conventionally. The manner in which the local terminal device provides the graphical user interface to the user may include a variety of ways, for example, it may be rendered for display on a display screen of the terminal or provided to the user by holographic projection. For example, the local terminal device may include a display screen for presenting a graphical user interface including an application screen and a processor for running the application, generating the graphical user interface, and controlling display of the graphical user interface on the display screen.
The application provides a searching method, which can select the searching result with higher relevance score to be displayed and display the searching result according to the predicted interest degree of the user to the second searching result, so that the user can preferentially check the interested searching result, and the accuracy of the returned searching result is improved.
Referring to fig. 1, a flow chart of steps of a search method in an embodiment of the present application is shown, and the method may include the following steps 101 to 107.
Step 101: and acquiring a search statement.
When the user inputs the text, the search sentence is the text content input by the user; when the user inputs in the form of a picture, the search sentence can be the name and/or text content of an object included in the input picture; when the user inputs the voice form, the search sentence is the text content included in the voice information input by the user.
Step 102: and acquiring at least one first search result according to the search statement.
In the embodiment of the application, after the search statement is acquired, at least one first search result matched with the search statement is searched.
Step 103: determining a relevance score for the search statement and each of the first search results.
Wherein the relevance score is used to indicate the relevance or degree of match of the search statement with the first search result. I.e., the greater the relevance score, the more relevant or matched the search statement is to the first search result; conversely, the smaller the relevance score, the less relevant or mismatched the search statement is to the first search result.
Optionally, the determining the relevance score of the search statement and each of the first search results includes: and calculating the relevance score of the search statement and each first search result by adopting a text similarity BM25 algorithm.
The BM25 is an algorithm for evaluating the relevance between the search term and the document, and therefore, in the embodiment of the present application, the BM25 algorithm may be used to calculate the relevance score of the search sentence with each first search result. It is to be understood that the algorithm for calculating the relevance score of the search sentence to each of the first search results is not limited to the BM 25.
Step 104: selecting a second search result from the first search results.
Wherein the second search result is the first search result with a relevance score greater than a first threshold. That is, in the embodiment of the present application, after determining the relevance score of the search statement and the first search result, the first search result with the relevance score greater than the first threshold needs to be selected as the second search result.
Step 105: determining a return parameter of the second search result.
The return parameter is used for representing the probability that the second search result is executed with a preset operation; i.e. the return parameter indicates the user's level of interest in the second search result. Therefore, the larger the return parameter is, the greater the probability that the second search result is performed with the preset operation is, that is, the greater the user is interested in the second search result; conversely, the smaller the return parameter is, the smaller the probability that the second search result is performed with the preset operation is, that is, the smaller the user is interested in the second search result.
In addition, the preset operation may be one of a click operation, a drag operation, and a long press operation. For example, a user clicking on a search result, dragging a displayed search result to a preset position (e.g., the right edge of the display screen), or pressing a display position of a search result on the display screen for more than a preset time (e.g., 2 seconds), this indicates that the user is interested in the search result and needs to view the search result.
Step 106: and selecting at least one search result from the second search results to display according to the return parameters.
Optionally, step 106 includes: sorting the second search results according to the sequence of the return parameters from large to small to obtain target sorting; and selecting the second search results of the preset number in the target sequence for display.
Alternatively, step 106 may include: and selecting and displaying a second search result with the return parameter larger than the fourth threshold value.
Therefore, in the embodiment of the application, the second search result required to be displayed can be selected according to the size of the return parameter, so that the user can preferentially view the second search result with the larger return parameter, namely the user can view the second search result which is interested by the user.
Optionally, in step 106, when the selected second search result is displayed, the second search result may be displayed in an order from a large return parameter to a small return parameter, so that the user may preferentially see the search result that the user is interested in, and the click rate of the search result may be further improved.
As can be seen from the foregoing steps 101 to 106, in the embodiment of the present application, a search statement is obtained, according to the search statement, at least one first search result is obtained, so that a relevance score of the search statement and each first search result is determined, the first search result with the relevance score larger than a first threshold is selected to serve as a second search result, then a return parameter of the second search result is determined, and at least one search result is selected from the second search result according to the return parameter and is displayed. The return parameter is used for representing the probability that the second search result is executed with the preset operation, namely the return parameter represents the interest degree of the user in the second search result. Therefore, according to the embodiment of the application, the search results with higher relevance scores and interest of the user can be selected for display, so that the user can view the search results which accord with the search purpose and are interested by the user, and the accuracy of the returned search results is improved.
Optionally, the obtaining at least one first search result according to the search statement includes:
acquiring a target word set, wherein the target word set comprises at least one of words in the search sentences, synonyms of the words in the search sentences, keywords in the search sentences and rewriting words of the words in the search sentences;
and respectively acquiring a search result matched with each word in the target word set as the first search result.
The rewritten words of the words in the search sentence are words after the words in the search sentence are rewritten according to the meaning expressed by the search sentence. That is, the rewritten word of the word in the search sentence is determined from the meaning expressed by the search sentence, and therefore, there may be a different word in the rewritten word of the word in the search sentence from the synonym of the word in the search sentence.
Therefore, in the embodiment of the application, after the search sentence is acquired, the words in the search sentence may be extracted, the synonyms of the words in the search sentence may be acquired, the keywords in the search sentence may be determined, at least some of the words in the search sentence may be rewritten based on the meaning expressed by the search sentence, and the rewritten words may be acquired, so that at least some of the words may constitute one target word set, and further, the search result matching each of the target words may be acquired separately as the first search result.
Therefore, according to the embodiment of the application, the search statement can be processed from multiple angles, the terms associated with the search statement are extracted, and further more first search results matched with the search statement can be obtained according to the terms, that is, more diversified first search results can be obtained.
As can be seen from the above, in the embodiments of the present application, intent recognition may be performed on a search statement, so as to obtain a target word set that may represent a search intent of a user. That is, embodiments of the present application may obtain a first search result matching a search term from a search intention expressed by the search term from a word.
Optionally, before the obtaining the search result matching with each word in the target word set as the first search result, the method further includes:
and carrying out de-duplication processing on the words in the target word set.
After the duplication removing processing is carried out on the words in the target words, the same word can be prevented from being subjected to the search result matched with the word for many times, the search process is simplified, and the search time is saved.
Optionally, the obtaining the target word set includes:
processing the search sentence into a preset font format (for example, performing full half-angle conversion, performing complex and simple conversion, and performing wrong word replacement) to obtain a candidate sentence;
performing word segmentation on the candidate sentences (for example, performing text word segmentation according to a pre-established word library, performing part-of-speech tagging, and further removing stop words) to obtain candidate words;
when the total length of the characters included in the candidate word is greater than a third threshold, performing at least one of the following steps H1-H4;
when the total length of the characters included in the candidate word is less than or equal to a third threshold value, removing at least one of an adjective and an adverb in the candidate word, then performing at least one of the following steps H1-H3, and taking a word and/or a candidate word obtained after performing at least one of the steps H1-H3 as a word in a target word set;
step H1: obtaining synonyms of the candidate words;
step H2: extracting keywords from the candidate words;
step H3: and rewriting the candidate words to obtain rewritten words.
It should be noted here that the above process of removing adjectives and/or adverbs in the candidate words is mainly to eliminate interfering terms in the long search sentence.
Optionally, the obtaining at least one first search result according to the search statement further includes:
determining a target type to which the search statement belongs;
and searching the target content belonging to the target type in the index database according to the target content stored in the pre-established index database and the type of the target content, and taking the target content as the first search result.
Therefore, in the embodiment of the application, an index library is established in advance, the target content and the type of the target content are stored in the index library, so that the target type of the search statement can be determined, and then the target content belonging to the target type in the index library is used as the first search result.
That is, in the embodiments of the present application, the first search result matched with the search statement may be obtained based on the type to which the search statement belongs.
Wherein the target content comprises at least one of a video file, an audio file and a text file. For example, when the target content includes a video file, the type of the target content (e.g., a movie video, an article sales video, a video shot by a user, etc.) can be determined according to the video content of the video file; when the target content comprises an audio file, the type (such as light pine class and sad class) of the target content can be determined according to the body expressed by the audio file; when the target content comprises a text file, the type of the target content can be determined according to the keywords of the text file.
Optionally, the obtaining at least one first search result according to the search statement further includes:
obtaining a semantic vector of the search statement;
and searching the target content of which the distance from the semantic vector of the search statement is smaller than a second threshold value according to the target content stored in a pre-established index library and the semantic vector of the target content to serve as the first search result.
Therefore, in the embodiment of the application, an index library is established in advance, and the target content and the semantic vector of the target content are stored in the index library, so that the semantic vector of the search statement can be acquired, then the target content of which the distance from the semantic vector of the search statement in the index library is smaller than a second threshold value is searched, that is, the distance between the semantic vector of the search statement and the semantic vector of the target content in the index library can be calculated, and the target content corresponding to the distance smaller than the second threshold value is selected to serve as the first search result.
That is, the embodiment of the application may obtain the first search result matched with the search term from the semantic vector of the search term.
As can be seen from the above, in the embodiment of the present application, text analysis may be performed on a search statement to obtain the target word set, to obtain a type to which the search statement belongs, and to obtain a semantic vector of the search statement, so that from multiple aspects, more first search results matched with the search statement may be obtained, and further, more search results interested by a user may be screened from the first search results.
Optionally, a probability prediction model is pre-established, and input data of the probability prediction model includes feature information of a user, a target word set of a search statement, and a relevance score between the search statement and a search result;
before the determining the return parameters of the second search result, the method further comprises:
acquiring feature information of a user associated with the search statement;
the determining the return parameters of the second search result comprises:
inputting the feature information of the user related to the search statement, the target word set of the search statement, and the relevance score of the search statement and the second search result into the probability prediction model, and outputting the return parameters of the second search result.
As can be seen from the above, in the embodiment of the present application, a machine learning algorithm is adopted in advance to train the training samples to obtain a probability prediction model, so that the probability prediction model can be used to obtain the return parameters of the second search result.
The training sample comprises characteristic information of a user associated with the search statement, a target word set of the search statement, a relevance score of the search statement and the search result, whether the search result is clicked, a display position of the search result when the search result is displayed (namely, a ranking when the search result is displayed), and a score of manually labeling the search result.
That is, a plurality of search sentences and search results searched by the search sentences may be collected in advance, so that the training samples are constructed according to the search sentences and the search results, and then the training samples are trained by using a machine learning algorithm to obtain the probability prediction model.
Optionally, the characteristic information of the user includes at least one of an age, a gender, and a geographic location of the user.
Optionally, after selecting at least one search result from the second search results to display according to the return parameter, the method further includes:
acquiring indication information of a third search result, wherein the third search result is the displayed second search result, and the indication information is used for indicating whether the third search result is executed with the preset operation;
and updating the probability prediction model according to the feature information of the user associated with the search statement, the target word set of the search statement, the relevance score of the search statement and the third search result, the display position of the third search result and the indication information.
In other words, in the embodiment of the application, after the third search result is displayed, indication information of whether the third search result is subjected to the preset operation may be further obtained, so that the probability prediction model is updated according to the indication information, the feature information of the user associated with the search statement, the target word set of the search statement, the relevance score between the search statement and the third search result, and the display position of the third search result, so that the output result of the updated probability prediction model is more accurate, that is, the probability of the output result being subjected to the preset operation is improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.
Referring to fig. 2, a block diagram of a search apparatus in an embodiment of the present application is shown, where the search apparatus 200 may include the following modules:
a search statement acquisition module 201, configured to acquire a search statement;
a first result obtaining module 202, configured to obtain at least one first search result according to the search statement;
a score determining module 203, configured to determine a relevance score of the search statement and each of the first search results;
a screening module 204, configured to select a second search result from the first search results, where the second search result is the first search result whose relevance score is greater than a first threshold;
a parameter determining module 205, configured to determine a return parameter of the second search result, where the return parameter is used to indicate a probability that the second search result is executed by a preset operation;
and a display module 206, configured to select and display one search result from the second search results according to the return parameter.
Optionally, the first result obtaining module 202 is specifically configured to:
acquiring a target word set of a search statement, wherein the target word set comprises at least one of words in the search statement, synonyms of the words in the search statement, keywords in the search statement and rewriting words of the words in the search statement;
and respectively acquiring a search result matched with each word in the target word set as the first search result.
Optionally, the first result obtaining module 202 is further configured to:
determining a target type to which the search statement belongs;
and searching the target content belonging to the target type in the index database according to the target content stored in the pre-established index database and the type of the target content, and taking the target content as the first search result.
Optionally, the first result obtaining module 202 is further configured to:
obtaining a semantic vector of the search statement;
and searching the target content of which the distance from the semantic vector of the search statement is smaller than a second threshold value according to the target content stored in a pre-established index library and the semantic vector of the target content to serve as the first search result.
Optionally, the score determining module 203 is specifically configured to:
and calculating the relevance score of the search statement and each first search result by adopting a BM25 algorithm.
Optionally, a probability prediction model is pre-established, and input data of the probability prediction model includes feature information of a user, a target word set of a search statement, and a relevance score between the search statement and a search result;
the device further comprises:
the user information acquisition module is used for acquiring the characteristic information of the user associated with the search statement;
the parameter determining module 205 is specifically configured to:
inputting the feature information of the user related to the search statement, the target word set of the search statement, and the relevance score of the search statement and the second search result into the probability prediction model, and outputting the return parameters of the second search result.
Optionally, the apparatus further comprises:
the indication information acquisition module is used for acquiring indication information of a third search result, wherein the third search result is the displayed second search result, and the indication information is used for indicating whether the third search result is executed with the preset operation or not;
and the updating module is used for updating the probability prediction model according to the feature information of the user related to the search statement, the target word set of the search statement, the relevance score of the search statement and the third search result, the display position of the third search result and the indication information.
Therefore, in the embodiment of the application, a search statement can be obtained, at least one first search result is obtained according to the search statement, so that the relevance score of the search statement and each first search result is determined, the first search result with the relevance score larger than a first threshold value is selected to serve as a second search result, then the return parameter of the second search result is determined, and at least one search result is selected from the second search result according to the return parameter and is displayed. The return parameter is used for representing the probability that the second search result is executed with the preset operation, namely the return parameter represents the interest degree of the user in the second search result. Therefore, according to the embodiment of the application, the search results with higher relevance scores and interest of the user can be selected for display, so that the user can view the search results which accord with the search purpose and are interested by the user, and the accuracy of the returned search results is improved.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
For example, the specific implementation of the search apparatus according to the embodiment of the present application may be as follows:
as shown in fig. 3, the search apparatus of the embodiment of the present application includes an intention identifying module, a recall module, a ranking module, an index building module, and a content library.
In a first aspect, a content library provides target content to an index building module.
And in the second aspect, the index building module is mainly responsible for processing target contents provided by the content library from the content side, building a text index, a tag index and a semantic index, and storing the built indexes into the index library, so that index query service is provided for the recall module.
Namely, the index building module builds an index relationship (namely text index) between the target content and the target word set of the target content, builds an index relationship (namely label index) between the target content and the type of the target content, and builds an index relationship (namely semantic index) between the target content and the semantic vector of the target content. The target word set comprises words related to the target content, synonyms of the words related to the target content and rewrites of the words related to the target content. The target content associated words include words in the textual information included in the target content.
In a second aspect, the intention recognition module is the processing module closest to the user input, and is mainly responsible for making a comprehensive understanding of the search sentence input by the user, finding the real intention of the user search, and preparing for the following recall module.
Specifically, as shown in fig. 4, the intention identifying module is configured to perform the following processes:
acquiring a search statement;
preprocessing the search sentences, namely performing full half-angle conversion, complex and simple conversion and wrong word replacement to obtain candidate sentences;
performing word segmentation on the candidate sentences, namely performing text word segmentation according to a pre-established word library, performing part-of-speech tagging, and further removing stop words to obtain candidate words;
when the total length of characters included in the candidate words is larger than a third threshold value, obtaining synonyms of the candidate words, keywords in the candidate words and rewrites of the candidate words;
when the total length of the characters included in the candidate words is smaller than or equal to a third threshold value, removing at least one of adjectives and adverbs in the candidate words, and then obtaining synonyms of the candidate words, keywords in the candidate words and candidate word rewrites;
the type of the candidate sentence is determined.
The candidate words obtained in the above process may be stored in the original word set, the synonyms of the candidate words may be stored in the synonym set, the keywords in the candidate words may be stored in the keyword set, the rewrites of the candidate words may be stored in the rewriter set, and the types of the candidate sentences may be stored in the classification tag set.
In addition, a set of classification labels, a set of original words, a set of synonyms, a set of keywords, and a set of rewritten words, which may be collectively referred to as a set of target words. The words in the target word set can be output to the recall module as data for user intention, so that the recall module can search for the first search results matching the words.
In a third aspect, the recall module screens out a batch of first search results to form a candidate content set by using user intention data output by the intention identification layer in a multi-way recall mode, and then screens out a batch of second search results from the candidate content set, so that the screened second search results are output to the sorting module for processing.
Specifically, as shown in fig. 3, the recall module is configured to perform the following processes:
executing a text recall process, namely searching target contents matched with the words from the index library according to the words in the original word set, the synonym set, the keyword set and the rewritten word set output by the intention identification module;
executing a tag recall process, namely searching target contents belonging to the type represented by the words from the index database according to the words in the classified tag set;
executing a semantic recall process, namely determining a semantic vector of the search statement, and then calculating the distance between the semantic vector of the search statement and the semantic vector of the target content in the index database, so as to select the target content corresponding to the distance smaller than a second threshold value;
and the target content obtained after the text recall process, the label recall process and the semantic recall process are executed is taken as a first search result matched with the search statement.
In addition, after the recall module obtains the first search results, the recall module may calculate the relevance scores of the search sentences and each of the first search results, so as to select the first search results with the relevance scores larger than the first threshold value, and return the first search results as second search results to the sorting module for processing.
In a fourth aspect, the sorting module sorts the second search results returned by the recall module, so as to select the second search results that best meet the user requirements, and displays the second search results to the user (i.e. the service implementing party).
Specifically, the sorting module may determine the probability that the second search result returned by the recall module is clicked by the user, so as to continue sorting the second search results according to the order of the probability from large to small, and further select and display the second search result N before ranking, where N is a positive integer.
Alternatively, it may be predetermined that the priorities of the second search results recalled by the different recalling processes are different, so that the sorting module may sort according to the priorities of the different recalling paths from the second search results. For example, the priority of the text recall process, the tag recall process and the semantic recall process is gradually reduced, the second search results can be sorted according to the priority order, and then the second search result M before the sorting is selected to be displayed, wherein M is a positive integer. The second search results from the same recall process may be randomly ordered or ordered according to the probability that the second search results are clicked by the user.
The probability prediction model is pre-established and used for determining the probability of the search result clicked by the user. Therefore, the probabilistic predictive model may also be referred to as a (click through rate (CTR) prediction model). The input of the CTR prediction model includes the feature information of the user, the target word set of the search statement, and the relevance score between the search statement and the search result, so that the ranking module inputs the feature information of the user associated with the search statement, the target word set of the search statement, and the relevance score between the search statement and the second search result into the probability prediction model, and the probability that the second search result is clicked by the user can be output.
Specifically, the training process of the CTR prediction model is as shown in fig. 5, that is, a training sample is constructed, and then the training sample is trained by using a machine learning algorithm. The training sample comprises characteristic information of a user associated with the search statement, a target word set of the search statement, a relevance score of the search statement and a search result, whether the search result is clicked, a display position where the search result is displayed and a score of manual annotation on the search result.
The training sample can be constructed in the following two ways:
the first method is as follows: relevant data are searched from the buried point days in a data warehouse (Hive) script mode, the obtained data set is sent to a sample construction program, namely information included in the training samples is extracted from the data set to obtain a plurality of training samples, and therefore the training samples are stored in a distributed file system (HDFS).
The second method comprises the following steps: and sending the buried point logs to a distributed publish-subscribe message system (kafka), reading search records from the buried point logs stored in the kafka by adopting an open source flow processing framework (flink), storing the search records to a relational database management system (MySQL), synchronously storing the search records to generate a Hive table, extracting the training samples from the search records stored in the Hive table, and storing the training samples in the HDFS.
In addition, after the training samples are trained through a machine learning algorithm to obtain the CTR prediction model, offline evaluation verification can be performed on the CTR prediction model obtained through training, and the verified CTR prediction model (namely the CTR prediction model with the AUC value reaching the preset value) can be applied to the actual searching process.
In addition, the CTR prediction model may be updated using an iterative closed-loop process as shown in fig. 6. After the CTR pre-estimation model obtained through the training process is applied to an actual searching process, a new training sample can be constructed based on data (including a search statement, a returned search result, a clicked search result, a display position of the clicked search result and characteristic information of a user inputting the search statement) generated in the searching process, and then the CTR pre-estimation model is trained according to the new training sample, namely the CTR pre-estimation model of a new version is obtained. The obtained CTR prediction model of the new version can be subjected to offline evaluation verification according to the AUC value, and the verified CTR prediction model (namely, the CTR prediction model with the AUC value reaching the preset value) can be applied to the actual searching process. After the newly-issued CTR estimation model is applied in the actual searching process, new data can be generated, and new training samples can be constructed according to the new data, so that the training process is repeatedly executed.
Here, AUC is a model evaluation index in the field of machine learning.
In summary, the embodiment of the application can acquire more first search results matched with the search statement from multiple aspects of the search statement, so that more search results interested by the user can be screened from the first search results, and the search recall rate can be further improved; in addition, according to the embodiment of the application, the probability that the recalled search result is clicked by the user can be predicted, and the search result is displayed in a sequencing mode according to the probability value, so that the user can preferentially view the interested search result, and the click rate of the search result can be improved; in addition, a new training sample is obtained by adopting data generated after the CTR estimation model is applied in the actual searching process, and the CTR estimation model is updated, so that the iteration efficiency can be improved.
An embodiment of the present application further provides an electronic device, including:
one or more processors; and
one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the electronic device to perform the methods of embodiments of the present application.
Embodiments of the present application also provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the methods of embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method and the device for displaying the cover picture provided by the application are introduced in detail, a specific example is applied in the text to explain the principle and the implementation of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of searching, the method comprising:
acquiring a search statement;
obtaining at least one first search result according to the search statement;
determining a relevance score for the search statement and each of the first search results;
selecting a second search result from the first search results, wherein the second search result is the first search result with the relevance score larger than a first threshold value;
determining a return parameter of the second search result, wherein the return parameter is used for representing the probability that the second search result is executed with a preset operation;
and selecting at least one search result from the second search results to display according to the return parameters.
2. The method according to claim 1, wherein the obtaining at least one first search result according to the search statement comprises:
acquiring a target word set of a search statement, wherein the target word set comprises at least one of words in the search statement, synonyms of the words in the search statement, keywords in the search statement and rewriting words of the words in the search statement;
and respectively acquiring a search result matched with each word in the target word set as the first search result.
3. The method according to claim 2, wherein the obtaining at least one first search result according to the search statement further comprises:
determining a target type to which the search statement belongs;
and searching the target content belonging to the target type in the index database according to the target content stored in the pre-established index database and the type of the target content, and taking the target content as the first search result.
4. The method according to claim 2 or 3, wherein the obtaining at least one first search result according to the search statement further comprises:
obtaining a semantic vector of the search statement;
and searching the target content of which the distance from the semantic vector of the search statement is smaller than a second threshold value according to the target content stored in a pre-established index library and the semantic vector of the target content to serve as the first search result.
5. The search method of claim 1, wherein said determining a relevance score for the search statement and each of the first search results comprises:
and calculating the relevance score of the search statement and each first search result by adopting a text similarity BM25 algorithm.
6. The search method according to claim 2, wherein a probabilistic predictive model is pre-established, and input data of the probabilistic predictive model includes feature information of a user, a target word set of a search sentence, and a relevance score of the search sentence and a search result;
before the determining the return parameters of the second search result, the method further comprises:
acquiring feature information of a user associated with the search statement;
the determining the return parameters of the second search result comprises:
inputting the feature information of the user related to the search statement, the target word set of the search statement, and the relevance score of the search statement and the second search result into the probability prediction model, and outputting the return parameters of the second search result.
7. The method according to claim 6, wherein after selecting at least one search result from the second search results for display according to the return parameter, the method further comprises:
acquiring indication information of a third search result, wherein the third search result is the displayed second search result, and the indication information is used for indicating whether the third search result is executed with the preset operation;
and updating the probability prediction model according to the feature information of the user associated with the search statement, the target word set of the search statement, the relevance score of the search statement and the third search result, the display position of the third search result and the indication information.
8. A search apparatus, characterized in that the apparatus comprises:
the search sentence acquisition module is used for acquiring a search sentence;
a first result obtaining module, configured to obtain at least one first search result according to the search statement;
a score determining module for determining a relevance score of the search statement and each of the first search results;
the screening module is used for selecting a second search result from the first search results, wherein the second search result is the first search result with the relevance score larger than a first threshold value;
the parameter determination module is used for determining a return parameter of the second search result, wherein the return parameter is used for representing the probability that the second search result is executed with a preset operation;
and the display module is used for selecting at least one search result from the second search results to display according to the return parameters.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the search method according to one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the search method according to one of claims 1 to 7.
CN202110476131.7A 2021-04-29 2021-04-29 Searching method, searching device, electronic equipment and storage medium Pending CN113204697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110476131.7A CN113204697A (en) 2021-04-29 2021-04-29 Searching method, searching device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110476131.7A CN113204697A (en) 2021-04-29 2021-04-29 Searching method, searching device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113204697A true CN113204697A (en) 2021-08-03

Family

ID=77029571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110476131.7A Pending CN113204697A (en) 2021-04-29 2021-04-29 Searching method, searching device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113204697A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186023A (en) * 2021-12-07 2022-03-15 北京金堤科技有限公司 Search processing method, device, equipment and medium for specific search scene
CN114662002A (en) * 2022-04-07 2022-06-24 杭州网易云音乐科技有限公司 Object recommendation method, medium, device and computing equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426508A (en) * 2015-11-30 2016-03-23 百度在线网络技术(北京)有限公司 Webpage generation method and apparatus
CN106547871A (en) * 2016-10-31 2017-03-29 北京百度网讯科技有限公司 Method and apparatus is recalled based on the Search Results of neutral net
CN106815252A (en) * 2015-12-01 2017-06-09 阿里巴巴集团控股有限公司 A kind of searching method and equipment
CN107491534A (en) * 2017-08-22 2017-12-19 北京百度网讯科技有限公司 Information processing method and device
CN108153792A (en) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 A kind of data processing method and relevant apparatus
CN109740077A (en) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 Answer searching method, device and its relevant device based on semantic indexing
CN110489638A (en) * 2019-07-08 2019-11-22 广州视源电子科技股份有限公司 A kind of searching method, device, server, system and storage medium
US20200081908A1 (en) * 2018-09-10 2020-03-12 Baidu Online Network Technology (Beijing) Co., Ltd. Internet text mining-based method and apparatus for judging validity of point of interest
CN111046294A (en) * 2019-12-27 2020-04-21 支付宝(杭州)信息技术有限公司 Click rate prediction method, recommendation method, model, device and equipment
CN111340522A (en) * 2019-12-30 2020-06-26 支付宝实验室(新加坡)有限公司 Resource recommendation method, device, server and storage medium
CN112035598A (en) * 2020-11-03 2020-12-04 北京淇瑀信息科技有限公司 Intelligent semantic retrieval method and system and electronic equipment
CN112328890A (en) * 2020-11-23 2021-02-05 北京百度网讯科技有限公司 Method, device, equipment and storage medium for searching geographical location point

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426508A (en) * 2015-11-30 2016-03-23 百度在线网络技术(北京)有限公司 Webpage generation method and apparatus
CN106815252A (en) * 2015-12-01 2017-06-09 阿里巴巴集团控股有限公司 A kind of searching method and equipment
CN106547871A (en) * 2016-10-31 2017-03-29 北京百度网讯科技有限公司 Method and apparatus is recalled based on the Search Results of neutral net
CN108153792A (en) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 A kind of data processing method and relevant apparatus
CN107491534A (en) * 2017-08-22 2017-12-19 北京百度网讯科技有限公司 Information processing method and device
US20200081908A1 (en) * 2018-09-10 2020-03-12 Baidu Online Network Technology (Beijing) Co., Ltd. Internet text mining-based method and apparatus for judging validity of point of interest
CN109740077A (en) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 Answer searching method, device and its relevant device based on semantic indexing
CN110489638A (en) * 2019-07-08 2019-11-22 广州视源电子科技股份有限公司 A kind of searching method, device, server, system and storage medium
CN111046294A (en) * 2019-12-27 2020-04-21 支付宝(杭州)信息技术有限公司 Click rate prediction method, recommendation method, model, device and equipment
CN111340522A (en) * 2019-12-30 2020-06-26 支付宝实验室(新加坡)有限公司 Resource recommendation method, device, server and storage medium
CN112035598A (en) * 2020-11-03 2020-12-04 北京淇瑀信息科技有限公司 Intelligent semantic retrieval method and system and electronic equipment
CN112328890A (en) * 2020-11-23 2021-02-05 北京百度网讯科技有限公司 Method, device, equipment and storage medium for searching geographical location point

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186023A (en) * 2021-12-07 2022-03-15 北京金堤科技有限公司 Search processing method, device, equipment and medium for specific search scene
CN114662002A (en) * 2022-04-07 2022-06-24 杭州网易云音乐科技有限公司 Object recommendation method, medium, device and computing equipment

Similar Documents

Publication Publication Date Title
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN107491534B (en) Information processing method and device
CN110888990B (en) Text recommendation method, device, equipment and medium
US20200184307A1 (en) Utilizing recurrent neural networks to recognize and extract open intent from text inputs
CN104834729B (en) Topic recommends method and topic recommendation apparatus
US20180365257A1 (en) Method and apparatu for querying
CN109284399B (en) Similarity prediction model training method and device and computer readable storage medium
CN111708949B (en) Medical resource recommendation method and device, electronic equipment and storage medium
EP3579124A1 (en) Method and apparatus for providing search results
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
CN107577807B (en) Method and device for pushing information
US11151191B2 (en) Video content segmentation and search
CN110968695A (en) Intelligent labeling method, device and platform based on active learning of weak supervision technology
CN108228567B (en) Method and device for extracting short names of organizations
CN109492081B (en) Text information searching and information interaction method, device, equipment and storage medium
CN110543592A (en) Information searching method and device and computer equipment
CN107239564B (en) Text label recommendation method based on supervision topic model
CN111400586A (en) Group display method, terminal, server, system and storage medium
US11645095B2 (en) Generating and utilizing a digital knowledge graph to provide contextual recommendations in digital content editing applications
US20220121668A1 (en) Method for recommending document, electronic device and storage medium
CN110941702A (en) Retrieval method and device for laws and regulations and laws and readable storage medium
CN110737824B (en) Content query method and device
CN113204697A (en) Searching method, searching device, electronic equipment and storage medium
CN112417996A (en) Information processing method and device for industrial drawing, electronic equipment and storage medium
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination