CN111611490A

CN111611490A - Resource searching method, device, equipment and storage medium

Info

Publication number: CN111611490A
Application number: CN202010448846.7A
Authority: CN
Inventors: 张志伟; 王希爱; 郑仲奇
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2020-09-01

Abstract

The application provides a resource searching method, a resource searching device and a storage medium, and belongs to the technical field of computers. The method comprises the steps of weighting resource characteristics of candidate resources according to the similarity between the candidate resources and resources which are clicked by a user in history by considering the historical interest of the user, so that the obtained target characteristics not only contain the characteristics of the candidate resources, but also integrate the preference of the user to the resources.

Description

Resource searching method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a resource search method, apparatus, device, and storage medium.

Background

With the development of computer technology, computers can provide services for searching resources, wherein the searched resources comprise images, videos, audios, information and the like. For example, in the short video application scenario, when a user initiates a search request, the computer may search for short videos according to keywords in response to the search request of the user, and provide the searched short videos to the user.

In the related art, a computer usually marks a corresponding keyword (such as a title, a brief introduction, etc.) for each resource in a database in advance, and stores a corresponding relationship between the resource and the keyword. After the computer obtains the keywords input by the user, the computer searches in the database according to the keywords input by the user. If the keywords of the resource in the database are matched with the keywords input by the user, the computer determines the resource as a search result and returns the search result to the user.

When the method is adopted, only whether the resource is matched with the keyword is considered, so that the accuracy of the search result provided by the computer is poor.

Disclosure of Invention

The present disclosure provides a resource searching method, apparatus, device and storage medium, to at least solve the problem of poor accuracy of search results in a scene of searching resources in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a resource search method, including:

receiving a search request of a terminal, wherein the search request comprises a keyword;

respectively obtaining similarity between a plurality of candidate resources matched with the keywords and historical resources, wherein the historical resources are resources which are clicked by a user logged in by the terminal in a historical manner;

processing the resource characteristics of the candidate resources according to the similarity between each candidate resource and the historical resource to obtain target characteristics of the candidate resources;

respectively predicting target parameters of the candidate resources according to target characteristics of the candidate resources, wherein the target parameters are used for indicating the probability that the candidate resources are triggered by the user to perform target behaviors;

and determining a search result from the candidate resources according to the target parameters of the candidate resources.

Optionally, the respectively obtaining the similarity between the multiple candidate resources matched with the keyword and the historical resource includes:

fusing the multi-type features of the historical resources to obtain first fused features;

for each candidate resource in the multiple candidate resources, fusing the multi-class characteristics of the candidate resource to obtain a second fusion characteristic, and obtaining the similarity between the candidate resource and the historical resource according to the first fusion characteristic and the second fusion characteristic.

Optionally, the fusing the multiple types of features of the historical resource to obtain the first fused feature includes:

performing word embedding on the identifier of the historical resource to obtain a first embedding characteristic;

performing feature extraction on the content of the historical resource to obtain a first content feature;

and fusing the first embedded feature and the first content feature to obtain the first fused feature.

Optionally, the fusing the multiple types of features of the candidate resource to obtain a second fused feature includes:

performing word embedding on the identification of the candidate resource to obtain a second embedding characteristic;

performing feature extraction on the content of the candidate resource to obtain a second content feature;

and fusing the second embedded characteristic and the second content characteristic to obtain the second fused characteristic.

Optionally, the predicting the target parameters of the candidate resources according to the target features of the candidate resources respectively includes:

and for each candidate resource in the plurality of candidate resources, fusing the target characteristics of the candidate resource, the target characteristics of the keyword and the user identification characteristics to obtain multi-modal characteristics of the candidate resource, and predicting the target parameters of the candidate resource according to the multi-modal characteristics.

Optionally, before fusing the target feature of the candidate resource, the target feature of the keyword, and the user identification feature, the method further includes:

acquiring similarity between the keywords and historical words, wherein the historical words are words input historically when the user searches the historical resources;

and weighting the semantic features of the keywords according to the similarity between the keywords and the historical words to obtain the target features of the keywords.

Optionally, the obtaining of the similarity between the keyword and the historical word includes:

fusing the multi-type features of the keywords to obtain third fused features;

fusing the multi-class characteristics of the historical words to obtain fourth fused characteristics;

and acquiring the similarity between the keyword and the historical word according to the third fusion characteristic and the fourth fusion characteristic.

Optionally, the fusing the multiple types of features of the keyword to obtain a third fused feature includes:

performing word embedding on the identification of the keyword to obtain a third embedding characteristic;

inputting the keywords into a word vector model, processing the keywords through the word vector model, and outputting third content characteristics;

and fusing the third embedded characteristic and the third content characteristic to obtain a third fused characteristic.

Optionally, the fusing the multiple types of features of the historical word to obtain a fourth fused feature includes:

performing word embedding on the identifier of the historical word to obtain a fourth embedding characteristic;

inputting the historical words into a word vector model, processing the historical words through the word vector model, and outputting fourth content characteristics;

and fusing the fourth embedded feature and the fourth content feature to obtain the fourth fused feature.

Optionally, the user identification feature includes a fifth embedded feature, and before the target feature of the candidate resource, the target feature of the keyword, and the user identification feature are fused, the method further includes:

and performing word embedding on the user identification of the user to obtain the fifth embedding characteristic.

Optionally, the predicting target parameters of the candidate resource according to the multi-modal features includes:

inputting the multi-modal characteristics into a prediction model, processing the multi-modal characteristics through the prediction model, and outputting target parameters of the candidate resources, wherein the prediction model is used for predicting the target parameters of the resources according to the multi-modal characteristics of the resources.

According to a second aspect of the embodiments of the present disclosure, there is provided a resource searching apparatus, including;

a receiving unit configured to perform a search request of a receiving terminal, the search request including a keyword;

an obtaining unit configured to perform obtaining similarity between a plurality of candidate resources matched with the keyword and a historical resource respectively, where the historical resource is a resource that has been clicked historically by a user logged in the terminal;

the weighting unit is configured to process the resource characteristics of the candidate resources according to the similarity between each candidate resource and the historical resource to obtain target characteristics of the candidate resources;

a prediction unit configured to perform prediction of target parameters of the plurality of candidate resources according to target features of the plurality of candidate resources, respectively, the target parameters indicating probabilities of the candidate resources being triggered by the user for target behaviors;

a determining unit configured to perform determining a search result from the plurality of candidate resources according to the target parameters of the plurality of candidate resources.

Optionally, the obtaining unit is configured to perform fusion of multiple types of features of the historical resource to obtain a first fused feature; for each candidate resource in the multiple candidate resources, fusing the multi-class characteristics of the candidate resource to obtain a second fusion characteristic, and obtaining the similarity between the candidate resource and the historical resource according to the first fusion characteristic and the second fusion characteristic.

Optionally, the obtaining unit is configured to perform word embedding on the identifier of the historical resource to obtain a first embedding characteristic; performing feature extraction on the content of the historical resource to obtain a first content feature; and fusing the first embedded feature and the first content feature to obtain the first fused feature.

Optionally, the obtaining unit is configured to perform word embedding on the identifier of the candidate resource to obtain a second embedding feature; performing feature extraction on the content of the candidate resource to obtain a second content feature; and fusing the second embedded characteristic and the second content characteristic to obtain the second fused characteristic.

Optionally, the predicting unit is configured to perform, for each candidate resource in the plurality of candidate resources, fusing the target feature of the candidate resource, the target feature of the keyword, and the user identification feature to obtain a multi-modal feature of the candidate resource, and predicting the target parameter of the candidate resource according to the multi-modal feature.

Optionally, the weighting unit is further configured to perform obtaining a similarity between the keyword and a history word, where the history word is a word that is historically input when the user searches the history resource; and weighting the semantic features of the keywords according to the similarity between the keywords and the historical words to obtain the target features of the keywords.

Optionally, the obtaining unit is further configured to perform fusion of multiple types of features of the keyword to obtain a third fused feature; fusing the multi-class characteristics of the historical words to obtain fourth fused characteristics; and acquiring the similarity between the keyword and the historical word according to the third fusion characteristic and the fourth fusion characteristic.

Optionally, the obtaining unit is configured to perform word embedding on the identifier of the keyword to obtain a third embedding characteristic; inputting the keywords into a word vector model, processing the keywords through the word vector model, and outputting third content characteristics; and fusing the third embedded characteristic and the third content characteristic to obtain a third fused characteristic.

Optionally, the obtaining unit is configured to perform word embedding on the identifier of the history word, so as to obtain a fourth embedding characteristic; inputting the historical words into a word vector model, processing the historical words through the word vector model, and outputting fourth content characteristics; and fusing the fourth embedded feature and the fourth content feature to obtain the fourth fused feature.

Optionally, the user identifier features include a fifth embedding feature, and the obtaining unit is further configured to perform word embedding on the user identifier of the user to obtain the fifth embedding feature.

Optionally, the predicting unit is configured to perform inputting the multi-modal features into a prediction model, processing the multi-modal features through the prediction model, and outputting target parameters of the candidate resource, where the prediction model is used for predicting the target parameters of the resource according to the multi-modal features of the resource.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

one or more processors;

one or more memories for storing the processor-executable instructions;

wherein the one or more processors are configured to execute the instructions to implement the resource search method described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the above-mentioned resource search method.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising one or more instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the above-mentioned resource search method.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the embodiment provides a resource searching method based on historical clicking behaviors of a user, and the resource characteristics of candidate resources are weighted according to the similarity between the candidate resources and the resources clicked by the user in the history by considering the historical interest of the user, so that the obtained target characteristics not only contain the characteristics of the candidate resources, but also integrate the preference of the user for the resources.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic illustration of an image shown in accordance with an exemplary embodiment;

FIG. 2 is a diagram illustrating an implementation environment of a multimedia asset searching method according to an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of resource searching in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a method of resource searching in accordance with an exemplary embodiment;

FIG. 5 is an architecture diagram illustrating an end-to-end model for predicting a target parameter, in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating a resource search apparatus in accordance with an exemplary embodiment;

FIG. 7 is a block diagram illustrating a terminal in accordance with an exemplary embodiment;

FIG. 8 is a block diagram illustrating a server in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The user information (e.g., click history) to which the present disclosure relates may be information that is authorized by the user or sufficiently authorized by various parties.

The resource searching method provided by the embodiment of the application can be applied to a scene of searching resources based on a deep learning technology, and the application scene is simply introduced below.

In recent years, deep learning has been widely used in related fields such as video images, speech recognition, and natural language processing. Due to the ultra-strong fitting capability and the end-to-end global optimization capability, deep learning is very colorful in the scene of multimedia content understanding. At present, the image and video classification detection task based on deep learning exceeds the capability of people in some scenes, and in the field of voice recognition, a deep learning algorithm can complete accurate conversion from sound to characters.

Although the algorithm of deep learning makes breakthrough progress in the multimedia field, good effect can be achieved due to the fact that the solving field is relatively fixed and the problem definition is relatively clear, but the method can catch the elbow when the algorithm result is directly applied to the searching field.

Taking fig. 1 as an example, fig. 1 includes two images, one image is (a) in fig. 1, and the other image is (b) in fig. 1. In the context of searching for images, when a user inputs the keyword "cat", the image classification network can mostly identify the contents of (a) in fig. 1 and (b) in fig. 1 as "cat", in other words, both images shown in fig. 1 can be used as candidate results of "cat", but it is obvious that most people prefer (a) in fig. 1, and it is obvious that (a) in fig. 1 is used as a search result more accurately than (b) in fig. 1.

Meanwhile, when different user groups use the same keyword to search in a search system, different behaviors may be generated on the same content due to some factors of the users themselves. Therefore, when < userid, query, photo, click > is estimated, data clicked by the user history needs to be considered comprehensively, and the user history interest is modeled through the user, so that the click behavior can be estimated accurately.

In view of this, some embodiments of the present application provide a click rate estimation method based on historical viewing data, which is suitable for being applied to a search scenario, for example, in picture retrieval, video search, song search, web search, and other scenarios requiring searching for multimedia resources.

Hereinafter, a hardware environment of the embodiments of the present disclosure is exemplified.

Fig. 2 is a schematic diagram of an implementation environment of a multimedia resource searching method according to an exemplary embodiment, referring to fig. 2, in which at least one terminal 101 and a server 102 may be included, and the following details are described below:

the at least one terminal 101 is installed and operated with an application program supporting a search function, where the application program may be at least one of a browser, a social application, a live application, a shopping application, and a payment application, and the category of the application program is not specifically limited in the embodiment of the present disclosure.

The server 102 may include at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center, and the server 102 is configured to provide a background service for an application supporting a search function. Alternatively, the server 102 may undertake primary computational tasks and at least one terminal 101 undertakes secondary computational tasks; alternatively, the server 102 may undertake secondary computing tasks, with at least one terminal 101 undertaking primary computing tasks; alternatively, at least one terminal 101 and the server 102 perform cooperative computing by using a distributed computing architecture.

The at least one terminal 101 and the server 102 may be connected to each other through a wired network or a wireless network.

In an exemplary scenario, a user may start an application on any one of the at least one terminal 101, the application may display a user interface carrying a search box, the user inputs a keyword to be searched in the search box, when the terminal detects a trigger operation of the user on a search option, the terminal generates a search request carrying the keyword, and sends the search request to the server 102. The server 102 receives a search request from a terminal, generates a search result based on the multimedia resource search method provided by the embodiment of the present disclosure, and sends the search result to the terminal, which will be described in detail in the following embodiments.

The applications installed on each terminal in the at least one terminal 101 may be the same, or may be the same type of application on different operating system platforms, and the device types of each terminal may be the same or different, and the device types may include: at least one of a smart phone, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III), an MP4(Moving Picture Experts Group Audio Layer IV), a laptop or a desktop computer. The following embodiments are illustrated with the terminal comprising a smartphone.

Those skilled in the art will appreciate that the number of each terminal may be only one, and may also be several tens or hundreds, or more, and the number and the device type of at least one terminal 101 are not specifically limited in the embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating a resource searching method, as shown in fig. 3, for use in an electronic device, according to an example embodiment, including the following steps.

In step S21, the electronic device receives a search request of the terminal, the search request including a keyword.

In step S22, the electronic device obtains similarities between a plurality of candidate resources matched with the keyword and historical resources respectively, where the historical resources are resources that have been clicked historically by the user logged in the terminal.

In step S23, the electronic device processes the resource features of the multiple candidate resources according to the similarity between each candidate resource and the history resource, so as to obtain the target features of the multiple candidate resources.

In step S24, the electronic device predicts target parameters of the multiple candidate resources respectively according to the target features of the multiple candidate resources, where the target parameters are used to indicate the probability that the candidate resources are triggered by the user to perform the target behavior.

In step S25, the electronic device determines a search result from the plurality of candidate resources according to the target parameters of the plurality of candidate resources.

Optionally, the obtaining the similarity between the multiple candidate resources matched with the keyword and the historical resource respectively includes:

fusing the multi-type characteristics of the historical resources to obtain first fusion characteristics;

and for each candidate resource in the plurality of candidate resources, fusing the multi-class characteristics of the candidate resource to obtain a second fusion characteristic, and acquiring the similarity between the candidate resource and the historical resource according to the first fusion characteristic and the second fusion characteristic.

Through fusing the multi-class characteristics of the historical resources, the similarity between the candidate resources and the historical resources is determined according to the fused characteristics because the fused characteristics have stronger expression capability, and the accuracy of the similarity is improved.

Optionally, the fusing the multiple types of features of the historical resources to obtain a first fused feature, including:

extracting the characteristics of the content of the historical resources to obtain first content characteristics;

and fusing the first embedded characteristic and the first content characteristic to obtain a first fused characteristic.

By fusing the word-embedded characteristics and the content characteristics of the historical resources, the fused characteristics can express the characteristics of the historical resources in the aspect of identification and the characteristics of the historical resources in the aspect of content, so that the expression capability of the fused characteristics is stronger, the similarity between the candidate resources and the historical resources is determined according to the fused characteristics, and the accuracy of the similarity is improved.

Optionally, the fusing the multiple types of features of the candidate resource to obtain a second fused feature, including:

and fusing the second embedded characteristic and the second content characteristic to obtain a second fused characteristic.

By fusing the word-embedded characteristics and the content characteristics of the candidate resources, the fused characteristics can express the characteristics of the candidate resources in the aspect of identification and the characteristics of the candidate resources in the aspect of content, so that the fused characteristics have stronger expression capability on the candidate resources, the similarity between the candidate resources and the historical resources is determined according to the fused characteristics, and the accuracy of the similarity is improved.

Optionally, predicting target parameters of the multiple candidate resources according to target features of the multiple candidate resources respectively includes:

By fusing the features of the resources, the features of the keywords and the user identification features, the fused features can express the features of the candidate resources in multiple modes, so that the expression capability of the fused features on the candidate resources is stronger, the target parameters of the candidate resources are predicted according to the fused features, and the accuracy of the target parameters is improved.

acquiring similarity between the keywords and historical words, wherein the historical words are historically input words when a user searches historical resources;

The semantic features of the keywords are weighted according to the similarity between the keywords and the historical words, so that the weights of the semantic features of the keywords similar to the historical words are enlarged, the multi-modal feature expression capability is improved, and the accuracy of target parameters is improved.

Optionally, obtaining the similarity between the keyword and the history word includes:

fusing the multi-type features of the keywords to obtain third fused features;

and acquiring the similarity between the keywords and the historical words according to the third fusion characteristic and the fourth fusion characteristic.

Through fusing the multi-class characteristics of the keywords and the historical words respectively, the similarity between the keywords and the historical words is determined according to the fused characteristics because the fused characteristics have stronger expression capability, and the accuracy of the similarity is improved.

Optionally, the method includes fusing the multiple types of features of the keyword to obtain a third fused feature, including:

performing word embedding on the identifier of the keyword to obtain a third embedding characteristic;

The characteristics of the keywords embedded by the words and the content characteristics are fused, so that the fused characteristics can express the characteristics of the keywords in the aspect of identification and the characteristics of the keywords in the aspect of content, the expression capability of the fused characteristics is stronger, the similarity between the candidate resources and the keywords is determined according to the fused characteristics, and the accuracy of the similarity is improved.

Optionally, the fusing the multi-class features of the historical word to obtain a fourth fused feature, including:

and fusing the fourth embedded characteristic and the fourth content characteristic to obtain a fourth fused characteristic.

The characteristics of word embedding of the historical words and the content characteristics are fused, so that the fused characteristics can express the characteristics of the historical words in the aspect of identification and in the aspect of content, the fused characteristics have stronger expression capacity on the historical words, the similarity between the historical words and the historical resources is determined according to the fused characteristics, and the accuracy of the similarity is improved.

and performing word embedding on the user identification of the user to obtain a fifth embedding characteristic.

The characteristics corresponding to the user identification are extracted by adopting a word embedding mode, so that the accuracy of the characteristics of the user identification is improved.

Optionally, predicting target parameters of the candidate resource according to the multi-modal features comprises:

By using the artificial intelligence technology, the target parameters of the resources are predicted by using the prediction model, and the prediction model can automatically learn the mapping relation between the characteristics and the target parameters through the samples, so that the accuracy of the obtained target parameters is improved.

Fig. 4 is a flowchart illustrating a resource searching method according to an exemplary embodiment, where an interaction subject of the method includes a terminal, for example, the terminal 101 in the implementation environment shown in fig. 2, and a server, for example, the server 102 in the implementation environment shown in fig. 2, as shown in fig. 4. The method comprises the following steps.

In step S30, the terminal transmits a search request to the server.

The search request includes a keyword. The search request is used for indicating that the resource is searched according to the keyword. How the search request is triggered includes a variety of implementations. For example, a user may start an application program on a terminal, a user interface may be displayed in the application program, the user interface may be a search interface for providing a search function, or may be a home page of the application program, the user interface may include a search box and a search option, the search option may be located near the search box, the user may input a keyword in the search box by clicking the search box, specifically, the user may directly input the keyword in a text form through a screen keyboard, the user may also input the keyword in a voice form, the voice form is intelligently converted into the text form by the terminal, and when the terminal detects a trigger operation of the user on the search option, the terminal generates a search request carrying the keyword, and sends the search request to a server.

Keywords refer to words entered by a user in a search box and may be referred to as queries. For example, the keyword is "cat", "XX program", or the like. The keywords may include texts, numbers, letters, special symbols, and the like, and the content of the keywords is not specifically limited in the embodiments of the present disclosure.

In step S31, the server receives a search request from the terminal, and the server acquires a plurality of candidate resources whose keywords match.

The server analyzes the search request to obtain keywords carried by the search request, and determines a plurality of candidate resources according to the keywords.

The candidate resource is a resource that matches the keyword. For example, at least one of the name, Identity (ID), author, tag, upload time of the candidate resource is the same as the keyword. Optionally, the candidate resource is a resource stored in advance, and the server accesses the database to obtain the resource stored in the database. Alternatively, the candidate resource is a real-time recorded resource, such as a currently live video. The type of candidate resource includes a variety of situations, for example, the candidate resource is a multimedia resource, for example, the candidate resource includes at least one of text, video, audio, image, or web page.

How to obtain candidate resources includes various implementations. Alternatively, if the search scope is specified in the search request, the server may screen the respective databases for candidate resources that meet the search scope. For example, the search request specifies a search video, the server may determine all videos pre-stored in the video database as the plurality of candidate resources. For example, if the search request specifies to search for pictures, the server may determine all pictures pre-stored in the picture database as the candidate resources, and if the search request specifies to search for resources before a certain historical time point, the server may determine resources stored in each database at a time point before the historical time point as the candidate resources.

In step S32, the server acquires a history resource that is a resource clicked historically by the user who logs in the terminal.

The user has historical click behavior on historical resources. For example, in the scene of searching videos, the history resource is videos that the user has clicked historically, that is, videos that the user has watched historically. As another example, in the context of searching for images, the historical resources are images that the user has historically clicked. As another example, the historical resource is a web page that the user has clicked historically, and the like. Optionally, the type of the historical resource and the candidate resource are the same, e.g., the historical resource and the candidate resource are both images or both videos.

How to obtain historical resources includes a variety of implementations. In one possible implementation, the server obtains a click history of the user, the server obtains a resource identifier from the click history, and a resource corresponding to the resource identifier is determined as a history resource.

The click history is used to indicate whether a user has clicked on a resource. The data form of the click history may be in the form of a quadruple. The click history record comprises a user identifier, a history word, an identifier of a history resource and a click identifier. The user identification can be information which can uniquely identify a certain user in at least one of a user name, a user identification code, a user mobile phone number or a user mailbox. The history words may also be referred to as history keywords, and the history words refer to words that are historically input by the user when searching for history resources, for example, the history words are keywords that have been input by the user in a search box during a search process. The identification of the historical resource is at least one of an ID of the historical resource, a name of the historical resource, a title of the historical resource, an author of the historical resource, or a tag of the historical resource, for example. For example, the history resource is an image, the identifier of the history resource is an ID of the image, and the ID of the image is, for example, written as photo ID. The click mark is used for marking whether the user clicks the resource corresponding to the resource mark in the process of searching according to the historical words. For example, the click flag is a binary value. For example, when the click flag takes a value of 1, the user clicks the resource corresponding to the resource flag when searching according to the history word, and when the click flag takes a value of 0, the user does not click the resource corresponding to the resource flag when searching according to the history word.

Taking the resource as an image, the click history is, for example, < userid, query, photo id, click >, and such click history is used to characterize whether a user corresponding to userid has a click behavior on an image corresponding to photo id. Wherein userid represents the user identifier, query represents the history word, and photo represents the identifier of the image. click represents a click identity. For example, the click history is { a, cat, B, 1}, and this click history indicates that the user a has searched for an image using a cat as a keyword and clicked on an image B in a search interface (also referred to as a search system) provided by an application, and the image B is a history resource.

How the click history is obtained includes a variety of ways. For example, the server analyzes the search request, obtains a user identifier carried by the search request, obtains a user usage log corresponding to the user identifier, and obtains a click history according to the user usage log. For example, in general, for a User Generated Content (UGC) platform, a click history can be analyzed from a User usage log.

In addition, the click history corresponding to any user can be analyzed according to the user identification (userid). For example, the click history is analyzed by the following formula (1).

< userid, query, photo > { < userid, query, photo, click > | click { (1 }); formula (1)

In step S33, the server fuses the multiple types of features of the historical resource to obtain a first fused feature.

The first fused Feature can be regarded as a composition of multiple types of features of the same historical resource, or a cross Feature (Feature cross) of the same historical resource.

Fusing which features of historical resources includes multiple implementations. In some possible embodiments, the features of the historical resources are divided into two categories, one category is ID of the historical resources, the other category is content of the historical resources, and the first fusion feature is obtained by fusing the features of the ID category and the features of the content category. How to fuse the features of the ID class and the features of the content class includes various implementations, which are exemplified by S331 to S333 below.

S331, the server carries out word embedding on the identification of the historical resource to obtain a first embedding characteristic.

The first embedding feature is word embedding of an identification of a historical resource. The dataform of the first embedded feature is, for example, a vector or a matrix. Optionally, the first embedded feature includes a word vector corresponding to each word in the identification of the historical resource. The word vector is a vector which can represent a certain word and capture the semantic meaning of the word.

Word Embedding is called Embedding, and word Embedding refers to a process of converting data from a text form into a vector form that can be processed by a computer. How word embedding is done includes a variety of implementations. For example, the server inputs the identifier of the historical resource into a pre-stored word vector model, and performs embedding processing on the identifier of the historical resource through the word vector model to obtain a word vector of the identifier of the historical resource, wherein the word vector of the identifier of the historical resource is the first embedding feature. In other words, the word vector model can obtain word vectors in a word embedding (word embedding) mode, so that the identifiers of the historical resources can be converted into a vector form which can be processed by a computer from a text form, and the processibility and the expression capacity of the identifiers of the historical resources can be improved when subsequent calculation is carried out according to the first embedding characteristics of the word vector form.

Taking historical resources as images (photo) as an example, please refer to fig. 5, wherein S331 is, for example: the server carries out Embedding on the photo id to obtain feature emb_photo。feature emb_photoA first embedded feature of the image is represented. feature caremb_photoCorresponding to Embedding3 in fig. 5.

The word vector model is a model capable of extracting text features. For example, the word Vector model may be a chinese word Vector model chinese 2Vector or a foreign word Vector model, and word Vector models of different languages may be adopted according to the different languages to which the resource identifiers belong.

It should be understood that the approach of the word vector model is only an alternative approach. In some embodiments, the server may not extract features of the ID class through the word vector model, but obtain the first embedded feature by one-hot encoding the identification of the historical resource.

S332, the server extracts the characteristics of the content of the historical resources to obtain first content characteristics.

The content of the historical resources may include a variety of circumstances. For example, the historical resource is a video that the user has clicked historically, and the content of the historical resource is any one or more of the video itself, one or more frames of images in the video, or a video summary. The image in the video is, for example, a key frame of the video, a first frame of the video, or a cover of the video. As another example, the history resource includes images that the user has clicked historically, and the content of the history resource is the images themselves or key areas in the images. For another example, the history resource is a web page or text clicked by the user, and the content of the history resource is a character string included in the web page or text.

The first content characteristic refers to a characteristic of the content of the historical resource. For example, the history resource is an image that the user has clicked historically, and the first content feature is a feature map of the image. For another example, the historical resource is a video that the user has historically clicked on, and the first content feature comprises a feature map of an image in the video. As another example, the history resource is text that the user has historically clicked on, and the first content feature is a semantic representation vector of the text.

How to extract the content features of the historical resources includes a variety of implementations. Optionally, the first content feature is extracted using a convolutional neural network. For example, the historical resource comprises images clicked by the user in history, the server inputs the images into a convolutional neural network, and the images are processed through the convolutional neural network to output the first content characteristics. For example, referring to fig. 5, after photo is input into the convolutional neural network, the first content feature is obtained.

Specifically, a Convolutional Neural Network (CNN) includes at least one Convolutional layer, adjacent Convolutional layers of the at least one Convolutional layer are connected in series, that is, an output graph of any Convolutional layer serves as an input graph of a next Convolutional layer of the Convolutional layer. And sequentially carrying out convolution processing on at least one convolution layer to obtain an output graph of the last convolution layer as a first content characteristic. By adopting the convolutional neural network to extract the features, linear mapping and nonlinear mapping of the features can be realized through multiple convolutions, so that the expression capability of the obtained first content features is stronger.

It should be understood that, the content feature extraction by using the CNN is only an example, in some embodiments, the server may not obtain the content feature of the resource through the CNN, but may obtain the respective content feature by using different models for different types of resources, for example, obtaining the content feature of the picture by using the CNN, obtaining the content feature of the video by using the LSTM (Long Short-term memory network), obtaining the content feature of the audio by using the VGG (Visual Geometry Group), and the like, so that the targeted feature extraction can be performed on different types of resources, and the expression capability of the content feature can be improved.

S333, the server fuses the first embedded feature and the first content feature to obtain a first fused feature.

How feature fusion is implemented includes a variety of implementations. In one possible implementation, the first embedded feature and the first content feature are fused in a bit-wise multiplication manner. Specifically, the value of each bit in the first embedded feature is multiplied by the value of the bit corresponding to the first content feature to obtain a first fused feature, where the value of each bit in the first fused feature is the product of the value of one bit in the first embedded feature and the value of one bit in the first content feature. Taking the resource as an image (photo) as an example, the following formula (2) is adopted to obtain the first fusion feature.

feature_photo＝feature content_photo⊙feature emb_photo(ii) a Formula (2)

Wherein feature_photoRepresenting a first fusion feature, feature content_photoRepresenting a first content feature, feature emb_photoIndicating a first embedded feature and ⊙ indicating a bitwise multiplication.

For example, if the values of the first content feature of the image and the first embedded feature of the image are as follows:

feature content_photo＝[1，2，3，4，5]

feature emb_photo＝[5，4，3，2，1]

the first fusion feature of the image takes the following values:

feature_photo＝[5，8，9，8，5]

by integrating the ID class characteristics of the historically clicked resources and the content class characteristics of the historically clicked resources, the first fusion characteristics not only express the ID of the historical resources, but also express the content of the historical resources, so that the characteristic expression capability of the fusion characteristics is stronger, and the accuracy of a calculation result is improved when subsequent calculation is carried out according to the first fusion characteristics.

Alternatively, if the number of the historical resources is multiple, that is, the user has historically clicked on multiple resources, the first fusion features of the multiple historical resources may be averaged, and the similarity between the historical resources and the candidate resources may be calculated using the averaged first fusion features. For example, the same network structure is adopted to extract features of < userid, query, photo id > with the same historical click behavior. For example, all features in a list are averaged using equation (3) below.

The avg-feature represents that for the feature j at any position, the average value of the positions corresponding to all the features is obtained, and K represents the length of the list of the resources clicked historically.

In step S34, for each candidate resource of the multiple candidate resources, the server fuses the multi-class features of the candidate resource to obtain a second fused feature.

The second fused Feature may be regarded as a composite of multiple types of features of the same candidate resource, or a cross Feature (Feature cross) of the same candidate resource.

Fusing which features of the candidate resources includes multiple implementations. In some possible embodiments, the features of the candidate resource are divided into two categories, one category is the ID of the candidate resource, the other category is the content of the candidate resource, and the second fusion feature is obtained by fusing the features of the ID category and the features of the content category. How to fuse the features of the ID class and the features of the content class includes various implementations, which are exemplified by S341 to S343 below. It should be understood that the feature extraction process of the candidate resource is the same as the feature extraction process of the historical resource, and details not shown in step S34 can refer to step S33, which is not repeated in step S34.

And S341, performing word embedding on the identifier of the candidate resource by the server to obtain a second embedding characteristic.

The identity of the candidate resource is, for example, at least one of an ID of the candidate resource, a name of the candidate resource, a title of the candidate resource, an author of the candidate resource, or a tag of the candidate resource. For example, the candidate resource is an image, the identifier of the candidate resource is an ID of the image, and the ID of the image is, for example, written as photo ID.

The second embedding characteristic is an identified word embedding of the candidate resource. The dataform of the second embedded features is, for example, a vector or a matrix. Optionally, the second embedded feature comprises a word vector corresponding to each word in the identification of the candidate resource.

For example, the server inputs the identifier of the candidate resource into a pre-stored word vector model, and performs embedding processing on the identifier of the candidate resource through the word vector model to obtain a word vector of the identifier of the candidate resource, where the word vector of the identifier of the candidate resource is the second embedding feature. The word vector model used in extracting the second embedded feature and the word vector model used in extracting the first embedded feature may be the same.

S342, the server extracts the features of the content of the candidate resource to obtain a second content feature;

the second content characteristic refers to a characteristic of the content of the candidate resource. For example, in the context of searching for images, the second content feature is a feature map of candidate images that match the keyword. For another example, in a scene of a search video, the second content features include feature maps of multiple frames of images in candidate videos with matched keywords. As another example, in the context of searching text, the second content feature is a semantic representation vector of keyword-matched text.

How to extract content features of candidate resources includes various implementations. Optionally, the second content features are extracted using a convolutional neural network. For example, the candidate resource comprises an image matched with the keyword, the server inputs the candidate image into a convolutional neural network, processes the candidate image through the convolutional neural network, and outputs the second content feature.

And S343, the server fuses the second embedded feature and the second content feature to obtain a second fused feature.

S343 and S333 are the same, and the second embedding feature and the second content feature may be fused in a bit-wise multiplication manner. By integrating the ID class characteristics of the candidate resources and the content class characteristics of the candidate resources, the second fusion characteristics express both the ID of the candidate resources and the semantics of the candidate resources, so that the characteristic expression capability of the fusion characteristics is stronger, and the accuracy of a calculation result is improved when subsequent calculation is performed according to the second fusion characteristics.

It should be noted that, in this embodiment, the order of step S32 and step S34 is not limited. In some embodiments, steps S32 and S34 may be performed sequentially. For example, step S32 may be executed first, and then step S34 may be executed; step S34 may be executed first, and then step S32 may be executed. In other embodiments, step S32 and step S34 may be executed in parallel, that is, step S32 and step S34 may be executed simultaneously.

In step S35, the server obtains the similarity between the candidate resource and the historical resource according to the first fusion feature and the second fusion feature.

By executing the above steps, based on the history information and the current information, the method can be divided into two features, one of which is the fusion feature of the history resource, and the other of which is the fusion feature of the current candidate resource, and the two features can be respectively marked as features_histAnd feature_nowAnd calculating the similarity between the two features so as to obtain the similarity between the historical resource and the candidate resource. For example, the similarity between the first fusion feature and the second fusion feature is calculated, and the similarity between the candidate resource and the historical resource is obtained according to the similarity between the first fusion feature and the second fusion feature. How to calculate the similarity between the first fused feature and the second fused feature includes various ways, for example, calculating the cosine similarity or distance between the first fused feature and the second fused feature. For example, the similarity is calculated using the following formula (4).

sim＝1-cosine(feature_hist，feature_now) (ii) a Formula (4)

Where sim represents the similarity between the candidate resource and the historical resource, while cosine represents the cosine similarity.

Compared with the single characteristic of the historical resource, the characteristic expression capability of the first fusion characteristic is stronger because the obtained first fusion characteristic contains the characteristics of different types of the historical resource by fusing the characteristics of the same historical resource. Compared with the single class of characteristics of the candidate resources, the method has the advantages that the multiple classes of characteristics of the same candidate resource are fused together, and the obtained second fusion characteristics contain the characteristics of different types of the candidate resources, so that the characteristic expression capacity of the second fusion characteristics is stronger. Then, the similarity is calculated by utilizing the first fusion characteristic and the second fusion characteristic, and the obtained similarity can more effectively express the similarity between the candidate resource and the historical resource, so that the accuracy of the similarity is improved.

It should be understood that how to obtain the similarity between the candidate resource and the historical resource includes various implementation manners, and the above description is only an example, and optionally, the similarity is calculated not according to the fusion feature of the candidate resource and the fusion feature of the historical resource, but in other manners. For example, the similarity between the candidate resource and the historical resource is calculated according to the content feature (such as the second content feature) of the candidate resource and the content feature (such as the first content feature) of the historical resource. As another example, a similarity between the candidate resource and the historical resource is calculated based on the identified embedded feature (e.g., the second embedded feature) of the candidate resource and the identified embedded feature (e.g., the first embedded feature) of the historical resource.

In step S36, the server processes the resource features of the multiple candidate resources according to the similarity between each candidate resource and the history resource, to obtain the target features of the multiple candidate resources.

The resource features include, without limitation, at least one of embedded features of the resource, content features of the resource, or fused features of the resource. For example, the resource features of the candidate resource include, without limitation, at least one of a second embedded feature, a second content feature, or a second fused feature.

How to process by utilizing the similarity between the candidate resource and the historical resource comprises a plurality of ways. Optionally, the server weights the resource features of the multiple candidate resources according to the similarity between each candidate resource and the historical resource to obtain target features of the multiple candidate resources, and when the target features are implemented in this way, the target features may also be referred to as weighted features. How to weight the resource characteristics includes various implementation manners. For example, the server weights the similarity with each dimension in the resource characteristics. For example, referring to the following formula (5), the server multiplies the similarity by the value of each dimension in the resource features to obtain a target feature, the number of dimensions of the target feature is equal to the number of dimensions of the resource features, and the value of each dimension in the target feature is the product of the value of the corresponding dimension in the resource features and the similarity. Alternatively, the server weights the similarity with a partial dimension in the resource feature. For example, referring to fig. 5, after the historical photo fusion features and the current candidate photo fusion features are obtained, the two fusion features are used to obtain similar weights, and then the similar weights are used to perform weighting operation on the current candidate photo fusion features to obtain the target features.

Wherein feature_newRepresenting target characteristics of the candidate resource, sim representing similarity between the candidate resource and the historical resource,

which represents the multiplication.

In some embodiments, the server may not perform the processing in a weighted manner. In a possible implementation, the server performs processing in a summation manner, for example, the server adds the similarity between the candidate resource and the historical resource to the value of each dimension in the resource feature of the candidate resource to obtain the target feature of the candidate resource, where the value of each dimension in the target feature is the sum of the value of the dimension of the resource feature and the similarity. In another possible implementation, the server performs processing in a feature splicing manner, for example, the server splices the similarity between the candidate resource and the historical resource and the resource feature of the candidate resource to obtain a target feature of the candidate resource, where the target feature includes the resource feature and the similarity of the candidate resource, in other words, the similarity between the candidate resource and the historical resource is a feature value of a newly added dimension of the candidate resource.

It should be understood that the above is only an example of how to obtain the target feature of one candidate resource, the number of candidate resources obtained by the server may be multiple, and the target feature may be calculated separately for each of the multiple candidate resources in a similar manner.

In step S37, the server predicts target parameters of the plurality of candidate resources based on the target features of the plurality of candidate resources, respectively.

The target parameter is used to indicate the probability that the candidate resource is triggered by the user to the target behavior. The target behavior refers to any behavior which can be executed by the terminal on the resource in the process of displaying the resource by the terminal. For example, the target behavior includes, without limitation, any one or more of a click behavior, a like behavior, a comment behavior, an attention behavior, a favorite behavior, a forward behavior, a behavior to access a purchase page associated with the candidate resource.

For example, the target behavior is click behavior, and the target parameter is click rate; as another example, the target behavior is a behavior of accessing a purchase page associated with the candidate resource and the target parameter is a conversion rate.

For each candidate resource in the plurality of candidate resources, the server predicts a target parameter of the candidate resource according to the target feature of the candidate resource. And under the condition that the target parameter is the click rate, predicting the obtained target parameter, namely the estimated click rate. How to predict the target parameter includes various implementations, which are exemplified by S371 to S372 below.

S371, for each candidate resource in the plurality of candidate resources, the server fuses the target feature of the candidate resource, the target feature of the keyword and the user identification feature to obtain the multi-modal feature of the candidate resource.

Multimodal features are, for example, resources, keywords, and user identifications. How to fuse the target feature and the user identification feature includes a variety of ways. For example, the server splices the target features of the candidate resources, the target features of the keywords and the user identification features, the spliced result is a multi-modal feature, and the number of dimensions of the multi-modal feature is the sum of the dimensions of the target features of the candidate resources, the dimensions of the target features of the keywords and the dimensions of the user identification features. By means of splicing, the target characteristics and the user identification characteristics can be reserved as far as possible, and loss of part of characteristic information in the characteristic fusion process is avoided.

Of course, the way of stitching is only an example of obtaining multi-modal features, and in some embodiments, the server may also obtain multi-modal features by: and the server performs dimension transformation on the target characteristics of the candidate resources to obtain the target characteristics after the dimension transformation, wherein the target characteristics of the candidate resources after the dimension transformation and the target characteristics of the keywords have the same dimension quantity. And then, the server adds each element in the target feature of the keyword with each element at the corresponding position in the target feature of the candidate resource after dimension conversion to obtain the multi-modal feature, so that the target feature of the keyword and the target feature of the candidate resource can be subjected to more compact feature fusion.

How to obtain the target feature of the keyword includes various ways, for example, the server obtains the semantic feature of the keyword. The server obtains the similarity between the keywords and the historical words, and the server weights the semantic features of the keywords according to the similarity between the keywords and the historical words to obtain the target features of the keywords.

How to obtain the similarity between the keyword and the history word includes various ways, which are exemplified by S3711 to S3713 below.

S3711, the server fuses the multiple types of features of the keyword to obtain a third fused feature.

Optionally, S3711 includes the following steps a to C.

Step A, the server carries out word embedding on the identification of the keyword to obtain a third embedding characteristic;

the third embedding characteristic is word embedding of the identification of the keyword carried by the search request, namely word embedding of the identification of the query currently input by the user. The data form of the third embedded feature is, for example, a vector or a matrix. For example, the third embedded feature includes a word vector corresponding to each word in the identification of the currently input keyword. For example, referring to FIG. 5, after Embedding the query ID, the Embedding2 is obtained.

Step B, the server inputs the keywords into a word vector model, processes the keywords through the word vector model and outputs third content characteristics;

the third content feature refers to a feature of the keyword itself carried by the search request, and the third content feature is used for indicating the semantics of the keyword, for example, the third content feature is a semantic representation vector of the keyword. For example, referring to fig. 5, after the Word2Vector is input to the query, the obtained result is the third content feature.

And step C, the server fuses the third embedded characteristic and the third content characteristic to obtain a third fused characteristic.

For example, referring to fig. 5, the results of Embedding2 and Word2Vector are fused, and the obtained result is the third fusion feature. By integrating the ID class characteristics of the keywords and the content class characteristics of the keywords, the third fusion characteristics not only express the IDs of the candidate resources, but also express the semantics of the candidate resources, so that the characteristic expression capability of the fusion characteristics is stronger, and the accuracy of a calculation result is improved when subsequent calculation is performed according to the third fusion characteristics.

S3712, the server fuses the multiple types of features of the historical words to obtain a fourth fused feature;

optionally, S3711 includes the following steps a to c.

Step a, the server carries out word embedding on the identification of the historical word to obtain a fourth embedding characteristic;

a fourth embedding feature is word embedding of the identity of the historical word. The data form of the fourth embedded feature is, for example, a vector or a matrix. For example, the fourth embedded feature includes a word vector corresponding to each word in the identification of the historical words.

For another example, the identifier of the history resource is an identifier of a history word, and the identifier of the history word is denoted as queryid, where the history word is, for example, a keyword used when the history resource is searched out.

B, the server inputs the historical words into a word vector model, processes the historical words through the word vector model and outputs fourth content characteristics;

the fourth content feature refers to a feature of the history word itself, and is used for indicating the semantic meaning of the history word, for example, the fourth content feature is a semantic representation vector of the history word.

And c, the server fuses the fourth embedded characteristic and the fourth content characteristic to obtain a fourth fused characteristic.

By integrating the ID class characteristics of the historical words with the content class characteristics of the historical words, the fusion characteristics of the historical words not only express the IDs of the historical words, but also express the semantics of the historical words, so that the characteristic expression capability of the fusion characteristics is stronger, and the accuracy of a calculation result is improved when subsequent calculation is performed according to the fusion characteristics of the historical words.

S3713, the server obtains the similarity between the keyword and the history word according to the third fusion feature and the fourth fusion feature.

For example, the server calculates a similarity (e.g., cosine similarity) between the third fusion feature and the fourth fusion feature, and obtains a similarity between the keyword and the history word according to the similarity between the third fusion feature and the fourth fusion feature, and this implementation is similar to the above formula (4), which is not described herein again.

The user identification feature refers to a feature included in the user identification of the user. For example, the user identification feature is word embedding for user identification. For example, the user identification features include a fifth embedding feature, and the server performs word embedding on the user identification of the user to obtain the fifth embedding feature.

Of course, how to obtain multi-modal features includes various implementations, and S371 is merely an illustration, and alternatively, multi-modal features of candidate resources may be obtained by other means. For example, S371 is replaced with: for each candidate resource in the plurality of candidate resources, the server fuses the target feature of the candidate resource, the semantic feature of the keyword and the user identification feature to obtain the multi-modal feature of the candidate resource. As another example, S371 is replaced with: for each candidate resource in the plurality of candidate resources, the server fuses the target characteristics of the candidate resource and the semantic characteristics of the keywords to obtain the multi-modal characteristics of the candidate resource. As another example, S371 is replaced with: for each candidate resource in the plurality of candidate resources, the server fuses the target feature and the user identification feature of the candidate resource to obtain the multi-modal feature of the candidate resource.

And S372, the server predicts the target parameters of the candidate resources according to the multi-modal characteristics of the candidate resources.

How to predict the target parameter includes a variety of implementations. Optionally, the server inputs the multi-modal features into the prediction model, processes the multi-modal features through the prediction model, and outputs the target parameters of the candidate resources.

The prediction model is used to predict a target parameter (CTR) of the resource based on the multi-modal characteristics of the resource. Specifically, the prediction model is used to predict the probability that the user clicks a candidate resource under the condition of searching a keyword, and the prediction model may be pre-stored locally by the server. The prediction model can comprise an input layer, a hidden layer and an output layer, each layer can comprise a plurality of neurons, and each neuron can perform linear mapping and nonlinear mapping on input multi-modal features to obtain target parameters. Wherein the neuron may comprise at least one of a convolution kernel, a gaussian kernel, a kernel structure, a gate structure, and a memory cell. Optionally, the prediction model may be a Deep Neural Network (DNN), and the DNN may include at least one hidden layer and a normalization layer, where adjacent hidden layers in the at least one hidden layer are connected in series, that is, an output graph of any hidden layer serves as an input graph of a next hidden layer of the hidden layer. For any multi-modal feature, the server may input the multi-modal feature into at least one hidden layer in the DNN, perform convolution processing on the multi-modal feature through the at least one hidden layer, input an output graph of a last hidden layer into a normalization layer, perform exponential normalization (softmax) processing on the output graph of the last hidden layer through the normalization layer to obtain target parameters of candidate resources corresponding to the multi-modal feature, and repeatedly perform the above steps until the target parameters of each candidate resource are obtained. In some embodiments, the prediction model may be a Wide & Deep network (combined width and depth network), a GBDT (Gradient Boosting Decision Tree), an XGBoost (eXtreme Gradient Boosting), and the like, in addition to the DNN, and the embodiment of the present application does not specifically limit the type of the prediction model. The prediction model is obtained by training the multi-modal characteristics of sample resources in advance, the sample resources are marked with target parameters, and the prediction model can learn the corresponding relation between the multi-modal characteristics and the target parameters through samples.

In an exemplary embodiment, referring to fig. 5, fig. 5 shows a logic architecture diagram of an estimated target parameter, and fig. 5 shows an end-to-end model, which may be called a target parameter Embedding (Embedding) model, and which is a neural network, and which is obtained by model training using click as a supervision signal, and which can be applied in a scene of a search image or a search video. For example, a user triggers a search request, the search request carries a keyword (Query) and a user identifier (Userid), and the server finds historical Photo clicked by the user history and the ID of the historical Photo according to the Userid. The server inputs the historical Photo into the convolutional neural network, embeds the ID of the historical Photo, and fuses the output result of the convolutional neural network and the embedded result (Embedding3) to obtain a fusion feature 1 (namely a first fusion feature). And the server searches Photo matched with Query, inputs Photo into the convolutional neural network, embeds the ID of Photo, and fuses the output result of the convolutional neural network and the result of Embedding to obtain a fusion feature 2 (i.e. a second fusion feature). And then, the server calculates the similarity between the fusion characteristic 1 and the fusion characteristic 2, and weights the fusion characteristic 2 by taking the similarity as weight to obtain the target characteristic of Photo. And the server inputs the Query carried by the search request into a Word2Vec model, processes the Query through the Word2Vec model, embeds the ID of the Query, and fuses the output result of the Word2Vec model and the embedded result (embedding2) to obtain a fusion feature 3 (namely, a third fusion feature). Then, the server inputs the Query corresponding to the history Photo into the Word2Vec model, processes the Query through the Word2Vec model, embeds (embedding) the ID of the Query, and fuses the output result of the Word2Vec model and the embedded result (embedding2) to obtain a fusion feature 4 (i.e., a fourth fusion feature). And then, the server calculates the similarity between the fusion characteristics 3 and the fusion characteristics 4, takes the similarity as weight, and weights the fusion characteristics 3 to obtain the target characteristics of Query. In addition, the server also embeds (Embedding) the user id to obtain Embedding1 (i.e. the fifth Embedding feature). Finally, the server fuses the target features of Embedding1 and Query and the target features of Photo into multi-modal features, inputs the multi-modal features into the deep neural network, processes the multi-modal features through the deep neural network, and outputs target parameters (Click).

In step S38, the server determines a search result from the plurality of candidate resources according to the target parameters of the plurality of candidate resources.

And the server selects the target candidate resources with the target parameters meeting the conditions according to the target parameters of the candidate resources, and takes the target candidate resources as the search results.

The target parameter satisfaction condition includes various cases. For example, the condition may be that the target parameter of the candidate resource is located at a preset position, or the condition may also be that the target parameter of the candidate resource is located at a preset ratio, and the content of the condition is not specifically limited in the embodiment of the present disclosure.

In some embodiments, the server may rank the plurality of candidate resources in order of decreasing target parameters, and encapsulate the candidate resources ranked at the top preset position as the search result, so that the target parameters of the candidate resources in the search result are as high as possible. Optionally, since some candidate resources (e.g., videos) usually occupy a large space, titles, thumbnails, and jump links of the candidate resources ordered in the top preset position may be packaged as search results, so that the consumed time for resource transmission can be saved.

In some embodiments, the server may rank the plurality of candidate resources in order of decreasing target parameters, determine candidate resources ranked at a first preset proportion, and encapsulate at least one of the candidate resources ranked at the first preset proportion as a search result. In the above process, for the candidate resources ranked at the top preset proportion, the server may optionally select at least one candidate resource from the candidate resources, and encapsulate the at least one candidate resource as the search result, so that an over-fitting phenomenon of the search result can be avoided, and the generalization degree and the randomness of the search result are increased. In some embodiments, since some candidate resources (e.g., video resources) usually occupy a large space, the title, thumbnail and jump link of at least one candidate resource can be packaged as a search result, so that the consumed time of resource transmission can be saved.

In step S39, the server transmits the search result to the terminal.

When the terminal receives the search result, the resources in the search result can be displayed in a user interface provided by the application program, for example, the display area of the search result can be positioned below the search box.

In some embodiments, if the title, the thumbnail and the jump link of each resource are carried in the search result, the terminal may display the title, the thumbnail and the jump link of each resource in the user interface, when a click operation of a user on the jump link of any resource is detected, a resource request is sent to the server, the resource request is used for requesting to access the resource, the server responds to the resource request, the resource is sent to the terminal, and when the terminal receives the resource, the terminal may display the resource in the application program.

Fig. 6 is a block diagram illustrating a resource search apparatus according to an example embodiment. Referring to fig. 6, the apparatus includes a receiving unit 401, an acquiring unit 402, a weighting unit 403, a prediction unit 404, and a determination unit 405.

A receiving unit 401 configured to perform a search request of a receiving terminal, the search request including a keyword;

an obtaining unit 402 configured to perform obtaining similarity between a plurality of candidate resources matched with the keyword and a historical resource respectively, wherein the historical resource is a resource clicked by a user logged in the terminal in a history;

a weighting unit 403, configured to perform processing on resource features of multiple candidate resources according to a similarity between each candidate resource and a history resource, so as to obtain target features of the multiple candidate resources;

a prediction unit 404 configured to perform prediction of target parameters of the plurality of candidate resources according to target features of the plurality of candidate resources, respectively, the target parameters being used for indicating a probability that the candidate resources are triggered by the user to perform a target action;

a determining unit 405 configured to perform determining a search result from the plurality of candidate resources according to the target parameter of the plurality of candidate resources.

The embodiment provides a device for searching resources based on historical click behaviors of a user, and the device weights the resource characteristics of candidate resources according to the similarity between the candidate resources and the resources clicked by the user in the history by considering the historical interest of the user, so that the obtained target characteristics not only contain the characteristics of the candidate resources, but also integrate the preference of the user on the resources.

Optionally, the obtaining unit 402 is configured to perform fusion of multiple types of features of the historical resource to obtain a first fused feature; and for each candidate resource in the plurality of candidate resources, fusing the multi-class characteristics of the candidate resource to obtain a second fusion characteristic, and acquiring the similarity between the candidate resource and the historical resource according to the first fusion characteristic and the second fusion characteristic.

Optionally, the obtaining unit 402 is configured to perform word embedding on the identifier of the history resource, so as to obtain a first embedding characteristic; extracting the characteristics of the content of the historical resources to obtain first content characteristics; and fusing the first embedded characteristic and the first content characteristic to obtain a first fused characteristic.

Optionally, the obtaining unit 402 is configured to perform word embedding on the identifier of the candidate resource, so as to obtain a second embedding characteristic; performing feature extraction on the content of the candidate resource to obtain a second content feature; and fusing the second embedded characteristic and the second content characteristic to obtain a second fused characteristic.

Optionally, the predicting unit 404 is configured to perform, for each candidate resource in the multiple candidate resources, fusing the target feature of the candidate resource, the target feature of the keyword, and the user identification feature to obtain a multi-modal feature of the candidate resource, and predicting the target parameter of the candidate resource according to the multi-modal feature.

Optionally, the weighting unit 403 is further configured to perform obtaining a similarity between the keyword and a history word, where the history word is a word that is historically input when the user searches a history resource; and weighting the semantic features of the keywords according to the similarity between the keywords and the historical words to obtain the target features of the keywords.

Optionally, the obtaining unit 402 is further configured to perform fusion of multiple types of features of the keyword to obtain a third fused feature; fusing the multi-class characteristics of the historical words to obtain fourth fused characteristics; and acquiring the similarity between the keywords and the historical words according to the third fusion characteristic and the fourth fusion characteristic.

Optionally, the obtaining unit 402 is configured to perform word embedding on the identifier of the keyword, so as to obtain a third embedding characteristic; inputting the keywords into a word vector model, processing the keywords through the word vector model, and outputting third content characteristics; and fusing the third embedded characteristic and the third content characteristic to obtain a third fused characteristic.

Optionally, the obtaining unit 402 is configured to perform word embedding on the identifier of the history word, so as to obtain a fourth embedding characteristic; inputting the historical words into a word vector model, processing the historical words through the word vector model, and outputting fourth content characteristics; and fusing the fourth embedded characteristic and the fourth content characteristic to obtain a fourth fused characteristic.

Optionally, the user identifier includes a fifth embedding feature, and the obtaining unit 402 is further configured to perform word embedding on the user identifier of the user, so as to obtain the fifth embedding feature.

Optionally, the predicting unit 404 is configured to perform inputting the multi-modal features into a prediction model, processing the multi-modal features through the prediction model, and outputting target parameters of the candidate resource, wherein the prediction model is used for predicting the target parameters of the resource according to the multi-modal features of the resource.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

The electronic device in the above method embodiment may be implemented as a terminal or a server, for example, fig. 7 shows a block diagram of a terminal 500 provided in an exemplary embodiment of the present disclosure. The terminal 500 may be: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. Terminal 500 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.

In general, the terminal 500 includes: one or more processors 501 and one or more memories 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the resource search method provided by method embodiments in the present disclosure.

In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, display screen 505, camera assembly 506, audio circuitry 507, positioning assembly 508, and power supply 509.

The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 504 may further include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 505 may be one, providing the front panel of the terminal 500; in other embodiments, the display screens 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in other embodiments, the display 505 may be a flexible display disposed on a curved surface or a folded surface of the terminal 500. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 505 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.

The positioning component 508 is used to locate the current geographic position of the terminal 500 for navigation or LBS (location based Service). The positioning component 508 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 509 is used to power the various components in terminal 500. The power source 509 may be alternating current, direct current, disposable or rechargeable. When power supply 509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: acceleration sensor 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515, and proximity sensor 516.

The acceleration sensor 511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 501 may control the display screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may cooperate with the acceleration sensor 511 to acquire a 3D motion of the user on the terminal 500. The processor 501 may implement the following functions according to the data collected by the gyro sensor 512: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 513 may be disposed on a side frame of the terminal 500 and/or underneath the display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, a user's holding signal of the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 514 is used for collecting a fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 514 may be provided on the front, back, or side of the terminal 500. When a physical button or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or the vendor Logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the display screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the display screen 505 is increased; when the ambient light intensity is low, the display brightness of the display screen 505 is reduced. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 515.

A proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 500. The proximity sensor 516 is used to collect the distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually decreases, the processor 501 controls the display screen 505 to switch from the bright screen state to the dark screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 becomes gradually larger, the display screen 505 is controlled by the processor 501 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 is not intended to be limiting of terminal 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The electronic device in the foregoing method embodiment may be implemented as a server, for example, fig. 8 is a schematic structural diagram of a server provided in the present disclosure, and the server 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the memories 602 store at least one instruction, and the at least one instruction is loaded and executed by the processors 601 to implement the resource search method provided by each of the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, an input/output interface, and other components to facilitate input and output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an electronic device to perform the above-described resource search method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for resource search, comprising:

2. The resource searching method according to claim 1, wherein the obtaining the similarity between the plurality of candidate resources matched with the keyword and the historical resource respectively comprises:

3. The resource search method according to claim 2, wherein the fusing the multiple types of features of the historical resource to obtain a first fused feature comprises:

4. The method according to claim 2, wherein the fusing the multi-class features of the candidate resource to obtain a second fused feature comprises:

5. The resource searching method according to claim 1, wherein the predicting the target parameters of the candidate resources according to the target features of the candidate resources comprises:

6. The resource search method of claim 5, wherein before fusing the target features of the candidate resources, the target features of the keywords, and the user identification features, the method further comprises:

7. The resource searching method according to claim 6, wherein the obtaining of the similarity between the keyword and the history word comprises:

fusing the multi-type features of the keywords to obtain third fused features;

8. A resource search apparatus, comprising:

9. An electronic device, comprising:

one or more processors;

one or more memories for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to execute the instructions to implement the resource search method of any of claims 1 to 7.

10. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the resource search method of any one of claims 1 to 7.