CN113378064A

CN113378064A - Method for determining content similarity and content recommendation method based on similarity

Info

Publication number: CN113378064A
Application number: CN202110779922.7A
Authority: CN
Inventors: 黄彦华; 王维堃; 张雷; 徐瑞文
Original assignee: Xiaohongshu Technology Co ltd
Current assignee: Xiaohongshu Technology Co ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-09-10

Abstract

The application relates to the technical field of computers, and discloses a method for determining content similarity and a recommendation method based on the content similarity, wherein the method for determining the content similarity comprises the following steps: determining first content and second content; extracting a first standard feature set from the first content and a second standard feature set from the second content; determining tags associated with the first content and the second content; performing MLP processing on the first standard feature group and the second standard feature group to obtain a first vector and a second vector; and calculating the similarity of the first content and the second content according to the first vector, the second vector and the label. The method and the device solve the problem of long tails, achieve the technical effects of establishing a similarity relation between the notes with long tails and the hot notes and helping the embedded model generalize from the notes even with long tails to deduce the similarity.

Description

Method for determining content similarity and content recommendation method based on similarity

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for determining content similarity, and a method, an apparatus, a device, and a medium for recommending content based on similarity.

Background

With the advent of the big data age, content recommendation has become the best solution to the problem of screening massive internet information.

Content recommendation in the conventional art generally relies on user interaction, or exposure of content for analysis processing. In both recalling and sorting processes, the traditional recommendation algorithm usually recommends popular content items or content items meeting the user's preference, and this recommendation method has less consideration on the user's needs, so that it is difficult for the user who has a need for content items at the long tail to find the content of interest, and the recommendation effect is poor.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a medium for determining content similarity.

In a first aspect, an embodiment of the present application provides a method for determining content similarity, where the method is used for an electronic device, and the method includes:

determining first content and second content;

extracting a first standard feature group from the first content and a second standard feature group from the second content, wherein the first/second standard feature group is a vector group for describing at least part of image information and/or text information included in the first/second content;

determining a tag associated with the first content and the second content, wherein the tag is an interaction association degree of the first content and the second content determined based on an interaction of users of the first content and the second content;

performing MLP processing on the first standard feature group and the second standard feature group to obtain a first vector and a second vector; and

and calculating the similarity of the first content and the second content according to the first vector, the second vector and the label.

In a possible implementation of the first aspect, when a number of the users greater than or equal to a preset number interact with the first content and interact with the second content, the tag value is 1, otherwise, the tag value is 0.

In a possible implementation of the first aspect, the MLP processing includes: and the first standard feature group and the second standard feature group are subjected to nonlinear transformation and compression processing through MLP respectively to obtain the first vector and the second vector.

In a possible implementation of the first aspect, calculating a similarity between the first content and the second content includes: and applying the numerical value of the label to a calculation formula for calculating the similarity probability between the first vector and the second vector to obtain the similarity of the first content and the second content.

In a possible implementation of the first aspect, a similarity probability between the first vector and the second vector is shown as follows: p (label | v _ A, v _ B) [ \\ sigma (8 \ function (v _ A, v _ B) ] < Lamebel > [1- \ sigma (8 \ _ A, v _ B) ] < Lamebel ] < 1-label ], wherein p represents a similarity probability between the first vector and the second vector, label represents the label, v _ A represents the first vector, v _ B represents the second vector, v _ sigma represents a sigmoid function, and function (v _ A, v _ B) represents a function of the similarity between the first vector and the second vector.

In one possible implementation of the first aspect, the function may be one of a cosine value between the first vector and the second vector, an inner product between the first vector and the second vector, and a norm of a difference value of the first vector and the second vector.

In a possible implementation of the first aspect, the interaction means that the user performs a preset interaction behavior for both the first content and the second content, where the preset interaction behavior at least includes one of the following behaviors: clicking, browsing, praise, collecting, commenting, sending a bullet screen, sharing, paying attention to an author, and entering an author personal page when the browsing time exceeds a certain threshold.

In one possible implementation of the first aspect, extracting a first standard feature group from the first content and extracting a second standard feature group from the second content includes: extracting a first initial feature group and a second initial feature group from the first content and the second content respectively, wherein the first initial feature group comprises a first initial picture of the first content and a first initial character of the first content; the second initial feature group comprises a second initial picture of the second content and a second initial character of the second content; and performing feature extraction preprocessing on the first initial feature group and the second initial feature group to obtain a first standard feature group and a second standard feature group.

In a possible implementation of the first aspect, the preprocessing includes: inputting the first initial picture and the first initial character into an inclusion-V3 model and a BERT model respectively for processing to obtain a first standard feature group; and inputting the second initial picture and the second initial text into the inclusion-V3 model and the BERT model respectively for processing to obtain the second standard feature group.

In a possible implementation of the first aspect, at least one of the following is extracted from the first content and the second content as the first initial picture or the second initial picture, respectively: still pictures, a frame of video, and a frame of a motion picture.

In a possible implementation of the first aspect, at least one of the following is extracted from the first content and the second content as the first initial text and the second initial text, respectively: titles, texts, comments, pictures or video labels in the first content and the second content, and barrages in videos and/or motion pictures.

In one possible implementation of the first aspect, the first vector and the second vector have the same dimension and are 64-dimensional.

In a possible implementation of the first aspect, the method further includes: initializing a weight parameter of the MLP with a random number.

In a second aspect, an embodiment of the present application provides a content recommendation method based on similarity, which is used for an electronic device, and the method includes:

determining first content;

according to any one of the possible methods of the first aspect above, a similarity of the first content and a second content from a set of contents to be recommended is determined, an

And determining that the similarity of the first content and the second content is greater than or equal to a preset threshold value, and determining the second content as a recommendation object.

In a third aspect, an embodiment of the present application provides an apparatus for determining content similarity, including:

a determination module that determines a first content and a second content;

the extraction module is used for extracting a first standard feature group from the first content and extracting a second standard feature group from the second content, wherein the first/second standard feature group is a vector group used for describing at least part of image information and/or text information included in the first/second content;

an analysis module that determines tags associated with the first content and the second content, wherein the tags are degrees of interactive relevance of the first content and the second content determined based on interactions of users of the first content and the second content;

and the calculating module is used for calculating the similarity of the first content and the second content according to the first vector, the second vector and the label.

In a possible implementation of the third aspect, when a number greater than or equal to a preset number of the users interact with the first content and interact with the second content, the tag value is 1, otherwise, the tag value is 0.

In a possible implementation of the third aspect, the MLP processing includes: and the first standard feature group and the second standard feature group are subjected to nonlinear transformation and compression processing through MLP respectively to obtain the first vector and the second vector.

In a possible implementation of the third aspect, calculating a similarity between the first content and the second content includes: and applying the numerical value of the label to a calculation formula for calculating the similarity probability between the first vector and the second vector to obtain the similarity of the first content and the second content.

In a possible implementation of the third aspect, a similarity probability between the first vector and the second vector is shown as follows: p (label | v _ A, v _ B) [ \\ sigma (8 \ function (v _ A, v _ B) ] < Lamebel > [1- \ sigma (8 \ _ A, v _ B) ] < Lamebel ] < 1-label ], wherein p represents a similarity probability between the first vector and the second vector, label represents the label, v _ A represents the first vector, v _ B represents the second vector, v _ sigma represents a sigmoid function, and function (v _ A, v _ B) represents a function of the similarity between the first vector and the second vector.

In a possible implementation of the third aspect, the function may be one of a cosine value between the first vector and the second vector, an inner product between the first vector and the second vector, and a norm of a difference value of the first vector and the second vector.

In a possible implementation of the third aspect, the interaction means that the user implements a preset interaction behavior for both the first content and the second content, where the preset interaction behavior at least includes one of the following behaviors: clicking, browsing, praise, collecting, commenting, sending a bullet screen, sharing, paying attention to an author, and entering an author personal page when the browsing time exceeds a certain threshold.

In a possible implementation of the third aspect, the extracting a first standard feature set from the first content and a second standard feature set from the second content includes: extracting a first initial feature group and a second initial feature group from the first content and the second content respectively, wherein the first initial feature group comprises a first initial picture of the first content and a first initial character of the first content; the second initial feature group comprises a second initial picture of the second content and a second initial character of the second content; and performing feature extraction preprocessing on the first initial feature group and the second initial feature group to obtain a first standard feature group and a second standard feature group.

In a possible implementation of the third aspect, the preprocessing includes: inputting the first initial picture and the first initial character into an inclusion-V3 model and a BERT model respectively for processing to obtain a first standard feature group; and inputting the second initial picture and the second initial text into the inclusion-V3 model and the BERT model respectively for processing to obtain the second standard feature group.

In a possible implementation of the third aspect, at least one of the following is extracted from the first content and the second content as the first initial picture or the second initial picture, respectively: still pictures, a frame of video, and a frame of a motion picture.

In a possible implementation of the third aspect, at least one of the following is extracted as the first initial text and the second initial text from the first content and the second content, respectively: titles, texts, comments, pictures or video labels in the first content and the second content, and barrages in videos and/or motion pictures.

In one possible implementation of the third aspect, the first vector and the second vector have the same dimension and are 64-dimensional.

In a possible implementation of the third aspect, the method further includes: initializing a weight parameter of the MLP with a random number.

In a fourth aspect, an embodiment of the present application provides a device for determining content similarity, where the device for determining content similarity includes:

a memory for storing instructions for execution by one or more processors of the system, an

The processor, being one of the processors of the system, is configured to execute the instructions to implement any one of the possible methods of the first aspect described above.

In a fifth aspect, the present application provides a computer-readable medium, on which instructions are stored, and when executed on a computer, the instructions may cause the computer to perform any one of the possible methods of the first aspect.

In a sixth aspect, an embodiment of the present application provides a content recommendation device based on similarity, including:

an acquisition module that determines first content;

a processing module for determining a similarity between the first content and a second content from a set of contents to be recommended, according to any one of the possible methods of the first aspect, an

And the recommending module is used for determining that the similarity of the first content and the second content is greater than or equal to a preset threshold value and determining the second content as a recommending object.

In a seventh aspect, an embodiment of the present application provides a device for determining content similarity, where the device for determining content similarity includes:

The processor, being one of the processors of the system, is configured to execute the instructions to implement any one of the possible methods of the second aspect described above.

In an eighth aspect, the present application provides a computer-readable medium, on which instructions are stored, and when executed on a computer, the instructions may cause the computer to perform any one of the possible methods of the second aspect.

Compared with the prior art, the application has the following effects:

in the prior art, the demand of the user is less considered, so that the user who has a demand on the content item at the long tail can hardly find the content which is interested by the user, and the recommendation effect is poor. The method for determining the content similarity disclosed by the application combines the CF method and the CB method, and forces the model to generalize from the content to learn the view of the content similarity implicitly expressed by a user in the content agreement, and is different from the CB2CF method in the prior art in that the content is not mapped into a CF vector, but the model is forced to learn from the content to infer an enhanced similarity signal generated by the CF method, so that the long-tail problem is solved, the technical effect of establishing a similarity relation between a note with a long tail and a hot note and helping the embedded model to generalize from the note even with the long tail to infer the similarity is achieved.

Drawings

FIG. 1 illustrates a long tail effect diagram of a proposed method according to some embodiments of the present application;

FIG. 2 illustrates an application scenario diagram of a method of determining content similarity, according to some embodiments of the present application;

FIG. 3 illustrates a block diagram of a hardware architecture of a method of determining content similarity, according to some embodiments of the present application;

FIG. 4 illustrates an architectural diagram of a method of determining content similarity, according to some embodiments of the present application;

FIG. 5 illustrates a flow diagram of a method of determining content similarity, according to some embodiments of the present application;

FIG. 6a illustrates a target note schematic, according to some embodiments of the present application;

FIG. 6b illustrates a random note schematic determined using the CF method from the target note of FIG. 6a, according to some embodiments of the present application;

FIG. 6c shows a diagram of a random note determined using the NCB2CF method of the present application from the target note of FIG. 6a, according to some embodiments of the present application;

FIG. 7 illustrates a flow diagram of a method for similarity-based content recommendation, according to some embodiments of the present application;

FIG. 8 illustrates an exemplary architecture of an apparatus for determining content similarity, according to some embodiments of the present application;

fig. 9 illustrates a schematic diagram of a similarity-based content recommendation apparatus, according to some embodiments of the present application.

Detailed Description

The illustrative embodiments of the present application include, but are not limited to, a method, apparatus, device, and medium for determining similarity of contents and a method, apparatus, device, and medium for recommending contents based on the similarity.

It is to be appreciated that the methods of determining content similarity provided herein can be implemented on a variety of electronic devices including, but not limited to, a server, a distributed server cluster of multiple servers, a cell phone, a tablet, a laptop, a desktop, a wearable device, a head-mounted display, a mobile email device, a portable game console, a portable music player, a reader device, a personal digital assistant, a virtual reality or augmented reality device, a television or other electronic device having one or more processors embedded or coupled therein, and the like.

It is to be appreciated that in various embodiments of the present application, the processor may be a microprocessor, a digital signal processor, a microcontroller, or the like, and/or any combination thereof. According to another aspect, the processor may be a single-core processor, a multi-core processor, the like, and/or any combination thereof.

The inventive concepts of the embodiments of the present application are briefly described below.

Currently, in a recommendation system, content-to-content recommendation is a key component in which to present the most similar content to the user's favorite content. There is an implementation of content recommendation that employs a computation characterized as Collaborative Filtering (CF) based approach or Content Based (CB) approach to compute content similarity. Wherein, the CF is usually used for various personalized tasks, and the algorithm establishes a similar item list according to the implicit feedback of the user; CB usually calculates content similarity based on only content description, image, etc. information. However, the long tail effect occurs in the feeds of many recommendation systems. FIG. 1 illustrates a long tail effect diagram of a proposed method according to some embodiments of the present application. As shown in fig. 1, the long tail effect is manifested in that only a small amount of content gets most of the user's participation, while a large amount of content has no or little user interaction. The long tail effect makes the similarity measurement of CF less efficient because the user participates in sparsity of many contents, i.e., when there is no or little feedback on the contents (i.e., long tail or new contents) constituting the main part of the contents in the social media platform, the CF method is limited, and the CB method is employed. However, a purely content-based similarity measure also has the disadvantage that the CB method ignores the user's historical behavior and computes content similarity based only on content information (e.g., content description, images, etc.). Collecting a large amount of training data for content-based similarity models is very time consuming, and tends to bias the notion of content similarity for annotators, rather than the notion of similarity for end users. Therefore, many attempts have been made by scholars to address this problem with content-based features. For example, the model is refined using a multi-view neural attention mechanism. However, the attention model is not suitable for measuring content similarity for diverse recommendation tasks because it is a pairwise model and is computationally prohibitive, given the large number of pairwise contents to be evaluated.

In view of this, embodiments of the present application provide a method of determining content similarity, in which: first determining first content and second content; extracting a first/second standard feature group from first/second content respectively, wherein the first/second standard feature group is a vector group for describing at least partial image information and/or text information included in the first/second content; determining a label representing the interactive relevance between the user and the first/second content; next, MLP processing is performed on the first/second standard feature groups, and the similarity between the first content and the second content is calculated in combination with the tag.

The method of determining content similarity disclosed in this application, which combines both CF and CB methods, forces a model to generalize from the content to learn a view of the content similarity that a user implicitly expresses in a content contract, and differs from the CB2CF method in that the content is not mapped into a CF vector, but rather the model is forced to learn from the content to infer an enhanced similarity signal produced by the CF method, and the method of determining content similarity is referred to as NCB2 CF.

After the inventive concept of the embodiment of the present application is introduced, some simple descriptions are made below on application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In a specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

The technical scheme provided by the embodiment of the application is suitable for multimedia content recommendation scenes such as characters, pictures (including static pictures in formats such as jpeg, and dynamic pictures in formats such as gif), videos and the like, and is mainly exemplified by recommendation of relevance of notes. The note may be a note including at least one of text, picture, video and other multimedia contents on a certain platform, and is used for recording the mood or opinion issued by the user for a certain subject. FIG. 2 illustrates a scene diagram for determining content similarity, according to some embodiments of the present application. Specifically, the scenario includes a terminal 101, a server 102, and a network 103.

The terminal 101 may be a desktop terminal or a mobile terminal, and the mobile terminal may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and the like. The terminal 101 may be installed with an application, such as a browser or a client, that can browse notes. The application related to the embodiment of the application may be a software client, or a client such as a web page or an applet, and if the application is a client such as a web page or an applet, the background server is a background server corresponding to the software or the web page or the applet, and the specific type of the client is not limited. The user can log in the user on the application to browse the note, and can determine the multimedia content recommended to the user by using the method of the embodiment of the application while browsing the note, the multimedia content can be displayed on a note interface together, or the multimedia content can be displayed on a slide-down interface, and the form is not limited to this. Even if the user does not log in, the server corresponding to the client typically identifies the user, for example, through a terminal used by the user, and thus the identification can be understood as the user of the user.

The server 102 may be a background server corresponding to an application installed on the terminal 101, for example, an independent physical server or a server cluster or distributed system composed of a plurality of servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, but is not limited thereto.

The server 102 may include one or more processors 1021, memory 1022, and an I/O interface 1023 to interact with the terminal, among other things. In addition, the server 102 may further configure a database 1024, and the database 1024 may be used to store note records browsed by the user, and multidimensional data such as texts, pictures, annotations and the like of the notes. The memory 1022 of the server 102 may further store program instructions of the method for determining content similarity provided in the embodiment of the present application, and when executed by the processor 1021, the program instructions can be configured to implement the steps of the method for determining content similarity provided in the embodiment of the present application, to determine content recommended to the user, and further push the content to the target user, so as to push the content in the terminal 101.

The terminals 101 and the server 102 are connected via a network 103, and the network 103 includes one or more and may include various connection types, such as a wired, wireless communication link, cloud, or fiber optic cable, and the like, and the specific examples of the network described above may include the internet provided by the communication provider of the terminal 101.

First, the processor 1021 reads the currently browsed note of the user stored in the database 1024 corresponding to the terminal 101 through the I/O interface 1023 interacting with the terminal 101, and then the memory 1022 determines the second content by executing the stored program instruction of the method for determining content similarity, and pushes the second content to the terminal 101 through the I/O interface 1023 interacting with the terminal, and displays the second content to the user.

The following describes in detail a technical solution for determining content similarity applied to the scenario shown in fig. 2 according to some embodiments of the present application. The processor 1021 receives user information from the terminal 101 through the I/O interface 1023, and determines that the note currently browsed by the user is a target note (as an example of the first content). Any one of the randomly determined notes in the database 1024 is taken as a random note (as an example of the second content). Respectively extracting characters and pictures in the target note and the random note to form a target initial feature group (as an example of a first initial feature group) and a random initial feature group (as an example of a second initial feature group), wherein the characters are derived from a title, a text, a comment, a bullet screen and the like of the note, and the pictures are derived from a certain frame in a static picture and a video/moving picture. And then inputting characters and pictures in the notes into a BERT model and an increment-V3 model respectively, performing preprocessing of feature extraction to obtain a target feature group (as an example of a first standard feature group) and a random feature group (as an example of a second standard feature group), performing nonlinear transformation and compression processing on the target feature group and the random feature group through MLP to output a target vector (as an example of a first vector) and a random vector (as an example of a second vector), determining a label value according to the number of users interacted with both the target note and the random note, wherein if the users interacted with both the target note and the random note exceed a certain preset value, the label is 1, and otherwise, the label is 0. Then, based on the target vector, the random vector and the label, the similarity between the random note and the target note is calculated, and whether the random note is recommended to the terminal 101 or not is judged based on the similarity, and the random note is displayed to the user. According to the technical scheme, the problem of long tails is solved, the technical effects that a similarity relation is established between the note with the long tail and the hot note, and the embedded model is helped to generalize from the note with the long tail to infer the similarity are achieved.

Embodiments of the method provided in this application may be executed in the terminal 101, and fig. 3 shows a hardware structure block diagram of a method for determining content similarity according to some embodiments of the present application. As shown in fig. 3, terminal 101 may include one or more (only one shown) processors 1012 (processor 1012 may include, but is not limited to, a processing device such as a central processing unit CPU, an image processor GPU, a digital signal processor DSP, a microprocessor MCU, or a programmable logic device FPGA). The specific connection medium between the memory 1011 and the processor 1012 is not limited in the embodiments of the present application. In the embodiment of the present application, the memory 1011 and the processor 1012 are connected by the bus 1013 in fig. 3, the bus 1013 is shown by a thick line in fig. 3, and the connection manner between other components is only schematically illustrated and is not limited thereto. The bus 1013 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus. It will be understood by those skilled in the art that the structure shown in fig. 3 is only an illustration and is not intended to limit the structure of the electronic device. For example, terminal 101 may also include more or fewer components than shown in FIG. 3, or have a different configuration than shown in FIG. 3.

The processor 1012 executes various functional applications and data processing by executing software programs and modules stored in the memory 1011, that is, implements the above-described method of determining content similarity.

Memory 1011 may be used to store program instructions/modules that processor 1012 executes corresponding methods for determining content similarity as in some embodiments of the present application. The memory 1011 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1011 may further include memory located remotely from the processor 1012, which may be connected to the terminal 101 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Fig. 4 shows an architecture diagram for determining content similarity. As shown in fig. 4, firstly, in a preprocessing stage, feature extraction is performed on pictures and text information extracted from a target note and a random note respectively through a pre-trained BERT model and an inclusion-V3 model to obtain a target feature group and a random feature group, wherein the target feature group includes features extracted from multimedia information such as texts and pictures in the target note and used for representing the target note; the random feature group comprises features extracted from multimedia information such as characters and pictures in the random note and used for representing the random note. Then, in the stage of initializing the model, an MLP model is initialized randomly, and MLP processing of shared parameters is carried out on the target feature group and the random feature group respectively to obtain a target vector and a random vector. The MLP processing is specifically non-linear transformation and compression processing, wherein the weight parameters of the MLP model are initialized by random numbers D, and the MLP parameters of the processing target feature group and the random feature group are the same. And finally, in a gradient descent updating parameter stage, calculating a cross entropy function by combining the target vector and the random vector with the numerical value of the label, minimizing the cross entropy function by using a gradient descent method, and further updating the parameter value of the MLP model, wherein when the cross entropy function is minimized, the higher the similarity between the target note and the random note is, the more the two notes are similar, and thus the random note with the similarity meeting the requirement with the target note is obtained.

By the method for determining the content similarity, the problem of long tails is solved, the technical effects that a similarity relation is established between the notes with long tails and the popular notes, and the embedded model is helped to generalize from the notes even with long tails to infer the similarity are achieved.

FIG. 5 illustrates a flow diagram of a method of determining content similarity, according to some embodiments of the present application. As shown in fig. 5, in some embodiments, the method may include:

step 501: a target note and a random note are determined.

Specifically, in step 501, in some embodiments, the note being browsed by the user, the note historically browsed by the user, and the note of interest may all be determined as the target note, specifically, fig. 6a illustrates a target note schematic according to some embodiments of the present application, and referring to fig. 6a, the user is browsing a note related to calligraphy, and takes the calligraphy note as the target note. FIG. 6b illustrates a schematic diagram of a random note determined by the CF method according to the target note of FIG. 6a according to some embodiments of the present application, and reference is made to the upper right diagram of FIG. 6b, which is used to illustrate an example of the random note.

Step 502: and respectively extracting a target initial feature group and a random initial feature group comprising characters and pictures from the target note and the random note.

Specifically, in step 502, as shown in fig. 6a, text information such as "open-cut", "civilian", "regular script", and the like in the target note and a picture of the note cover are extracted to form a target initial feature set; as shown in fig. 6b, text information such as "regular script", "flying warp", etc. in the random note and the picture of the note cover are extracted to form a random initial feature group. In some embodiments, the textual information may be extracted from user-editable textual portions of the note, such as at least one of a title, a body, a comment, a picture or annotation of the video, a video, and/or a bullet in a motion picture. In some embodiments, the picture information may be extracted from at least one of a still picture, a video, and/or a frame of a motion picture in the note. In some embodiments, the picture is a three-dimensional array composed of RGB pixels; in some embodiments, the text is a one-dimensional array composed of characters.

Step 503: and respectively inputting the target initial feature group and the random initial feature group into a BERT model and an inclusion-V3 model for feature extraction to obtain a target feature group and a random feature group.

Specifically, in step 503, in some embodiments, the pictures in the target initial feature group are input into the inclusion-V3 model, so as to obtain a one-dimensional array vector V _ img _ a; inputting characters in the target initial feature group into a BERT model to obtain a one-dimensional array vector v _ txt _ A, and forming a target feature group by the one-dimensional array vector v _ img _ A and the one-dimensional array vector v _ txt _ A. And performing the same feature extraction operation on the random initial features based on the twin network to obtain a random feature group consisting of a one-dimensional array vector v _ img _ B and a one-dimensional array vector v _ img _ B. The twin network is able to naturally learn note embedding from note pairs consisting of target notes and random notes without having to worry about deciding the order of the notes during training or selecting branches of the model at the time of reasoning. Given the target note, a text representation and an image representation are created for each note using the pre-trained model, respectively, with the note description and the overlay image as inputs. For text representation, BERT is used as a basic model in the application, standard procedures are followed, and a hidden state corresponding to the first token is used as embedding. For the image representation, the image representation was obtained using the inclusion-V3 model. Then we merge their representations by cascading their corresponding representations, and then merge the fully connected output layers into a target/random feature set. In some embodiments, the BERT model and the inclusion-V3 model are pre-trained and are merely exemplary neural network models for text and graphics processing, and the selection of the model may be based on actual training data. In some embodiments, the epi-endo and photo are not limited to being pre-trained using the BERT model and the inclusion-V3 model, and any model may be used.

Only item content such as text representations, image representations, and the like are used as features in the embedding model of the present application. Without taking the contents agreed by the user as the basis of the feature extraction. By doing so, even notes with no or little user involvement can still be reliably computed by the model by having enough content features to obtain their embedded feature set, forcing the inference of similarity purely from note content. The technical effect that the long tail effect is avoided from the initial characteristic extraction stage of the model is achieved.

Step 504: tags associated with the target note and the random note are determined.

Specifically, in step 504, the label value is determined according to the number of users who have interacted with both the target note and the random note, and if the number of users who have interacted with both the target note and the random note exceeds a certain preset value N, the label is equal to 1, otherwise, the label is equal to 0. The preset value N is a super parameter and can be set by self. For example, the preset value N is set to 5, that is, when there are more than or equal to 5 users who have interacted the target note and also interacted the random note, it indicates that the degree of interaction correlation of the users is high, and the value of the corresponding label is 1, and when less than 5 users have interacted the target note and also interacted the random note, it indicates that the degree of interaction correlation of the users is zero, and the value of the corresponding label is 0. In some embodiments, the user interaction includes at least one of the following actions: clicking, browsing, praise, collecting, commenting, launching a bullet screen, sharing, paying attention to an author, allowing browsing duration to exceed a certain threshold, entering an author personal page and the like. The above interaction behaviors are merely illustrative, and any behavior that satisfies a user's association with a note may be considered an interaction. In some embodiments, finding the label equal to 1 and then finding the label equal to 0 makes the actual operation more efficient. In some embodiments, in order to more accurately improve the user interaction relevance, the interaction of the user at a certain time is considered, for example, only the interaction situation of the user with the target note and the random note within the latest week is considered.

Step 503 and step 504 are two relatively independent steps, and the sequence of steps in the embodiment of the present application is only an example and is not a limitation on the operation sequence. The two can be implemented in reverse order or simultaneously.

Step 505: and performing MLP processing on the target feature group and the random feature group to output a target vector and a random vector.

Specifically, in step 505, the MLP model is used to perform nonlinear transformation and compression processing on the target feature group and the random feature group, respectively, so as to obtain a target vector and a random vector. In some embodiments, the MLP model is shared by the target note and the random note, the weighting parameters of the MLP model are initialized with a hyper-parameter D, and the MLP hyper-parameter D for processing the target feature group and the random feature group is the same, wherein the hyper-parameter D is customizable. For example, if the super parameter D is 64, the target vector and the random vector are 64-dimensional vectors, and if the super parameter D is 64, the compression rate is not too low and the occupied space is small.

Step 506: and calculating the similarity of the target note and the random note based on the target vector, the random vector and the label.

Specifically, in step 506, in some embodiments, based on the target vector, the random vector, and the label, the formula for calculating the similarity between the target note and the random note is as follows:

p(label|v_A,v_B)＝[\sigma(8*function(v_A,v_B)]^label*[1-\sigma(8*function(v_A, v_B)]^[1-label]；(1)

wherein p represents the probability of similarity between the target vector and the random vector; label represents a label number of 0 or 1; v _ A represents a target vector; v _ B represents a random vector; \\ sigma denotes the sigmoid function; function (v _ a, v _ B) represents a function of the similarity between the target vector and the random vector.

The losses we wish to minimize are: - (label log (\ sigma (8 \ _ a, v _ B)) + (1-label) _ log (1- \\ sigma (8 \ _ sigma (v _ a, v _ B))), this loss is minimized using a gradient descent method, so that the super parameter d used to update the MLP is greater the function value is greater, while sigma is a positive proportional function, corresponding to equation (1), the cross entropy loss function is-log p (label | v _ a, v _ B) we maximize log p, which is equivalent to maximizing p, if label is 1, the function may be one of the number of cosine values, inner products, v _ a-v _ B between the first vector and the second vector.

According to the technical scheme, random notes are retrieved by a CF method for any target note with user interaction, and if the random notes obtain enough interaction in a final recommendation result, the random notes have high confidence level, namely the target note and the random notes are very similar from the viewpoint of a user. The CF ensures that the basic similarity perceived by the user is between the target note and the random note according to the interaction behavior of the user. If a random note is recommended in the recommendation system and interacts with multiple users multiple times, the random note may be more similar to a popular note for the multiple users. In this case, the target note is not greatly affected by the long tail effect, because even a note with little interaction with the user can be taken as a hot note. However, random notes are more likely to be popular notes because our solution requires at least one interaction with the user, which is why we do not need any user to participate in random notes because it further increases the likelihood of integration with the pen as a target note. The technical scheme solves the problem of long tails, achieves the technical effects of establishing a similarity relation between the notes with long tails and hot notes and helping an embedded model generalize from the notes even with long tails to infer similarity.

In order to verify the above technical effects, we also performed a lot of experimental verification, and the experimental effects are specifically described below with reference to fig. 6a to 6 c. In the NCB2CF embedding of the present application, we have two main categories: text and images. For text, we have trained in advance a BERT base model with about 4 million notes published on a social application. The sequence length is set to 128. For images, we used the inclusion-V3 model pre-trained on some social application internal image classification dataset. To ensure an accurate positive label (i.e., the case of label ═ 1), we have collected user feedback over two months on notes retrieved by the classical note-to-note CF technique, i.e., we only consider notes that have been interacted with by a certain number of users as positive notes. We randomly drawn 400,000 pairs from this data set as the final training positive set. To avoid the prejudice of most popular notes, we ensure that each note appears only once in this set. We randomly select notes from the impression to form a negative set of labels that are the same size as the positive labels. The embedding dimension is set to d 64 for efficient reasoning. Furthermore, we can pre-compute the embedding of all available notes for the online service.

TABLE 1

Also, we have conducted offline experiments to better understand the difference between note similarity calculations using CF techniques and NCB2CF note embedding proposed for notes at the long tail.

We limit attention to long-tailed notes with only a small number of interactions. We randomly drawn 100,000 target notes that were all less than 3 in engagement within a month and obtained the top 10 most similar notes with CF scores. For the NCB2CF method, note insertions for the same note set are computed and a maximum of 10 similar notes per target note are retrieved in the note insertion space. To compare the similarity of these retrieved notes, we labeled each note with a classification label, such as sports, fashion, food, etc., and used our internal classification system to represent the category of the note. Here we only use the taxonomy as a coarse-grained approximation to compare the similarity of notes. The taxonomy distribution of some popular taxonomies in the recommended target note is shown in table 1. The notes in these four taxonomies are usually distinct from each other. It is clear that the method of N CB2CF proposed in this application is more effective in measuring similar notes for long-tailed notes. Further, fig. 6b and 6c show example diagrams of the most similar notes generated using CF and NCB2CF, giving a low engagement note describing chinese calligraphy. Obviously, the NCB2CF method proposed in this application can retrieve and recommend more similar notes than CF recommending a category of entertainment and drawing related notes in this case.

According to some embodiments of the present application, a similarity-based content recommendation method 700 is provided, and fig. 7 shows a flow chart of a similarity-based content recommendation method according to some embodiments of the present application. As shown in fig. 7, the method is as follows:

step 701: and determining the target note.

Specifically, in step 701, the note being browsed by the user, the note historically browsed by the user, and the note of interest may all be determined as the target note, specifically, fig. 6a illustrates a target note schematic diagram according to some embodiments of the present application, and referring to fig. 6a, the user is browsing a handwriting-related note, and takes the handwriting note as the target note.

Step 702: and determining the similarity between the random note and the target note according to the method for determining the content similarity.

Specifically, in step 702, the similarity between the target note and the random note is determined according to any one of the methods for determining content similarity of the first embodiment. And traversing all the notes or neighbor searching methods are adopted in the notes in the preset range of the database, and the similarity between the notes and the target note is respectively determined.

Step 703: and determining one or more random notes with the similarity greater than or equal to a preset threshold value, and determining the one or more random notes as recommended objects.

Specifically, in step 703, in some embodiments, after obtaining the random note with the similarity p to the target note, a threshold may be set, so that when the similarity p between the random note and the target note is greater than the threshold, the random note is placed in a note set to be recommended and determined as a recommended object; otherwise, the random note is discarded.

According to some embodiments of the present application, there is provided a device 800 for determining content similarity, and fig. 8 is a schematic structural diagram of a device for determining content similarity according to some embodiments of the present application. As shown in fig. 8, the apparatus 800 for determining content similarity is as follows:

a determination module 801 determines a target note and a random note.

An extraction module 802, which extracts a target feature group from the target note and a random feature group from the random note, wherein the target/random feature group is a vector group for describing at least part of image information and/or text information included in the target/random note;

the analysis module 803 determines tags associated with the target note and the random note, wherein the tags are interaction association degrees of the target note and the random note determined based on interaction of users of the target note and the random note; and performing MLP (multi-level linear programming) processing on the target feature group and the random feature group to obtain a target vector and a random vector.

The calculating module 804 calculates the similarity between the target note and the random note according to the target vector, the random vector and the label.

The first embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

A fourth embodiment of the present application relates to a content similarity determination apparatus including:

The processor, being one of the processors of the system, is configured to execute the instructions to implement any one of the possible methods of the first embodiment described above.

A fifth embodiment of the present application relates to a computer storage medium encoded with a computer program, the computer readable medium having stored thereon instructions that, when executed on a computer, can cause the computer to perform any one of the possible methods of the first embodiment described above.

According to some embodiments of the present application, a content recommendation device 900 based on similarity is provided, and fig. 9 is a schematic structural diagram of a content recommendation device based on similarity according to some embodiments of the present application. As shown in fig. 9, the apparatus 900 is as follows:

an acquisition module 901 for determining a target note;

a processing module 902, configured to determine similarity between a target note and a random note according to any one of the methods for determining content similarity in the first embodiment, where the random note is from a set of contents to be recommended;

and the recommending module 903 is used for determining that the similarity is greater than or equal to a preset threshold value and determining the random note as a recommending object.

The second embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the second embodiment. The related technical details mentioned in the second embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the second embodiment.

A seventh embodiment of the present application relates to a content similarity determination apparatus including:

The processor, being one of the processors of the system, is configured to execute the instructions to implement any one of the possible methods of the second embodiment described above.

An eighth embodiment of the present application relates to a computer storage medium encoded with a computer program, the computer readable medium having stored thereon instructions that, when executed on a computer, can cause the computer to perform any one of the possible methods of the second embodiment described above.

It should be noted that the method embodiments of the present application can be implemented in software, hardware, firmware, and the like. Whether implemented in software, hardware, or firmware, the instruction code may be stored in any type of computer-accessible memory (e.g., permanent or modifiable, volatile or non-volatile, solid or non-solid, fixed or removable media, etc.). Also, the Memory may be, for example, Programmable Array Logic (PAL), Random Access Memory (RAM), Programmable Read Only Memory (PROM), Read-Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic disk, an optical disk, a Digital Versatile Disk (DVD), or the like.

It should be noted that, all units/modules mentioned in the embodiments of the apparatuses in this application are logic units/modules, and physically, a logic unit may be a physical unit, or a part of a physical unit, or may be implemented by a combination of multiple physical units, where the physical implementation manner of the logic unit itself is not the most important, and the combination of the functions implemented by the logic units is the key to solve the technical problem provided by this application. In addition, in order to highlight the innovative part of the present application, the above-mentioned embodiments of the apparatus of the present application do not introduce elements that are not so closely related to solve the technical problems proposed by the present application, which does not indicate that there are no other elements in the above-mentioned embodiments of the apparatus.

It is to be noted that in the claims and the description of the present patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A method for determining content similarity for an electronic device, the method comprising:

determining first content and second content;

2. The method of claim 1, wherein the tag value is 1 when more than or equal to a predetermined number of the users interact with the first content and the second content, and wherein the tag value is 0 otherwise.

3. The method of claim 2, wherein the MLP process comprises:

and the first standard feature group and the second standard feature group are subjected to nonlinear transformation and compression processing through MLP respectively to obtain the first vector and the second vector.

4. The method of claim 3, wherein calculating the similarity between the first content and the second content comprises:

and applying the numerical value of the label to a calculation formula for calculating the similarity probability between the first vector and the second vector to obtain the similarity of the first content and the second content.

5. The method of claim 4, wherein the probability of similarity between the first vector and the second vector is given by:

p(label|v_A,v_B)＝[\sigma(8*function(v_A,v_B)]^label*[1-\sigma(8*function(v_A,v_B)]^[1-label]；

wherein p represents a similarity probability between the first vector and the second vector; label represents the label; v _ A represents the first vector; v _ B represents the second vector; \\ sigma denotes the sigmoid function; function (v _ A, v _ B) represents a function of the similarity between the first vector and the second vector.

6. The method of claim 5, wherein the function is one of a cosine value between the first vector and the second vector, an inner product between the first vector and the second vector, and a norm of a difference between the first vector and the second vector.

7. The method according to claim 1, wherein the interaction is that the user implements a preset interaction behavior for both the first content and the second content, and the preset interaction behavior includes at least one of the following behaviors:

clicking, browsing, praise, collecting, commenting, sending a bullet screen, sharing, paying attention to an author, and entering an author personal page when the browsing time exceeds a certain threshold.

8. The method of claim 1, wherein extracting a first set of standard features from the first content and a second set of standard features from the second content comprises:

extracting a first initial feature group and a second initial feature group from the first content and the second content respectively, wherein the first initial feature group comprises a first initial picture of the first content and a first initial character of the first content; the second initial feature group comprises a second initial picture of the second content and a second initial character of the second content;

and performing feature extraction preprocessing on the first initial feature group and the second initial feature group to obtain a first standard feature group and a second standard feature group.

9. The method of claim 8, wherein the pre-processing comprises:

inputting the first initial picture and the first initial character into an inclusion-V3 model and a BERT model respectively for processing to obtain a first standard feature group;

and inputting the second initial picture and the second initial text into the inclusion-V3 model and the BERT model respectively for processing to obtain the second standard feature group.

10. The method according to claim 9, characterized in that at least one of the following is extracted from the first content and the second content, respectively, as the first initial picture or the second initial picture: still pictures, a frame of video, and a frame of a motion picture.

11. The method of claim 9, wherein at least one of the following is extracted from the first content and the second content as the first initial text and the second initial text, respectively:

titles, texts, comments, pictures or video labels in the first content and the second content, and barrages in videos and/or motion pictures.

12. The method of claim 1, wherein the first vector and the second vector are the same dimension and are 64-dimensional.

13. The method of claim 1, further comprising:

initializing a weight parameter of the MLP with a random number.

14. A content recommendation method based on similarity is used for electronic equipment, and is characterized in that the method comprises the following steps: determining first content;

the method according to claims 1-13, determining a similarity of the first content and a second content, wherein the second content is from a set of contents to be recommended, an

15. An apparatus for determining content similarity, the apparatus comprising:

a determination module that determines a first content and a second content;

performing MLP processing on the first standard feature group and the second standard feature group to obtain a first vector and a second vector;

16. An apparatus for determining content similarity, comprising:

A processor, being one of the processors of the system, for executing the instructions to implement the method of determining content similarity as claimed in any one of claims 1-13.

17. A computer-readable storage medium encoded with a computer program, having instructions stored thereon, which, when executed on a computer, cause the computer to perform the method of determining content similarity as claimed in any one of claims 1-13.

18. A similarity-based content recommendation apparatus, the apparatus comprising:

an acquisition module that determines first content;

a processing module for determining a similarity of the first content and a second content according to the method of claims 1-13, wherein the second content is from a set of contents to be recommended;

19. A similarity-based content recommendation apparatus, comprising:

A processor, being one of the processors of the system, for executing the instructions to implement the similarity-based content recommendation method of claim 14.

20. A computer-readable storage medium encoded with a computer program, having instructions stored thereon, which, when executed on a computer, cause the computer to perform the similarity-based content recommendation method of any one of claims 14.