CN115438201A - Data recommendation method and device, storage medium and electronic equipment - Google Patents

Data recommendation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115438201A
CN115438201A CN202211086246.6A CN202211086246A CN115438201A CN 115438201 A CN115438201 A CN 115438201A CN 202211086246 A CN202211086246 A CN 202211086246A CN 115438201 A CN115438201 A CN 115438201A
Authority
CN
China
Prior art keywords
multimedia
multimedia object
time sequence
data
weight parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211086246.6A
Other languages
Chinese (zh)
Inventor
赵鑫萍
章莺
吴敏
肖强
李勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202211086246.6A priority Critical patent/CN115438201A/en
Publication of CN115438201A publication Critical patent/CN115438201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles

Abstract

The embodiment of the invention relates to the technical field of computers, in particular to a data recommendation method and device, a storage medium and an electronic device. The method comprises the following steps: acquiring a multimedia object set of a target user in a preset statistical period, and performing interactive behavior statistics on multimedia objects in the multimedia object set to determine time sequence interactive data based on time sequence analysis; calculating a first weight parameter between any two multimedia objects in the multimedia object set by using the time sequence interactive data; calculating a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data; and performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an incidence matrix for performing data recommendation by using the incidence matrix. The method can improve the accuracy of the recommended data.

Description

Data recommendation method and device, storage medium and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data recommendation method and device, a storage medium and an electronic device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims and the description herein is not admitted to be prior art by inclusion in this section.
Data recommendation is an important function in the multimedia field, for example, in an audio application program and a video application program, contents such as music and videos which may be interested by a user can be recommended to the user according to historical playing data of the user; alternatively, multimedia data such as news may be recommended to the user.
Disclosure of Invention
However, in some technologies, the related recommendation algorithms also have certain disadvantages, for example, some algorithms need online real-time reasoning, have high time delay, and cannot meet the requirement of rapidly screening data; in some algorithms, because the algorithm strategy is unreasonable, only the attribute or the tag data of the multimedia data is used in the calculation process, the association information of the recommended result is weak, and the recommended result does not accord with the preference of the user.
For this reason, an improved data recommendation method and apparatus, a storage medium, and an electronic device are needed to provide a data recommendation scheme capable of improving the accuracy of recommended data.
In this context, embodiments of the present invention are intended to provide a data processing method and apparatus, a storage medium, and an electronic device.
According to an aspect of the present disclosure, there is provided a data recommendation method including: acquiring a multimedia object set of a target user in a preset statistical period, and performing interactive behavior statistics on multimedia objects in the multimedia object set to determine time sequence interactive data based on time sequence analysis;
calculating a first weight parameter between any two multimedia objects in the multimedia object set by using the time sequence interactive data;
calculating a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data;
and performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an incidence matrix, so as to perform data recommendation by using the incidence matrix.
In an exemplary embodiment of the disclosure, the performing interactive behavior statistics on the multimedia objects in the multimedia object set to determine time-series interactive data based on time-series analysis includes:
counting the total play-out amount of the multimedia object in a preset counting period and the play-out amount distribution of the total play-out amount on a time axis in the preset counting period;
and constructing time sequence interactive data based on time sequence analysis based on the total broadcasting amount and the distribution of the broadcasting amount.
In an exemplary embodiment of the disclosure, the calculating a first weight parameter between any two multimedia objects in the set of multimedia objects by using the time-series interaction data includes:
calculating a first weight parameter between a first multimedia object and a second multimedia object by using the total playing amount of the first multimedia object and the second multimedia object in a preset statistical period; the first multimedia object and the second multimedia object are any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the disclosure, the calculating a second weight parameter between any two multimedia objects in the set of multimedia objects according to the time sequence distribution information of the time sequence interactive data includes:
determining corresponding edge distribution according to the time sequence distribution of the broadcasting completion amount of the first multimedia object and the second multimedia object;
determining the joint distribution corresponding to the first multimedia object and the second multimedia object according to the edge distribution corresponding to the first multimedia object and the second multimedia object;
calculating the minimum moving matching distance of time sequence distribution between the first multimedia object and the second multimedia object based on the data time sequence distribution difference and joint distribution of the first multimedia object and the second multimedia object;
determining a second weight parameter between the first multimedia object and the second multimedia object in combination with the optimal joint distribution of the minimum movement matching distances, predefined movement costs between multimedia objects.
In an exemplary embodiment of the present disclosure, the correlation matrix is an i2i correlation matrix;
the performing aggregation operation based on the first weight parameter and the second weight parameter to obtain the incidence matrix includes:
performing aggregation operation by using the first weight parameter and the second weight parameter of any two multimedia objects in the multimedia object set and combining the play-completed user sets corresponding to the two multimedia objects to determine aggregation weights corresponding to the two multimedia objects;
and constructing an i2i association matrix according to the aggregation weights corresponding to any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the present disclosure, the target user belongs to an intersection user of the broadcasting-completed user sets corresponding to any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the present disclosure, the acquiring a set of multimedia objects of a target user in a preset statistical period includes: collecting the multimedia operation behavior historical data of the target user in the preset statistical period;
and screening the multimedia operation behavior historical data according to preset conditions to construct the multimedia object set of the target user in a preset statistical period according to a screening result.
In an exemplary embodiment of the present disclosure, the method further comprises: responding to a data recall request triggered by a currently played multimedia object, and inquiring the incidence matrix by using the currently played multimedia object to acquire a recalled multimedia object;
and pushing the recalled multimedia object.
According to an aspect of the present disclosure, there is provided a data recommendation apparatus including:
the time sequence interactive data statistics module is used for acquiring a multimedia object set of a target user in a preset statistics period and carrying out interactive behavior statistics on multimedia objects in the multimedia object set so as to determine time sequence interactive data based on time sequence analysis;
the first weight parameter calculation module is used for calculating a first weight parameter between any two multimedia objects in the multimedia object set by utilizing the time sequence interactive data;
the second weight parameter calculation module is used for calculating a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data;
and the database construction module is used for performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an incidence matrix, so as to perform data recommendation by using the incidence matrix.
In an exemplary embodiment of the present disclosure, the time-series interactive data statistics module is configured to count a total amount of the multimedia object played over a preset statistics period, and a distribution of the total amount of played over a time axis over the preset statistics period; and constructing time sequence interactive data based on time sequence analysis based on the total broadcasting amount and the distribution of the broadcasting amount.
In an exemplary embodiment of the present disclosure, the first weight parameter calculation module includes: calculating a first weight parameter between a first multimedia object and a second multimedia object by using the total playing amount of the first multimedia object and the second multimedia object in a preset statistical period; the first multimedia object and the second multimedia object are any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the present disclosure, the second weight parameter calculation module includes: determining corresponding edge distribution according to the time sequence distribution of the play completion amount of the first multimedia object and the second multimedia object; determining the joint distribution corresponding to the first multimedia object and the second multimedia object according to the edge distribution corresponding to the first multimedia object and the second multimedia object; calculating the minimum moving matching distance of time sequence distribution between the first multimedia object and the second multimedia object based on the data time sequence distribution difference and joint distribution of the first multimedia object and the second multimedia object; determining a second weight parameter between the first multimedia object and the second multimedia object by combining the optimal joint distribution of the minimum movement matching distance and the predefined movement cost between the multimedia objects.
In an exemplary embodiment of the present disclosure, the correlation matrix is an i2i correlation matrix; the database construction module comprises: performing aggregation operation by using the first weight parameter and the second weight parameter of any two multimedia objects in the multimedia object set and combining the play-completed user sets corresponding to the two multimedia objects to determine aggregation weights corresponding to the two multimedia objects; and constructing an i2i association matrix according to the aggregation weights corresponding to any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the present disclosure, the target user belongs to an intersection user of the play-done user sets corresponding to any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the disclosure, the time series interaction data statistics module is further configured to: collecting multimedia operation behavior historical data of a target user in the preset statistical period; and screening the multimedia operation behavior historical data according to preset conditions to construct the multimedia object set of the target user in a preset statistical period according to a screening result.
In an exemplary embodiment of the present disclosure, the apparatus further includes:
the data recommendation module is used for responding to a data recall request triggered by a currently played multimedia object, and inquiring the incidence matrix by using the currently played multimedia object so as to acquire a recalled multimedia object; and pushing the recalled multimedia object.
According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the above-described data recommendation method.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any one of the data recommendation methods described above via execution of the executable instructions.
According to the data recommendation method provided by the embodiment of the invention, the corresponding time sequence interactive data based on time sequence analysis is obtained by performing data statistics on the multimedia object set of the target user in the preset statistical period, so that a first weight parameter can be calculated by using the time sequence interactive data, a second weight parameter is calculated according to the time sequence distribution information of the time sequence interactive data, and an incidence matrix is determined by using the first weight parameter and the second weight parameter, so that data recommendation can be performed by using the incidence matrix. By introducing time dimension in the calculation process of the weight parameters and utilizing the characteristics of time sequence distribution, the similarity between multimedia objects can be more accurately acquired, so that the hit rate of recall can be improved when the incidence matrix is utilized for data recommendation, and online quick recall can be realized; effectively relieving the long tail effect.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 schematically shows a flow diagram of a data recommendation method according to an embodiment of the invention;
FIG. 2 schematically illustrates a user song play sequence;
FIG. 3 schematically illustrates a time sequence profile of two songs;
FIG. 4 schematically illustrates a diagram of the movement between two song time series distributions;
FIG. 5 schematically illustrates a flow chart of a method of calculating a second weight parameter according to an embodiment of the invention;
FIG. 6 schematically illustrates a flow diagram of a data recall method according to an embodiment of the present invention;
FIG. 7 schematically shows a block diagram of a data pushing apparatus according to an embodiment of the present invention;
FIG. 8 schematically shows a block diagram of an electronic device according to an embodiment of the invention; and
fig. 9 shows a schematic diagram of a storage medium according to an embodiment of the present invention.
In the drawings, like or corresponding reference characters designate like or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to several exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
The data related to the present disclosure may be data authorized by a user or fully authorized by each party, and the collection, transmission, use, and the like of the data all meet the requirements of relevant national laws and regulations, and the embodiments/examples of the present disclosure may be combined with each other.
According to an embodiment of the invention, a data recommendation method, a data recommendation device, a storage medium and an electronic device are provided.
In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Summary of The Invention
The inventor has found that in the prior art, when an application program makes data recommendation to a user, for example, when recommending songs, it is usually necessary to recall a candidate set that may be of interest to the user from a huge amount of candidate music using a simple and efficient algorithm or strategy. In some approaches of mainstream, the association relationship of i2i (item-item, item-to-item) is constructed by a collaborative filtering algorithm or a content-based algorithm; taking i2i as an example, similar articles of the articles are found and stored in an inverted index form, index service is queried according to the historical behavior of the user as a trigger, a similar article set is taken, and then a topK recall set of the user is obtained through aggregation. However, the method does not consider the time difference between different interactions, and the interactions with any time difference are treated identically; and the content-based recall only uses the attribute or tag information of the item, and does not mine the associated information from the very large-scale interaction information at all. In other mainstream schemes, a vector recall mode is adopted, supervised learning is carried out on a user and an article in a double-tower mode, embedding (characteristic embedding) of the user and the article is output, a user tower is dismantled to serve as an online service, the side part of the article runs out of the embedding of a candidate article set in an offline mode, a vector similarity retrieval index is constructed, features are input to the side part of the user during online recall to obtain the embedding of the user, and then a K neighbor index is inquired to obtain a topK recall set. However, the method needs online real-time reasoning, has high time delay and does not meet the target of quick screening in the recall stage.
In view of the above, the basic idea of the present invention is: according to the data recommendation method and the data recommendation device provided by the embodiment of the invention, the similarity among the multimedia objects can be more accurately acquired by introducing the time dimension in the calculation process of the weight parameters and utilizing the characteristics of time sequence distribution, so that the recall hit rate can be improved when the incidence matrix is utilized for data recommendation. The method solves the problems that songs associated with specific time are inundated by popular songs in the existing recall method and the association among the songs at the specific time is not accurately captured; and can realize on-line quick recall; effectively relieving the long tail effect.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Exemplary method
A data recommendation method according to an exemplary embodiment of the present invention is described below with reference to fig. 1.
Referring to fig. 1, the data recommendation method may include the steps of:
s1, acquiring a multimedia object set of a target user in a preset statistical period, and performing interactive behavior statistics on multimedia objects in the multimedia object set to determine time sequence interactive data based on time sequence analysis;
s2, calculating a first weight parameter between any two multimedia objects in the multimedia object set by using the time sequence interactive data;
s3, calculating a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data;
and S4, performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an incidence matrix for data recommendation by using the incidence matrix.
In the data recommendation method of the embodiment of the invention, the corresponding time sequence interactive data based on time sequence analysis is obtained by performing data statistics on the multimedia object set of the target user in the preset statistical period, so that a first weight parameter can be calculated by using the time sequence interactive data, a second weight parameter is calculated according to the time sequence distribution information of the time sequence interactive data, and an incidence matrix is determined by using the first weight parameter and the second weight parameter, so that data recommendation can be performed by using the incidence matrix. By introducing time dimension in the calculation process of the weight parameters and utilizing the characteristics of time sequence distribution, the similarity between multimedia objects can be more accurately acquired, so that the hit rate of recall can be improved when the incidence matrix is utilized for data recommendation, and online quick recall can be realized; effectively relieving the long tail effect.
In step S1, a multimedia object set of a target user in a preset statistical period is obtained, and interactive behavior statistics is performed on multimedia objects in the multimedia object set to determine time sequence interactive data based on time sequence analysis.
In an exemplary embodiment of the present disclosure, the multimedia object may be music, video, short video, audio program, or multimedia data of news type. In the following exemplary embodiment, the multimedia object is music, and music recall is performed as an example. The method described above may be performed on a terminal device that installs an application. Alternatively, the execution may be performed in cooperation with the server and the terminal device.
In an exemplary embodiment of the disclosure, the obtaining a multimedia object set of a target user in a preset statistical period may specifically include: collecting the multimedia operation behavior historical data of the target user in the preset statistical period; and screening the multimedia operation behavior historical data according to preset conditions to construct the multimedia object set of the target user in a preset statistical period according to a screening result.
Specifically, the multimedia operation behavior history data may be a play behavior of music by the user. For example, the music playing records of the target user within N days may be collected through log data or in a server-side database, and then the songs involved in the music playing records are filtered.
Specifically, the preset condition for screening may be whether the song is played completely, whether the song is collected, or the like. The songs screened by the conditions can be used as a multimedia object set of a target user in a preset statistical period. Of course, in other exemplary embodiments of the present disclosure, when recommending video data for a user, the video playing record of a target user within N days may be recorded as a multimedia object set of a preset statistical period.
In an exemplary embodiment of the present disclosure, the target user belongs to an intersection user of the broadcasting-completed user sets corresponding to any two multimedia objects in the multimedia object set.
In an exemplary embodiment of the disclosure, the performing interaction behavior statistics on the multimedia objects in the set of multimedia objects to determine time-series interaction data based on time-series analysis includes:
s11, counting the total broadcasting amount of the multimedia object in a preset counting period and the distribution of the total broadcasting amount on a time axis in the preset counting period;
and S12, constructing time sequence interactive data based on time sequence analysis based on the total broadcasting finished amount and the distribution of the broadcasting finished amount.
Specifically, for playing music as an example, the multimedia objects in the multimedia object set may be songs that are favorite by the user through a click collection or a click operation. The above-mentioned interactive behavior may be a song playing-out behavior of the user on the song, that is, a behavior of the user on clicking music from the beginning to the end of the playing-out. Specifically, the number of times of broadcasting of the already broadcasted red-heart songs in the statistical period may be counted, and the distribution of the amount of broadcasting of each red-heart song in the preset statistical period may be determined.
In the multimedia object set, through screening the songs which are played and collected in a red heart by the user, the interactive behaviors of noise types can be effectively removed. For example, a user randomly plays a large number of songs on the same day, and the implicit behavior is difficult to indicate that the songs are favorite for the user, so that the explicit user's favorite collection behavior is introduced for effective data screening. The total broadcasting completion amount of a certain user to a certain song in N days and the distribution of the broadcasting completion times on the time axis of the N days are calculated by grouping and counting the user and the corresponding Hongxing song at the same time. After the interactive behaviors are cleaned and counted, time sequence interactive data of technical time sequence analysis of the user songs can be obtained. For example, referring to the user song interaction information table shown in table 1, key in the time sequence distribution field indicates relative time, and value represents the number of times of broadcasting on the same day; for example, the user U-1' S Reddish Song S-1 has a total number of completed plays of 16 in a statistical period of N days, 8 on the first day of the statistical period, and 8 on the Nth day.
Figure BDA0003835238220000101
Alternatively, in some exemplary embodiments of the present disclosure, the above-mentioned interactive behavior for the multimedia object may also be related interactive behavior for comments; for example, write comments, read comments; alternatively, the interactive behavior may also be a forwarding or sharing behavior for the song.
In step S2, a first weight parameter between any two multimedia objects in the set of multimedia objects is calculated using the time-series interaction data.
In an exemplary embodiment of the present disclosure, the step S2 may specifically include: calculating a first weight parameter between a first multimedia object and a second multimedia object by using the total playing amount of the first multimedia object and the second multimedia object in a preset statistical period; the first multimedia object and the second multimedia object are any two multimedia objects in the multimedia object set.
Specifically, for the multimedia object set, any two songs may be selected from the multimedia object set, and the degree of association between the selected any two songs may be calculated as the first weighting parameter.
Specifically, for all songs of the same user, the weight between each two songs is calculated, the co-occurrence information is captured, the input parameter is the identifier of the user and the two songs, and the weight between [0,1] is output, and the calculation formula may include:
Figure BDA0003835238220000102
wherein the content of the first and second substances,
Figure BDA0003835238220000103
is based on the user u and the song s i And s j The correlation weight of the two songs is obtained through the interactive information calculation, c (-) is used for calculating the playing amount of the user to a certain song, and the function f (-) is expressed in that the weight is calculated by utilizing the ratio of the playing amount of the user to the two songs. In addition, the function f (-) can vary according to the actual scene, as long as the above definition is met and one [0,1] is output]The weights in between.
In step S3, a second weight parameter between any two multimedia objects in the set of multimedia objects is calculated according to the time sequence distribution information of the time sequence interactive data.
In an exemplary embodiment of the present disclosure, the correlation weight between songs is obtained in the first weight parameter calculated in the above step, but the weight parameter does not relate to the timing information; without considering that the time sequence distribution difference of two songs listened by the same user may be large, the taste of the user may be changed. Referring to fig. 2, two songs played by a user are not similar in nature, and the influence is particularly serious when the time span of the used original data is large (for example, N is equal to 90 days); therefore, ignoring the timing distribution information may introduce a large amount of negative information. In step S3, the distribution similarity of the two songs is dynamically weighted by the time-series distribution similarity cut-in.
Specifically, referring to fig. 5, the step S3 may specifically include:
step S31, determining corresponding edge distribution according to the time sequence distribution of the play completion amount of the first multimedia object and the second multimedia object;
step S32, determining the joint distribution corresponding to the first multimedia object and the second multimedia object according to the edge distribution corresponding to the first multimedia object and the second multimedia object;
step S33, calculating the minimum mobile matching distance of the time sequence distribution between the first multimedia object and the second multimedia object based on the data time sequence distribution difference and the joint distribution of the first multimedia object and the second multimedia object;
and step S34, determining a second weight parameter between the first multimedia object and the second multimedia object by combining the optimal joint distribution of the minimum movement matching distance and the predefined movement cost between the multimedia objects.
Specifically, two multimedia objects can be arbitrarily selected from the multimedia data set as the first multimedia object and the second multimedia object for calculation. Taking the song set as an example, two songs can be selected at will, and the perceptual dynamic weight based on time sequence distribution is calculated as the second weight parameter. The method and the device aim to accurately measure the time sequence distribution similarity among different songs, avoid the problem that the time sequence distribution similarity cannot be accurately measured due to the fact that a single index is used, and avoid the problem that too many artificial experience values are injected by comprehensively using a plurality of indexes, and have no great practicability. Therefore, the embodiment uses the minimum moving matching distance between the song time sequence distributions for measurement, and uses the song playing amount for normalization to obtain the discrete probability distribution (time sequence distribution) of the song on N days. Suppose user u pairs songs s i And s j The time sequence distribution of the playing is
Figure BDA0003835238220000121
And
Figure BDA0003835238220000122
as shown with reference to fig. 3. According to the property of probability distribution, the sum of the probabilities of all possible situations is 1, and the formula is expressed as follows:
Figure BDA0003835238220000123
if it is not
Figure BDA0003835238220000124
Then there will be an excess at k (i.e., the
Figure BDA0003835238220000125
Fractions of (e.g., day N-1 distribution) moved elsewhere; if it is used
Figure BDA0003835238220000126
Then the rest is moved unnecessarily to k (i.e., to
Figure BDA0003835238220000127
E.g., day 1 distribution).
Defining a federated distribution
Figure BDA0003835238220000128
It means the amount of movement from x to y, with its edge distribution as
Figure BDA0003835238220000129
And
Figure BDA00038352382200001210
the formula may include:
Figure BDA00038352382200001211
wherein the cost of moving from x to y is defined as d (x, y), and L can be used specifically 1 Norm calculation can be defined according to specific scenes. The formula may include:
d(x,y)=|x-y|
for finally calculating the minimum moving matching distance between the time sequence distributions of the two songs and performing normalization processing, the following formula can be specifically included:
Figure BDA00038352382200001212
Figure BDA00038352382200001213
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038352382200001214
the optimal joint distribution of the minimum movement matching distance is obtained, and since both timing distributions are discrete distributions, the optimal joint distribution can be obtained by using linear programming. Specifically, the solution may specifically be solved by using a scipy.
Figure BDA00038352382200001215
Is a second weight parameter between the first multimedia object and the second multimedia object.
In addition, by performing normalization processing, weights between [0,1] can be obtained for dynamically adjusting the song association weights. Specifically, the normalization process may be calculated by dividing the minimum moving matching distance by the maximum moving distance.
In addition, when the multimedia object is data of other types, the calculation mode of normalization can be customized according to the scene.
For example, assume N equals 4;
Figure BDA00038352382200001216
the distribution is {1, 0.4,2, 0.3, 3;
Figure BDA0003835238220000131
the distribution is {1, 0.3, 3; the optimal joint distribution is found from the linear programming as follows:
Figure BDA0003835238220000132
where the value of the x row and y column represents the amount moved from x to y.
Then the process of the first step is carried out,
Figure BDA0003835238220000133
the calculation is as follows:
Figure BDA0003835238220000134
the calculation results are shown in fig. 4.
The above is the minimum shift matching distance, and if a joint distribution is used arbitrarily instead of the optimal joint distribution, the shift matching distance must be greater than 0.3. For example, with
Figure BDA0003835238220000135
For example, the following steps are carried out:
Figure BDA0003835238220000136
then, it is corresponding to
Figure BDA0003835238220000137
The calculation result is as follows:
Figure BDA0003835238220000138
in step S4, performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an association matrix, so as to perform data recommendation by using the association matrix.
In some exemplary embodiments of the present disclosure, the step S4 may specifically include:
step S41, carrying out aggregation operation by using the first weight parameter and the second weight parameter of any two multimedia objects in the multimedia object set and combining the played user sets corresponding to the two multimedia objects, and determining the aggregation weights corresponding to the two multimedia objects;
and S42, constructing an i2i incidence matrix according to the aggregation weights corresponding to any two multimedia objects in the multimedia object set.
Specifically, after a first weight parameter and a second weight parameter (namely, inter-song association weight and time distribution perception dynamic weight for fine-grained individual representation) between any two multimedia objects in a multimedia object set are obtained, an aggregation function is used for operation, and individual knowledge needs to be gathered to obtain collective intelligence. The aggregation function may include:
Figure BDA0003835238220000141
wherein, NE i Indicating that the song s has been played i The user set of (2); in the numerator of the formula, the songs s that have been played are gathered i And songs s j The information of user intersection, weight1 and weight, are the first weight parameter and the second weight parameter respectively, and weight2 is used for dynamically adjusting weight1 and achieving fine-grained adjustment. In the denominator of the formula,
Figure BDA0003835238220000142
the items are respectively standard items, and the items not only have the function of normalization, but also can balance the cold and hot songs. For example, if s j If it is a popular song, the specification penalty is greater.
Through the steps, each multimedia object in the multimedia object set is respectively operated, and the obtained aggregation weight is used for constructing the incidence matrix of i2 i. For the incidence matrix, the incidence matrix can be stored in the database in the form of an inverted index.
In some exemplary embodiments of the present disclosure, the method described above may further include:
step S51, responding to a data recall request triggered by a currently played multimedia object, and inquiring the incidence matrix by using the currently played multimedia object to acquire a recalled multimedia object;
and S52, pushing the recalled multimedia object.
Specifically, when the user plays a song at the terminal device, the data recall request may be triggered according to the currently played song. For example, when a user plays a song using a random pattern, a data recall request may be triggered based on the currently played song. At this time, the identification information of the currently played song can be used for inquiring the incidence matrix in the database, the TOP K song with the highest song incidence can be inquired as a recall multimedia object, and the recall multimedia object is pushed to the user on the terminal equipment, so that online quick recall can be realized, and the hit rate of recall data can be improved. In addition, a buffer can be arranged for the database, so that the access times of the database are reduced.
In summary, the data recommendation method provided by the present disclosure obtains time sequence interactive data based on time sequence analysis by performing interactive behavior statistics on multimedia objects, and achieves fine-grained dynamic balancing of interactive information, thereby not only effectively eliminating noise information, but also effectively alleviating the long tail effect to a certain extent. Moreover, the relationship between the interactions can be accurately balanced, and the interaction information in the past for a long time can be used for training without worrying about negative information. In addition, the first weight parameter and the second weight parameter are calculated by utilizing time sequence interactive data, the time dimension is introduced, the two-dimensional interactive matrix is expanded to the three-dimensional interactive matrix, and the three-dimensional recall strategy of user-article-time is realized; making it possible to capture information between songs associated with a particular time. The incidence matrix is obtained by carrying out aggregation operation on the first weight parameter and the second weight parameter by using the aggregation function, so that the common information is gathered, the global specification is carried out, and finally the accurate collective intelligence is obtained. Collaborative filtering recalls based on time sequence distribution perception are realized.
Exemplary devices
Having introduced the data recommendation method of the exemplary embodiment of the present invention, next, a data recommendation apparatus of the exemplary embodiment of the present invention is described with reference to fig. 7.
Referring to fig. 7, a data recommendation apparatus 70 according to an exemplary embodiment of the present invention may include: a time sequence interactive data statistic module 701, a first weight parameter calculation module 702, a second weight parameter calculation module 703 and a database construction module 704, wherein:
the timing sequence interactive data statistics module 701 may be configured to obtain a multimedia object set of a target user in a preset statistics period, and perform interactive behavior statistics on multimedia objects in the multimedia object set to determine timing sequence interactive data based on timing sequence analysis.
The first weight parameter calculation module 702 may be configured to calculate a first weight parameter between any two multimedia objects in the set of multimedia objects using the time-series interaction data.
The second weight parameter calculating module 703 may be configured to calculate a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data.
The database construction module 704 may be configured to perform an aggregation operation on any two multimedia objects in the set of multimedia objects based on the first weight parameter and the second weight parameter to obtain an association matrix, so as to perform data recommendation by using the association matrix.
According to an exemplary embodiment of the present disclosure, the time-series interactive data statistics module may be further configured to count a total amount of the multimedia object played over a preset statistics period, and a distribution of the total amount of played over a time axis over the preset statistics period; and constructing time sequence interactive data based on time sequence analysis based on the total broadcasting amount and the distribution of the broadcasting amount.
According to an exemplary embodiment of the present disclosure, the first weight parameter calculation module may further include: calculating a first weight parameter between a first multimedia object and a second multimedia object by using the total playing amount of the first multimedia object and the second multimedia object in a preset statistical period; the first multimedia object and the second multimedia object are any two multimedia objects in the multimedia object set.
According to an exemplary embodiment of the present disclosure, the second weight parameter calculation module may include: determining corresponding edge distribution according to the time sequence distribution of the play completion amount of the first multimedia object and the second multimedia object; determining the joint distribution corresponding to the first multimedia object and the second multimedia object according to the edge distribution corresponding to the first multimedia object and the second multimedia object; calculating the minimum movement matching distance of time sequence distribution between the first multimedia object and the second multimedia object based on the difference of data time sequence distribution and joint distribution of the first multimedia object and the second multimedia object; determining a second weight parameter between the first multimedia object and the second multimedia object in combination with the optimal joint distribution of the minimum movement matching distances, predefined movement costs between multimedia objects.
According to an exemplary embodiment of the present disclosure, the correlation matrix is an i2i correlation matrix; the database construction module comprises: performing aggregation operation by using the first weight parameter and the second weight parameter of any two multimedia objects in the multimedia object set and combining the play-completed user sets corresponding to the two multimedia objects to determine aggregation weights corresponding to the two multimedia objects; and constructing an i2i incidence matrix according to the aggregation weights corresponding to any two multimedia objects in the multimedia object set.
According to an exemplary embodiment of the present disclosure, the target user belongs to an intersection user of the broadcast-completed user sets corresponding to any two multimedia objects in the multimedia object set.
According to an exemplary embodiment of the disclosure, the time series interaction data statistics module is further configured to: collecting multimedia operation behavior historical data of a target user in the preset statistical period; and screening the multimedia operation behavior historical data according to preset conditions to construct the multimedia object set of the target user in a preset statistical period according to screening results.
According to an exemplary embodiment of the present disclosure, the apparatus 70 may further include: and a data recommendation module.
The data recommendation module may be configured to, in response to a data recall request triggered by a currently playing multimedia object, query the association matrix using the currently playing multimedia object to obtain a recalled multimedia object; and pushing the recalled multimedia object.
Since each functional module of the data recommendation device of the embodiment of the present invention is the same as that of the data recommendation method of the embodiment of the present invention, it is not described herein again.
Exemplary storage Medium
Having described the data recommendation method and apparatus according to exemplary embodiments of the present invention, a storage medium according to an exemplary embodiment of the present invention will be described with reference to fig. 9.
Referring to fig. 9, a program product 90 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Exemplary electronic device
Having described the storage medium of the exemplary embodiment of the present invention, next, an electronic apparatus of the exemplary embodiment of the present invention will be described with reference to fig. 8.
The electronic device 800 shown in fig. 8 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 connecting different system components (including the memory unit 820 and the processing unit 810), and a display unit 840.
Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 810 may perform steps S1 to S3 as shown in fig. 1, or the processing unit 810 may perform steps S0 to S5 as shown in fig. 2.
The memory unit 820 may include a volatile memory unit such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may include a data bus, an address bus, and a control bus.
The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 850. The electronic device 800 further comprises a display unit 840 connected to the input/output (I/O) interface 850 for displaying. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
It should be noted that although in the above detailed description several modules or sub-modules of the audio playback device and the audio sharing device are mentioned, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for recommending data, comprising:
acquiring a multimedia object set of a target user in a preset statistical period, and performing interactive behavior statistics on multimedia objects in the multimedia object set to determine time sequence interactive data based on time sequence analysis;
calculating a first weight parameter between any two multimedia objects in the multimedia object set by using the time sequence interactive data;
calculating a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data;
and performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an incidence matrix, so as to perform data recommendation by using the incidence matrix.
2. The data recommendation method of claim 1, wherein performing interactive behavior statistics on the multimedia objects in the set of multimedia objects to determine time-series interactive data based on time-series analysis comprises:
counting the broadcasting-finished total amount of the multimedia object in a preset counting period and the distribution of the broadcasting-finished total amount on a time axis in the preset counting period;
and constructing time sequence interactive data based on time sequence analysis based on the total broadcasting amount and the distribution of the broadcasting amount.
3. The method according to claim 1 or 2, wherein the calculating a first weight parameter between any two multimedia objects in the set of multimedia objects by using the time-series interaction data comprises:
calculating a first weight parameter between a first multimedia object and a second multimedia object by using the total playing amount of the first multimedia object and the second multimedia object in a preset statistical period; the first multimedia object and the second multimedia object are any two multimedia objects in the multimedia object set.
4. The data recommendation method according to claim 1 or 2, wherein said calculating a second weight parameter between any two multimedia objects in the set of multimedia objects according to the time sequence distribution information of the time sequence interactive data comprises:
determining corresponding edge distribution according to the time sequence distribution of the play completion amount of the first multimedia object and the second multimedia object;
determining the joint distribution corresponding to the first multimedia object and the second multimedia object according to the edge distribution corresponding to the first multimedia object and the second multimedia object;
calculating the minimum movement matching distance of time sequence distribution between the first multimedia object and the second multimedia object based on the difference of data time sequence distribution and joint distribution of the first multimedia object and the second multimedia object;
determining a second weight parameter between the first multimedia object and the second multimedia object in combination with the optimal joint distribution of the minimum movement matching distances, predefined movement costs between multimedia objects.
5. The data recommendation method of claim 1, wherein the correlation matrix is an i2i correlation matrix;
the performing aggregation operation based on the first weight parameter and the second weight parameter to obtain the incidence matrix includes:
performing aggregation operation by using the first weight parameter and the second weight parameter of any two multimedia objects in the multimedia object set and combining the play-completed user sets corresponding to the two multimedia objects to determine aggregation weights corresponding to the two multimedia objects;
and constructing an i2i association matrix according to the aggregation weights corresponding to any two multimedia objects in the multimedia object set.
6. The data recommendation method of claim 1, wherein the target user belongs to an intersection user of the broadcasting-completed user sets corresponding to any two multimedia objects in the multimedia object sets.
7. The data recommendation method according to claim 1, wherein the obtaining of the set of multimedia objects of the target user in a preset statistical period comprises:
collecting the multimedia operation behavior historical data of the target user in the preset statistical period;
and screening the multimedia operation behavior historical data according to preset conditions to construct the multimedia object set of the target user in a preset statistical period according to screening results.
8. A data recommendation device, comprising:
the time sequence interactive data statistics module is used for acquiring a multimedia object set of a target user in a preset statistics period, and performing interactive behavior statistics on multimedia objects in the multimedia object set to determine time sequence interactive data based on time sequence analysis;
the first weight parameter calculation module is used for calculating a first weight parameter between any two multimedia objects in the multimedia object set by utilizing the time sequence interactive data;
the second weight parameter calculation module is used for calculating a second weight parameter between any two multimedia objects in the multimedia object set according to the time sequence distribution information of the time sequence interactive data;
and the database construction module is used for performing aggregation operation on any two multimedia objects in the multimedia object set based on the first weight parameter and the second weight parameter to obtain an incidence matrix so as to recommend data by using the incidence matrix.
9. A storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, implements the data recommendation method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data recommendation method of any one of claims 1-7 via execution of the executable instructions.
CN202211086246.6A 2022-09-06 2022-09-06 Data recommendation method and device, storage medium and electronic equipment Pending CN115438201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211086246.6A CN115438201A (en) 2022-09-06 2022-09-06 Data recommendation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211086246.6A CN115438201A (en) 2022-09-06 2022-09-06 Data recommendation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115438201A true CN115438201A (en) 2022-12-06

Family

ID=84247509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211086246.6A Pending CN115438201A (en) 2022-09-06 2022-09-06 Data recommendation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115438201A (en)

Similar Documents

Publication Publication Date Title
CN108921221B (en) User feature generation method, device, equipment and storage medium
US20170139912A1 (en) Cross media recommendation
CN110704674B (en) Video playing integrity prediction method and device
CN101981574B (en) Distributed media fingerprint repositories
CN111126495B (en) Model training method, information prediction device, storage medium and equipment
CN112052387B (en) Content recommendation method, device and computer readable storage medium
WO2020157283A1 (en) Method for recommending video content
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN107679186B (en) Method and device for searching entity based on entity library
CN111209469A (en) Personalized recommendation method and device, computer equipment and storage medium
CN111309966A (en) Audio matching method, device, equipment and storage medium
WO2019242453A1 (en) Information processing method and device, storage medium, and electronic device
CN113297486B (en) Click rate prediction method and related device
US20220238087A1 (en) Methods and systems for determining compact semantic representations of digital audio signals
CN111460215B (en) Audio data processing method and device, computer equipment and storage medium
CN110569447B (en) Network resource recommendation method and device and storage medium
CN115618024A (en) Multimedia recommendation method and device and electronic equipment
Yan Audience evaluation and analysis of symphony performance effects based on the genetic neural network algorithm for the multilayer perceptron (ga-mlp-nn)
CN116257758A (en) Model training method, crowd expanding method, medium, device and computing equipment
CN115438201A (en) Data recommendation method and device, storage medium and electronic equipment
CN113377640B (en) Method, medium, device and computing equipment for explaining model under business scene
CN115129922A (en) Search term generation method, model training method, medium, device and equipment
Sakthivelan et al. RETRACTED ARTICLE: A video analysis on user feedback based recommendation using A-FP hybrid algorithm
Korzeniowski et al. Artist Similarity for Everyone: A Graph Neural Network Approach.
CN1662921A (en) Method to compare various initial cluster sets to determine the best initial set for clustering a set of TV shows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination