CN106250499B - Video pair mining method and device - Google Patents

Video pair mining method and device Download PDF

Info

Publication number
CN106250499B
CN106250499B CN201610624818.XA CN201610624818A CN106250499B CN 106250499 B CN106250499 B CN 106250499B CN 201610624818 A CN201610624818 A CN 201610624818A CN 106250499 B CN106250499 B CN 106250499B
Authority
CN
China
Prior art keywords
video
click
pair
identifier
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610624818.XA
Other languages
Chinese (zh)
Other versions
CN106250499A (en
Inventor
吴凯凯
王世强
单明辉
尹玉宗
姚键
潘柏宇
王冀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201610624818.XA priority Critical patent/CN106250499B/en
Publication of CN106250499A publication Critical patent/CN106250499A/en
Application granted granted Critical
Publication of CN106250499B publication Critical patent/CN106250499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The application discloses a video pair mining method and device, and the method comprises the following steps: the method comprises the steps of obtaining video identification containing a request video, video identification of a click video and click rate sample data of the click video under the condition that a user watches the request video, determining a probability matrix according to the video identification of the request video, the video identification of the click video and the click rate of the click video, calculating N-step transition matrixes corresponding to the probability matrix, and mining video pairs according to the calculated N-step transition matrixes and the obtained sample data of the video pairs. By the method, potential video pairs can be effectively mined, namely, the video pairs with indirect request click relation are effectively mined, so that the number of the video pairs in the video recommendation system is enriched, and the recommendation accuracy of the video recommendation system can be improved.

Description

Video pair mining method and device
Technical Field
The application relates to the technical field of computers, in particular to a video pair mining method and device.
Background
With the continuous development of network technology, more and more users can acquire videos through a network.
At present, in order to better provide a service for a user, when a video requested by the user from a server currently (that is, a video that the user currently wants to watch) is provided for the user, a video recommendation system in the server also recommends a video related to the video requested by the user from the server currently, and then the user clicks the video if the video requested by the user is available in the recommended video while watching the video currently requested by the user from the server.
Further, in order to better recommend the click video to the user, a corresponding video click probability prediction model is generally trained based on some training features (for example, for a video pair appearing in history, a ratio of the number of times that the click video in the video pair is watched by the user to the number of times that the request video in the video pair is requested, user information, and the like) for the user according to the request video in the video pair, and then, after the current request video of the user is given, the video recommendation system predicts the click probability of the video to be recommended according to the video click probability prediction model, sorts the click probabilities of all the videos to be recommended from high to low, and recommends the video to be recommended, of which the click probability is greater than a preset threshold, to the user.
However, in the whole process of model training and predicting the click probability of the video, when the video pair which appears historically is used, according to the characteristic that the ratio of the number of times that the click video in the video pair is watched by the user to the number of times that the click video in the video pair is requested by the user (in this application, the ratio may be defined as the click rate of the click video), the video pairs which are included are all the video pairs which appear historically, and in practical application, given a certain request video, there may be a case that a video which is never recommended to the user but is not recommended to the user has a video pair relationship with the click video in the video pair which includes the request video, that is, the request video and the video which is not recommended to the user have a certain relationship, or to say, the relationship between the request and the click exists, which may be a relationship between video pairs, for example, a video pair 1 includes a request video a and a click video B, and a video pair 2 includes a request video B and a click video C, where the click video B and the request video B are the same video, it can be seen that the request video a and the click video C have a certain request and click relationship, that is, when a user watches the request video a, if the server recommends the video C to the user, the user may click the video C.
In order to train a better model and predict the click probability of the video to be recommended more accurately, the relationship between the potential video pairs can be mined, so as to increase the data content contained in the training feature, namely the ratio of the number of times that the click video in the video pair is clicked to the number of times that the request video in the video pair is requested, given the request video in the video pair which appears historically.
Disclosure of Invention
The embodiment of the application provides a video pair mining method and device, which are used for solving the problem that no corresponding means is available in the prior art for mining potential video pairs, namely, video pairs with indirect request click relations are mined.
The video pair mining method provided by the embodiment of the application comprises the following steps:
acquiring sample data of each video pair, wherein the sample data of each video pair comprises a video identifier of a requested video, a video identifier of a clicked video and the click rate of the clicked video under the condition that a user watches the requested video;
determining a probability matrix according to the video identification of the request video, the video identification of the clicked video and the click rate of the clicked video;
calculating an N-step transition matrix corresponding to the probability matrix, wherein N is an integer greater than 1;
mining the video pairs according to the calculated N-step transfer matrix and the sample data of each acquired video pair,
the video pair consists of a request video and a click video, and the click video is a video which is watched by the user in videos recommended for the user according to the request video.
The video that this application embodiment provided is to excavating device, the device includes:
the acquisition module is used for acquiring sample data of each video pair, wherein the sample data of the video pair comprises a video identifier of a request video, a video identifier of a click video and the click rate of the click video when a user watches the request video;
the determining module is used for determining a probability matrix according to the video identifier of the request video, the video identifier of the click video and the click rate of the click video;
the calculation module is used for calculating an N-step transition matrix corresponding to the probability matrix, wherein N is an integer greater than 1;
and the mining module is used for mining the video pairs according to the calculated N-step transfer matrix and the obtained sample data of each video pair, wherein the video pairs consist of request videos and click videos, and the click videos are videos which are recommended to the users according to the request videos and watched by the users.
The embodiment of the application provides a video pair mining method and device, the method obtains video pair sample data comprising a video identifier of a request video, a video identifier of a click video and click rate of the click video under the condition that a user watches the request video, determines a probability matrix according to the video identifier of the request video, the video identifier of the click video and the click rate of the click video, calculates N-step transfer matrixes corresponding to the probability matrix, and mines video pairs according to the calculated N-step transfer matrixes and the obtained video pair sample data. By the method, potential video pairs can be effectively mined, namely, the video pairs with indirect request click relation are effectively mined, so that the number of the video pairs in the video recommendation system is enriched, and the recommendation accuracy of the video recommendation system can be improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic diagram of a video pair mining process provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a probability matrix result provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a video pair mining device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a video pair mining process provided in an embodiment of the present application, which specifically includes the following steps:
s101: and acquiring sample data of each video pair.
In the present application, in order to be able to mine potential video pairs, sample data of each video pair needs to be acquired first.
It should be noted here that the acquired video pair sample data is video pair data that has appeared historically, that is, the video pair that appears historically is composed of a video set (i.e., a request video) that a user wants to watch and a video (i.e., a click video) that is watched by the user in videos recommended for the user according to the request video, which are currently sent to the server by the user, and one video pair only includes one request video and one click video.
And the video pair data consists of a video identifier of the requested video, a video identifier of the clicked video and the click rate of the clicked video under the condition that the user watches the requested video, wherein the click rate of the clicked video is specific to a certain video pair, and the ratio of the number of times that the clicked video in the video pair is watched by the user to the number of times that the video pair is requested by the requested video in the video pair is recommended to the user according to the requested video in the video pair.
And S102, determining a probability matrix according to the video identification of the request video, the video identification of the click video and the click rate of the click video.
According to the method and the device, the video pairs needing to be mined are mainly determined through the transition matrix, and the probability matrix needs to be established firstly to obtain the transition matrix, so that in the method and the device, after the sample data of each video pair is obtained, the probability matrix can be established according to the sample data of each video pair.
Further, the present application provides a method for establishing a probability matrix, specifically, according to sample data of each video pair, establishing a probability matrix, specifically, taking a video identifier corresponding to each requested video as a row of the probability matrix, taking a video identifier corresponding to each click video as a column of the probability matrix, and taking a click rate of the click video in sample data of each video pair as an element value of the probability matrix to determine the probability matrix, for example, assuming that there are three video pair data, namely, a video pair a including 1(1 is an identifier of the requested video) and 9(9 is an identifier of the click video) and a click rate of the click video of 0.2, a video pair B including 3(3 is an identifier of the requested video) and 10(10 is an identifier of the click video) and a click rate of the click video of 0.5, a video pair C including 5(5 is an identifier of the requested video) and 6(6 is an identifier of the click video) and a click rate of the click of the video of 0.7, therefore, the specific example of determining the probability matrix by using the video identifier corresponding to each requested video as a row of the probability matrix, the video identifier corresponding to each click video as a column of the probability matrix, and the click rate of each video on the click video in the sample data as an element value of the probability matrix is shown in fig. 2.
It should be noted that, if there is no video pair in any element position in the probability matrix, 0 may be filled directly.
In general, since the identifiers of the requested videos and the identifiers of the clicked videos may reach tens or hundreds of millions, but actually, for a certain requested video, the number of video pairs formed by the clicked videos connected with the requested video may only reach tens or hundreds of millions, so that the number of rows and the number of columns of the whole matrix may reach tens or hundreds of millions according to the method for establishing the probability matrix given above, and subsequently, computational calculation and pressure may be caused on a computer, in this application, the video identifiers of the requested videos and the video identifiers of the clicked videos that are not repeated may be extracted from the sample data of each video pair (that is, the video identifiers of the extracted requested videos and the video identifiers of the clicked videos are not repeated two by two), and the video identifiers corresponding to the extracted requested videos and the video identifiers corresponding to the clicked videos are numbered, and generating a corresponding relation between a video identifier and a number corresponding to the request video and a corresponding relation between a video identifier and a number corresponding to the click video, and determining a probability matrix according to the number corresponding to the video identifier corresponding to each request video after numbering, the number corresponding to the video identifier corresponding to the click video and the click rate of the click video, so that the number of rows and columns of the probability matrix can be effectively reduced, and the calculation amount and pressure of a computer in processing the probability matrix can be subsequently reduced 5(5 is the identifier of the requested video) and 6(6 is the identifier of the clicked video), and the video identifiers corresponding to the extracted requested videos and the video identifiers corresponding to the clicked videos are numbered, that is, 1 (the number corresponding to the identifier 1 of the requested video), 2 (the number corresponding to the identifier 9 of the clicked video), 3 (the number corresponding to the identifier 3 of the requested video), 4 (the number corresponding to the identifier 10 of the clicked video), 5 (the number corresponding to the identifier 5 of the requested video), and 6 (the number corresponding to the identifier 6 of the clicked video).
It should be noted that, in the process of numbering the identifications of the requested videos and the identifications of the clicked videos, consecutive numbering may be done, e.g. 2, 3, 4, 5, or non-consecutive numbering may be done, e.g. 2, 4, 6, 8, but such numbering is not such that the number of rows and columns of the probability matrix is minimal, the optimal numbering way is that the video identifications of all the requested videos and the video identifications of the clicked videos are numbered continuously from 1, so that the number of rows and columns of the established probability matrix is the minimum, and it should be noted that, although numbering starts from 1 in theory, in computer running processes, numbering needs to start from 0, in fact, 0 represents the first row or the first column in the probability matrix, which is equivalent to the number 1 in theory, so that the subsequent calculation amount and pressure of the computer can be effectively reduced.
Further, in the process of determining the probability matrix according to the serial number corresponding to the video identifier corresponding to each numbered request video, the serial number corresponding to the video identifier corresponding to the click video and the click rate of the click video, an embodiment is provided in the present application, specifically, the serial number corresponding to the video identifier corresponding to each request video is used as a row of the probability matrix, the serial number corresponding to the video identifier corresponding to each click video is used as a column of the probability matrix, and the click rate of each video on the click video in the sample data is used as an element value of the probability matrix to determine the probability matrix.
S103: and calculating the N-step transition matrix corresponding to the probability matrix.
After the probability matrix is established, N transition matrices corresponding to the probability matrix need to be calculated, where N is an integer greater than 1.
However, the probability matrix established in step S102 is usually inconsistent with the number of valid rows and the number of valid columns (it should be noted here that the maximum number of rows in the non-zero element is the valid number of rows, and the maximum number of columns in the non-zero element is the valid number of columns), so that when the number of valid rows of the probability matrix is inconsistent with the number of valid columns, the N-step transition matrix cannot be directly calculated, and based on this, in this application, before calculating the N-step transition matrix corresponding to the probability matrix, the number of rows or columns needs to be expanded on the probability matrix, if the number of rows is greater than the number of columns, the number of columns needs to be increased, so that the total number of rows is equal to the total number of rows, and if the number of columns is greater than the number of rows, the number of rows needs to be increased, so that the total number of rows is equal to the total number of columns, and the elements in the increased rows or the elements in the increased columns are all filled with, however, in actual computer storage, to reduce the additional storage overhead, all 0 elements in the probability matrix are actually not stored, and only those non-zero elements are stored.
It should be noted that N in the present application may be set according to actual conditions, and in a general case, N may be set to 2 if one layer of indirect request click relationship is to be mined, for example, the request video a and the click video B are a video pair, and the request video B and the click video C are a video pair, where the request video B and the click video B are the same video, so that the request video a and the click video C only have one layer of indirect request click relationship, and N may be set to N +1 if N layers of indirect request click relationship are to be mined.
Further, since N probability matrices are multiplied in series and N-1 multiplication operations are required, in order to reduce the number of multiplication operations, the N probability matrices may be grouped, wherein at least two groups among the grouped groups include the same number of probability matrices, the groups including the repeated number of probability matrices are removed, all the probability matrices included in the group are multiplied for any remaining group, the matrices obtained by multiplying all the groups after grouping are multiplied, and the matrices obtained by multiplying all the groups after grouping are used as N-step transition matrices corresponding to the probability matrices. For example, assuming that 7 transition matrices need to be calculated, and M represents a probability matrix, the 7 probability matrices can be divided into four groups, that is, the first group includes two probability matrices, the second group includes two probability matrices, the third group includes two probability matrices, and the fourth group includes one probability matrix, each group including the repetition of the number of probability matrices, that is, any two groups of the first group, the second group, or the third group, is removed, because the number of probability matrices in the three groups is the same repetition, therefore, only the probability matrices included in any one group need to be multiplied, the matrix obtained by multiplication is directly used as the matrix of the other two groups, the probability matrices included in the other two groups do not need to be multiplied repeatedly, all the probability matrices included in the fourth group are multiplied, and finally the matrices obtained by multiplication corresponding to all the groups after grouping are directly multiplied, that is, the matrix obtained by multiplying the matrix corresponding to the first group by the matrix obtained by multiplying the matrix corresponding to the second group by the matrix obtained by multiplying the matrix corresponding to the third group by the matrix obtained by multiplying the matrix corresponding to the fourth group, and the matrix obtained by multiplying all the grouped groups is used as the 7-step transition matrix corresponding to the probability matrix, so that the 7-step transition matrix can be obtained by only performing four operations, and the operation times are reduced.
In addition, a section of computer code for multiplying the probability matrix once by the computer is also provided in the application, and the specific steps are as follows:
// read data
valcoorMatrix=MTUtils.loadCoordinateMatrix(sc.args(0))
// conversion format CoordinateMatrix → DenseVecMatrix
valdenseVecMatrix=coorMatrixTODenseVecMatrix(coorMatrix,row,cols)
// conversion format CoordinateMatrix → SparseVecMatrix
valsparseVecMatrix=denseVecMatrixtoSparseVecMatrix
valleftMatrix=SparseVecMatrix
valrightMatrix=leftMatrix
// matrix multiplication
valmultiplyResult=leftMatrix.multiplySparse(rightMatrix)
In the computer code, coormatxtodensevecmatrix is a function realized by self-definition, and is mainly used for forcibly specifying the number of rows and columns of a transformation matrix, so as to avoid matrix dimension inconsistency during subsequent multiplication caused by matrix rows and columns obtained in an original transformation mode being smaller than an expected value due to the absence of effective data of boundary positions in actual data, and the corresponding operations are as follows:
Figure BDA0001067432640000091
s104: and mining the video pairs according to the calculated N-step transfer matrix and the acquired sample data of each video pair.
After the N-step transition matrix is obtained, the video pair data needs to be restored according to the N-step transition matrix, that is, according to the N-step transition matrix, the video pair data is established.
Further, in the process of establishing video pair data according to the N-step transition matrix, if the video identifier corresponding to each request video is used as a row of the probability matrix, the video identifier corresponding to each clicked video is used as a column of the probability matrix, and the click rate of the clicked video in each video pair sample data is used as an element value of the probability matrix to determine the probability matrix when the probability matrix is determined in step S102, the process of establishing video pair data according to the N-step transition matrix is as follows: and aiming at each element which is not 0 in the N-step transfer matrix, determining a video identifier corresponding to a request video corresponding to the element according to the row of the matrix probability corresponding to the determined video identifier corresponding to each request video, determining a video identifier corresponding to a click video corresponding to the element according to the column corresponding to the determined video identifier corresponding to each click video, taking the value of the element as the click rate of the click video corresponding to the element, establishing video pair data according to the determined video identifier corresponding to the request video, the video identifier corresponding to the click video and the click rate of the click video, and mining a video pair according to the established video pair data and the acquired video pair sample data.
If the number corresponding to the video identifier corresponding to each request video is used as a row of the probability matrix, the number corresponding to the video identifier corresponding to each click video is used as a column of the probability matrix, and the click rate of each video to the click video in the sample data is used as an element value of the probability matrix to determine the probability matrix when the probability matrix is determined in the step S102, the process of establishing video to data according to the N-step transition matrix is as follows: for each element which is not 0 in the N-step transfer matrix, determining the number of a request video corresponding to the element according to the line number corresponding to the element, determining the number of a click video corresponding to the element according to the column number corresponding to the element, determining the click rate of the click video corresponding to the element according to the numerical value of the element, determining the video identifier of the request video corresponding to the number of the request video according to the corresponding relation between the video identifier corresponding to the generated request video and the number, determining the video identifier of the click video corresponding to the number of the click video according to the corresponding relation between the video identifier corresponding to the generated click video and the number, establishing video pair data according to the determined video identifier of the request video, the video identifier of the click video and the click rate of the click video, and establishing video pair data and acquired video pair sample data, and (5) mining the video pairs.
Further, after video pair data are established according to the N-step transfer matrix, video pairs need to be mined according to the established video pair data and the acquired video pair sample data, specifically, the established video pair data are matched with the acquired video pair sample data, and video pair data inconsistent with the acquired video pair sample data are determined in the establishment of the video pair data.
When the video pair data inconsistent with the acquired video pair sample data is determined, two conditions mainly exist, the first case is that the click rate of the clicked video in the created video pair data is different from the click rate of the clicked video in the acquired video pair sample data, and therefore, when video pair data inconsistent with the acquired video pair sample data is determined, the click rate of the click video in the established video pair data and the click rate of the click video in the acquired video pair sample data can be determined, and subsequently, the video identification of the requested video and the video identification of the click video in the data can be determined according to the determined video, and searching the video pair sample data corresponding to the determined video pair data in a video recommendation system containing each video pair sample data, and replacing the video pair sample data corresponding to the determined video pair data with the determined video pair sample data.
It should be noted that, in the process of replacing the determined video pair data with the video pair sample data corresponding to the determined video pair data, the video identifier of the video request, the video identifier of the clicked video, and the click rate of the clicked video in the video pair sample data corresponding to the determined video pair data may all be replaced with the video identifier of the video request, the video identifier of the clicked video, and the click rate of the clicked video in the video pair sample data corresponding to the determined video, and the click rate of the clicked video in the video pair data may also be replaced with the click rate of the clicked video in the video pair sample data corresponding to the determined video pair data.
The second case is that the video identifier of the request video and the video identifier of the click video in the established video pair data are determined to have at least one video pair data different from the video identifier of the request video and the video identifier of the click video in the acquired video pair sample data, so that when the video pair data inconsistent with the acquired video pair sample data are determined, the video identifier of the request video and the video identifier of the click video in the established video pair data are determined to have at least one video pair data different from the video identifier of the request video and the video identifier of the click video in the acquired video pair sample data, and then the determined video pair data can be added into the video recommendation system.
By the method, potential video pairs can be effectively mined, namely, the video pairs with indirect request click relation are effectively mined, so that the number of the video pairs in the video recommendation system is enriched, and the recommendation accuracy of the video recommendation system can be improved.
In addition, in practical application, after the number of video pairs in the video recommendation system is enriched and expanded, the video recommendation system can be only used for improving the accuracy of the video recommendation system, and can also be used for increasing the number of videos recommended by a user when the user watches a request video, because click videos corresponding to a certain request video may increase after mining the video pairs, for example, when the user watches a certain request video, it is assumed that a website recommends ten click videos related to the request video to the user, but before mining the video pairs, only five click videos in the video recommendation system have a relationship with the request video, therefore, the video recommendation system recommends the five click videos to the user, and also finds five videos unrelated to the request video to recommend to the user, and after mining the video pairs, click videos related to the request video may exceed ten, therefore, the video recommendation system can recommend the ten click videos related to the request video to the user, so that the situation that the number of recommended videos obtained by a prediction model used by the video recommendation system is insufficient is avoided.
Based on the same idea, the video pair mining method provided in the embodiment of the present application further provides a video pair mining device, as shown in fig. 3.
Fig. 3 is a schematic structural diagram of a video pair mining device according to an embodiment of the present application, where the video pair mining device includes:
the obtaining module 201 is configured to obtain sample data of each video pair, where the sample data of the video pair includes a video identifier of a requested video, a video identifier of a clicked video, and a click rate of the clicked video when a user watches the requested video;
a determining module 202, configured to determine a probability matrix according to the video identifier of the request video, the video identifier of the clicked video, and the click rate of the clicked video;
a calculating module 203, configured to calculate an N-step transition matrix corresponding to the probability matrix, where N is an integer greater than 1;
and the mining module 204 is configured to mine a video pair according to the calculated N-step transition matrix and the obtained sample data of each video pair, where the video pair is composed of a request video and a click video, and the click video is a video that is watched by a user in videos recommended by the user according to the request video.
The determining module 202 is specifically configured to determine the probability matrix by using the video identifier corresponding to each request video as a row of the probability matrix, using the video identifier corresponding to each click video as a column of the probability matrix, and using the click rate of each video on the click video in the sample data as an element value of the probability matrix.
The determining module 202 is specifically configured to extract a video identifier of a non-duplicate request video from each video pair sample data, click the video identifier of the video, number the extracted video identifier corresponding to each request video and the video identifier corresponding to the click video, generate a corresponding relationship between the video identifier corresponding to the request video and the number and a corresponding relationship between the video identifier corresponding to the click video and the number, and determine a probability matrix according to the number corresponding to the video identifier corresponding to each request video after numbering, the number corresponding to the video identifier corresponding to the click video, and the click rate of the click video.
The determining module 202 is specifically configured to determine the probability matrix by taking the number corresponding to the video identifier corresponding to each request video as a row of the probability matrix, taking the number corresponding to the video identifier corresponding to each click video as a column of the probability matrix, and taking the click rate of each video on the click video in the sample data as an element value of the probability matrix.
The calculating module 203 is specifically configured to group the N probability matrices, where at least two groups in each group after grouping include the same number of probability matrices, remove each group including a repetition number of probability matrices, multiply all probability matrices included in the group for any remaining group, multiply matrices obtained by multiplying all groups after grouping, and use a matrix obtained by multiplying all groups after grouping as an N-step transition matrix corresponding to the probability matrix.
The mining module 204 is specifically configured to, for each element in the N-step transition matrix that is not 0, determine a video identifier corresponding to the requested video corresponding to the element according to a row of a matrix probability corresponding to a video identifier corresponding to each determined requested video, determine a video identifier corresponding to a clicked video corresponding to the element according to a column corresponding to a video identifier corresponding to each determined clicked video, use a value of the element as a click rate of the clicked video corresponding to the element, establish video pair data according to the determined video identifier corresponding to the requested video, the video identifier corresponding to the clicked video, and the click rate of the clicked video, and mine a video pair according to the established video pair data and the acquired video pair sample data.
The mining module 204 is specifically configured to, for each element in the N-step transfer matrix that is not 0, determine, according to a row number corresponding to the element, a number of a request video corresponding to the element, determine, according to a column number corresponding to the element, a number of a click video corresponding to the element, determine, according to a numerical value of the element, a click rate of the click video corresponding to the element, determine, according to a correspondence between a video identifier corresponding to the generated request video and the number, a video identifier of the request video corresponding to the number of the request video, determine, according to a correspondence between the video identifier corresponding to the generated click video and the number, a video identifier of the click video corresponding to the number of the click video, and create video pair data according to the determined video identifier of the request video, the video identifier of the click video and the click rate of the click video, and mining the video pairs according to the established video pair data and the acquired video pair sample data.
The mining module 204 is specifically configured to match the established video pair data with the acquired video pair sample data, and determine video pair data that is inconsistent with the acquired video pair sample data in the establishment of the video pair data.
The mining module 204 is specifically configured to determine video pair data in which the click rate of a clicked video in the established video pair data is different from the click rate of a clicked video in the acquired video pair sample data, search video pair sample data corresponding to the determined video pair data in a video recommendation system including the video pair sample data according to a video identifier of a requested video and a video identifier of a clicked video in the determined video pair sample data, and replace the determined video pair sample data with the video pair sample data corresponding to the determined video pair data.
The mining module 204 is specifically configured to determine that at least one of the video identifier of the request video and the video identifier of the click video in the established video pair data is different from the video identifier of the request video and the video identifier of the click video in the acquired video pair sample data, and add the determined video pair data to the video recommendation system.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement storage by any method or technology. May be computer-readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store data that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (20)

1. A method for pair mining, the method comprising:
acquiring sample data of each video pair, wherein the sample data of each video pair comprises a video identifier of a request video, a video identifier of a clicked video and a click rate of the clicked video under the condition that a user watches the request video, wherein the click rate of the clicked video is the ratio of the number of times that the clicked video in the video pair is watched by the user to the number of times that the request video in the video pair is requested;
determining a probability matrix according to the video identification of the request video, the video identification of the clicked video and the click rate of the clicked video;
calculating an N-step transition matrix corresponding to the probability matrix, wherein N is an integer greater than 1;
and mining a video pair according to the calculated N-step transfer matrix and the sample data of each acquired video pair, wherein the video pair consists of a request video and a click video, and the click video is a video which is watched by the user in videos recommended by the user according to the request video.
2. The method of claim 1, wherein determining a probability matrix according to the video identifier of the request video, the video identifier of the click video, and the click rate of the click video specifically comprises:
taking the video identification corresponding to each request video as a row of a probability matrix;
taking the video identification corresponding to each click video as a column of a probability matrix;
and determining a probability matrix by taking the click rate of each video to the clicked video in the sample data as the element value of the probability matrix.
3. The method of claim 1, wherein determining a probability matrix according to the video identifier of the request video, the video identifier of the click video, and the click rate of the click video specifically comprises:
extracting video identification of unrepeated request videos and video identification of clicked videos from the sample data of all the video pairs;
the video identifications corresponding to the extracted request videos and the video identifications corresponding to the clicked videos are numbered, and the corresponding relation between the video identifications corresponding to the request videos and the numbers and the corresponding relation between the video identifications corresponding to the clicked videos and the numbers are generated;
and determining a probability matrix according to the serial numbers corresponding to the video identifications corresponding to the numbered request videos, the serial numbers corresponding to the video identifications corresponding to the click videos and the click rate of the click videos.
4. The method of claim 3, wherein determining the probability matrix according to the number corresponding to the video identifier corresponding to each numbered request video, the number corresponding to the video identifier corresponding to the click video, and the click rate of the click video comprises:
taking the number corresponding to the video identifier corresponding to each request video as a row of the probability matrix;
taking the serial number corresponding to the video identification corresponding to each click video as the column of the probability matrix;
and determining the probability matrix by taking the click rate of each video to the clicked video in the sample data as the element value of the probability matrix.
5. The method of claim 1, wherein calculating the N-step transition matrix corresponding to the probability matrix specifically comprises:
grouping the N probability matrixes, wherein at least two groups in each grouped group have the same number of probability matrixes;
removing each group containing repeated number of probability matrixes, and multiplying all probability matrixes contained in the group aiming at any remaining group;
multiplying the matrixes obtained by multiplying all the grouped groups correspondingly;
and taking a matrix obtained by multiplying all the grouped groups as an N-step transfer matrix corresponding to the probability matrix.
6. The method according to claim 2, wherein mining video pairs according to the calculated N-step transition matrix and the acquired sample data of each video pair specifically comprises:
for each element which is not 0 in the N-step transition matrix, determining a video identifier corresponding to the request video corresponding to the element according to the row of the matrix probability corresponding to the video identifier corresponding to each determined request video, determining a video identifier corresponding to the clicked video corresponding to the element according to the column corresponding to the video identifier corresponding to each determined clicked video, and taking the value of the element as the click rate of the clicked video corresponding to the element;
establishing video pair data according to the determined video identification corresponding to the request video, the determined video identification corresponding to the clicked video and the click rate of the clicked video;
and mining the video pairs according to the established video pair data and the acquired video pair sample data.
7. The method according to claim 4, wherein mining video pairs according to the calculated N-step transition matrix and the acquired sample data of each video pair specifically comprises:
for each element which is not 0 in the N-step transfer matrix, determining the number of the request video corresponding to the element according to the row number corresponding to the element, determining the number of the click video corresponding to the element according to the column number corresponding to the element, and determining the click rate of the click video corresponding to the element according to the value of the element;
determining a video identifier of a request video corresponding to the number of the request video according to the corresponding relation between the video identifier corresponding to the generated request video and the number, and determining a video identifier of a click video corresponding to the number of the click video according to the corresponding relation between the video identifier corresponding to the generated click video and the number;
establishing video pair data according to the determined video identification of the request video, the determined video identification of the clicked video and the click rate of the clicked video;
and mining the video pairs according to the established video pair data and the acquired video pair sample data.
8. The method according to claim 6 or 7, wherein mining video pairs according to the established video pair data and the acquired video pair sample data specifically comprises:
matching the established video pair data with the acquired video pair sample data;
and in the process of establishing each video pair data, determining the video pair data inconsistent with the acquired video pair sample data.
9. The method of claim 8, wherein determining video pair data that is inconsistent with the acquired video pair sample data comprises:
determining video pair data with different click rates of click videos in the established video pair data and click videos in the acquired video pair sample data;
according to the video identification of the request video and the video identification of the click video in the determined video pair data, searching video pair sample data corresponding to the determined video pair data in a video recommendation system containing each video pair sample data;
replacing the determined video pair data with the video pair sample data corresponding to the determined video pair data.
10. The method of claim 8, wherein determining video pair data that is inconsistent with the acquired video pair sample data comprises:
determining that at least one of the video identifier of the request video and the video identifier of the click video in the established video pair data is different from the video identifier of the request video and the video identifier of the click video in the acquired video pair sample data;
adding the determined video pair data to a video recommendation system.
11. A video pair mining apparatus, the apparatus comprising:
the acquisition module is used for acquiring sample data of each video pair, wherein the sample data of the video pairs comprises a video identifier of a request video, a video identifier of a click video and a click rate of the click video under the condition that a user watches the request video, and the click rate of the click video is the ratio of the number of times that the click video in a video pair is watched by the user to the number of times that the request video in the video pair is requested;
the determining module is used for determining a probability matrix according to the video identifier of the request video, the video identifier of the click video and the click rate of the click video;
the calculation module is used for calculating an N-step transition matrix corresponding to the probability matrix, wherein N is an integer greater than 1;
and the mining module is used for mining the video pairs according to the calculated N-step transfer matrix and the obtained sample data of each video pair, wherein the video pairs consist of request videos and click videos, and the click videos are videos which are recommended to the users according to the request videos and watched by the users.
12. The apparatus according to claim 11, wherein the determining module is specifically configured to determine the probability matrix by using video identifiers corresponding to the requested videos as rows of the probability matrix, using video identifiers corresponding to the clicked videos as columns of the probability matrix, and using click rates of the clicked videos in the sample data of the videos as element values of the probability matrix.
13. The apparatus according to claim 11, wherein the determining module is specifically configured to extract a video identifier of a non-duplicate request video from each pair of video sample data, click a video identifier of the video, number the extracted video identifier corresponding to each request video and the video identifier corresponding to the click video, generate a correspondence between the video identifier corresponding to the request video and the number and a correspondence between the video identifier corresponding to the click video and the number, and determine the probability matrix according to the number corresponding to the video identifier corresponding to each request video after numbering, the number corresponding to the video identifier corresponding to the click video, and the click rate of the click video.
14. The apparatus according to claim 13, wherein the determining module is specifically configured to determine the probability matrix by using numbers corresponding to the video identifiers corresponding to the requested videos as rows of the probability matrix, numbers corresponding to the video identifiers corresponding to the click videos as columns of the probability matrix, and click rates of the click videos in the sample data of the respective videos as element values of the probability matrix.
15. The apparatus according to claim 11, wherein the computing module is specifically configured to group N probability matrices, where at least two groups in each group after grouping include the same number of probability matrices, remove groups including repeated numbers of probability matrices, and for any remaining group, multiply all probability matrices included in the group, multiply matrices obtained by multiplying all groups after grouping, and use matrices obtained by multiplying all groups after grouping as N-step transition matrices corresponding to the probability matrices.
16. The apparatus according to claim 12, wherein the mining module is specifically configured to, for each element in the N-step transition matrix that is not 0, determine a video identifier corresponding to the requested video corresponding to the element according to a row of a matrix probability corresponding to a determined video identifier corresponding to each requested video, determine a video identifier corresponding to a clicked video corresponding to the element according to a column corresponding to a determined video identifier corresponding to each clicked video, use a value of the element as a click rate of the clicked video corresponding to the element, create video pair data according to the determined video identifier corresponding to the requested video, the video identifier corresponding to the clicked video, and the click rate of the clicked video, and mine a video pair according to the created video pair data and the obtained video pair sample data.
17. The apparatus according to claim 14, wherein the mining module is specifically configured to, for each element in the N-step transition matrix that is not 0, determine a number of the requested video corresponding to the element according to a row number corresponding to the element, determine a number of the clicked video corresponding to the element according to a column number corresponding to the element, determine a click rate of the clicked video corresponding to the element according to a numerical value of the element, determine a video identifier of the requested video corresponding to the number of the requested video according to a corresponding relationship between the generated video identifier corresponding to the requested video and the number, determine a video identifier of the clicked video corresponding to the number of the clicked video according to a corresponding relationship between the generated video identifier corresponding to the clicked video and the number, determine the video identifier of the requested video, the video identifier of the clicked video, and the click rate of the clicked video, and establishing video pair data, and mining video pairs according to the established video pair data and the acquired video pair sample data.
18. The apparatus according to claim 16 or 17, wherein the mining module is specifically configured to match the created video pair data with the acquired video pair sample data, and determine, in creating the video pair data, video pair data that is inconsistent with the acquired video pair sample data.
19. The apparatus according to claim 18, wherein the mining module is specifically configured to determine video pair data in which a click rate of a click video in the established video pair data is different from a click rate of a click video in each obtained video pair sample data, find video pair sample data corresponding to the determined video pair data in a video recommendation system including each video pair sample data according to a video identifier of a request video in the determined video pair data and a video identifier of the click video, and replace the determined video pair sample data with the video pair sample data corresponding to the determined video pair data.
20. The apparatus according to claim 18, wherein the mining module is specifically configured to determine that the video identifier of the requested video and the video identifier of the clicked video in the created video pair data have at least one video pair data different from the video identifier of the requested video and the video identifier of the clicked video in the acquired video pair sample data, and add the determined video pair data to the video recommendation system.
CN201610624818.XA 2016-08-02 2016-08-02 Video pair mining method and device Active CN106250499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610624818.XA CN106250499B (en) 2016-08-02 2016-08-02 Video pair mining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610624818.XA CN106250499B (en) 2016-08-02 2016-08-02 Video pair mining method and device

Publications (2)

Publication Number Publication Date
CN106250499A CN106250499A (en) 2016-12-21
CN106250499B true CN106250499B (en) 2020-07-14

Family

ID=57606384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610624818.XA Active CN106250499B (en) 2016-08-02 2016-08-02 Video pair mining method and device

Country Status (1)

Country Link
CN (1) CN106250499B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108614856B (en) * 2018-03-21 2021-01-05 北京奇艺世纪科技有限公司 Video sequencing calibration method and device
CN111050195B (en) * 2018-10-12 2021-11-26 中国电信股份有限公司 Streaming media caching method and device and computer readable storage medium
CN112364202B (en) * 2020-11-06 2023-11-14 上海众源网络有限公司 Video recommendation method and device and electronic equipment
CN112767053A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Information processing method, information processing device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103441902A (en) * 2013-09-03 2013-12-11 重庆邮电大学 Flow generation method based on streaming media user behavior analysis
CN104216886A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Video recommendation device, system and method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5328934B2 (en) * 2009-04-13 2013-10-30 エンサーズ カンパニー リミテッド Method and apparatus for providing moving image related advertisement
CN104978368A (en) * 2014-04-14 2015-10-14 百度在线网络技术(北京)有限公司 Method and device used for providing recommendation information
CN105095279B (en) * 2014-05-13 2019-05-03 深圳市腾讯计算机系统有限公司 File recommendation method and device
CN104935963B (en) * 2015-05-29 2018-03-16 中国科学院信息工程研究所 A kind of video recommendation method based on timing driving

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216886A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Video recommendation device, system and method
CN103441902A (en) * 2013-09-03 2013-12-11 重庆邮电大学 Flow generation method based on streaming media user behavior analysis

Also Published As

Publication number Publication date
CN106250499A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
CN106250499B (en) Video pair mining method and device
CN106649346B (en) Data repeatability checking method and device
US9608664B2 (en) Compression of integer data using a common divisor
CN111143578B (en) Method, device and processor for extracting event relationship based on neural network
CN106227881B (en) Information processing method and server
CN106649210B (en) Data conversion method and device
CN107451204B (en) Data query method, device and equipment
GB2578430A (en) Data communication
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN114691732A (en) Method and device for positioning abnormal transaction, nonvolatile storage medium and processor
CN107016028B (en) Data processing method and apparatus thereof
CN111190896B (en) Data processing method, device, storage medium and computer equipment
CN110889065B (en) Page stay time determination method, device and equipment
CN113039805A (en) Accurately automatically cropping media content by frame using multiple markers
CN109213972B (en) Method, device, equipment and computer storage medium for determining document similarity
CN105550240A (en) Recommendation method and recommendation device
AU2020104435A4 (en) Method and apparatus for video recommendation, and refrigerator with screen
CN110334067B (en) Sparse matrix compression method, device, equipment and storage medium
CN109947728B (en) Log file processing method and device
CN110895479B (en) Data processing method, device and equipment
US20160155054A1 (en) Method of Operating a Solution Searching System and Solution Searching System
CN110955429B (en) Data analysis method and device
CN110532262B (en) Automatic data storage rule recommendation method, device and equipment and readable storage medium
CN117593096B (en) Intelligent pushing method and device for product information, electronic equipment and computer medium
CN111339574B (en) Block data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100080 A 5 C, block A, China International Steel Plaza, 8 Haidian Avenue, Haidian District, Beijing.

Applicant after: Youku network technology (Beijing) Co., Ltd.

Address before: 100080 A 5 C, block A, China International Steel Plaza, 8 Haidian Avenue, Haidian District, Beijing.

Applicant before: 1Verge Inc.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200528

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Applicant before: Youku network technology (Beijing) Co., Ltd

GR01 Patent grant
GR01 Patent grant