CN109889891A

CN109889891A - Obtain the method, apparatus and storage medium of target media file

Info

Publication number: CN109889891A
Application number: CN201910165332.8A
Authority: CN
Inventors: 张晗
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2019-06-14
Anticipated expiration: 2039-03-05
Also published as: CN109889891B

Abstract

The embodiment of the invention discloses a kind of method, apparatus and storage medium for obtaining target media file, and method includes: the media data for loading multiple media files；Based on the media data, feature extraction is carried out to each media file respectively, multiple characteristic values of each media file are obtained, the multiple characteristic value includes: the characteristic value of the characteristic value of the attributive character of the media file and the statistical nature of the media file；Mapping relations between multiple characteristic values, characteristic value based on each media file and media file scoring, obtain the scoring of each media file；Based on the scoring of each media file, the media file of preset quantity in the multiple media file is chosen as target media file.

Description

Obtain the method, apparatus and storage medium of target media file

Technical field

The present invention relates to data processing technique more particularly to a kind of method, apparatus and storage for obtaining target media file Medium.

Background technique

With the development of internet technology, user can watch the media such as video, picture and text text by the client of mobile terminal Part, client can also carry out the recommendation of high-quality media file to user, to improve the clicking rate of media file, as shown in Figure 1, with Family can carry out recommending by the main feeds of content distribution platform (watching focus) viewing of video.

In the related technology, the superiority and inferiority of rule (such as dispatch media grade) the evaluation media file formulated using user's subjectivity, Or the superiority and inferiority of posteriority statistical information (such as click information) the evaluation media file based on media file, and then carry out target medium The acquisition of file, however, the information of evaluation media file superiority and inferiority used by this acquisition modes is more unilateral, so that can not be quasi- The good target media file of true acquisition.

Summary of the invention

The embodiment of the present invention provides a kind of method, apparatus and storage medium for obtaining target media file, can be quick, quasi- The good target media file of true acquisition.

The technical solution of the embodiment of the present invention is achieved in that

In a first aspect, the embodiment of the present invention provides a kind of method for obtaining target media file, comprising:

Load the media data of multiple media files；

Based on the media data, feature extraction is carried out to each media file respectively, obtains each media file Multiple characteristic values, the multiple characteristic value include: the attributive character of the media file characteristic value and the media file Statistical nature characteristic value；

Mapping relations between multiple characteristic values, characteristic value based on each media file and media file scoring, obtain To the scoring of each media file；

Based on the scoring of each media file, the media file conduct of preset quantity in the multiple media file is chosen Target media file.

Second aspect, the embodiment of the present invention provide a kind of method for obtaining target media file, comprising:

In response to the acquisition instruction of target media file, the acquisition request of the target media file is sent；

Receive the target media file returned, scoring of the target media file based on media file, from multiple It chooses and obtains in media file, the scoring of the media file is based on multiple characteristic values, characteristic value and the matchmaker of the media file Mapping relations between body document score are calculated, and the multiple characteristic value includes: the attributive character of the media file The characteristic value of the statistical nature of characteristic value and the media file；

Pass through target media file described in user interface presentation.

The third aspect, the embodiment of the present invention provide a kind of device for obtaining target media file, comprising:

Loading unit, for loading the media data of multiple media files；

Extracting unit carries out feature extraction to each media file respectively for being based on the media data, obtains each Multiple characteristic values of the media file, the multiple characteristic value include: the attributive character of the media file characteristic value and The characteristic value of the statistical nature of the media file；

Map unit, between multiple characteristic values, characteristic value and media file scoring based on each media file Mapping relations, obtain the scoring of each media file；

Selection unit chooses preset quantity in the multiple media file for the scoring based on each media file Media file as target media file.

Fourth aspect, the embodiment of the present invention provide a kind of device for obtaining target media file, comprising:

Transmission unit sends the acquisition of the target media file for the acquisition instruction in response to target media file Request；

Receiving unit, for receiving the target media file returned, the target media file is based on media file Scoring, choose and obtain from multiple media files, the scoring of the media file is based on multiple features of the media file Mapping relations between value, characteristic value and media file scoring are calculated, and the multiple characteristic value includes: the media file Attributive character characteristic value and the media file statistical nature characteristic value；

Display unit, for passing through target media file described in user interface presentation.

5th aspect, the embodiment of the invention provides a kind of devices for obtaining target media file, comprising:

Memory is configured to save the program for obtaining target media file；

Processor is configured to operation described program, wherein described program executes institute provided in an embodiment of the present invention when running State the method for obtaining target media file.

6th aspect, the embodiment of the present invention provide a kind of storage medium, are stored with executable program, the executable program When being executed by processor, the method for obtaining target media file provided in an embodiment of the present invention is realized.

It is had the advantages that using the above embodiment of the present invention

Using the method, apparatus and storage medium of the above-mentioned acquisition target media file of the embodiment of the present invention, due to matchmaker During body file carries out feature extraction, the characteristic value of obtained media file includes: the feature of the attributive character of media file The characteristic value of the statistical nature of value and media file, that is, the feature of the media file used had both included the intrinsic category of media file Property feature, but the posteriority statistical nature including media file, the characteristic information of use are comprehensive, the multiple features being then based on Value scores to media file, and the superiority and inferiority of accurate, true reaction media file is capable of in obtained scoring, so that mesh The acquisition accuracy for marking media file is higher.

Detailed description of the invention

Fig. 1 is that the schematic diagram for recommending video is presented in client provided in an embodiment of the present invention；

Fig. 2 is a schematic diagram of user provided in an embodiment of the present invention portrait；

Fig. 3 is the configuration diagram of the system 100 provided in an embodiment of the present invention for obtaining target media file；

Fig. 4 is the hardware structural diagram of the device provided in an embodiment of the present invention for obtaining target media file；

Fig. 5 is the method flow schematic diagram one provided in an embodiment of the present invention for obtaining target media file；

Fig. 6 is the schematic diagram provided in an embodiment of the present invention based on user's portrait loading medium data；

Fig. 7 is the composed structure schematic diagram of characteristic value provided in an embodiment of the present invention；

Fig. 8 is the method flow schematic diagram two provided in an embodiment of the present invention for obtaining target media file；

Fig. 9 is the configuration diagram provided in an embodiment of the present invention for obtaining target media file；

Figure 10 is the configuration diagram that video provided in an embodiment of the present invention scoring is sorted；

Figure 11 is the configuration diagram of model training provided in an embodiment of the present invention；

Figure 12 is the composed structure schematic diagram one of the device provided in an embodiment of the present invention for obtaining target media file；

Figure 13 is the composed structure schematic diagram two of the device provided in an embodiment of the present invention for obtaining target media file.

Specific embodiment

The present invention is further described in detail below with reference to the accompanying drawings and embodiments.It should be appreciated that mentioned herein Embodiment is only used to explain the present invention, is not intended to limit the present invention.In addition, embodiment provided below is for implementing Section Example of the invention, rather than provide and implement whole embodiments of the invention, in the absence of conflict, the present invention is implemented Example record technical solution can mode in any combination implement.

It should be noted that in embodiments of the present invention, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that including the method for a series of elements or device not only includes wanting of being expressly recited Element, but also including other elements that are not explicitly listed, or further include for implementation method or device intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Method or device in there is also other relevant factor (such as the step in method or the unit in device, for example, Unit can be partial circuit, segment processor, subprogram or software etc.).

For example, the method provided in an embodiment of the present invention for obtaining target media file contains a series of step, still The method provided in an embodiment of the present invention for obtaining target media file is not limited to documented step, and similarly, the present invention is implemented The device for the acquisition target media file that example provides includes a series of units, but device provided in an embodiment of the present invention is unlimited In including unit be expressly recited, can also include be required when obtaining relevant information or being handled based on information set The unit set.

In the following description, related term " first second " be only be the similar object of difference, do not represent needle To the particular sorted of object, it is possible to understand that specific sequence or successively can be interchanged in ground, " first second " in the case where permission Order, so that the embodiment of the present invention described herein can be implemented with the sequence other than illustrating or describing herein.

Before the present invention will be described in further detail, noun involved in the embodiment of the present invention and term are said Bright, noun involved in the embodiment of the present invention and term are suitable for following explanation.

1) user draws a portrait, and refers to the virtual representations of real user, a series of target user being built upon on attribute datas Model herein refers to the interest model of the stratification of the correspondence user gone out according to the historical behavior data abstraction of user, for referring to The interest classification for showing user is illustrated in figure 2 a schematic diagram of user's portrait provided in an embodiment of the present invention.

2) media file, the matchmaker of retrievable various forms (such as video, audio, picture and text media format) in internet Video file, the article including picture and text form presented in body, such as client.

3) stylish degree: for measuring the reference standard of the timeliness of media file, media file is reflected in time Timely degree；The stylish degree for the news-video that the stylish degree for the news-video issued on the day of such as is issued before being higher than one week.

4) in response to the condition or state relied on for indicating performed operation, when the relied on condition of satisfaction Or when state, performed one or more operations be can be in real time, it is possible to have the delay of setting；Do not saying especially In the case where bright, there is no the limitations for executing sequencing for performed multiple operations.

In some embodiments, the acquisition of target media file can be carried out in the following way: using media file properties The superiority and inferiority for evaluating media file, scores to multiple media files based on the attribute of media file, is then based on scoring to more A media file is ranked up, and the selection of target media file is carried out based on ranking results；The attribute of media file such as issues matchmaker Grade, media file grade, media file tag quantity of body etc., wherein dispatch media grade is issued according to the media history The comprehensive marking to the media such as quality, the Data Representation of media file；Similar, media file grade is according to matchmaker A comprehensive marking to article such as length for heading, title keyword number, text message, pictorial information, length of body file. This mode for obtaining target media file is established under the premise of strong priori knowledge, the rules evaluation formulated using user's subjectivity The superiority and inferiority (such as thinking that all media files of high-quality media releasing are better than the media file of other media releasings) of media file, it is main The property seen is stronger.

In some embodiments, the acquisition of target media file can be also carried out in the following way: based on media file Posteriority statistical information evaluates the superiority and inferiority of media file, the greedy sequence based on posteriority statistical information: such as e-greedy algorithm, Tang Pu The methods of gloomy sampling.E-greedy algorithm is ranked up according to the clicking rate of current media file every time, it is believed that currently optimal to be For global optimum；Thompson sampling is modeled using gamma distribution modeling by the current click volume of media file, displaying amount, The current financial value of media file is obtained by way of sampling every time.This mode for obtaining target media file relies on by force text The posteriority statistical information of chapter is modeled using the posteriority statistical information of article, has abandoned the build-in attribute of media file completely Information, also, greedy algorithm is generally local optimum, non-global optimum.

Fig. 3 is an optional framework signal of the system 100 provided in an embodiment of the present invention for obtaining target media file Figure supports an exemplary application referring to Fig. 3 to realize, terminal 400 (illustrates terminal 400-1 and terminal 400-2) Server 200 is connected by network 300, network 300 can be wide area network or local area network, or be combination, makes Data transmission is realized with Radio Link.

In some embodiments, terminal 400, for when user triggers the acquisition instruction of media file by client (the watching focus page as opened mobile phone QQ), sends the acquisition request of target media file to server；

Server 200, the acquisition request of the target media file for receiving terminal transmission, loads multiple media files Media data is based on media data, carries out feature extraction to each media file respectively, obtains multiple features of each media file Value, the multiple characteristic value includes: the characteristic value of the characteristic value of the attributive character of media file and the statistical nature of media file, Mapping relations between multiple characteristic values, characteristic value based on each media file and media file scoring, obtain each media file Scoring the media file of preset quantity in multiple media files is chosen as target medium based on the scoring of each media file File；

Server 200 is also used to send the target media file chosen and obtained to terminal 400；

Terminal 400 is also used to (illustrate graphical interfaces 410-1 and graphical interfaces 410- by graphical interfaces 410 2) target media file received is shown on.

Next the device provided in an embodiment of the present invention for obtaining target media file is illustrated.The embodiment of the present invention The device of the acquisition target media file of offer may be embodied as the mode of hardware or software and hardware combining, illustrate the present invention below The various exemplary implementations for the device that embodiment provides.

It elaborates below to the hardware configuration of the device of the acquisition target media file of the embodiment of the present invention, Ke Yili Solution, Fig. 4 illustrate only the exemplary structure for obtaining the device of target media file rather than entire infrastructure, as needed can be real Apply the part-structure or entire infrastructure shown in Fig. 4.

The device 20 provided in an embodiment of the present invention for obtaining target media file includes: at least one processor 201, storage Device 202, user interface 203 and at least one network interface 204.The various components obtained in the device 20 of target media file are logical Bus system 205 is crossed to be coupled.It is appreciated that bus system 205 is for realizing the connection communication between these components.Always Linear system system 205 further includes power bus, control bus and status signal bus in addition in addition to including data/address bus.But in order to clear For the sake of Chu's explanation, various buses are all designated as bus system 205 in Fig. 4.

Wherein, user interface 203 may include display, keyboard, mouse, trace ball, click wheel, key, button, sense of touch Plate or touch screen etc..

It is appreciated that memory 202 can be volatile memory or nonvolatile memory, may also comprise volatibility and Both nonvolatile memories.

Memory 202 in the embodiment of the present invention is supported to obtain target media file for storing various types of data Device 20 operation.The example of these data includes: any for operating on the device 20 for obtaining target media file Executable instruction, such as executable instruction realize that the program of the method for the acquisition target media file of the embodiment of the present invention can wrap It is contained in executable instruction.

The method for the acquisition target media file that the embodiment of the present invention discloses can be applied in processor 201, Huo Zheyou Processor 201 is realized.Processor 201 may be a kind of IC chip, the processing capacity with signal.In the process of realization In, obtain the method for target media file each step can by the integrated logic circuit of the hardware in processor 201 or The instruction of software form is completed.Above-mentioned processor 201 can be general processor, digital signal processor (DSP, Digital Signal Processor) either other programmable logic device, discrete gate or transistor logic, discrete hardware group Part etc..Processor 201 may be implemented or execute disclosed each method, step and logic diagram in the embodiment of the present invention.It is logical It can be microprocessor or any conventional processor etc. with processor.The step of the method in conjunction with disclosed in the embodiment of the present invention Suddenly, hardware decoding processor can be embodied directly in and execute completion, or with the hardware and software module group in decoding processor Conjunction executes completion.Software module can be located in storage medium, which is located at memory 202, and the reading of processor 201 is deposited Information in reservoir 202, in conjunction with its hardware complete it is provided in an embodiment of the present invention obtain target media file method the step of.

Next the method provided in an embodiment of the present invention for obtaining target media file is illustrated.In some embodiments In, it is the flow diagram of the method provided in an embodiment of the present invention for obtaining target media file referring to Fig. 5, Fig. 5, the present invention is real The method of acquisition target media file for applying example offer includes:

Step 301: server loads the media data of multiple media files.

In some embodiments, server can carry out the load of media data in the following way:

The historical behavior data of server acquisition target user；Based on historical behavior data, determine instruction target user's User's portrait of interest classification；Load the media data of multiple media files of corresponding user's portrait.

Here, in actual implementation, server can identify the historical behavior data for obtaining target user based on target user, The media file of such as viewing/clicked, corresponding media file type, viewing/number of clicks, the history based on target user Behavioral data carries out the calculating of user's portrait of target user, and to determine the interest classification of target user, Fig. 6 is that the present invention is real The schematic diagram based on user's portrait loading medium data for applying example offer includes the label of Bryant referring to Fig. 6, in user's portrait, Server loads video data relevant to Bryant, and the video file of load exists as candidate video file, and then in candidate The selection of target video file (video file to be recommended) is carried out in video file, to recommend to agree with user interest most for user High-quality media file.

In practical applications, the media data of the media file of server load may include the attribute information of media file (positive number of rows evidence) and posteriority statistical data；Wherein, by taking media file is video as an example, the attribute information of video file can be view The first-level class of frequency file, secondary classification, three-level classification, duration, label (tag), source, topic, cover point, quality divide, are quick-fried Money point, stylish degree, whether group picture, whether big figure, video level, stylish degree etc., the posteriority statistical data of video file can be The hits of video file play number, clicking rate, duration, thumb up number, comment number, double-click number, collection number, share number etc.；Its In, posteriority statistical data can be obtained according to click logs, duration log, the user behaviors log etc. reported offline.

Step 302: being based on the media data, feature extraction is carried out to each media file respectively, obtains each media file Multiple characteristic values, the multiple characteristic value include: the attributive character of media file characteristic value and media file statistics it is special The characteristic value of sign.

In some embodiments, server executes following operation to each media file respectively, to realize to each media text The feature extraction of part:

Server obtains the original value of at least two features of media file, and at least two feature includes the media The attributive character and statistical nature of file；It is based respectively on the original value and corresponding feature name of each feature, obtains each feature Characteristic value.

Here, in actual implementation, the attributive character of media file may include at least one of: media file level-one Classification, the classification of media file secondary classification, media file three-level, media file duration, media file source, media file mark Label, whether media file title label, media file are group picture, media file be big figure, media file region grade, matchmaker The stylish degree of body file content grade, media file, media file title byte number, media file identification, media file picture category Property, media file issuing time, media file source, media file regional information, media file theme, media file title, Media file title number of words, media file source quality are divided, media file cover point, media file quality are divided, media file is quick-fried What money point, media file cover point and media file were classified intersect, media file cover divides and the intersecting of media file duration, matchmaker What body document quality point and media file were classified intersect, media file quality is divided and the intersecting of media file duration, media file What quick-fried money point and media file were classified intersect, the intersecting of media file theme and media file issuing time, media file classification With intersecting for media file region.

In actual implementation, the statistical nature of media file may include at least one of: the media text of media file The touching quantity of part, the clicking rate of media file, media file thumb up quantity, the number of reviews of media file, media file Collection quantity, the sharing quantity of media file, the forwarding quantity of media file, the temperature of media file, media file broadcast Put duration, the viewing completeness of media file, media file like count/liking rate, not liking for media file count/is liked Rate, the channel hits of media file, the channel clicking rate of media file, the theme hits of media file, media file Theme clicking rate, the media clicking rate of media file, the source hits of media file, the source clicking rate of media file, matchmaker The intersecting of body file clicking rate and hits, the intersecting of media file channel and temperature, the intersecting of media file theme and temperature, Media file source hits are intersected with clicking rate.

It is showed to distinguish media file in statistical data in different time periods, above-mentioned part statistical nature can be limited further It is scheduled in different time windows, four time windows such as time window such as hour, day, week, the moon, such as media file day/week/ Month touching quantity, media file day/week/moon clicking rate, thumb up media file day/week/moon quantity, media file day/ Week/moon number of reviews, media file day/week/moon collection quantity, media file day/week/moon sharing quantity, media text Part day/week/moon forwarding quantity etc..

Here the characteristic crossover of media file is illustrated, in embodiments of the present invention, characteristic crossover is also feature group It closes, is to form composite character and being combined individual feature, is with media file clicking rate and intersecting for hits Example, this feature are that independent feature clicking rate and hits are combined to the composite character to be formed.

In some embodiments, server is after the original value of the attributive character and statistical nature that obtain media file, The characteristic value of each feature can be further obtained in the following way:

Hash, the first cryptographic Hash for obtaining each feature, the feature name to each feature are carried out to the original value of each feature Claim character string to carry out Hash, obtains the second cryptographic Hash of each feature；The first cryptographic Hash and second for being based respectively on each feature are breathed out Uncommon value, obtains the characteristic value of each feature.

Here, in actual implementation, service implement body can obtain the first cryptographic Hash of feature: server in the following way The original value of feature is mapped to 64 hash spaces, obtained 64 cryptographic Hash are first Hash of this feature Value.By taking single feature as an example, generally there are three types of type, uint64 type, float type, character string types for the original value of feature；Example Such as hits thumb up number features and are generally uint64 type, and clicking rate thumbs up the features such as rate and is generally float type, send the documents The features such as media are generally character string type.

In actual implementation, service implement body can obtain the second cryptographic Hash of feature in the following way: server will be special Name-assemblying claims character string maps to 64 hash spaces, and obtained 64 cryptographic Hash are second cryptographic Hash of this feature.

In actual implementation, first cryptographic Hash and second cryptographic Hash of the server based on each feature specifically can be by such as Under type obtains the characteristic value of feature: server takes low 16 indicative character types of the second cryptographic Hash, takes the first cryptographic Hash Low 48 indicative characters index (i.e. offset of this feature in the category feature), are then combined into this feature 64 characteristic values, Wherein, preceding 16 expressions characteristic type of characteristic value, rear 48 expressions aspect indexing.Compared with continuous feature, this Hash is special The mode of sign can reduce the conflict between feature, increase the distinction of feature, and one of the characteristic value of obtained feature shows Such as shown in Fig. 7.

Step 303: the mapping between multiple characteristic values, characteristic value based on each media file and media file scoring is closed System, obtains the scoring of each media file.

In some embodiments, service implement body can obtain commenting for media file by machine learning model trained in advance Point, in practical applications, used machine learning model can be chosen according to actual needs, such as: logistic regression (LR, Logistic Regression) model, Factorization machine (FM, Factorization Machine) model, field perception Disassembler (FFM, Field-aware Factorization Machine) model, depth factor disassembler (DeepFM, Deep Factorisation Machine) model, width & depth (wide&deep) model；To select what LR model scored to obtain It is taken as example, server is respectively by multiple characteristic values input of each media file Logic Regression Models (classification that training obtains in advance Model), obtain the scoring of each media file, used formula can be with are as follows: y=w₀+w₁*x₁+w₂*x₂+w₃*x₃+…+w_n*x_n； Wherein, x_nFor n-th of characteristic value of media file, w_nFor x_nCoefficient, y be media file scoring, y ∈ [0,1], in reality In, after server obtains the corresponding n characteristic value of n feature of some media file, n characteristic value input is trained The LR model arrived, a scoring of the value for obtaining corresponding to the media file between 0 and 1.

Next the training of LR model is illustrated for using LR model.In actual implementation, LR model according to Positive sample data and the training of negative sample data obtain, and during the primary displaying of media file, the multiple media shown are literary The corresponding sample data of requested media file in positive sample data, the multiple media files shown as not asked in part The corresponding sample data of the media file asked is as negative sample data.

Here, with n value for 63, for 34 attributive character and 29 statistical natures that extract media file, for positive sample For notebook data, using the characteristic value of 34 attributive character of media file, the characteristic value of 29 statistical natures as input, to comment It is divided into 1 as output, for negative sample data, with the characteristic value of 34 attributive character of media file, 29 statistics spies The characteristic value of sign is 0 as output as input, using scoring, and training LR model is predicted corresponding according to n characteristic value of input The performance of the scoring of media file.

In actual implementation, online machine learning FTRL (Follow-the- is can be used in training for LR model Regularized-Leader) algorithm carries out the real-time training of Large Scale Sparse Logic Regression Models.

Step 304: the media file of preset quantity in the multiple media file is chosen in the scoring based on each media file As target media file.

In some embodiments, server can carry out the selection of target media file in the following way:

Scoring of the server based on each media file is ranked up multiple media files according to scoring height, obtains matchmaker Body file sequence carries out media file selection since media file sequence first media file, until choosing present count The media file of amount is as target media file.

It is illustrated so that server obtains the scoring of 10 media files, preset quantity is 3 as an example.Server obtain 10 The scoring of a media file between 0 and 1, server according to scoring sequence from high to low to 10 obtained score into Row sequence, such as 0.9,0.85,0.83,0.8,0.75,0.7,0.66,0.64,0.6,0.58, choose scoring for 0.9,0.85, Media file corresponding to 0.83 is as target media file.

In some embodiments, server can carry out the push of media file, service based on the target media file of selection The client that device sends target media file to target user is presented.Since the target media file pushed is based on use The determining good media file of family portrait, more agrees with target user's demand, and then improve the clicking rate and broadcasting of media file Rate.

Next to the method for the acquisition target media file of the embodiment of the present invention by taking media file is video file as an example It is illustrated, Fig. 8 is the flow diagram of the method provided in an embodiment of the present invention for obtaining target media file, referring to Fig. 8, originally The method of acquisition target media file that inventive embodiments provide includes:

Step 401: in response to the acquisition instruction of target video file, target user's client sends target video file Acquisition request is to server.

Here, in practical applications, target user's client can push (recommendation) function arbitrarily to have video file Client, such as see point module in QQ client, when target user clicks the watching focus page by QQ client, QQ client End sends the acquisition request of target video file (recommending video) to server, to carry out video recommendations in the watching focus page.

Step 402: server determines user's portrait of target user based on the acquisition request received.

In actual implementation, the acquisition request for the target video file that server parsing receives obtains target user's mark, The historical behavior data for obtaining target user are identified based on target user, as video-see data, video thumb up data, video is commented By data, video collection data etc., the historical behavior data of target user are then based on, calculate user's portrait of target user, To determine the interest classification of target user.

Step 403: server is drawn a portrait based on the user of target user, loads the video data of corresponding multiple videos.

Here, since user's portrait of target user is the labeling that the historical behavior data abstraction based on target user goes out User model, that is, the historical behavior data based on target user be target user labelling, and label be can indicate target The mark of a certain dimensional characteristics of user, therefore, in actual implementation, can based on the label in the user model taken out obtain with The relevant video data of label (the positive number of rows evidence of video), video data here includes video title, video presentation, classification, mark Sign, thumb up number, comment number etc..

Step 404: video data of the server based on load carries out feature extraction to each video respectively, obtains each video N characteristic value.

Here, in actual implementation, for each video file, server realizes that feature is taken out in the following way Take: server obtains the original value including the attributive character of video and n feature of statistical nature, then respectively by each feature Original value be mapped to 64 hash spaces, take low 48 indicative characters of obtained each 64 cryptographic Hash to index, so Afterwards respectively by the feature name character string maps of each feature to 64 hash spaces, obtained each 64 cryptographic Hash are taken Low 16 indicative character types, 48 Hash for then indexing 16 cryptographic Hash and indicative character of indicative character type Value, be combined into it is preceding 16 indicate characteristic type, it is rear 48 expression aspect indexing 64 this feature characteristic value.Server is adopted N characteristic value of each video is obtained with identical method, in practical applications, n value can be set according to actual needs.

Specifically, the calculating that following formula carries out characteristic value can be used in server:

Y=hash ((feature_name&0xFFFF) < < 48+feature_value&0xFFFFFFFFFFFF.

Step 405: the LR model that server respectively obtains the n characteristic value input training of each video obtains each The scoring of video.

Step 406: preset quantity is chosen in scoring of the server based on obtained each video from the multiple video Target video sends corresponding video data and gives target user's client.

Here, server is when carrying out video file selection, can be chosen based on scoring height, specifically, can be according to Scoring height is ranked up multiple video files, obtains video file sequence, first video text from video file sequence Part starts to carry out video file selection, until choosing the video file of preset quantity as target video file.

In actual implementation, the mode that stream transmission can be used in server sends video data (such as video of selecting video Cover data) give target user's client.

Step 407: target user's client shows the video data that server is sent.

In practical applications, target user carries out video-see, such as target user based on the video data that client is shown The video for wanting viewing is clicked, user end to server requests the video data of the video, while recording the use of the corresponding video Family operation data, the video based on current presentation, construct new positive sample data (during current primary video is shown, The corresponding sample data of video of user's request) and negative sample data (during current primary video is shown, user The corresponding sample data of the video of request), then, updated using the positive sample data and negative sample data of building above-mentioned trained The LR model arrived obtains the updated LR model of parameter, to obtain for the subsequent scoring to video.

Here, the displaying of the primary video of client is illustrated, due in practical applications, request of the user to video (click) exists relative to video display to be delayed, and therefore, time window is arranged, being considered as a time window (specifically can be according to reality Border situation is set, such as 15 minutes) in occur the request (click) for video, be the corresponding secondary video display process In video request；By taking time window is 15 minutes as an example, since video display in 15 minutes, in the video of displaying, user The corresponding sample data of the video of request is positive sample data, and in this 15 minutes, in the video of displaying, user is unsolicited The corresponding sample data of video is negative sample data.

Next it is QQ client by video file, client of media file, is provided with video recommendations in QQ client For engine (watching focus), the method for the acquisition target media file of the embodiment of the present invention is illustrated.Fig. 9 is implementation of the present invention The configuration diagram for the acquisition target media file that example provides, referring to Fig. 9, acquisition target medium text provided in an embodiment of the present invention The method of part mainly includes offline part and online part；Wherein, offline part is mainly according to the historical behavior data of user Calculate user's portrait, the portrait mainly comprising the different dimensions such as tag, channel, online part mainly recalling comprising candidate video, Ordering and marking and video diversity displaying of video etc., is next illustrated each section respectively.

For offline part, it is the interest mould an of stratification that user's portrait, which is the long-term accumulation to user interest, Type, as shown in Fig. 2, from top layer be successively first-level class, secondary classification, tag down；By taking " Bryant " this tag as an example, as call together Return video library in " Bryant " relevant video as candidate video.In actual implementation, when user starts recommendation service, service The user identifier carried in the request that device can be sent based on client carries out user's portrait to the user and calculates, to carry out correlation Video is recalled.

For online part, Figure 10 is the configuration diagram that video scoring provided in an embodiment of the present invention is sorted, Referring to Figure 10, including video sort sections and model training part；Next each section is illustrated respectively.

For video sort sections, including video data load, feature extraction and marking and queuing.

Firstly, being illustrated to video data load.

Here, in actual implementation (when recommendation service starts), server can be drawn a portrait based on user and load corresponding view The positive number of rows evidence of frequency, positive number of rows according to containing whole attributive character information of video, as the first-level class of video, secondary classification, Three-level classification, duration, tag, source, topic, cover point, quality point, quick-fried money point, stylish degree, whether group picture, whether big figure, view The relevant whole attributive character data of the videos such as frequency grade, stylish degree；Meanwhile server can also load the statistics spy of corresponding video Levy data (a series of Data Representation information of video after exposition), as video hits, play number, clicking rate, when Length thumbs up number, comment number, double-clicks the different statistical nature data such as number, collection number, sharing number.Wherein, statistical nature data can root It is obtained according to statistics such as click logs, duration log, the user behaviors logs reported offline, is pushed to line in the form of offline statistics file On, load positive number of rows according to when together load enter, extracted for subsequent characteristics and data supporting be provided.

Next feature extraction is illustrated.Feature extraction main feature engineering, aspect indexing and feature coding three Point.

Wherein, Feature Engineering is mainly the attributive character and statistical nature that video is extracted from the video data of load, is obtained To the original value of each feature.

In some embodiments, from video content angle, video include first-level class, secondary classification, three-level classification, Duration, tag, source, topic, cover point, quality point, quick-fried money point, stylish degree, whether group picture, whether big figure, article grade, when Important attributes, the embodiment of the present invention such as new degree have targetedly extracted attribute list feature, for wherein important single attribute Feature, such as topic, duration, cover divide, quality point, quick-fried money point, increase itself and first-level class, secondary classification, three-level classification etc. The intersection of video elementary classification finally obtains 26 class list attributive character, and it is special to amount to 34 class video attributes for 8 class cross attribute features Sign.

By taking video is news-video as an example, can carry out the extraction of following attributive character: news-video first-level class, video are new Hear secondary classification, the classification of news-video three-level, news-video duration, news-video source, news-video tag, news-video mark Whether it is big figure, news-video area news grade, news-video event that whether topic tag, news-video are group picture, news-video The stylish degree of grade, news-video, news-video lemma joint number, news-video ID, news-video picture attribute, news-video hair Cloth duration, news-video time of origin, news-video region, news-video topic, news-video title, news-video lemma Number, news-video source quality are divided, news-video cover point, news-video quality are divided, the quick-fried money of news-video point, news-video envelope Face point intersects with visual classification, news-video cover point intersects with video length, news-video quality point is intersected with visual classification, News-video quality point is intersected with video length, the quick-fried money of news-video point intersects with news-video classification, the quick-fried money of news-video point Intersect with video length, news-video topic intersects with news-video publication duration, news-video classification intersects with video region.

From operational angle is recommended, video has a series of Data Representation information after exposition, such as the click of video Number, clicking rate, duration, thumbs up number, comment number, double-clicks the different statistical natures such as number, collection number, sharing number, the present invention broadcasting number Embodiment is targetedly added to the above-mentioned rear verification certificate statistical nature of video, meanwhile, in order to distinguish video in number in different time periods According to performance, the embodiment of the present invention is added to the statistical nature in different time window for the important single statistical nature in above-mentioned part, Such as four hour, day, week, moon time windows amount to 23 class list statistical natures, 6 class cross features, finally obtain 29 class videos Posteriority statistical nature.

By taking video is news-video as an example, the extraction of following statistical nature: news-video hour clicking rate, video can be carried out News day clicking rate, news-video week clicking rate, news-video temperature, news-video day/week/moon broadcasting number, news-video day/ Share number, news-video day/week/moon forwarding number in week/moon, collect number, news-video day/week/moon BIU by news-video day/week/moon Number, news-video source clicking rate, news-video source hits, news-video read duration, news-video comments on number, video News comment rate, news-video read completeness, news-video user likes number, news-video likes rate, news-video user Number is not liked, news-video does not like rate, news-video hour hits, news-video channel clicking rate, news-video channel Hits, news-video topic hits, news-video topic clicking rate, news-video media clicking rate, news-video are clicked The intersecting of rate and hits, the intersecting of news-video week clicking rate and clicking rate, the friendship of news-video hour grade ctr and hits Fork, the intersecting of news-video channel and temperature, the intersecting of news-video topic and video temperature, news-video source hits and The intersection of clicking rate.

When server executes Feature Engineering operation, the video data based on load obtains above-mentioned 34 class video attribute feature Original value and above-mentioned 29 class video posteriority statistical nature original value.

Aspect indexing is to calculate offset of this feature in the category feature, when feature indexes, generally have one or Multiple input values (i.e. original value), the calculating of aspect indexing are carried out here according to one or more value of input, in reality When implementation, it can be obtained by the hash space that the original value of feature is mapped to 64.

By taking single feature as an example, generally there are three types of type, uint64 type, float type, character string types for input.Such as Hits thumb up the features, generally uint64 type such as number, and aspect indexing is to input x value at this time；Such as it clicking rate, thumbs up The features such as rate, generally float feature, aspect indexing is x*10000 at this time；Such as the features such as publication medium, generally character String feature, aspect indexing is hash (x) at this time, for character string Hash calculation value.

By taking cross feature as an example, input generally have multiple parameters, by channel, hits cross feature for, the two Single feature respectively corresponds two uint64 values, respectively x1, x2, and the embodiment of the present invention is attached by multiplying prime number, specially The mode of x1*13131+x2 is attached.It can be expanded herein, input value is that the feature of arbitrary format can be according to above-mentioned The mode of single feature calculation index obtains respective index value, then above-mentioned multiplied by prime number by way of be attached.Similarly, Multiple input feature vectors can be extended to from 2 input feature vectors.

Feature coding is mainly that the coding of characteristic value calculates.In order to increase the distinction of feature, on-line performance, this hair are taken into account Bright embodiment has been mapped to characteristic value 64 hash spaces.Using preceding 16 expressions characteristic type of 64 bit spaces, pass through By feature name character string Hash, low 16 are taken to obtain；48 expression aspect indexings after use, by taking characteristic value Hash Low 48 obtain, i.e., are as follows:

Y=hash ((feature_name&0xFFFF) < < 48+feature_value&0xFFFFFFFFFFFF；

Compared with continuous feature, the mode of this Hash feature can reduce the conflict between feature, increase feature Distinction.

Next marking and queuing is illustrated.

The LR model that server obtains the multiple characteristic values input training for the video that feature extraction obtains, obtains corresponding be somebody's turn to do The scoring of video.The forward calculation mode of LR model are as follows:

Y=w₀+w₁*x₁+w₂*x₂+w₃*x₃+…+w_n*x_n；

Wherein, x_nFor n-th of characteristic value of video, w_nFor x_nCoefficient, y be video scoring, y ∈ [0,1].

In actual implementation, the unordered_map container access parameter of stl can be used, but be to look for it is time-consuming excessively high, also The container access parameter that google dense_map can be used, trades space for time, and can reduce about 2/3 lookup time-consuming.

After server obtains the scoring of each video, each video is ranked up based on scoring height, based on sequence As a result, the video for choosing the high preset quantity that scores carries out video recommendations.

Next model training part is illustrated.Figure 11 is that the framework of model training provided in an embodiment of the present invention shows It is intended to, referring to Figure 11, model training mainly includes three log merging, feature extraction and model training parts, following difference It is illustrated.

Log, which merges, mainly polymerize all information once requested according to click logs, displaying log, feature log Together.Because clicking generally is delayed bigger relative to showing, there are time window issue, the embodiment of the present invention is used The time window of 15min, it is believed that the click of a displaying is occurred within 15min.For each piece requested every time It shows article, searches whether to be clicked and corresponding characteristic, the daily record data merged write on kafka.

Feature extraction operates on spark frame, according to the daily record data merged, extracts corresponding characteristic respectively According to, the positive sample and negative sample of model training are constructed, the sample data wherein the corresponding sample data of the video being clicked is positive, not The corresponding sample data of the video being clicked is negative sample data, and the embodiment of the present invention is by positive sample data, negative sample data point It does not write on two topic of kafka, is read for model training side.While extraction feature, the present invention has counted instruction simultaneously Practice the core index such as sample number, test sample number, average sample length, positive sample rate, to the operating status of monitoring model, example Such as, if carrying out positive sample rate in the sample of model training is 0, that is, positive sample is not present, then the model that the current training of explanation obtains Inaccuracy.In practical applications, Open Framework MXNET specifically can be used and carry out model training.

Figure 12 is the composed structure schematic diagram of the device provided in an embodiment of the present invention for obtaining target media file, the device Positioned at server side, referring to Figure 12, the device provided in an embodiment of the present invention for obtaining target media file includes:

Loading unit 121, for loading the media data of multiple media files；

Extracting unit 122 carries out feature extraction to each media file respectively, obtains for being based on the media data To multiple characteristic values of each media file, the multiple characteristic value includes: the feature of the attributive character of the media file The characteristic value of the statistical nature of value and the media file；

Map unit 123 scores it for multiple characteristic values, characteristic value and media file based on each media file Between mapping relations, obtain the scoring of each media file；

Selection unit 124 chooses present count in the multiple media file for the scoring based on each media file The media file of amount is as target media file.

In some embodiments, the loading unit, specifically for obtaining the historical behavior data of target user；

Based on the historical behavior data, the user's portrait for indicating the interest classification of the target user is determined；

The media data of multiple media files of the corresponding user's portrait of load.

In some embodiments, the extracting unit is specifically used for executing following operation to each media file respectively:

The original value of at least two features of the media file is obtained, at least two feature includes the media text The attributive character and statistical nature of part；

It is based respectively on the original value and corresponding feature name of each feature, obtains the characteristic value of each feature.

In some embodiments, the extracting unit carries out Hash specifically for the original value to each feature, obtains The first cryptographic Hash to each feature, the feature name character string to each feature carry out Hash, obtain each spy Second cryptographic Hash of sign；

It is based respectively on the first cryptographic Hash and the second cryptographic Hash of each feature, obtains the characteristic value of each feature.

In some embodiments, the map unit, specifically for respectively by multiple characteristic values of each media file Input logic regression model obtains the scoring of each media file；

The Logic Regression Models are obtained according to positive sample data and the training of negative sample data；

Wherein, during the primary displaying of media file, requested media text in multiple media files for being shown The corresponding sample data of part is corresponding as not requested media file in positive sample data, the multiple media files shown Sample data is as negative sample data.

In some embodiments, the selection unit, specifically for the scoring based on each media file, according to scoring Height is ranked up the multiple media file, obtains media file sequence；

Media file selection is carried out first media file since the media file sequence, until choosing present count The media file of amount is as target media file.

In some embodiments, described device further include:

Push unit, the client for sending the target media file to target user are presented.

Figure 13 is the composed structure schematic diagram of the device provided in an embodiment of the present invention for obtaining target media file, referring to figure 13, which is located at client-side, and the device provided in an embodiment of the present invention for obtaining target media file includes:

Transmission unit 131 sends obtaining for the target media file for the acquisition instruction in response to target media file Take request；

Receiving unit 132, for receiving the target media file returned, the target media file is based on media text The scoring of part is chosen from multiple media files and is obtained, and the scoring of the media file is based on multiple spies of the media file Mapping relations between value indicative, characteristic value and media file scoring are calculated, and the multiple characteristic value includes: the media text The characteristic value of the statistical nature of the characteristic value of the attributive character of part and the media file；

Display unit 133, for passing through target media file described in user interface presentation.

The embodiment of the invention also provides a kind of device for obtaining target media file, described device includes:

Memory, for storing executable program；

Processor when for executing the executable program stored in the memory, is realized provided in an embodiment of the present invention The method of above-mentioned acquisition target media file.

The embodiment of the invention also provides a kind of storage medium, it is stored with executable program, at the executable code When managing device execution, the method for realization above-mentioned acquisition target media file provided in an embodiment of the present invention.

It need to be noted that: above is referred to the descriptions for the device for obtaining target media file, describe with the above method Be it is similar, the beneficial effect with method describes, and does not repeat them here, for acquisition target media file described in the embodiment of the present invention Undisclosed technical detail in device, please refers to the description of embodiment of the present invention method.

This can be accomplished by hardware associated with program instructions for all or part of the steps of embodiment, and program above-mentioned can be with It is stored in a computer readable storage medium, which when being executed, executes step including the steps of the foregoing method embodiments；And Storage medium above-mentioned includes: movable storage device, random access memory (RAM, Random Access Memory), read-only The various media that can store program code such as memory (ROM, Read-Only Memory), magnetic or disk.

If alternatively, the above-mentioned integrated unit of the present invention is realized in the form of software function module and as independent product When selling or using, it also can store in a computer readable storage medium.Based on this understanding, the present invention is implemented The technical solution of example substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, The computer software product is stored in a storage medium, including some instructions are used so that computer equipment (can be with It is personal computer, server or network equipment etc.) execute all or part of each embodiment the method for the present invention. And storage medium above-mentioned includes: that movable storage device, RAM, ROM, magnetic or disk etc. are various can store program code Medium.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of method for obtaining target media file characterized by comprising

Load the media data of multiple media files；

Based on the media data, feature extraction is carried out to each media file respectively, obtains the more of each media file A characteristic value, the multiple characteristic value include: the characteristic value of the attributive character of the media file and the system of the media file Count the characteristic value of feature；

Mapping relations between multiple characteristic values, characteristic value based on each media file and media file scoring obtain each The scoring of the media file；

Based on the scoring of each media file, the media file of preset quantity in the multiple media file is chosen as target Media file.

2. the method as described in claim 1, which is characterized in that the media data of the multiple media files of load, comprising:

Obtain the historical behavior data of target user；

3. the method as described in claim 1, which is characterized in that it is described that feature extraction is carried out to each media file respectively, Obtain multiple characteristic values of each media file, comprising:

Following operation is executed to each media file respectively:

The original value of at least two features of the media file is obtained, at least two feature includes the media file Attributive character and statistical nature；

4. method as claimed in claim 3, which is characterized in that the original value for being based respectively on each feature and corresponding Feature name, obtain the characteristic value of each feature, comprising:

Hash is carried out to the original value of each feature, obtains the first cryptographic Hash of each feature, to each feature Feature name character string carry out Hash, obtain the second cryptographic Hash of each feature；

5. the method as described in claim 1, which is characterized in that multiple characteristic values based on each media file, spy Mapping relations between value indicative and media file scoring, obtain the scoring of each media file, comprising:

Respectively by multiple characteristic value input logic regression models of each media file, each media file is obtained accordingly Scoring；

Wherein, during the primary displaying of media file, requested media file pair in multiple media files for being shown The sample data answered is as the corresponding sample of media file not requested in positive sample data, the multiple media files shown Data are as negative sample data.

6. the method as described in claim 1, which is characterized in that the scoring based on each media file, described in selection The media file of preset quantity is as target media file in multiple media files, comprising:

Based on the scoring of each media file, the multiple media file is ranked up according to scoring height, obtains media File sequence；

Media file selection is carried out first media file since the media file sequence, until choosing preset quantity Media file is as target media file.

7. the method as described in claim 1, which is characterized in that the method also includes:

The client for sending the target media file to target user is presented.

8. a kind of method for obtaining target media file characterized by comprising

Receive the target media file returned, scoring of the target media file based on media file, from multiple media It chooses and obtains in file, the scoring of the media file is based on multiple characteristic values of the media file, characteristic value and media text Mapping relations between part scoring are calculated, and the multiple characteristic value includes: the feature of the attributive character of the media file The characteristic value of the statistical nature of value and the media file；

Pass through target media file described in user interface presentation.

9. a kind of device for obtaining target media file characterized by comprising

Loading unit, for loading the media data of multiple media files；

Extracting unit carries out feature extraction to each media file respectively for being based on the media data, obtains each described Multiple characteristic values of media file, the multiple characteristic value include: the characteristic value of the attributive character of the media file and described The characteristic value of the statistical nature of media file；

Map unit, for reflecting between multiple characteristic values, characteristic value and media file scoring based on each media file Relationship is penetrated, the scoring of each media file is obtained；

Selection unit chooses the matchmaker of preset quantity in the multiple media file for the scoring based on each media file Body file is as target media file.

10. device as claimed in claim 9, which is characterized in that

The loading unit, specifically for obtaining the historical behavior data of target user；

11. device as claimed in claim 9, which is characterized in that

The extracting unit is specifically used for executing following operation to each media file respectively:

12. device as claimed in claim 11, which is characterized in that

The extracting unit carries out Hash specifically for the original value to each feature, obtains the first of each feature Cryptographic Hash carries out Hash to the feature name character string of each feature, obtains the second cryptographic Hash of each feature；

13. device as claimed in claim 9, which is characterized in that

The map unit, specifically for by multiple characteristic value input logic regression models of each media file, obtaining respectively To the scoring of each media file；

14. device as claimed in claim 9, which is characterized in that

The selection unit, specifically for the scoring based on each media file, according to scoring height to the multiple media File is ranked up, and obtains media file sequence；

15. a kind of device for obtaining target media file characterized by comprising

Transmission unit sends the acquisition request of the target media file for the acquisition instruction in response to target media file；

Receiving unit, for receiving the target media file returned, the target media file is commented based on media file Point, it chooses and obtains from multiple media files, the scoring of the media file is based on multiple characteristic values of the media file, spy Mapping relations between value indicative and media file scoring are calculated, and the multiple characteristic value includes: the category of the media file The characteristic value of the statistical nature of the characteristic value and media file of property feature；