CN117014685A

CN117014685A - Method and device for distributing multimedia data and computer equipment

Info

Publication number: CN117014685A
Application number: CN202211433690.0A
Authority: CN
Inventors: 刘刚
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-11-07

Abstract

The present application relates to a method, an apparatus, a computer device, a storage medium and a computer program product for distributing multimedia data. The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like. The method comprises the following steps: acquiring candidate multimedia data, and determining data content information and data description information of the candidate multimedia data, wherein the data content information comprises data of one or more modes; acquiring data content characteristics corresponding to the data content information and data description characteristics corresponding to the data description information; based on the data content characteristics and the data description characteristics, obtaining distribution potentials corresponding to the candidate multimedia data through a distribution potential prediction model, wherein the distribution potentials at least comprise: the number of browses of the candidate multimedia data; and selecting target multimedia data according to the distribution potential corresponding to the candidate multimedia data, wherein the target multimedia data is used for being distributed to users, so that the efficiency of multimedia data distribution is improved.

Description

Method and device for distributing multimedia data and computer equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, and a computer device for distributing multimedia data.

Background

With the rapid development of internet applications, people often learn something or information of some kind through multimedia data, and through a multimedia data distribution platform, suitable multimedia data can be distributed to users in a suitable scene. The multimedia data distribution platform can screen multimedia data suitable for being distributed to users through a manual operation method or rely on a machine algorithm for multimedia data distribution.

Currently, the current multimedia data distribution system mainly selects multimedia data distributed to users through manually operated screening experience, and then considers the access amount of the multimedia data by combining a data statistics method to distribute the multimedia data, namely, the multimedia data with higher access amount is easier to be distributed to the users. However, the efficiency of the manually operated screening process is low, and inaccuracy of the human judgment may occur, resulting in that the distributed multimedia data does not meet the user's needs and favor, and thus the multimedia data distributed to the user is not browsed by the user, thereby reducing the efficiency of multimedia data distribution. Second, the method of combining data statistics may result in accessing multimedia data in a concentrated amount and satisfying the interests of the user, so that multimedia data distributed to the user is not browsed by the user, and also may reduce the efficiency of multimedia data distribution. Therefore, how to improve the efficiency of multimedia data distribution is a problem to be solved.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multimedia data distribution method, apparatus, and computer device that can improve the efficiency of multimedia data distribution.

In a first aspect, the present application provides a method for distributing multimedia data. The method comprises the following steps:

acquiring candidate multimedia data, and determining data content information and data description information of the candidate multimedia data, wherein the data content information comprises data of one or more modes;

acquiring data content characteristics corresponding to the data content information and data description characteristics corresponding to the data description information;

based on the data content characteristics and the data description characteristics, obtaining distribution potentials corresponding to the candidate multimedia data through a distribution potential prediction model, wherein the distribution potentials at least comprise: the number of browses of the candidate multimedia data;

and selecting target multimedia data according to the distribution potential corresponding to the candidate multimedia data, wherein the target multimedia data is used for being distributed to users.

In a second aspect, the application further provides a device for distributing multimedia data. The device comprises:

the data information acquisition module is used for acquiring candidate multimedia data, determining data content information and data description information of the candidate multimedia data, wherein the data content information comprises data of one or more modes;

The feature acquisition module is used for acquiring data content features corresponding to the data content information and data description features corresponding to the data description information;

the distribution potential prediction module is configured to obtain, based on the data content feature and the data description feature, a distribution potential corresponding to the candidate multimedia data through the distribution potential prediction model, where the distribution potential at least includes: the number of browses of the candidate multimedia data;

and the multimedia data selection module is used for selecting target multimedia data according to the distribution potential corresponding to the candidate multimedia data, wherein the target multimedia data is used for being distributed to users.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The method, the device, the computer equipment, the storage medium and the computer program product for distributing the multimedia data acquire candidate multimedia data, determine data content information and data description information of the candidate multimedia data, wherein the data content information comprises data of one or more modes, acquire data content characteristics corresponding to the data content information and data description characteristics corresponding to the data description information, and obtain distribution potential corresponding to the candidate multimedia data through a distribution potential prediction model based on the data content characteristics and the data description characteristics, and the distribution potential at least comprises: the number of browses of the candidate multimedia data, thereby selecting target multimedia data for distribution to the user according to the distribution potential corresponding to the candidate multimedia data. And carrying out characteristic information description on the candidate multimedia data from two dimensions of the data content and the data description through the data content characteristics and the data description characteristics, and predicting the distribution potential of the multimedia data according to the characteristic information description, particularly describing the distribution potential through the browsed times of the candidate multimedia data, so that the browsed times of the candidate multimedia data are considered for selection when the multimedia data are distributed, and the efficiency of multimedia data distribution is improved.

Drawings

FIG. 1 is an application environment diagram of a method of distributing multimedia data in one embodiment;

fig. 2 is a system configuration diagram of a multimedia data distribution system in one embodiment;

FIG. 3 is a flow chart illustrating a method of distributing multimedia data in one embodiment;

FIG. 4 is a partial architectural diagram of a distribution potential prediction model in one embodiment;

FIG. 5 is a partial flow diagram of a method of obtaining a distribution potential prediction model in one embodiment;

FIG. 6 is a partial flow chart illustrating the selection of a multimedia data sample in one embodiment;

FIG. 7 is a partial flow chart illustrating selecting a multimedia data sample according to another embodiment;

FIG. 8 is a partial flow diagram of model parameters adjustment with different loss weights in one embodiment;

FIG. 9 is a partial flow diagram of an embodiment for adjusting model parameters;

FIG. 10 is a schematic diagram of a comparison of core density estimation before and after an embodiment;

FIG. 11 is a partial flow diagram of an alternative embodiment for adjusting model parameters;

FIG. 12 is a partial flow diagram of determining data content information and data description information in one embodiment;

FIG. 13 is a partial flow diagram of acquiring data content characteristics and data description information data description characteristics in one embodiment;

FIG. 14 is a partial flow chart of another embodiment for acquiring data content characteristics and data description information data description characteristics;

FIG. 15 is a schematic diagram of a multi-layer sensor in one embodiment;

FIG. 16 is a complete architectural diagram of a distribution potential prediction model in one embodiment;

FIG. 17 is a partial flow diagram of a training method of a video modality pre-training model in one embodiment;

FIG. 18 is a schematic diagram of an architecture of a video modality pre-training model in one embodiment;

fig. 19 is a complete flow diagram of a method of distributing multimedia data in one embodiment;

fig. 20 is a block diagram showing a structure of a multimedia data distribution apparatus in one embodiment;

fig. 21 is a block diagram showing a structure of a multimedia data distribution apparatus according to another embodiment;

fig. 22 is an internal structural view of the computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

With the rapid development of internet applications, people often learn something or some information through multimedia data distributed to users, and through a multimedia data distribution platform, suitable multimedia data can be distributed to users in a suitable scene. The multimedia data distribution platform can screen multimedia data suitable for being distributed to users through a manual operation method or rely on a machine algorithm for multimedia data distribution. The distribution potential of the multimedia data can reflect the attention of people and the potential degree of the data browsing level, so that if the multimedia data with higher distribution potential can be positioned as early as possible, the multimedia data with lower distribution potential can be filtered, the method has important significance for the distribution of the multimedia data and the cold start of the data in the recommendation process of the multimedia data, and the scenes such as the scheduling and the active pushing of audit resources in the audit process of the multimedia data, and the efficiency can be greatly improved for operators to screen the multimedia data. Based on the above, the multimedia data distribution provided by the embodiment of the application can be particularly applied to application scenes related to multimedia data distribution, such as multimedia data recommendation, information flow recommendation, multimedia data distribution audit, multimedia data recall analysis, content cold start, content operation and the like.

Taking a content cold start scene as an example, the method for distributing multimedia data provided by the embodiment of the application can describe the distribution potential through the browsed times of candidate multimedia data, so that the distribution potential is estimated in advance by considering the browsed times of the candidate multimedia data when the multimedia data is distributed, and the distribution potential corresponding to the candidate multimedia data is obtained according to the distribution potential prediction model on the basis of the original cold start strategy through the estimated result of the distribution potential, and the flow distribution and optimization are performed on each candidate multimedia data. Secondly, taking a scenario of pushing content applied to operation on a large scale as an example, the method for distributing multimedia data provided by the embodiment of the application reduces the cost of manually screening high distribution potential and low distribution potential through the estimated result of the distribution potential, and improves the operation efficiency, so that when the manpower of the multimedia data with low distribution potential is insufficient on an audit link, the priority of auditing the multimedia data with low distribution potential is reduced, or auditing and the like of the multimedia data with low distribution potential is finished in advance, so that the resource consumption of a recommendation system is saved.

Since the present application needs to make distribution potential prediction according to the distribution potential prediction model, the technical content related to artificial intelligence (Artificial Intelligence, AI) will be described first. AI is a theory, method, technique, and application system that utilizes a digital computer or a digital computer-controlled machine to simulate, extend, and extend human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

And machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing, machine learning and other technologies, and is specifically described by the following embodiments:

the method for distributing the multimedia data, provided by the embodiment of the application, can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data, such as multimedia data, that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers.

Specifically, taking application to the server 104 as an example, the server 104 may obtain candidate multimedia data from the data storage system, and determine data content information and data description information of the candidate multimedia data, thereby obtaining data content features corresponding to the data content information and data description features corresponding to the data description information, and obtain distribution potentials corresponding to the candidate multimedia data through a distribution potential prediction model based on the data content features and the data description features, where the distribution potentials at least include: the number of times the candidate multimedia data is browsed, whereby the server 104 selects target multimedia data for distribution to the user according to the distribution potential corresponding to the candidate multimedia data, and then transmits the target multimedia data to the terminal 102 through communication with the terminal 102, so that the terminal 102 recommends and displays the target multimedia data to the user through the display interface.

Next, taking the application to the terminal 102 as an example, the terminal 102 may acquire candidate multimedia data through communication with the server 104, or acquire candidate multimedia data from the local, and the acquisition manner of the candidate multimedia data is not limited here. Based on this, the terminal 102 determines the data content information and the data description information of the candidate multimedia data again, thereby obtaining the data content feature corresponding to the data content information and the data description feature corresponding to the data description information, and obtains the distribution potential corresponding to the candidate multimedia data through the distribution potential prediction model based on the data content feature and the data description feature, where the distribution potential at least includes: the number of times the candidate multimedia data is browsed, whereby the terminal 102 can recommend and present the target multimedia data to the user through the presentation interface.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, aircrafts, etc. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers. The embodiment of the invention can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.

Specifically, the method for distributing multimedia data provided by the embodiment of the application can be applied to a multimedia data distribution system as shown in fig. 2. The main functions of the respective service modules are introduced in the following description of a multimedia data-based distribution method:

1. content production end 201

Professional production content (Professional Generated Content, PGC), user originated content (User Generated Content, UGC) and Multi-Channel Network product form MCN (MCN) provide graphics or video content through a mobile or back-end application program interface (Application Program Interface, API) system, which are the primary content sources for the content production end 201. Next, the content production end 201 first obtains the interface address of the uplink and downlink content interface server 203 through communication with the uplink and downlink content interface server 203, and then uploads the multimedia data.

2. Content consumer 202

The content consumption end 202 communicates with the uplink and downlink content interface server 203, pushes to obtain index information of access content through distribution, then communicates with the content storage server 204, and obtains corresponding content including distributed multimedia data, where the content storage server 204 stores content entities, that is, multimedia data, such as: video source files, picture source files, etc. The meta information of the content is stored in the content database 205, and the meta information of the content is data content information and data description information of the multimedia data, for example: the title text information, the content description information, the data creator information, the data cover image, the data classification information, the data Tag (Tag) information and the like are all matched with the multimedia data, namely, the content entity is described from each dimension.

Secondly, the content consumption end 202 can also report behavior data generated by browsing the multimedia data by the user in the uploading and downloading processes, and time blocking, loading time, browsing times, browsing duration, click behavior information, sliding behavior information, sharing behavior information, collection behavior information, forwarding behavior information and the like generated by browsing the multimedia data to the back end for statistical analysis. Various data from external channel source enter the system through the content consumption end 202 via the uplink and downlink content interface server 203, and when the multimedia data is data of video mode, the content consumption end 202 usually browses the multimedia data in an information flow mode, and specifically can directly put the multimedia data with higher browsed times through operation, and can PUSH the multimedia data to more users in an active PUSH (PUSH) mode.

3. Uplink and downlink content interface server 203

The up-down content interface server 203 is used to communicate directly with the content producer 201, and the content submitted from the front end, typically, title text information, data creator information, abstract text information, data cover image, and distribution time of multimedia data, is stored in the content storage server 204. Next, the uplink and downlink content interface server 203 can also use data content information and data description information of multimedia data of the teletext, such as: multiple information such as title text information, content description information, data creator information, data jacket image, data category information, data Tag information, etc. is also written in the content database 205. Further, the uplink and downlink content interface server 203 is configured to synchronize the content submitted by the content producer 201 to the dispatch center server 206 for subsequent content processing and circulation.

4. Content storage server 204

The content storage server 204 is configured to store data content information of multimedia data of content and content entity information other than data description information, that is, multimedia data, such as: the video source file and the picture source file are directly accessed by the terminal from the content storage server 204 based on the content consumer 202 when consuming video content.

5. Content database 205

All meta-information of the content released by the content producer 201 is stored in the content database 205, and the key points are that the meta-information of the content itself, such as file size, cover map link, code rate, file format, title, release time, author, video file size, video format, whether original mark or first include classification of the content in the manual review process, where the classification of the content in the manual review process includes classification and tag information of each of the first, second and third levels, for example: an article explaining A cell-phone, the first class classification is the science and technology, and the second class classification is the smart mobile phone, and the tertiary classification is domestic cell-phone, and tag information is A. Next, when the manual auditing system 207 performs the manual auditing process, the information in the content database 205 is read, and the result and the state of the manual auditing obtained by the manual auditing system 207 are also returned to the content database 205.

Further, the content processing by the dispatch center server mainly includes machine processing and manual auditing processing, where the machine processing core performs various quality judgments such as low quality filtering, where the machine processing core invokes the duplication elimination service, the duplication elimination result is written into the content database 205, and the duplicate processing of the identical content is not performed by the manual.

6. Dispatch center server 206

The scheduling center server 206 is responsible for the entire scheduling process of the multimedia data stream, acquires the multimedia data put in storage through the uplink and downlink content interface server 203, and then acquires the information of the multimedia data from the content database 205. Second, manual auditing system 207 and machine processing systems may also be scheduled, controlling the order and priority of scheduling. The method can also communicate with the video recall retrieval service and then communicate with the duplication judgment service, and unnecessary repeated similar multimedia data can be filtered. Based on this, multimedia data that does not reach the repeated filtering, the multimedia data similarity and similarity relation chain are output for the multimedia data recommendation system to break up for use, and finally, multimedia data of the manual auditing system 207 is enabled to be provided to content consumers of various terminals through content outlet distribution services, typically through a recommendation engine, a search engine, and operation.

7. Manual auditing system 207

The manual auditing system 207 is a carrier of manual service capability, and is mainly used for auditing the multimedia data which cannot be determined by machines such as sensitive multimedia data, so as to perform preliminary filtration on the multimedia data, and the manual auditing system 207 can also perform secondary confirmation on the distribution potential of the multimedia data sample, thereby ensuring the effect and quality of the marked distribution potential of the multimedia data sample.

8. Distribution potential prediction model 208

Based on the multimedia data distribution method provided by the application, a distribution potential prediction model is obtained by the following method: acquiring a multimedia data sample and a sample distribution potential matched with the multimedia data sample, wherein the sample distribution potential at least comprises: the method comprises the steps of browsing times of multimedia data samples and browsing completeness of the multimedia data samples after the multimedia data samples are browsed, determining sample data content information of the multimedia data samples and sample data description information matched with the multimedia data samples, obtaining sample data content characteristics corresponding to the sample data content information and sample data description characteristics corresponding to the sample data description information, obtaining predicted distribution potential corresponding to the multimedia data samples based on an initial distribution potential prediction model through the sample data content characteristics and the sample data description characteristics, and adjusting model parameters of the initial distribution potential prediction model based on the sample distribution potential and the predicted distribution potential to obtain a distribution potential prediction model.

Based on the method, distribution potential prediction can be performed on each candidate multimedia data to obtain the distribution potential corresponding to each candidate multimedia data, so that the corresponding distribution potential is marked in the content dimension of the multimedia data for subsequent multimedia data recommendation and multimedia data operation.

9. Prediction of multimedia data and distribution service 209

The prediction and distribution service 209 of the multimedia data projects the distribution potential prediction model 208 obtained in the present application, provides online actual distribution potential prediction and marking capability, and receives the schedule of the dispatch center server 206, and then distributes the target multimedia data.

10. Statistical reporting interface service 211

The statistics reporting interface service 211 is configured to report behavior data generated by each user during the distribution process of the content consumer 202, for example: and (3) reporting statistics of interaction behaviors such as reading, forwarding, collecting, praying, commenting and the like. Next, the statistical reporting interface service 211 can also accept the reporting of related data information by the uplink and downlink content interface server 203. Data support is provided for the user analysis and statistics service 212 through the collection and statistics of the aforementioned data.

11. User analysis and statistics service 212

The data written by the statistical report interface service 211 is received, and meanwhile, the necessary data such as the multimedia data sample, the sample distribution potential matched with the multimedia data sample and the like are provided by modeling the distribution potential prediction model 208 required in the distribution prediction service 209 of the multimedia data.

12. Content and account feature modeling 213

The content and account feature modeling 213 is used to provide data meta-information of the multimedia data, in particular, to obtain the foregoing information from the content database 205. The data meta information of the multimedia data at least includes: primary classification, secondary classification, multimedia data duration, UGC or PGC, and data statistics including at least: statistics of data creator information, the number of interactions acquired in a historical time period, the number of data pushes in the historical time period, the number of shares in the historical time period, the number of concerns in the historical time period, and the like.

13. Video deduplication service 214

Since many multimedia data are distributed at the same time in practical applications, the video deduplication service 214 mainly provides deduplication service for large-scale data amount of multimedia data, that is, provides parallelization capability of deduplication service engineering of multimedia data, mainly avoids duplicate multimedia data enablement, and provides the deduplication result to the dispatch center server 206.

Based on this, in one embodiment, as shown in fig. 3, a method for distributing multimedia data is provided, and the method is described by taking an example that the method is applied to the server in fig. 1, it is understood that the method may also be applied to a terminal, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

step 302, candidate multimedia data is acquired, and data content information and data description information of the candidate multimedia data are determined, wherein the data content information comprises data of one or more modes.

The candidate multimedia data is multimedia data with a distribution potential to be predicted, namely the candidate multimedia data is multimedia data to be selected and distributed to users. And the candidate multimedia data may specifically include one or more multimedia data. Based on this, the data content information is data content in the candidate multimedia data, and the data content information includes data information of one or more modalities, such as text modality data information, image modality data information, audio modality data information, and the like. Taking the candidate multimedia data as an example of the video, the data content information of the video may specifically include text mode data information, image mode data information, and audio mode data information, and the data content information may at least include: the title text information is a video title of the video mode data information, the content description information is description information (such as a content abstract, etc.) of video content, and the video frame data may be a plurality of video frames obtained by performing frame extraction processing on the multimedia data, and may also include a cover image frame of the video frame data, which is not limited herein.

Next, the data description information is information for describing candidate multimedia data, and the data description information at least includes: the data creator information and the data meta information may include at least: the method comprises the steps of data size of multimedia data, cover map link of the multimedia data, code rate of the multimedia data, file format of the multimedia data, title of the multimedia data, release time of the multimedia data, and whether the multimedia data is original or initial of an creator.

It may be appreciated that the data meta-information further includes classification information of the multimedia data in the manual auditing process and data statistics information in the history period, where the classification of the multimedia data in the manual auditing process includes a first-level, a second-level and a third-level classification and tag information, for example: the multimedia data is a video of an explanation A mobile phone, the first-level classification is science and technology, the second-level classification is a smart mobile phone, the third-level classification is a domestic mobile phone, and the tag information is A. Next, the data statistics information includes at least: statistics of data creator information, the number of interactions acquired in a historical time period, the number of data pushes in the historical time period, the number of shares in the historical time period, the number of concerns in the historical time period, and the like.

Specifically, the server acquires candidate multimedia data, and specifically performs data information analysis on the candidate multimedia data to determine data content information and data description information of the candidate multimedia data. That is, the server specifically analyzes the data content in the candidate multimedia data and performs specific data content information extraction based on one or more modalities, for example, the candidate multimedia data is video data, and then the server may extract video frame data in the candidate multimedia data from the image modality dimension, where the video frame data may include a plurality of video frames in the video, and may also include a video cover map of the video, which is not limited herein. Similarly, the server may also extract text-related data in the candidate multimedia data from the text modality dimension, such as text titles, text summaries, etc., of the candidate multimedia data, which is not limited herein.

Next, the server specifically extracts data description information of the candidate multimedia data, that is, description information of various data in the candidate multimedia data, for example, data creator information, the aforementioned data meta information, and the like. The aforementioned acquisition method is not particularly limited here.

Step 304, data content characteristics corresponding to the data content information and data description characteristics corresponding to the data description information are obtained.

Wherein the data content features are feature vectors for the data content information and, as the data content information comprises data of one or more modalities, the data content features may also comprise data of one or more modalities. For example, the data content information includes: text modality data information and image modality data information, then the data content characteristics may include text modality characteristics as well as image modality characteristics. Next, the data description feature is a feature vector for the data description information, that is, at least includes a feature corresponding to the data creator information and a feature corresponding to the data meta information.

Specifically, the server performs feature processing on the data content information to obtain data content features corresponding to the data content information. And similarly, the server performs feature processing on the data description information to obtain data description features corresponding to the data description information. It is understood that the network for performing feature processing on the data content information and the data description information in this embodiment may be the same or different, and is not particularly limited herein.

Step 306, obtaining distribution potentials corresponding to the candidate multimedia data through a distribution potential prediction model based on the data content characteristics and the data description characteristics, wherein the distribution potentials at least comprise: the number of browses of the candidate multimedia data.

Wherein the distribution potential comprises at least: the number of browsed times of the candidate multimedia data is the total number of browsed times of the multimedia data, specifically, the number of browsed times within a preset time interval, for example, the total number of browsed times of the candidate multimedia data within 1 day, the total number of browsed times of the candidate multimedia data within 3 days, and the total number of browsed times of the candidate multimedia data within 7 days. And, in this embodiment, the definition of multimedia data browsing is: after clicking the multimedia data, performing interaction behavior, namely browsing, wherein the interaction operation can be as follows: playing multimedia data, viewing multimedia data, reading multimedia data, forwarding multimedia data, collecting multimedia data, praying multimedia data, commenting multimedia data, and the like.

It can be appreciated that, in practical applications, the distribution potential may further include browsing integrity of the candidate multimedia data after being browsed, where the browsing integrity specifically describes: taking candidate multimedia data as an example of video, the ratio between the browsed data amount of the multimedia data and the actual browsed data amount of the multimedia data specifically plays the candidate multimedia data, the browsed data amount of the candidate multimedia data is the time length of playing, the actual browsed data amount of the candidate multimedia data is the total time length of the video of the candidate multimedia data, if the time length of playing is 10 seconds and the total time length of the video is 20 seconds, the browsing integrity of the candidate multimedia data after being browsed can be determined to be 1/2 (namely 10/20). Based on the above, each time the candidate multimedia data is browsed, the browsing integrity has a corresponding browsing integrity, and at this time, the browsing integrity may be browsing integrity corresponding to multiple browsing respectively, or the browsing integrity may also be average browsing integrity obtained based on browsing integrity corresponding to multiple browsing respectively. And are not limited herein.

Secondly, taking the candidate multimedia data as an example of a picture, the browsed data amount of the candidate multimedia data is the size of the picture to be watched at the moment, and the actual browsed data amount of the candidate multimedia data is the total size of the picture of the candidate multimedia data. Or taking the candidate multimedia data as an article, the browsed data amount of the candidate multimedia data is the total number of paragraphs or pages to be read, and the actual browsed data amount of the candidate multimedia data is the total number of paragraphs or pages of the candidate multimedia data. The foregoing examples are provided for the understanding of the present invention and are not to be construed as limiting the present invention.

Specifically, the server outputs the distribution potential corresponding to the predicted candidate multimedia data through a distribution potential prediction model obtained through training based on the data content characteristics and the data description characteristics. The distribution potential at this time at least includes the number of times the candidate multimedia data is browsed, and as can be seen from the foregoing description, the distribution potential may also include browsing integrity after the candidate multimedia data is browsed.

And secondly, the server specifically needs to perform characteristic splicing processing on the data content characteristics and the data description characteristics, and obtains distribution potential corresponding to the candidate multimedia data based on the spliced data content characteristics and the data description characteristics. In order to facilitate understanding of the structure of the distribution potential prediction model, as shown in fig. 4, data content information 401 is subjected to feature processing to obtain data content features 402, then data description information 403 is subjected to feature processing to obtain data description features 404, the data content features 402 and the data description features 404 are subjected to feature stitching processing 405, and distribution potential 406 corresponding to candidate multimedia data is obtained based on the stitched data content features and the data description features.

In step 308, target multimedia data is selected for distribution to the user according to the distribution potential corresponding to the candidate multimedia data.

Wherein the target multimedia data is for being distributed to the user, i.e. the target multimedia data is selected multimedia data to be distributed to the user, and the target multimedia data may comprise one or more multimedia data.

Specifically, the server selects the target multimedia data according to the distribution potential corresponding to the candidate multimedia data, and because the target multimedia data is used for being distributed to the users, the server can perform distribution processing on the target multimedia data, so that the target multimedia data is selected to be distributed to the users to the terminals used by the users, and therefore the users can perform interactive operations such as clicking, browsing and the like on the target multimedia data on the terminal side based on requirements. Therefore, if the candidate multimedia data is one multimedia data, the terminal determines the target multimedia data to be distributed to the user from the plurality of candidate multimedia data based on the distribution potential corresponding to each of the plurality of candidate multimedia data. If the candidate multimedia data comprises a plurality of multimedia data, the terminal determines target multimedia data distributed to the user from the candidate multimedia data based on the distribution potential corresponding to the candidate multimedia data.

Based on this, the server preferentially selects multimedia data having a higher distribution potential as target multimedia data in consideration of the efficiency of multimedia data distribution and the effect of recommending multimedia data. Thus, in the case where only the number of browses is considered, the larger the number of browses of the candidate multimedia data is, the larger the probability of selecting as the target multimedia data is, and therefore, the server can specifically determine as the target multimedia data the candidate multimedia data whose number of browses is larger than the maximum number of browses threshold.

Secondly, under the condition that two dimensions of the browsed times and the browsed integrity are considered, the larger the browsed times of the candidate multimedia data are, and the higher the browsed integrity of the candidate multimedia data is, the larger the probability of being selected as target multimedia data is, so that the server can specifically determine the candidate multimedia data with the browsed times of the candidate multimedia data being larger than the maximum browsed times threshold and the browsed integrity of the candidate multimedia data being larger than the maximum browsed integrity threshold as target multimedia data.

Taking the case of being applied to the auditing scene of the multimedia data as an example, when manpower on an auditing link is insufficient, the priority of auditing the multimedia data with low distribution potential can be reduced for the multimedia data with low distribution potential, or the auditing of the multimedia data with low distribution potential can be finished in advance, and the like, so that the resource consumption of a recommendation system is saved. Then the low distribution potential multimedia data may be: candidate multimedia data having a browsed number of times less than a minimum browsed number threshold. Or the candidate multimedia data which is browsed for times smaller than the minimum browsing times threshold and browsed for completeness smaller than the minimum browsing completeness threshold. Secondly, taking the case of being applied to operating a large-scale content pushing scene as an example, the cost of manually screening high distribution potential and low distribution potential is reduced through the estimation result of the distribution potential, so that target multimedia data distributed to users can be determined more efficiently, the efficiency of multimedia data distribution is improved, and the operating efficiency of large-scale content pushing can be improved.

In practical application, the distribution potential can be classified in level, namely, corresponding distribution level labels are set for the distribution potential, and the distribution level labels of the multimedia data are determined from the beginning. Considering only the browsed times, the larger the browsed times included in the distribution potential, for example, the browsed times included in the distribution potential belong to a first time range, the multimedia data corresponding to the distribution potential is set as a first distribution level tag, and similarly, the browsed times included in the distribution potential belong to a second time range, the multimedia data corresponding to the distribution potential is set as a second distribution level tag, and the browsed times included in the distribution potential belong to a third time range, the multimedia data corresponding to the distribution potential is set as a third distribution level tag. And the number of times of interaction coincidence among the first time range, the second time range and the third time range does not exist.

Alternatively, the larger the level label corresponding to the distribution potential. Or from the two dimensions of the browsed times and the browsing integrity, the larger the browsed times included in the distribution potential, and the higher the browsing integrity, the larger the level label corresponding to the distribution potential. The division manner is not limited herein, and the determination is specifically required based on the actual requirements and the service scenario.

In the method for distributing multimedia data, the candidate multimedia data is described by the characteristic information from the data content dimension and the data description dimension through the data content characteristic and the data description characteristic, and the distribution potential of the multimedia data is predicted according to the characteristic information description, and the distribution potential is described specifically by the browsed times of the candidate multimedia data, so that the browsed times of the candidate multimedia data are considered for selection when the multimedia data is distributed, and the efficiency of the multimedia data distribution is improved.

The foregoing embodiments refer to a distribution potential prediction model, and require prediction of the distribution potential of multimedia data based on the distribution potential prediction model, and a method of how to train the distribution potential prediction model will be described in detail below:

in one embodiment, as shown in fig. 5, the method for obtaining the distribution potential prediction model specifically includes:

step 502, obtaining a multimedia data sample and a sample distribution potential matched with the multimedia data sample, wherein the sample distribution potential at least comprises: the number of times the multimedia data sample is browsed, and browsing integrity after the multimedia data sample is browsed.

Wherein the sample distribution potential is determined based on historical browsing data of the multimedia data samples within a historical statistical time interval, namely the number of times the multimedia data samples are browsed is: the browsing times of the multimedia data samples in the historical statistical time interval and the browsing integrity after the multimedia data samples are browsed are as follows: browsing integrity of the multimedia data samples after being browsed over a historical statistical time interval. The number of browses and the browsing integrity are described in detail in the foregoing embodiments, and are not repeated here.

Specifically, based on the system architecture of fig. 2, the user analysis and statistics service 212 receives the data written by the statistics report interface service 211, and at the same time, the modeling of the distribution potential prediction model 208 required in the distribution prediction service 209 of the multimedia data provides data such as a multimedia data sample and a sample distribution potential matched with the multimedia data sample. Thus, the server is able to obtain the sample distribution potential that the distribution potential prediction model needs to train on, the user analyzing the multimedia data samples provided by the statistics service 212, and the sample distribution potential that the multimedia data samples match.

It is understood that the multimedia data samples obtained by the server may be obtained by screening the initial multimedia data samples for abnormal data samples. Or, after the abnormal data sample screening is performed on the initial multimedia data sample, the manual auditing system in the system of fig. 2 performs secondary screening on the initial multimedia data sample, so as to ensure the effect and quality of the distribution potential marked by the multimedia data sample. The specific manner in which the multimedia data samples are obtained, and the sample distribution potential to which the multimedia data samples are matched, is not limited herein.

At step 504, sample data content information of the multimedia data sample, and sample data description information matched with the multimedia data sample are determined.

The sample data content information is data content in a multimedia data sample, and includes data information of one or more modes, and the sample data content information is specifically similar to the foregoing data content information and is not described herein again. Next, the sample data description information is information for describing a multimedia data sample, and the sample data description information includes at least: sample data creator information and sample data meta information, sample data description information is similar to data content information, and is not described here again.

Specifically, the server performs data information analysis on specific multimedia data samples to determine sample data content information of the multimedia data samples and sample data description information matched with the multimedia data samples. The specific implementation steps are similar to those described in step 302, and will not be repeated here.

Step 506, obtaining the sample data content characteristics corresponding to the sample data content information and the sample data description characteristics corresponding to the sample data description information.

Wherein the sample data content features are feature vectors for the sample data content information and, since the sample data content information comprises data of one or more modalities, the sample data content features may also comprise data of one or more modalities. For example, the sample data content information includes: the text modality data information and the image modality data information, then the sample data content characteristics may include text modality characteristics as well as image modality characteristics. Next, the sample data description feature is a feature vector for the sample data description information, that is, at least includes a feature corresponding to the sample data creator information and a feature corresponding to the sample data meta information.

Specifically, the server performs feature processing on the sample data content information to obtain sample data content features corresponding to the sample data content information. And similarly, the server performs feature processing on the sample data description information to obtain sample data description features corresponding to the sample data description information. And the server performs feature processing on the sample data content information through a Deep network in the initial distribution potential prediction model to obtain sample data content features corresponding to the sample data content information, and performs feature processing on the sample data description information through a linear Wide network in the initial distribution potential prediction model to obtain sample data description features corresponding to the sample data description information.

The Deep network is a trained video mode pre-training model, the video mode pre-training model is obtained after training based on a video mode multimedia data sample, and the video mode multimedia data sample at least comprises: text content information and visual content information. And the linear Wide network is specifically a multi-layer perceptron (MLP, multilayer Perceptron).

And step 508, obtaining the predicted distribution potential corresponding to the multimedia data sample based on the initial distribution potential prediction model through the sample data content characteristics and the sample data description characteristics.

Specifically, the server performs feature stitching on the obtained sample data content features and sample data description features, and outputs predicted distribution potential corresponding to the multimedia data samples through an initial distribution potential prediction model. Wherein the predicted distribution potential comprises at least: the number of times the multimedia data sample is browsed, and browsing integrity after the multimedia data sample is browsed.

Step 510, based on the sample distribution potential and the predicted distribution potential, model parameters of the initial distribution potential prediction model are adjusted to obtain a distribution potential prediction model.

Specifically, the server calculates loss information by using the sample distribution potential and the predicted distribution potential, and updates model parameters of the initial distribution potential prediction model according to the loss information obtained by calculation, that is, the server judges whether a loss function of the initial distribution potential prediction model reaches a convergence condition according to the loss information, and if the loss function does not reach the convergence condition, the model parameters of the initial distribution potential prediction model are updated by using the loss information. Based on the above, until the loss function of the initial distribution potential prediction model reaches a convergence condition, the distribution potential prediction model is obtained according to the model parameters obtained after the last update of the model parameters, so that the distribution potential of the multimedia data is predicted by the distribution potential prediction model obtained through training in practical application.

The convergence condition of the foregoing loss function may be that the value of the loss function is less than or equal to a first preset threshold, for example, the value of the first preset threshold may be 0.005, 0.01, 0.02 or other values approaching 0. It may also be that the difference between the values of two adjacent times of the loss function is less than or equal to a second preset threshold, where the value of the second threshold may be the same as or different from the value of the threshold, for example, the value of the second preset threshold may be 0.005, 0.01, 0.02 or other values approaching 0, etc. The model parameter updating of the initial distribution potential prediction model can reach the updating iteration threshold value, and other convergence conditions can be adopted in practical application, and the method is not limited herein.

In this embodiment, when training the distribution potential prediction model, two dimensions of the browsed times of the multimedia data sample and the browsed browsing integrity are specifically considered, that is, information between the two dimensions and the multimedia data can be learned in a multiple iteration updating process of the distribution potential prediction model, and when feature information of the multimedia data is learned, sample data content information and sample data description information are specifically considered, so that the multidimensional and more refined feature information can be learned more completely and accurately, thereby improving the reliability and accuracy of the distribution potential prediction model, that is, improving the reliability and accuracy of the distribution potential prediction for the multimedia data.

Since the present application aims to improve the efficiency of multimedia data distribution, that is, to consider the distribution of multimedia data with high distribution potential, when obtaining multimedia data samples for training, the number of times of browsing the multimedia data samples and the browsing integrity after browsing are considered to distinguish between the multimedia data samples with high distribution potential and the multimedia data with low distribution potential, the following describes in detail how to obtain the multimedia data samples and the sample distribution potential matched with the multimedia data samples:

in one embodiment, as shown in fig. 6, obtaining a multimedia data sample, and a sample distribution potential for matching the multimedia data sample, comprises:

step 602, acquiring an initial multimedia data sample set and historical browsing data corresponding to each initial multimedia data sample in the initial multimedia data sample set, wherein the historical browsing data comprises browsing times of the initial multimedia data sample in a historical statistical time interval and browsing integrity of the initial multimedia data sample after the initial multimedia data sample is browsed.

The historical browsing data comprises the browsed times of the initial multimedia data sample in a historical statistical time interval and the browsing integrity of the initial multimedia data sample after being browsed. The historical statistical time interval may be 1 day, 3 days, 7 days, 15 days, 30 days, and 90 days, and the specific time interval is not limited herein.

Specifically, based on the system shown in fig. 2, the statistical reporting interface service 211 reports the behavior data generated by each user in the process of distributing the multimedia data, for example: the statistics report of the interaction behaviors such as reading, forwarding, collecting, praying, comment and the like can also receive the related data information reported by the uplink and downlink content interface server 203, and the statistics report interface service 211 provides data support for the user analysis and the statistics service 212 through the collection and statistics of the data at the moment, and the data reported by the statistics report interface service 211 at the moment is the historical browsing data corresponding to each initial multimedia data sample.

Based on this, the receiving statistics report interface service 212 provides the historical browsing data corresponding to each initial multimedia data sample, so that the server obtains the historical browsing data corresponding to each initial multimedia data sample.

Step 604, selecting a multimedia data sample from the initial multimedia data sample set based on the historical browsing data corresponding to each initial multimedia data sample, and determining the sample distribution potential matched by the multimedia data sample.

The multimedia data samples specifically include positive samples, namely initial multimedia data samples with high distribution potential, and negative samples, namely initial multimedia data samples with low distribution potential, wherein the positive samples can carry positive sample tags 1, and the negative samples can carry negative sample tags 0.

Based on this, the initial multimedia data samples of low distribution potential are in particular: the initial multimedia data sample with the browsing times smaller than the minimum browsing times threshold and high distribution potential is specifically: the number of browses is greater than a maximum number of browses threshold. Specifically, the formula (1):

where y represents the sample tag. A sample tag of 1 is used to indicate a positive sample and a sample tag of 0 is used to indicate a negative sample.

Specifically, the server determines an initial multimedia data sample having a browsed frequency smaller than a minimum browsed frequency threshold as a positive sample, and determines an initial multimedia data sample having a browsed frequency larger than a maximum browsed frequency threshold as a negative sample, that is, the multimedia data sample includes: the method comprises the steps of setting a browsing frequency of an initial multimedia data sample set to be smaller than a minimum browsing frequency threshold value and setting a browsing frequency of an initial multimedia data sample set to be larger than a maximum browsing frequency threshold value. The aforementioned minimum browsing frequency threshold may be 100, and the maximum browsing frequency threshold may be 1000, that is, the multimedia data sample includes: the initial multimedia data samples of less than 100 browsed times and the initial multimedia data samples of more than 1000 browsed times are to be browsed.

Secondly, in practical applications, since the historical browsing data includes the number of times the initial multimedia data sample is browsed in the historical statistics time interval, and the browsing integrity of the initial multimedia data sample after being browsed, the foregoing two dimensions may be considered when selecting the multimedia data sample, that is, the multimedia data sample with low distribution potential may be: the number of browsed initial multimedia data samples is less than the minimum browsing number threshold and the browsing integrity is less than the minimum browsing integrity threshold. And multimedia data samples of high distribution potential may be: the number of browsed initial multimedia data samples is greater than the maximum browsing number threshold and the browsing integrity is greater than the maximum browsing integrity threshold. I.e. the multimedia data samples comprise: the method comprises the steps of browsing initial multimedia data samples with the number of times smaller than a minimum browsing frequency threshold and browsing completeness smaller than a minimum browsing completeness threshold in an initial multimedia data sample set, and browsing initial multimedia data samples with the number of times larger than a maximum browsing frequency threshold and browsing completeness larger than a maximum browsing completeness threshold.

In this embodiment, through the browsed times of the initial multimedia data sample in the historical statistics time interval and the browsed integrity of the initial multimedia data sample after being browsed, the multimedia data sample including the positive sample and the negative sample is selected, so that the multimedia data sample can include samples with larger difference between the browsed times and the browsed integrity, and the loss information calculated later is more accurate and reliable, so that the model training effect is improved, and a more accurate and reliable distribution potential prediction model is obtained.

In one embodiment, as shown in fig. 7, the historical browsing data further includes the actual browsing data amount of the initial multimedia data sample, and the click rate and access amount of the initial multimedia data sample within the historical statistics time interval.

The actual browsing data amount is used to describe the browsing data amount of the initial multimedia data sample that can be browsed, taking the initial multimedia data sample as a video as an example, where the actual browsing data amount of the initial multimedia data sample is the total duration of the video of the initial multimedia data sample, and if the total duration of the video is 20s, the actual browsing data amount of the initial multimedia data sample can be determined to be 20 s. Secondly, taking the initial multimedia data sample as an example of a picture, wherein the actual browsing data amount of the initial multimedia data sample is the total size of the picture of the initial multimedia data sample. Or taking the initial multimedia data sample as an article, wherein the actual browsing data amount of the initial multimedia data sample is the total number of paragraphs or the total number of pages of the initial multimedia data sample. The foregoing examples are provided for the understanding of the present invention and are not to be construed as limiting the present invention.

Secondly, the click rate is: the ratio of the total number of clicks to the total number of pushes of the initial multimedia data sample in the historical statistical time interval is specifically described in terms of percentage, for example, the initial multimedia data sample is clicked 60 times and pushed 100 times in the historical statistical time interval, and then the click rate of the initial multimedia data sample is 60% (60/100×100%). And the access amount is used to describe the total access amount of the initial multimedia data sample over the historical statistical time interval.

It may be understood that, in practical application, the initial multimedia data sample further includes a specific corresponding service type, that is, the server may perform statistical analysis on historical browsing data of the initial multimedia data sample under the same service type to obtain statistical analysis data under the same service type, where the statistical analysis data at least includes: average click-through rate and average access volume of each initial multimedia data sample over a historical statistical time interval.

Based on this, selecting a multimedia data sample from the initial set of multimedia data samples based on the respective corresponding historical browsing data for each initial multimedia data sample, comprising:

step 702, based on the browsed data amount, click rate and access amount corresponding to each initial multimedia data sample, performing abnormal data sample screening on the initial multimedia data sample set to obtain a candidate multimedia data sample set.

The abnormal data sample screening is used for screening initial multimedia data samples with abnormal data, and the abnormal data is used for describing that data which does not accord with normal interaction behaviors of users, does not accord with actual application of service scenes or is not suitable for model training exists in historical browsing data of the initial multimedia data samples. Based on the above, the candidate multimedia data sample set includes candidate multimedia data samples, and the candidate multimedia data samples are not abnormal data samples, i.e. the candidate multimedia data samples are all normal data samples screened by the abnormal data samples.

For example, the initial multimedia data sample may have a smaller actual browsing data amount, taking the initial multimedia data sample as a video, if the total duration of the video is 3s, it is indicated that the content of the video is less and cannot reach the push conversion effect, so that it may be determined that the initial multimedia data sample is an abnormal data sample. Or, an abnormal situation of clicking the initial multimedia data sample multiple times occurs in a short time, for example, 10 initial multimedia data samples are clicked in 1 second, which does not conform to the normal interaction behavior of the user, so that it can be determined that the initial multimedia data sample is an abnormal data sample.

Specifically, the server screens the abnormal data sample of the initial multimedia data sample set based on the browsed data volume, the click rate and the access volume corresponding to each initial multimedia data sample. The server is configured with a data volume anomaly range, a click rate anomaly range and an access volume anomaly range of the browsed data volume in advance, and determines that the initial multimedia data sample is an anomaly data sample when the initial multimedia data sample has any one of the corresponding browsed data volume, click rate and access volume, the anomaly range of the hit data volume, the anomaly range of the click rate and the anomaly range of the access volume, i.e. the anomaly data sample is not classified into the candidate multimedia data sample set. Therefore, in the candidate multimedia data sample set, any one of the browsed data volume, the click rate and the access volume corresponding to each candidate multimedia data sample does not hit the data volume abnormal range, the click rate abnormal range and the access volume abnormal range.

Step 704, selecting a multimedia data sample from the candidate multimedia data sample set based on the browsed times and the browsed integrity of each candidate multimedia data sample.

Specifically, the terminal selects a multimedia data sample from the candidate multimedia data sample set based on the browsed times and browsing integrity corresponding to each candidate multimedia data sample. As can be seen from the description of the foregoing embodiments, the multimedia data samples include: the method comprises the steps of selecting candidate multimedia data samples with browsing times smaller than a minimum browsing times threshold and browsing completeness smaller than a minimum browsing completeness threshold from an initial multimedia data sample set, and selecting candidate multimedia data samples with browsing times larger than a maximum browsing times threshold and browsing completeness larger than a maximum browsing completeness threshold from the initial multimedia data sample set.

It can be appreciated that, in practical application, since the initial multimedia data sample further includes a specific corresponding service type, based on the foregoing description, the statistical analysis data under the same service type includes at least: average click-through rate and average access volume of each initial multimedia data sample under the same service type in a historical statistical time interval. Therefore, in the process of selecting the multimedia data samples from the candidate multimedia data sample set, further consideration of statistical analysis data under the same service type can be performed, that is, the server can pre-configure an average click rate threshold and an average access amount threshold under different service types, further consider statistical analysis data under the service type corresponding to each candidate multimedia data sample on the basis of considering the browsed times and browsing integrity corresponding to each candidate multimedia data sample, and take priority into consideration of the average click rate, and further consider the average click rate.

Based on the above, the server divides the candidate multimedia data samples with average click rate larger than the average click rate threshold under the corresponding service types into positive samples. Similarly, the server divides the candidate multimedia data samples with average click rate smaller than the average click rate threshold under the corresponding service types into negative samples. Alternatively, the average click rate may be further considered, that is, the candidate multimedia data samples with the average click rate under the corresponding service type greater than the average click rate threshold and the average access amount greater than the average access amount threshold are divided into positive samples. And dividing the candidate multimedia data samples with average click rate smaller than the average click rate threshold and average access amount smaller than the average access amount threshold under the corresponding service types into negative samples. It should be understood that the foregoing division of the positive and negative samples should also consider the number of browses and browsing integrity of each candidate multimedia data sample.

In this embodiment, the abnormal data sample is screened through multiple dimensions of the actual browsing data volume, the click rate and the access volume, so as to avoid that sample information of the abnormal data sample affects the prediction of the distribution potential. Secondly, the statistical analysis data under the same service type are also considered in selecting the multimedia data samples, namely, the difference between click rates and access amounts of different service types is considered, the service types with high local distribution potential and surrender global distribution potential are prevented from being judged as positive samples, namely, the selection of the multimedia data samples from the local samples is prevented, and the global property and the reliability of the selection of the multimedia data samples are ensured.

In one embodiment, as shown in fig. 8, the multimedia data samples include positive samples and negative samples, the sample distribution potential to which the positive samples match belongs to the first distribution potential range, and the sample distribution potential to which the negative samples match belongs to the second distribution potential range.

Since the sample distribution potential includes the number of times of browsing and the browsing integrity, the distribution potential range may be limited from the number of times of browsing or the number of times of browsing and the browsing integrity. Based on this, the first distribution potential range is: the browsed times are larger than the maximum browsed times threshold, or the browsed times are larger than the maximum browsed times threshold, and the browsed integrity is larger than the maximum browsed integrity threshold. Similarly, the second distribution potential range is: the browsed times are smaller than the minimum browsed times threshold, or the browsed times are smaller than the minimum browsed times threshold, and the browsed integrity is smaller than the minimum browsed integrity threshold.

Based on this, the multimedia data distribution method further includes:

step 802, configuring a first loss weight for a first distribution potential range and a second loss weight for a second distribution potential range, the first loss weight being greater than the second loss weight.

Wherein the first loss weight is greater than the second loss weight.

Specifically, the server configures a first loss weight for a first distribution potential range and a second loss weight for a second distribution potential range. That is, the sample distribution potential belonging to the first distribution potential configures a first penalty weight, and the sample distribution potential belonging to the second distribution potential configures a second penalty weight.

Based on this, model parameters of the initial distribution potential prediction model are adjusted based on the sample distribution potential and the predicted distribution potential, including:

step 804, based on the sample distribution potential and the predicted distribution potential, model parameters of the initial distribution potential prediction model are adjusted by the first loss weight and the second loss weight.

Specifically, the server adjusts model parameters of the initial distribution potential prediction model by the first loss weight and the second loss weight based on the sample distribution potential and the predicted distribution potential. That is, when the sample distribution potential belongs to the first distribution potential range, the server adjusts model parameters of the initial distribution potential prediction model through the first loss weight based on the sample distribution potential and the predicted distribution potential. Similarly, when the sample distribution potential belongs to the second distribution potential range, the server adjusts model parameters of the initial distribution potential prediction model through the second loss weight based on the sample distribution potential and the predicted distribution potential.

In this embodiment, different loss weights are adopted for multimedia data samples with different sample distribution potentials, model parameters of an initial distribution potential prediction model are adjusted based on the different loss weights, and a first loss weight with high distribution potential is greater than a second loss weight with low distribution potential, so that the initial distribution potential prediction model learns to truly valuable characteristic information with high distribution potential, and the obtained distribution potential prediction model can predict truly valuable high distribution potential, namely, reliability of distribution potential prediction is guaranteed.

In one embodiment, as shown in fig. 9, adjusting model parameters of an initial distribution potential prediction model based on sample distribution potential and predicted distribution potential, includes:

in the process of adjusting the model parameters:

and step 902, performing mean square error calculation on the sample distribution potential and the prediction distribution potential to obtain the mean square error of the multimedia data sample, wherein the mean square error is used for describing the closeness between the sample distribution potential and the prediction distribution potential.

Wherein, the mean square error is used for describing the closeness between the sample distribution potential and the forecast distribution potential, namely, the mean square error calculation is used for calculating the closeness between the sample distribution potential and the forecast distribution potential.

Specifically, the server performs mean square error calculation on the sample distribution potential and the prediction distribution potential to obtain the mean square error of the multimedia data sample. Specifically, the mean square error is calculated by the formula (2):

wherein MSE represents error calculation, Y _i Representing the potential for sample distribution,representing the predictive distribution potential.

And 904, performing kernel density estimation on the sample distribution potential and the prediction distribution potential to obtain a kernel density estimation result of the multimedia data sample, wherein the kernel density estimation result is used for describing probability density function estimation of the multimedia data sample.

Wherein the kernel density estimation result is used for describing probability density function estimation of the multimedia data samples, i.e. the kernel density estimation (Kernel Density Estimation, KDE) is used for estimating probability density function estimation of the multimedia data samples.

Specifically, the server performs kernel density estimation on the sample distribution potential and the prediction distribution potential to obtain a kernel density estimation result of the multimedia data sample, and detailed description is given below of the kernel density estimation, where the kernel density estimation is used to estimate an unknown density function in a probability theory, belongs to one of non-parametric inspection methods, and the kernel density estimation can infer the distribution of overall data through a limited multimedia data sample, so that the kernel density estimation result is the probability density function estimation of the multimedia data sample, and according to the estimated probability density function, the distribution property of the multimedia data sample can be obtained, where the distribution property of the multimedia data sample can be: an aggregate area of multimedia data samples.

Based on this, the kernel density estimation algorithm expands the gaussian mixture concept to a logical limit, which obtains a density estimator that is essentially parameter-free by generating a mixture component of a gaussian distribution for each point, the free parameters of which are kernel type and kernel bandwidth, the former specifying the shape of the density distribution of each point kernel, the latter specifying the size of each point kernel. For ease of understanding, as shown in fig. 10, the kernel density estimation belongs to non-parameter estimation specifically, unlike parameter estimation, the kernel density estimation strategy does not have any prior assumption about the overall distribution, and the features of the data distribution are studied completely from the sampled samples, so that fig. 10 (a) shows the effect before and after the multimedia data sample distribution estimation, and fig. 10 (B) shows the effect after the multimedia data sample distribution estimation.

Step 906, performing loss calculation based on the mean square error and the kernel density estimation result to obtain loss information corresponding to the multimedia data sample.

Specifically, the server performs loss calculation based on the mean square error and the kernel density estimation result to obtain loss information corresponding to the multimedia data sample, that is, the loss information corresponding to the multimedia data sample specifically considers two dimensions of the mean square error and the kernel density estimation result. Specifically, the loss information is calculated by the formula (3):

Wherein, loss _i Loss information representing a multimedia data sample i, KDE (Y _i ) Representing the core density estimate of the multimedia data samples i,representing the mean square error of the multimedia data samples i.

Step 908 adjusts model parameters of the initial distribution potential prediction model based on the loss information.

Specifically, the server updates model parameters of the initial distribution potential prediction model through the loss information, namely, the server judges whether the loss function of the initial distribution potential prediction model reaches a convergence condition according to the loss information, and if the loss function does not reach the convergence condition, the model parameters of the initial distribution potential prediction model are updated through the loss information. Based on the above, until the loss function of the initial distribution potential prediction model reaches a convergence condition, the distribution potential prediction model is obtained according to the model parameters obtained after the last update of the model parameters, so that the distribution potential of the multimedia data is predicted by the distribution potential prediction model obtained through training in practical application. The manner in which the particular iterative update is performed is not limited herein.

In this embodiment, the approach degree between the sample distribution potential and the prediction distribution potential is considered through the mean square error, and the problem of long-tail distribution is solved through the kernel density estimation result, namely, the loss weight of the multimedia data sample of the distribution potential is further improved, the initial distribution potential prediction model is further enabled to learn to the feature information of the truly valuable high distribution potential, iterative updating is carried out through the difference between the sample distribution potential and the prediction distribution potential, and the truly valuable high distribution potential can be predicted by the distribution potential prediction model obtained through training, namely, the reliability of the distribution potential prediction is guaranteed.

In one embodiment, as shown in fig. 11, adjusting model parameters of an initial distribution potential prediction model based on sample distribution potential and predicted distribution potential, includes:

and 1102, normalizing the sample distribution potential and the predicted distribution potential to obtain the normalized sample distribution potential and the normalized predicted distribution potential.

Wherein the normalization process is used to place the distribution potential between 0 and 1.

Specifically, the terminal normalizes the sample distribution potential and the predicted distribution potential to obtain the normalized sample distribution potential and the normalized predicted distribution potential. The sample distribution potential after normalization and the predicted distribution potential after normalization are both 0-1, and normalization is specifically performed by the formula (4):

wherein Label is _VV Represents the sample distribution potential after normalization processing, or the predicted distribution potential after normalization processing, and VV represents the sample distribution potential, or the predicted distribution potential. I.e. VV, represents the sample distribution potential, label _VV Representing the sample distribution potential after normalization. Similarly, when VV represents the predicted distribution potential, label _VV Representing the predicted distribution potential after normalization processing.

Step 1104, adjusting model parameters of the initial distribution potential prediction model based on the normalized sample distribution potential and the normalized prediction distribution potential.

Specifically, the terminal calculates loss information based on the normalized sample distribution potential and the normalized predicted distribution potential, and updates model parameters of the initial distribution potential prediction model through the loss information. The calculation of the loss information and how to update the model parameters are similar to the foregoing embodiments, and will not be repeated here.

In this embodiment, unified sample distribution potential and statistical distribution of prediction distribution potential can be induced through normalization processing, so as to ensure that the calculated loss information is performed in the same information description dimension, and ensure accuracy and reliability of the calculated loss function, thereby improving reliability of adjusting model parameters, and obtaining a more reliable model training effect.

In one embodiment, as shown in fig. 12, determining the data content information and the data description information of the candidate multimedia data includes:

Step 1202, title text information and content description information of candidate multimedia data are determined.

Specifically, title text information and content description information of candidate multimedia data are determined. That is, the server performs data information analysis and extraction on the candidate multimedia data to determine data content information of the candidate multimedia data. That is, the server may extract text-related data in the candidate multimedia data, such as text titles, text summaries, etc., of the candidate multimedia data from the text modality dimension, again without limitation. Thus, the text modality data information includes at least: title text information and content description information.

In step 1204, video frame extraction processing is performed on the multimedia data to obtain video frame data satisfying the number condition.

Specifically, the server performs video frame extraction processing on the multimedia data to obtain video frame data meeting the quantity condition. The server obtains video frame data meeting the quantity condition from the multimedia data through a preset frame extraction frequency, wherein the preset frame extraction frequency can be 1 s/frame, 0.5 s/frame, 2 s/frame and the like, and the method is not limited herein. Thus, the image modality data information includes at least video frame data.

Further, in order to balance the efficiency of processing the data and the complexity of the data information, the number of the data frames is 32 frames, and the video frames of 32 frames are selected from the multimedia data through the preset frame extraction frequency to form the required video frame data. It can be understood that in practical application, if the duration angle of the multimedia data or the video frames are small, that is, when 32 frames of video frames cannot be acquired through video frame extraction processing, that is, the number of video frames in the video frame data is determined through the number of specific frame extraction of the preset frame extraction frequency, that is, the number of video frames can be smaller than 32 frames.

It will be appreciated that where the multimedia data is video and includes cover images, the image modality data information may also include cover image frame data.

In step 1206, data creator information and data meta information of the candidate multimedia data are acquired.

Wherein, the data description information at least comprises: data creator information and data meta information.

Specifically, the server specifically extracts data description information of the candidate multimedia data, that is, description information of various types of data in the candidate multimedia data. The data creator information at least includes: detailed information of a data creator, attention quantity of the data creator, ratings of the data creator on different platforms and the like. The data meta-information is described in detail in the foregoing embodiments, and is not described in detail here.

In this embodiment, extraction of different mode data is performed on data content information in multimedia data through text mode dimensions and image mode dimensions, so as to ensure comprehensiveness and reliability of the data content information. And providing content description information with more information dimensions from information related to a data creator and data meta-information describing the data so as to ensure the comprehensiveness and reliability of the content description information, so that more comprehensive and reliable characteristic information can be obtained when characteristic extraction is carried out later, more-dimensional characteristic information can be obtained when the distribution potential prediction model carries out distribution potential prediction, and the obtained distribution potential is more accurate and reliable.

In one embodiment, as shown in fig. 13, the data content information includes at least: text modality data information and image modality data information.

Based on the data content characteristics corresponding to the data content information and the data description characteristics corresponding to the data description information are obtained, and the method comprises the following steps:

in step 1302, feature extraction is performed on the text mode data information and the image mode data information, so as to obtain text mode features corresponding to the text mode data information and image mode features corresponding to the image mode data information.

Specifically, the terminal performs feature extraction on the text mode data information and the image mode data information respectively to obtain text mode features corresponding to the text mode data information and image mode features corresponding to the image mode data information. And the text modality data information includes at least: title text information and content description information, so text modality features include at least: title text features and content description features. Similarly, the image modality data information at least includes: video frame data and cover image frames. The image modality features therefore include at least: video frame features and cover image frame features.

In step 1304, feature extraction is performed on the data description information to obtain data description features.

Specifically, feature extraction is performed on the data description information, and data description features are obtained.

In this embodiment, feature extraction is performed on data information of different modalities, so that reliability and universality of feature extraction are achieved.

In one embodiment, as shown in fig. 14, feature extraction is performed on text mode data information and image mode data information to obtain text mode features corresponding to the text mode data information and image mode features corresponding to the image mode data information, including:

Step 1402, extracting features of the text mode data information and the image mode data information through a Deep network in the distribution potential prediction model, and obtaining text mode features and image mode features.

The server specifically performs feature extraction on the text mode data information and the image mode data information through a Deep network in the distribution potential prediction model to obtain text mode features and image mode features. The Deep network is a trained video mode pre-training model, the video mode pre-training model is obtained after training based on a video mode multimedia data sample, and the video mode multimedia data sample at least comprises: text content information and visual content information.

Based on the above, the feature extraction is performed on the data description information to obtain the data description feature, which comprises the following steps:

in step 1404, feature extraction is performed on the data description information through a linear Wide network in the distribution potential prediction model, so as to obtain data description features.

Specifically, the server performs feature extraction on the data description information through a linear Wide network in the distribution potential prediction model to acquire data description features. The linear Wide network is specifically a multi-layer perceptron MLP. The MLP is described in detail below:

The MLP is also called an artificial neural network (ANN, artificial Neural Network), and may have a plurality of hidden layers in the middle of the MLP, except for the input/output layers, in this embodiment, the MLP includes one hidden layer, that is, the MLP is specifically configured of three layers, and for convenience in understanding the structure of the MLP in this embodiment, as shown in fig. 15, the layers in the multi-layer perceptron are fully connected, that is, the input layer 502, the hidden layer 504, and the output layer 506 are fully connected. The input layer 502, i.e. the input vector X1 output is the vector X1, and similarly, the input vector X2 output is the vector X2, i.e. the input is an N-dimensional vector, and then the input layer 502 has N neurons. Based on this, the input layer in this embodiment receives the network vector corresponding to the data description information.

Further, the neurons in the hidden layer 504 are fully connected with the input layer 502, and if the input layer 502 is represented by the vector X, the corresponding output of the hidden layer 504 is f (W ₁ X+b1), the aforementioned W ₁ Representing weights (which may also be referred to as connection coefficients), b1 represents bias parameters, f () is an activation function, which may be a commonly used sigmoid function or a tanh function, which in this embodiment is specifically selected as the activation function. Second, the hidden layer 504 to the output layer 506 can be regarded as a class of logistic regression, i.e., softmax regression, and based on the foregoing description, the output of the output layer 506 is softmax (W ₂ X ₁ +b2)，X ₁ Represents the output f (W) of the hidden layer 504 ₁ X+b1). The aforementioned W ₂ Representing weights (which may also be referred to as connection coefficients), b2 represents the bias parameters.

Based on this, in this embodiment, the output layer in the MLP outputs the vector of the second hidden layer, and through the three-layer MLP layer described in fig. 15, a K-dimensional low weft dense vector is output. Therefore, the data description information is extracted through the MLP, and the obtained K-dimensional low-latitude dense vector is the data description feature.

It can be understood that the distribution potential corresponding to the candidate multimedia data can also be obtained through the output layer pair in the distribution potential prediction model, in this embodiment, the output layer also includes an MLP, that is, the data content feature and the data description feature are spliced and then used as an MLP input, and the output layer in the MLP inputs the obtained low-latitude dense vector to the Softmax (sigmoid) layer, so as to obtain the distribution potential corresponding to the candidate multimedia data. Specifically, the Softmax function in the Softmax layer is a generalization of the logistic function over multiple classification problems, so the Softmax layer can ultimately compress an N-dimensional real vector into an N-dimensional real that satisfies certain conditions.

In order to facilitate understanding of the specific structure of the distribution potential prediction model, as shown in fig. 16, feature extraction is performed on the data description information 1601 through the MLP network 1602 in the distribution potential prediction model to obtain the data description feature 1603, next, feature extraction is performed on the data content information 1604 through the video modality pre-training model 1605 that has completed training in the distribution potential prediction model to obtain the data content feature 1606, and then splicing processing is performed on the data description feature 1603 and the data content feature 1606 to obtain the spliced data description feature and the data content feature 1607. That is, if the data content features are 256-dimensional feature vectors and the data description features are 256-dimensional feature vectors, the data content features and the data description features are spliced to obtain 512-dimensional feature vectors. The spliced data description features and data content features 1607 are then processed through the MLP network 1608, that is, the data description features and data content features 1607 are compressed into distribution potentials 1609 satisfying specific conditions, where the distribution potentials 1609 are 2-dimensional results, that is, the distribution potentials specifically include: the number of browsing times of the candidate multimedia data, and the browsing integrity of the candidate multimedia data after being browsed.

In this embodiment, the video modality pre-training model can learn semantic information of a video modality, and the video modality can include a plurality of modality data such as text, image and audio, so that the trained video modality pre-training model can help to distribute basic features of data content information better extracted by the potential prediction model. And secondly, the multi-dimensional characteristics of the data description information can be extracted more efficiently and accurately through the MLP, so that the obtained data content characteristics and the data description characteristics can accurately and completely describe candidate multimedia data, the effect of predicting distribution potential of a subsequent model is improved, and the reliability of the determined target multimedia data is improved.

The feature extraction mode when the Wide network is specifically MLP is described in detail in the foregoing, and the Deep network will be described in detail below, namely, how to train to obtain the video modality pre-training model is described as follows:

in one embodiment, as shown in fig. 17, the training method of the video modality pre-training model includes:

in step 1702, a video modality multimedia data sample is obtained.

Specifically, a server obtains a video modality multimedia data sample. The video modality multimedia data samples include at least: text content information, visual content information, and audio content information.

Step 1704, extracting data content from the video mode multimedia data sample, and obtaining text content information, visual content information, and a data classification label corresponding to the video mode multimedia data sample.

Specifically, the server performs data content extraction on the video-mode multimedia data sample to obtain text content information, visual content information and a data classification label corresponding to the video-mode multimedia data sample. Wherein, the text content information at least comprises: text title information of video content, and content description information of video content. Next, the visual content information includes at least: video frame data obtained by frame extraction processing of the video mode multimedia data sample and a video cover map of the video mode multimedia data sample.

And the data classification Tag is video Tag information extracted from visual content information in a video modality multimedia data sample based on EfficientB 3. The video Tag information is a text description of the video content information in a finer granularity in a video mode multimedia data sample, for example, the video mode multimedia data sample is the video content of the introduction a mobile phone, and the data classification label may include: A. m40, shooting a magic cube.

Step 1706, extracting text features corresponding to the text content information, visual features corresponding to the visual content information, and classification features corresponding to the data classification labels.

Specifically, the server specifically extracts text features corresponding to the text content information, visual features corresponding to the visual content information, and classification features corresponding to the data classification tags. The text features at least comprise text title features and content description features, the visual features at least comprise video frame features and video cover map features, and the classification features are data classification tag features.

Based on this, in order to facilitate understanding of the structure of the video modality pre-training model, as shown in fig. 18, data content extraction is performed on the video modality multimedia data sample 1801 to obtain visual content information 1802 and text content information 1803, and at this time, video Tag information (i.e., a data classification Tag) is extracted on the basis of the visual content information 1802 in the video modality multimedia data sample 1801 by using the EfficientB3, thereby obtaining classification features 1804 corresponding to the data classification Tag, visual features 1805 corresponding to the visual content information, and text features 1806 corresponding to the text content information.

And 1708, training the initial video mode pre-training model based on the text features, the visual features and the classification features to obtain the video mode pre-training model.

Specifically, the server trains the initial video mode pre-training model based on text features, visual features and classification features to obtain the video mode pre-training model.

Further, in order to obtain a more excellent video modality pre-training model, as shown in fig. 18, the video modality pre-training model of the subject adopts a transducer structure, and the video modality pre-training model specifically adopts a plurality of pre-training tasks, at least including a model optimization method such as a visual transducer, a text transducer structural feature fusion, a feature layer cross-modal fusion, and the like. The visual transducer is specifically: the method comprises the steps of carrying out model optimization on the characteristics of the visual dimension through the video frame characteristics and the video cover map characteristics in the visual characteristics, wherein the text transform structural characteristic fusion is specifically carried out by carrying out model optimization on the characteristics of the text dimension through the text title characteristics, the content description characteristics and the data classification label characteristics, and the cross-modal fusion of the characteristic layers is that the visual dimension and the characteristics of the text dimension are subjected to modal fusion. Based on the above, the following three pre-training pre-tasks are specifically used in the process of training the initial video modality pre-training model: mask language modeling (Masked Language Modeling, MLM), mask image frame modeling (Masked Frame Modeling, MFM) and Visual-Text modeling (VTM).

In this embodiment, the video modality pre-training model can learn semantic information of a video modality, and the video modality can include a plurality of modality data such as text, image and audio, so that the trained video modality pre-training model can help to distribute basic features of data content information better extracted by the potential prediction model.

Based on the detailed description of the foregoing embodiments, a complete flow of the method for distributing multimedia data according to the embodiment of the present application will be described below, and as shown in fig. 19, the method is described by taking an application of the method to the server in fig. 1 as an example, it will be understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

step 1901, acquiring an initial multimedia data sample set and historical browsing data corresponding to each initial multimedia data sample in the initial multimedia data sample set.

The historical browsing data comprises browsing times of the initial multimedia data sample in a historical statistical time interval, browsing integrity of the initial multimedia data sample after being browsed, actual browsing data quantity of the initial multimedia data sample, and click rate and access quantity of the initial multimedia data sample in the historical statistical time interval.

Specifically, the server acquires the initial multimedia data sample set and the historical browsing data corresponding to each initial multimedia data sample in the initial multimedia data sample set by the method described in the foregoing embodiment.

Step 1902, based on the browsed data amount, click rate and access amount corresponding to each initial multimedia data sample, screening the abnormal data sample from the initial multimedia data sample set to obtain a candidate multimedia data sample set.

Step 1903, selecting a multimedia data sample from the candidate multimedia data sample set based on the browsing times and browsing completeness corresponding to each candidate multimedia data sample.

The multimedia data samples comprise positive samples and negative samples, the sample distribution potential matched by the positive samples belongs to a first distribution potential range, and the sample distribution potential matched by the negative samples belongs to a second distribution potential range.

Step 1904 configures a first penalty weight for the first distribution potential range and a second penalty weight for the second distribution potential range, the first penalty weight being greater than the second penalty weight.

Wherein the first loss weight is greater than the second loss weight.

And step 1905, performing mean square error calculation on the sample distribution potential and the prediction distribution potential to obtain the mean square error of the multimedia data sample.

Specifically, the server performs mean square error calculation on the sample distribution potential and the prediction distribution potential to obtain the mean square error of the multimedia data sample. Specifically, the mean square error is calculated by the formula (2).

In step 1906, the core density estimation is performed on the sample distribution potential and the prediction distribution potential, so as to obtain the core density estimation result of the multimedia data sample.

The kernel density estimation result is used for describing probability density function estimation of the multimedia data sample, namely, KDE is used for estimating probability density function estimation of the multimedia data sample. Specifically, the server performs core density estimation on the sample distribution potential and the predicted distribution potential to obtain a core density estimation result of the multimedia data sample.

In step 1907, a loss calculation is performed based on the mean square error and the kernel density estimation result, so as to obtain loss information corresponding to the multimedia data sample.

Specifically, the server performs loss calculation based on the mean square error and the kernel density estimation result to obtain loss information corresponding to the multimedia data sample, that is, the loss information corresponding to the multimedia data sample specifically considers two dimensions of the mean square error and the kernel density estimation result. Specifically, the loss information is calculated by the aforementioned formula (3).

In step 1908, model parameters of the initial distribution potential prediction model are adjusted by the first loss weight and the second loss weight based on the loss information.

Specifically, the server adjusts model parameters of the initial distribution potential prediction model by the first loss weight and the second loss weight based on the loss information. That is, when the sample distribution potential belongs to the first distribution potential range, model parameters of the initial distribution potential prediction model are adjusted through loss information and first loss weights. And secondly, when the sample distribution potential belongs to the second distribution potential range, adjusting model parameters of the initial distribution potential prediction model through loss information and second loss weight.

Step 1909, obtaining candidate multimedia data, and determining data content information and data description information of the candidate multimedia data.

The candidate multimedia data is multimedia data with a distribution potential to be predicted, namely the candidate multimedia data is multimedia data to be selected and distributed to users. And the candidate multimedia data may specifically include one or more multimedia data. Based on this, the data content information is data content in the candidate multimedia data, and the data content information includes data information of one or more modalities, such as text modality data information, image modality data information, audio modality data information, and the like. The data description information is information for describing candidate multimedia data, and at least includes: data creator information and data meta information.

Specifically, the server acquires candidate multimedia data, and specifically performs data information analysis on the candidate multimedia data to determine data content information and data description information of the candidate multimedia data.

In step 1910, the data content characteristics corresponding to the data content information are obtained through the deep network in the distribution potential prediction model.

The depth network is a trained video mode pre-training model, the video mode pre-training model is obtained after training based on a video mode multimedia data sample, and the video mode multimedia data sample at least comprises: text content information and visual content information. Namely, the server obtains the data content characteristics corresponding to the data content information through the depth network in the distribution potential prediction model based on the method described in the previous embodiment.

Step 1911, acquiring data description features corresponding to the data description information through a linear network in the distribution potential prediction model.

The linear network is specifically a multi-layer perceptron MLP. That is, the server obtains the data description characteristic corresponding to the data description information through the linear network in the distribution potential prediction model based on the method described in the previous embodiment.

Step 1912, obtaining distribution potential corresponding to the candidate multimedia data through the distribution potential prediction model based on the data content characteristics and the data description characteristics.

Wherein the distribution potential comprises at least: the number of times the candidate multimedia data is browsed, and browsing integrity after the candidate multimedia data is browsed. Specifically, the server outputs the distribution potential corresponding to the predicted candidate multimedia data through a distribution potential prediction model obtained through training based on the data content characteristics and the data description characteristics.

In step 1913, target multimedia data is selected for distribution to the user according to the distribution potential corresponding to the candidate multimedia data.

Specifically, if the candidate multimedia data is one multimedia data, the terminal determines the target multimedia data to be distributed to the user from the plurality of candidate multimedia data based on the distribution potential corresponding to each of the plurality of candidate multimedia data. If the candidate multimedia data comprises a plurality of multimedia data, the terminal determines target multimedia data distributed to the user from the candidate multimedia data based on the distribution potential corresponding to the candidate multimedia data.

It should be appreciated that the specific implementation of steps 1901 to 1913 is similar to the previous embodiments, and will not be repeated here.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a multimedia data distribution device for realizing the above-mentioned multimedia data distribution method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the apparatus for distributing multimedia data provided below may refer to the limitation of the method for distributing multimedia data hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 20, there is provided a multimedia data distribution apparatus, comprising: a data information acquisition module 2002, a feature acquisition module 2004, a distribution potential prediction module 2006, and a multimedia data selection module 2008, wherein:

the data information acquisition module 2002 is configured to acquire candidate multimedia data, and determine data content information and data description information of the candidate multimedia data, where the data content information includes data of one or more modalities;

a feature acquiring module 2004, configured to acquire a data content feature corresponding to the data content information and a data description feature corresponding to the data description information;

the distribution potential prediction module 2006 is configured to obtain, based on the data content feature and the data description feature, a distribution potential corresponding to the candidate multimedia data through a distribution potential prediction model, where the distribution potential at least includes: the number of browses of the candidate multimedia data;

The multimedia data selecting module 2008 is configured to select target multimedia data according to the distribution potential corresponding to the candidate multimedia data, where the target multimedia data is used for being distributed to the user.

In one embodiment, as shown in fig. 21, the multimedia data distribution apparatus further includes a distribution potential prediction model acquisition module 2102;

a distribution potential prediction model obtaining module 2102, configured to obtain a multimedia data sample, and a sample distribution potential matched with the multimedia data sample, where the sample distribution potential at least includes: the number of times the multimedia data sample is browsed, and browsing integrity after the multimedia data sample is browsed; and determining sample data content information of the multimedia data samples and sample data description information matched with the multimedia data samples; acquiring sample data content characteristics corresponding to the sample data content information and sample data description characteristics corresponding to the sample data description information; obtaining a predicted distribution potential corresponding to the multimedia data sample based on the initial distribution potential prediction model through the sample data content characteristics and the sample data description characteristics; and based on the sample distribution potential and the predicted distribution potential, model parameters of the initial distribution potential prediction model are adjusted to obtain the distribution potential prediction model.

In one embodiment, the distribution potential prediction model obtaining module 2102 is further configured to obtain an initial multimedia data sample set, and historical browsing data corresponding to each initial multimedia data sample in the initial multimedia data sample set, where the historical browsing data includes a number of times the initial multimedia data sample is browsed in a historical statistics time interval, and browsing integrity of the initial multimedia data sample after being browsed; and selecting a multimedia data sample from the initial multimedia data sample set based on the historical browsing data corresponding to each initial multimedia data sample, and determining the sample distribution potential matched with the multimedia data sample.

In one embodiment, the historical browsing data further includes an actual browsing data amount of the initial multimedia data sample, and a click rate and an access amount of the initial multimedia data sample within the historical statistics time interval;

the distribution potential prediction model obtaining module 2102 is further configured to screen abnormal data samples from the initial multimedia data sample set based on the browsed data amount, the click rate and the access amount corresponding to each initial multimedia data sample, so as to obtain a candidate multimedia data sample set; and selecting the multimedia data sample from the candidate multimedia data sample set based on the browsed times and the browsed integrity of each candidate multimedia data sample.

In one embodiment, the multimedia data samples include positive samples and negative samples, the sample distribution potential matched by the positive samples belongs to a first distribution potential range, and the sample distribution potential matched by the negative samples belongs to a second distribution potential range;

the distribution potential prediction model obtaining module 2102 is further configured to configure a first loss weight for the first distribution potential range, and configure a second loss weight for the second distribution potential range, where the first loss weight is greater than the second loss weight; and based on the sample distribution potential and the predicted distribution potential, adjusting model parameters of the initial distribution potential prediction model through the first loss weight and the second loss weight.

In one embodiment, the distribution potential prediction model acquisition module 2102 is further configured to, during adjustment of the model parameters: performing mean square error calculation on the sample distribution potential and the prediction distribution potential to obtain the mean square error of the multimedia data sample, wherein the mean square error is used for describing the closeness between the sample distribution potential and the prediction distribution potential; performing kernel density estimation on the sample distribution potential and the prediction distribution potential to obtain a kernel density estimation result of the multimedia data sample, wherein the kernel density estimation result is used for describing probability density function estimation of the multimedia data sample; performing loss calculation based on the mean square error and the kernel density estimation result to obtain loss information corresponding to the multimedia data sample; and adjusting model parameters of the initial distribution potential prediction model based on the loss information.

In one embodiment, the distribution potential prediction model obtaining module 2102 is further configured to normalize the sample distribution potential and the prediction distribution potential to obtain a normalized sample distribution potential and a normalized prediction distribution potential; and adjusting model parameters of the initial distribution potential prediction model based on the normalized sample distribution potential and the normalized prediction distribution potential.

In one embodiment, the data information obtaining module 2002 is further configured to determine title text information and content description information of the candidate multimedia data; performing video frame extraction processing on the multimedia data to obtain video frame data meeting the quantity condition; acquiring data creator information and data meta information of candidate multimedia data; the text modal data information at least comprises: title text information and content description information; the image modality data information includes at least video frame data; the data description information at least comprises: data creator information and data meta information.

In one embodiment, the data content information includes at least: text modality data information and image modality data information;

The feature obtaining module 2004 is further configured to perform feature extraction on the text mode data information and the image mode data information, and obtain a text mode feature corresponding to the text mode data information and an image mode feature corresponding to the image mode data information; and extracting the characteristics of the data description information to obtain the data description characteristics.

In one embodiment, the feature obtaining module 2004 is further configured to perform feature extraction on the text modality data information and the image modality data information through a depth network in the distribution potential prediction model, to obtain text modality features and image modality features; and extracting the characteristics of the data description information through a linear Wide network in the distribution potential prediction model to acquire the data description characteristics.

In one embodiment, the Deep network is a trained video modality pre-training model, the video modality pre-training model being obtained after training based on video modality multimedia data samples, the video modality multimedia data samples comprising at least: text content information and visual content information; the linear Wide network is specifically a multi-layer perceptron MLP.

In one embodiment, the multimedia data distribution apparatus further includes a training module 2104 of a video modality pre-training model;

The training module 2104 of the video modality pre-training model is used for acquiring a video modality multimedia data sample; extracting data content from the video mode multimedia data sample to obtain text content information, visual content information and a data classification label corresponding to the video mode multimedia data sample; extracting text features corresponding to the text content information, visual features corresponding to the visual content information and classification features corresponding to the data classification labels; and training the initial video mode pre-training model based on the text features, the visual features and the classification features to obtain the video mode pre-training model.

The modules in the multimedia data distribution apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 22. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing multimedia data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of distributing multimedia data.

It will be appreciated by those skilled in the art that the structure shown in FIG. 22 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above. :

in an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical feature information of the above embodiments may be arbitrarily combined, and for brevity of description, all possible combinations of the technical feature information in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical feature information, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of distributing multimedia data, the method comprising:

And obtaining distribution potentials corresponding to the candidate multimedia data through a distribution potential prediction model based on the data content characteristics and the data description characteristics, wherein the distribution potentials at least comprise: the browsed times of the candidate multimedia data;

2. The method according to claim 1, characterized in that the method for obtaining the distribution potential prediction model specifically comprises:

acquiring a multimedia data sample and a sample distribution potential matched with the multimedia data sample, wherein the sample distribution potential at least comprises: the browsed times of the multimedia data samples and the browsed integrity of the multimedia data samples after being browsed;

determining sample data content information of the multimedia data sample and sample data description information matched with the multimedia data sample;

acquiring sample data content characteristics corresponding to the sample data content information and sample data description characteristics corresponding to the sample data description information;

Obtaining a predicted distribution potential corresponding to the multimedia data sample based on an initial distribution potential prediction model through the sample data content characteristics and the sample data description characteristics;

and adjusting model parameters of the initial distribution potential prediction model based on the sample distribution potential and the prediction distribution potential to obtain the distribution potential prediction model.

3. The method of claim 2, wherein the acquiring the multimedia data sample and the sample distribution potential to which the multimedia data sample matches comprises:

acquiring an initial multimedia data sample set and historical browsing data corresponding to each initial multimedia data sample in the initial multimedia data sample set, wherein the historical browsing data comprises the browsed times of the initial multimedia data sample in a historical statistical time interval and the browsing integrity of the initial multimedia data sample after being browsed;

and selecting the multimedia data samples from the initial multimedia data sample set based on the historical browsing data corresponding to each initial multimedia data sample, and determining sample distribution potential matched with the multimedia data samples.

4. The method of claim 3, wherein the historical browsing data further comprises an actual browsing data amount of the initial multimedia data sample, and a click rate and access amount of the initial multimedia data sample over the historical statistics time interval;

the selecting the multimedia data sample from the initial multimedia data sample set based on the historical browsing data corresponding to each initial multimedia data sample, including:

based on the browsed data quantity, click rate and access quantity corresponding to each initial multimedia data sample, screening abnormal data samples of the initial multimedia data sample set to obtain candidate multimedia data sample sets;

and selecting the multimedia data samples from the candidate multimedia data sample sets based on the browsed times and the browsed integrity of each candidate multimedia data sample.

5. The method of claim 2, wherein the multimedia data samples comprise positive samples and negative samples, wherein the sample distribution potential to which the positive samples are matched belongs to a first distribution potential range, and wherein the sample distribution potential to which the negative samples are matched belongs to a second distribution potential range;

The method further comprises the steps of:

configuring a first loss weight for the first distribution potential range and a second loss weight for the second distribution potential range, the first loss weight being greater than the second loss weight;

the adjusting model parameters of the initial distribution potential prediction model based on the sample distribution potential and the predicted distribution potential includes:

model parameters of the initial distribution potential prediction model are adjusted by the first loss weight and the second loss weight based on the sample distribution potential and the predicted distribution potential.

6. The method of claim 2, wherein adjusting model parameters of the initial distribution potential prediction model based on the sample distribution potential and the predicted distribution potential comprises:

during the process of adjusting the model parameters:

performing mean square error calculation on the sample distribution potential and the prediction distribution potential to obtain a mean square error of the multimedia data sample, wherein the mean square error is used for describing the closeness between the sample distribution potential and the prediction distribution potential;

Performing kernel density estimation on the sample distribution potential and the prediction distribution potential to obtain a kernel density estimation result of the multimedia data sample, wherein the kernel density estimation result is used for describing probability density function estimation of the multimedia data sample;

performing loss calculation based on the mean square error and the kernel density estimation result to obtain loss information corresponding to the multimedia data sample;

model parameters of the initial distribution potential prediction model are adjusted based on the loss information.

7. The method of claim 2, wherein said adjusting model parameters of the initial distribution potential prediction model based on the sample distribution potential and the predicted distribution potential comprises:

normalizing the sample distribution potential and the predicted distribution potential to obtain the normalized sample distribution potential and the normalized predicted distribution potential;

and adjusting model parameters of the initial distribution potential prediction model based on the normalized sample distribution potential and the normalized prediction distribution potential.

8. The method of claim 1, wherein said determining data content information and data description information of the candidate multimedia data comprises:

determining title text information and content description information of the candidate multimedia data;

performing video frame extraction processing on the multimedia data to obtain video frame data meeting a quantity condition;

acquiring data creator information and data meta information of the candidate multimedia data;

wherein the text modality data information at least includes: the title text information and the content description information;

the image modality data information includes at least the video frame data;

the data description information at least comprises: the data creator information and the data meta information.

9. The method according to claim 1, wherein the data content information comprises at least: text modality data information and image modality data information;

the obtaining the data content characteristics corresponding to the data content information and the data description characteristics corresponding to the data description information includes:

extracting features of the text modal data information and the image modal data information to obtain text modal features corresponding to the text modal data information and image modal features corresponding to the image modal data information;

And extracting the characteristics of the data description information to obtain the data description characteristics.

10. The method according to claim 9, wherein the feature extracting the text modality data information and the image modality data information to obtain the text modality feature corresponding to the text modality data information and the image modality feature corresponding to the image modality data information includes:

extracting features of the text modal data information and the image modal data information through a depth network in the distribution potential prediction model to obtain the text modal features and the image modal features;

the step of extracting the characteristics of the data description information to obtain the data description characteristics includes:

and extracting the characteristics of the data description information through a linear network in the distribution potential prediction model to acquire the data description characteristics.

11. The method of claim 10, wherein the depth network is a trained video modality pre-training model, the video modality pre-training model being based on video modality multimedia data samples, the video modality multimedia data samples comprising at least: text content information and visual content information;

The linear network is in particular a multi-layer perceptron MLP.

12. The method of claim 11, wherein the training method of the video modality pre-training model comprises:

acquiring a video mode multimedia data sample;

extracting data content from the video mode multimedia data sample to obtain the text content information _、 The visual content information and the data classification label corresponding to the video mode multimedia data sample;

extracting text features corresponding to the text content information, visual features corresponding to the visual content information and classification features corresponding to the data classification labels;

and training an initial video mode pre-training model based on the text features, the visual features and the classification features to obtain the video mode pre-training model.

13. A multimedia data distribution apparatus, the apparatus comprising:

the data information acquisition module is used for acquiring candidate multimedia data, and determining data content information and data description information of the candidate multimedia data, wherein the data content information comprises data of one or more modes;

the distribution potential prediction module is configured to obtain, based on the data content feature and the data description feature, a distribution potential corresponding to the candidate multimedia data through a distribution potential prediction model, where the distribution potential at least includes: the browsed times of the candidate multimedia data;

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.

15. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.