CN115730111A - Content distribution method, device, equipment and computer readable storage medium - Google Patents

Content distribution method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN115730111A
CN115730111A CN202111670560.4A CN202111670560A CN115730111A CN 115730111 A CN115730111 A CN 115730111A CN 202111670560 A CN202111670560 A CN 202111670560A CN 115730111 A CN115730111 A CN 115730111A
Authority
CN
China
Prior art keywords
content
identifier
distributed
vector
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111670560.4A
Other languages
Chinese (zh)
Other versions
CN115730111B (en
Inventor
刘刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of CN115730111A publication Critical patent/CN115730111A/en
Application granted granted Critical
Publication of CN115730111B publication Critical patent/CN115730111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The application provides a content distribution method, a device, equipment and a computer readable storage medium; the method comprises the following steps: acquiring a feature vector of an identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content; determining an identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information; determining similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed; recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed; and sequencing the at least one target distribution identifier by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identifier based on the sequencing result. By the method and the device, account number recall and multimedia content distribution are performed by utilizing vector similarity, and the cold start speed of the newly registered interactive identification can be increased.

Description

Content distribution method, device, equipment and computer readable storage medium
Description of the priority
The application has the application number of 2021110202813, and the application date is as follows: 2021, 9/1, entitled priority of content distribution methods, apparatus, devices, and computer readable storage media.
Technical Field
The present application relates to internet technologies, and in particular, to a content distribution method, apparatus, device, and computer-readable storage medium.
Background
With the rapid development of the internet, the whole media era is also rapidly changed, a new media era of mobile social interaction is created in the era of increasingly popular application of mobile terminals, a platform which enables users to make sounds, share, tell grooves and propagate is called 'self media' in a new media platform, the media information is displayed in an information flow mode, and a distribution mode in which the users can fully interact with the information is greatly developed. The inclusion of premium content from media content producers behind them becomes the subject of these platforms to follow each other, which attracts a large number of users, attracts the attention of users, and brings huge traffic, thereby creating huge commercial value.
For the self-media platform, how the account accelerates the cold start and the recommendation system are effectively combined, the main scheme at present is that an operation co-worker configures a white list mechanism for account cultivation, and quota flow is given to the account to try the cold start or to recall through characteristic information, but the potential account content is not started, and the characteristic information of content distribution precipitation is little. Meanwhile, in this stage, because an accurate target user cannot be found quickly, the use efficiency of the flow is very low; and because of the head effect in the self-media ecology, a large amount of flow can be shunted by the head account, so that the content distribution and the user perception of the high-quality potential account are further reduced.
Disclosure of Invention
The embodiment of the application provides a content distribution method, a content distribution device and a computer-readable storage medium, wherein account recall and content distribution are performed by utilizing vector similarity, and the cold start speed of a newly registered account can be increased.
The technical scheme of the embodiment of the application is realized as follows:
an embodiment of the present application provides a content method, including:
acquiring a feature vector of an identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content;
determining an identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information;
determining similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed;
recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed;
and sequencing the at least one target distribution identifier by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identifier based on the sequencing result.
An embodiment of the present application provides a content distribution apparatus, including:
the first acquisition module is used for acquiring a feature vector of an identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content;
a first determining module, configured to determine an identifier vector of the identifier to be distributed based on the feature vector, the content vector, and the interaction information;
the second determining module is used for determining the similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed;
the recall module is used for recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed;
and the content distribution module is used for sequencing the at least one target distribution identifier by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identifier based on the sequencing result.
In some embodiments, the first obtaining module is further configured to:
determining vertical class vectors corresponding to the to-be-distributed identifiers based on the vertical class information of the N pieces of published content which are recently published by the to-be-distributed identifiers;
determining M content tags of the identifier to be distributed based on the content tags of the content released by the identifier to be distributed in a preset time period, and vectorizing the M content tags to obtain M tag vectors;
determining the verticality of the identifier to be distributed, and vectorizing the verticality to obtain a verticality vector of the identifier to be distributed;
and determining the vertical vector, the M label vectors and the verticality vector as the feature vector of the identifier to be distributed.
In some embodiments, the first obtaining module is further configured to:
acquiring N first-level vertical information of N published contents which are recently published by the identifier to be distributed;
vectorizing the N pieces of first-level vertical information to obtain N pieces of first-level vertical vectors;
acquiring N pieces of secondary vertical information of the N pieces of published content; vectorizing the N pieces of secondary vertical information to obtain N pieces of secondary vertical vectors;
and determining the N primary vertical vectors and the N secondary vertical vectors as vertical vectors corresponding to the identifier to be distributed.
In some embodiments, the first obtaining module is further configured to:
determining the content number corresponding to each level of verticality based on the information of each level of verticality of the content published in the preset time period;
determining the vertical class proportion corresponding to each first-level vertical class based on the content number corresponding to each first-level vertical class and the total content number of the released content in the preset time period;
determining the verticality of the mark to be distributed based on the vertical class proportion corresponding to each first-level vertical class;
and determining the verticality grade corresponding to the verticality, and determining the verticality vector corresponding to the verticality grade.
In some embodiments, the first obtaining module is further configured to:
acquiring the N published contents which are recently published by the identifier to be distributed;
determining respective video content vectors for respective video content of the N published content;
determining each text content vector of each of the graphics content of the N published content.
In some embodiments, the first obtaining module is further configured to:
analyzing the video content to obtain a plurality of video frame images corresponding to the video content;
extracting a plurality of video frame images to obtain a plurality of extracted target video frames;
extracting each frame image feature of each target video frame, and performing feature fusion on each frame image feature to obtain a video content vector of the video content.
In some embodiments, the first obtaining module is further configured to:
acquiring each text content in each image-text content, and deleting format information in each text content to obtain each processed text;
and extracting semantic features of the processed text to obtain the text content vectors.
In some embodiments, the first determining module is further configured to:
determining an interaction behavior sequence based on the interaction information;
constructing a weighted directed graph of identifiers to be distributed based on the interaction behavior sequence, the feature vector and the content vector, wherein the weight of a directed edge in the weighted directed graph is determined by the feature vector and the content vector;
and carrying out random walk on the weighted directed graph to obtain the identification vector of the identification to be distributed.
In some embodiments, the recall module is further configured to:
obtaining each identification grade of each identification to be distributed, and determining a first reference identification from the plurality of identifications to be distributed based on each identification grade;
and recalling at least one target distribution identifier with the similarity greater than a similarity threshold value from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring a plurality of published contents under the target distribution identifier and acquiring each piece of interaction information of each published content;
a third determining module, configured to determine, based on the respective interaction information, respective forward feedback rates for the respective published contents;
and the fourth determining module is used for determining target content from the plurality of released contents based on each forward feedback rate and storing the target content into a preset resource pool.
In some embodiments, when the ordering objective is registration time, the content distribution module is further configured to:
sequencing the at least one target distribution identifier according to the sequence from near to far from the current moment based on the registration time to obtain a sequencing result;
acquiring target distribution objects corresponding to the target distribution identifications;
and distributing the content to the target distribution objects corresponding to the target distribution identifications based on the sorting sequence.
In some embodiments, the content distribution module is further configured to:
acquiring a first identifier concerning the target distribution identifier;
acquiring other identifiers to be distributed, wherein the similarity between the identifiers and the target distribution identifier is greater than a similarity threshold value;
acquiring a second identifier concerning the other identifiers to be distributed;
and determining the first identifier and the second identifier as a target distribution object corresponding to the target distribution identifier.
An embodiment of the present application provides a content distribution apparatus, including:
a memory for storing executable instructions;
and the processor is used for realizing the content distribution method provided by the embodiment of the application when executing the executable instructions stored in the memory.
The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for distributing content provided by the embodiment of the application.
The embodiment of the application has the following beneficial effects:
when content distribution is carried out on an identifier to be distributed in a content distribution system, firstly, a feature vector of the identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content are obtained; determining an identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information; determining the similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed; recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed; finally, sequencing the at least one target distribution identification by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identification based on the sequencing result; therefore, the identification vector is constructed through the characteristic vector, the content vector and the interaction information, and the recall is carried out through calculating the similarity of the identification vector, so that the content can be distributed in time under the condition that the characteristic information of the content sediment is small during the cold start of the potential high-quality distribution account, the content distribution cold start speed of the newly registered identification to be distributed can be increased, and the retention rate of the newly registered identification to be distributed is increased.
Drawings
Fig. 1A is a schematic network architecture diagram of a content distribution system provided in an embodiment of the present application;
fig. 1B is a schematic structural diagram of the distributed system 201 applied to the blockchain system according to the embodiment of the present application;
FIG. 1C is a Block Structure (Block Structure) diagram according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a server 400 provided in an embodiment of the present application;
fig. 3 is a schematic flow chart of an implementation of a content distribution method provided by an embodiment of the present application;
fig. 4 is a schematic flow chart of another implementation of the content distribution method provided in the embodiment of the present application;
fig. 5 is a schematic flow chart of still another implementation of the content distribution method according to the embodiment of the present application;
fig. 6 is a schematic structural diagram of a content distribution system provided in an embodiment of the present application;
fig. 7 is an implementation process for determining a video content vector according to an embodiment of the present application;
fig. 8 is a schematic flow chart illustrating an implementation of determining an identification vector according to an embodiment of the present application;
fig. 9 is a schematic diagram of fusing each feature to construct an identification vector according to an embodiment of the present application;
fig. 10 is a schematic diagram illustrating an example of recalling similar account numbers by using an identification vector according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) The article may be an article which is actively edited and published after a public number is opened from the media and is recommended to a user for reading, and may include a video or a picture.
2) And the video recommended by the recommendation platform to the user for reading comprises a vertical version of small video and a horizontal version of short video, and is provided in the form of a Feeds stream.
3) MCN (Multi-Channel Network): the method is a product form of a multi-channel network, combines PGC (product content control) contents, and ensures continuous output of the contents under the powerful support of capital, thereby finally realizing stable business achievement.
4) Professional Generated Content (PGC), an internet term, is used to generally refer to Content personalization, view diversification, social relationship virtualization. Also known as PPC (professional-produced Content).
5) The Multi-Channel Network (MCN) combines PGC contents, and under the powerful support of capital, the continuous output of the contents is guaranteed, so that the stable business reappearance is finally realized.
6) User Generated Content (UGC) is emerging with the web2.0 concept advocating personalization as a key feature. It is not a specific service, but a new way for users to use the internet, namely, downloading and uploading are repeated from the original downloading to the main transformer.
7) Message sources (Feeds), in turn, translated into Feeds, information Feeds, summaries, sources, news subscriptions, web Feeds (web Feeds, news Feeds, synchronized Feeds) are a format of data through which websites disseminate up-to-date information to users, usually arranged in a Timeline (Timeline) fashion, the most primitive and basic presentation form of Feeds. A prerequisite for a user to be able to subscribe to a website is that the website provides a source of messages. Converging fed at one place is called aggregation, and a software for aggregation is called an aggregator (aggregator).
8) Graph Embedding (Graph Embedding) represents nodes in a network with low-dimensional, dense, real-valued vectors. Embedding mathematically represents a mapping relationship, F: X- > Y, i.e., a function. The function has two properties: injective and structure-preserving. Injective, the so-called Injective function, has only a unique X correspondence for each Y, and vice versa; structure-preserving, e.g. X1< X2 in the space X belongs to, then Y1< Y2 in the space Y belongs to after mapping. In deep learning, embedding refers to representing an entity by a low-dimensional vector, which may be a Word (Word 2 Vec), an Item (Item 2 Vec), or a node (Graph Embedding) in a network relationship.
In order to better understand the content distribution method provided by the embodiment of the present application, an implementation manner and existing disadvantages of the related art that processes multimedia data to implement multimedia data distribution will be described first.
The inclusion of premium content from media content producers behind them becomes the subject of these platforms to follow each other, which attracts a large number of users, attracts the attention of users, and brings huge traffic, thereby creating huge commercial value. The self-media content producer mainly comprises two large-direction original account numbers and carrying. With higher and higher user demands and requirements, the platform is more and more expected to appear from originators and from high-quality works. Only if the copyright environment and the content distribution environment of the high-quality author become better, more people can be attracted to create the high-quality content, further the positive cycle of content ecology is formed, the continuous high-quality content supply is provided, and the long-term healthy development of the platform is facilitated. Original and high-quality authors all need to have a process of growing and cold starting, including their accumulation of vermicelli, not in clusters. For the self-media platform, how the account accelerates the cold start and the recommendation system are effectively combined, the current main scheme is that an operator configures a white list mechanism for account cultivation, and the account is rated for traffic to try cold start or is recalled through characteristic information, but the potential account content is not started. The convergence rate of the cold start degree of the potential account is very low, so that the ecological construction of the auxiliary account is not facilitated, the precipitation characteristic information is collected through content exploration in the prior art, the efficiency of recalling the content through the characteristic information is very low, the account is lost, the growth is slow, and finally the retention rate of the account is reduced, so that the ecological construction of the account is not facilitated.
The embodiment of the application provides a content distribution method, a content distribution device, content distribution equipment and a computer-readable storage medium, and the distribution and cold start efficiency of an account can be accelerated by a method of closely combining a self-media potential account and a recommendation system. An exemplary application of the content distribution device provided in the embodiment of the present application is described below, and the device provided in the embodiment of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), an intelligent voice interaction device, an intelligent home appliance, and a vehicle-mounted terminal, and may also be implemented as a server. In the following, an exemplary application will be explained when the device is implemented as a server.
Referring to fig. 1A, fig. 1A is a schematic diagram of a network architecture of a content distribution system 100 provided in an embodiment of the present application, where the network architecture includes a content production terminal 200, a network 300, a server 400, and a content consumption terminal 500, where the content production terminal 200 and the content consumption terminal 500 are respectively connected to the server 200 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
A user registers a distribution account through the content production terminal 200, and determines a content to be distributed, where the content to be distributed may be multimedia data, the multimedia data may be a short video or a small video recorded by the content production terminal 200, or an article edited by the content production terminal, or may be an image-text content or a video obtained by local storage, and then the content production terminal 200 responds to a data distribution instruction, and sends the content to be distributed to the server 400 through the network 300, and after receiving the content to be distributed, the server 400 stores the content to be distributed to a multimedia database based on a distribution identifier, and at this time, the distribution identifier may be determined as the distribution identifier. In this embodiment of the application, the server 400 determines the identifier vector of the identifier to be distributed by using the feature vector of the identifier to be distributed, based on the content vector of the published content and the interaction information of the published content, determines the similarity between different identifiers to be distributed, recalls the target distribution identifier from the multiple identifiers to be distributed according to the similarity between different identifiers to be distributed, and further performs content distribution on the target distribution identifier, that is, sends the content to be published to the content consuming end 500.
In some embodiments, the server 400 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The content production terminal 200 and the content consumption terminal may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart television, a smart car-mounted terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.
The content distribution system related to the embodiment of the application can be a distributed system formed by connecting a client, a plurality of nodes (any form of computing devices in an access network, such as servers and user terminals) through a network communication form.
Taking a distributed system as an example of a blockchain system, referring To fig. 1B, fig. 1B is a schematic structural diagram of a distributed system 201 applied To a blockchain system provided in this embodiment of the present application, and is formed by a plurality of nodes 202 (computing devices in any form in an access network, such as servers and user terminals) and a client 203, where a Peer-To-Peer (P2P) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.
It should be noted that, in the distributed system 201, the node 202 may be a terminal or a server.
Referring to the functions of each node in the blockchain system shown in fig. 1B, the functions involved include:
1) Routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) The application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1 Wallet) for providing functions of conducting transactions of electronic money, including initiating transactions (i.e. sending transaction records of current transactions to other nodes in the blockchain system, and storing the record data of the transactions in temporary blocks of the blockchain as a response for acknowledging that the transactions are valid after the other nodes are successfully verified; of course, the wallet also supports querying for remaining electronic money in the electronic money address;
2.2 Shared account book) is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify that the record data are valid, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3 Smart contracts, computerized agreements) that can enforce the terms of a contract, implemented by code deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement code, e.g. querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the block chain comprises a series of blocks (Bloc k) which are connected with each other according to the generated chronological order, new blocks cannot be removed once being added into the block chain, and the blocks record the record data submitted by the nodes in the block chain system.
Referring to fig. 1C, fig. 1C is a Block Structure (Block Structure) diagram provided in this embodiment, each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash value to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 400 provided in an embodiment of the present application, where the server 400 shown in fig. 2 includes: at least one processor 410, memory 440, at least one network interface 420. The various components in server 400 are coupled together by a bus system 430. It is understood that the bus system 430 is used to enable connected communication between these components. The bus system 430 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are designated as bus system 430 in FIG. 2.
The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The memory 440 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 440 optionally includes one or more storage devices physically located remote from processor 410.
Memory 440 includes volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 440 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 440 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 441 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 442 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
in some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 2 shows a content distribution apparatus 443 stored in the memory 440, which may be software in the form of programs and plug-ins, and includes the following software modules: the first acquisition module 4431, the first determination module 4432, the second determination module 4433, the recall module 4434, and the content distribution module 4435 are logical and thus may be arbitrarily combined or further separated depending on the functions implemented.
The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the video dubbing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.
In order to better understand the method provided by the embodiment of the present application, artificial intelligence, each branch of artificial intelligence, and the application field related to the method provided by the embodiment of the present application are explained first.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The scheme provided by the embodiment of the application mainly relates to the machine learning technology of artificial intelligence, and the technology is explained below.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
The artificial intelligence cloud Service is also generally called AI as a Service (AIaaS). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the opening of an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface) interface, and part of the qualified developers can also use the AI framework and the AI infrastructure provided by the platform to deploy and operate and maintain the own dedicated cloud artificial intelligence services.
The content distribution method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the server provided by the embodiment of the present application.
Referring to fig. 3, fig. 3 is a schematic flow chart of an implementation of the content distribution method provided in the embodiment of the present application, and will be described with reference to the steps shown in fig. 3.
Step S101, obtaining a feature vector of an identifier to be distributed, a content vector of a content published under the identifier to be distributed and interaction information of the published content.
In the embodiment of the application, the identifier to be distributed can be a user account registered on the interactive platform, and the feature vector of the identifier to be distributed can be depicted by verticality information, label information and verticality of the content published by the user account. Therefore, when the feature vector is obtained, the feature vector can be obtained by obtaining the vertical vector, the label vector and the verticality vector of the published content within a period of time.
When the content vector of the content published under the identifier to be distributed is obtained, it may be that a plurality of contents recently published by the identifier to be distributed are obtained, and vectorization is performed on the plurality of contents recently published to obtain a corresponding content vector. The interactive information of the published content can be information of behavior such as paying attention to the identifier to be distributed, comments, praise, objection and the like of the published content by other accounts.
Step S102, determining the identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information.
When the step S102 is implemented, an interaction behavior sequence of a user account is determined based on the interaction information, a weighted directed graph of an identifier to be distributed is constructed based on the interaction behavior sequence, the feature vector and the content vector, wherein directed edges between nodes are determined through a concern relationship, weights of the directed edges in the weighted directed graph are determined through the feature vector and the content vector, and finally the weighted directed graph is randomly walked to obtain the identifier vector of the identifier to be distributed.
Step S103, determining the similarity between different identifications to be distributed based on the identification vectors of the identifications to be distributed.
Here, the "distance" between different identifiers to be distributed may be calculated by identifier vectors of different identifiers to be distributed, and may be, for example, a manhattan distance (L1 distance) or an euclidean distance (L2 distance) between different identifiers to be distributed. The greater the distance between two identification vectors, the lower the similarity, and the smaller the distance, the higher the similarity.
In some embodiments, cosine values between different identifiers to be distributed can also be calculated through identifier vectors of different identifiers to be distributed, and the closer the cosine value is to 1, the closer the included angle between two identifier vectors is to 0 degree, that is, the more similar the two identifier vectors are.
And step S104, recalling at least one target distribution identification from the plurality of identifications to be distributed based on the similarity between different identifications to be distributed.
When the step S104 is implemented, a part of the to-be-distributed identifiers with a higher identifier level may be determined from the multiple to-be-distributed identifiers based on the identifier level, the part of the to-be-distributed identifiers with the higher identifier level is used as reference identifiers, and then, based on the similarity between different account numbers, a target distribution identifier with a similarity greater than a similarity threshold value with each reference identifier is determined from the multiple to-be-distributed identifiers.
In some embodiments, after calculating the similarity between the identifiers to be distributed, the identifiers to be distributed with close similarity may be divided into a plurality of subsets based on the similarity between different identifiers to be distributed, then the comprehensive identifier levels of the subsets may be determined respectively, then S subsets achieved by the comprehensive identifier levels are determined from the plurality of subsets based on the comprehensive identifier levels of the subsets, the identifiers to be distributed in the S subsets are determined as target distribution identifiers, and the target distribution identifiers are recalled.
In implementation, when determining the comprehensive identification level of a certain subset, the arithmetic mean calculation may be performed on the identification levels of the identifications to be distributed in the subset. In some embodiments, considering that the initial level may not be high for a newly registered to-be-distributed identifier, when calculating the comprehensive identifier level of a certain subset, the identifier levels of the to-be-distributed identifiers in the subset may also be calculated by performing weighted average calculation based on the registration duration. The weight value of the mark to be distributed with long registration time is larger than that with short registration time.
Step S105, sequencing the at least one target distribution identifier by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identifier based on the sequencing result.
In the embodiment of the present application, the preset ranking target may be a click rate, a user duration, an attention rate, an approval rate, a registration time, and the like. After the sort target is set, at least one target distribution identifier can be sorted by using the sort target to obtain a sorting result, and a target distribution object of each target distribution identifier is determined. The target distribution object herein may be understood as a content consumer in other embodiments. When the distribution of the at least one target distribution identifier based on the sorting result is implemented, the content to be issued by each target distribution identifier may be sent to a target distribution object, that is, to a content consumption end based on the sorting result, so that a user of the content consumption end may perform interaction behaviors such as viewing, approval, comment, attention, and the like.
In this embodiment of the application, when the ordering target is the registration time, then step S105 may be performed according to the order of the registration time from near to far, so that the newly registered to-be-distributed identifiers may be ordered in front, and content distribution may be preferentially performed on the newly registered to-be-distributed identifiers, so that the cold start time of the newly registered account may be shortened, and the retention rate of the newly registered account may be improved.
In the embodiment of the application, when content distribution is performed on an identifier to be distributed in a content distribution system, firstly, a feature vector of the identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content are obtained; determining an identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information; then, based on the identification vectors of all the identifications to be distributed, the similarity between different identifications to be distributed is determined; recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed; finally, sequencing the at least one target distribution identification by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identification based on the sequencing result; therefore, the identification vector is constructed through the characteristic vector, the content vector and the interaction information, and recall is carried out through calculating the similarity of the identification vector, so that content distribution can be carried out in time under the condition that the characteristic information of content deposition is small during the cold start period of a potential high-quality distribution account, the content distribution cold start speed of the newly registered identification to be distributed can be increased, and the retention rate of the newly registered identification to be distributed is increased.
In some embodiments, the above-mentioned "acquiring the feature vector of the identifier to be distributed" in step S101 may be implemented by steps S1011 to S1014 described below, which are described below.
Step S1011, determining the vertical vector corresponding to the identifier to be distributed based on the vertical information of the N published contents most recently published by the identifier to be distributed.
The vertical information of the N published contents may refer to the first-level vertical information of the N published contents, and may also refer to the first-level vertical information and the second-level vertical information of the N published contents. In some embodiments, the vertical class information of the N pieces of published content may further include three-level vertical class information, if necessary.
After the vertical information of the N published contents is obtained, the vertical information can be vectorized, so that a vertical vector corresponding to the identifier to be distributed is obtained.
Step S1012, determining M content tags of the identifier to be distributed based on the content tags of the content issued by the identifier to be distributed within a preset time period, and vectorizing the M content tags to obtain M tag vectors.
Here, the preset time period may be a period of time before the current time, for example, one month before the current time, or 14 days before the current time, or 10 days before the current time. The published content has one or more content tags. When the step S1012 is implemented, first, content tags of each content issued within a preset time period are obtained, then, the number of each content tag is counted, the first M content tags with the largest number are determined based on the number of each content tag, and the first M content tags are determined as the M content tags of the identifier to be distributed. And vectorizing the M content tags, wherein in the implementation, the tag identifications of the M content tags can be obtained, and then the tag identifications are vectorized, so that M tag vectors are obtained. In the embodiment of the present application, M is a positive integer, and may be, for example, 20, 10, or the like.
And S1013, determining the verticality of the identifier to be distributed, and vectorizing the verticality to obtain a verticality vector of the identifier to be distributed.
The verticality is used for measuring the concentration degree of the type of the content issued by one account, for example, if all the contents issued by one identifier to be distributed are sports contents, and the other content issued by the identifier to be distributed is sports contents mixed with education contents, the verticality of the identifier to be distributed, which is all sports contents, is higher.
Step S1014, determining the vertical class vector, the M label vectors and the verticality vector as the feature vector of the identifier to be distributed.
In the above steps S1011 to S1014, a vertical vector is determined by obtaining the vertical information of the identifier to be distributed, a content tag vector is determined by the content tag of the published content within a certain time period, a verticality and a verticality vector reflecting the concentration degree of the published content are determined, and a characteristic vector is inscribed by a series of intuitively understandable display characteristic vectors, i.e., the vertical vector, the content tag vector, and the verticality vector.
In some embodiments, the step S10211 "determining the vertical class vector corresponding to the identifier to be distributed based on the vertical class information of the N published contents most recently published by the identifier to be distributed" may be implemented by:
step S111, obtaining N first-level vertical information of the N published contents which are recently published by the identifier to be distributed.
The last N published contents may be understood as the first N published contents whose publication time is closest to the current time in the to-be-distributed identified published contents. N is a positive integer, and in the embodiment of the present application, N may be 5 or 10, but may also be other values.
The vertical category refers to the vertical domain, an internet industry term, that provides specific services for a defined group. Users in the same vertical domain or in the same vertical category have similar needs, hobbies, and the like. In the embodiment of the present application, the first-level vertical information may refer to the largest-range classification information to which the published content belongs. The higher the level of the vertical information, the smaller and more accurate the scope to which it belongs.
Step S112, vectorizing the N pieces of first-level vertical information to obtain N pieces of first-level vertical vectors.
In this embodiment of the present application, the vertical class information may be implemented by a vertical class identifier, where the vertical class identifier may be a number or a letter, or a combination of the number or the letter, and when the step S112 is implemented, vectorization may be performed on the N primary vertical class identifiers, so as to obtain N primary vertical class vectors correspondingly.
Step S113, obtaining N pieces of secondary vertical information of the N pieces of published content, and vectorizing the N pieces of secondary vertical information to obtain N pieces of secondary vertical vectors.
The class is hung down to the second grade can be one of them branch of class is hung down to the first grade, for example, the class is hung down to the first grade can be science and technology, and the class is hung down to the second grade that can include under this class of the science and technology has smart mobile phone, intelligent house, artificial intelligence etc..
Similar to the first-level vertical information, the second-level vertical information may also refer to second-level vertical identifiers, N pieces of second-level vertical information are vectorized, and N pieces of second-level vertical identifiers are vectorized to obtain N second-level vertical vectors.
Step S114, determining the N primary vertical vectors and the N secondary vertical vectors as the vertical vectors corresponding to the identifier to be distributed.
In steps S111 to S114, the primary vertical class vector and the secondary vertical class vector of the N published contents most recently published by the identifier to be distributed are determined as the vertical class vector of the identifier to be distributed, so that the vertical class vector can reasonably reflect the category of the content of the document published by the issuing account. Of course, if the category of the identifier to be distributed needs to be more accurately and finely reflected, the first-level vertical class vector, the second-level vertical class vector, and the third-level vertical class vector of the N pieces of published content most recently published by the identifier to be distributed may also be determined as the vertical class vector of the identifier to be distributed.
In some embodiments, in the step S1013, "determining the verticality of the identifier to be distributed, and vectorizing the verticality to obtain a verticality vector of the identifier to be distributed" may be implemented by:
step S131, based on the information of each level vertical class of the content issued in the preset time period, determining the content number corresponding to each level vertical class.
In this step, information of each primary vertical class of the content to be distributed and identified in the past month may be obtained, so as to determine the number of content corresponding to each primary vertical class.
For example, a certain to-be-distributed identifier has released 50 contents in the past month, and the first-level vertical classes of the 50 contents have three types: education, sports and entertainment, where education has 30 contents, sports has 10 contents, and entertainment has 10 contents.
Step S132, determining the vertical category proportion corresponding to each primary vertical category based on the content number corresponding to each primary vertical category and the total content number of the released content in the preset time period.
During implementation, the content number corresponding to each first-level vertical class is divided by the total content number to obtain the vertical class proportion corresponding to each first-level vertical class. In accordance with the above example, the ratio of the verticals of education is 0.6, the ratio of the verticals of sports is 0.2, and the ratio of the verticals of entertainment is 0.2.
And S133, determining the verticality of the mark to be distributed based on the vertical class proportion corresponding to each primary vertical class.
In practical implementation, the verticality of the identifier to be distributed can be determined by the formula (2-1):
Figure BDA0003452855070000201
wherein i is the ith letter vertical class, n is the total number of letter vertical classes, P i Is the vertical class ratio of the ith vertical class.
Taking the above example as a reference, n is 3, and the verticality of the mark to be distributed is determined to be 0.412 by the formula (2-1).
And S134, determining the verticality grade corresponding to the verticality, and determining the verticality vector corresponding to the verticality grade.
The perpendicularity determined by the formula (2-1) is a value between 0 and 1, and a lower perpendicularity indicates a higher concentration of the distributed contents. In the embodiment of the present application, the verticality may be divided into 10 levels, and the lower the level, the lower the verticality, and the higher the concentration of the published content. )
For example, [0,0.1) is the first grade, [0.1,0.2) is the second grade, [0.2,0.3) is the third grade, …, [0.9,1] is the tenth grade. Taking the above example as an example, the verticality 0.412 is a fifth level, and in this step, vectorization may be performed on the level 5 to determine the verticality vector of the identifier to be distributed.
Through the steps S131 to S134, the verticality vector that can reflect the concentration degree of the published content can be determined through the first-level verticality of the published content in a certain time period of the identifier to be distributed, so that a necessary data basis is provided for characterizing the distribution identifier.
In some embodiments, the step S101 of "obtaining the content vector of the content released under the identifier to be distributed" may be implemented by the following steps S1015 to S1017, and the following steps are explained below.
Step S1015, the N published contents that are recently published by the identifier to be distributed are obtained.
The N published contents may be video contents or teletext contents.
Step S1016, determining each video content vector of each video content of the N published contents;
when the step S1016 is implemented, firstly, the video content is analyzed to obtain a plurality of video frame images corresponding to the video content, and then the plurality of video frame images are subjected to frame extraction to obtain a plurality of extracted target video frames; and then extracting the image characteristics of each frame of each target video frame, and performing characteristic fusion on the image characteristics of each frame to obtain a video content vector of the video content.
When feature fusion is performed on the image features of each frame, feature fusion may be performed by average pooling, or feature fusion may be performed by a NetVlad model or Youtub 8M-nextvad model.
Step S1017, determining each text content vector of each image-text content in the N published contents.
When the step S1017 is realized, firstly, each text content in each image-text content is obtained, and format information in each text content is deleted to obtain each processed text; and then extracting the semantic features of the processed text of each text to obtain the content vector of each text. The text content may be a text portion obtained by removing a title and a picture from the text content, and format information in the text content is deleted, and when the text content is implemented, formatted style texts in the text content, such as various HyperText Markup Language (HTML) tags, may be removed, and the text content is processed through a decoder (BERT) model of a Bidirectional converter to extract semantic features of the text, that is, text character strings are converted into a vector.
In some embodiments, the step S102 "determining the identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information" shown in fig. 3 may be implemented by:
step S1021, determining an interaction behavior sequence based on the interaction information.
The sequence of the interaction behaviors can reflect the sequence of the interaction behaviors, for example, the sequence of the behaviors of the user U1 may be that the content A1 of the account a is watched first, then the content B1 of the account B is watched, the content A2 of the account a is complied with, and finally the content C1 of the account C is commented.
Step S1022, constructing a weighted directed graph of the to-be-distributed identifiers based on the interaction behavior sequence, the feature vector, and the content vector.
The nodes of the weighted directed graph are accounts, the weight of the directed edges in the weighted directed graph is determined by the feature vectors and the content vectors, and the directed edges of the weighted directed graph are determined based on the attention relationship.
And step S1023, performing random walk on the weighted directed graph to obtain the identification vector of the identification to be distributed.
During implementation, ordered nodes can be sampled from the weighted directed graph by adopting random walk (Randomwalk), so that interaction behaviors are converted into ordered nodes for learning, and a Skip-gram neural language model is applied to the random walk to obtain an identification vector of an identification to be distributed.
In the embodiment of the foregoing steps S1021 to S1023, the identification vector of the identifier to be distributed is constructed by combining the interactive behavior information (the attention behavior — the attention from the media account, the attention cancellation and the content consumption sequence behavior, and the like) and the content vector implicit vector embedding (including the embedding of the text content of the image and text, if the video content is published, the embedding of the video content is used by the video content) of the published content itself, so as to apply the identification vector to the subsequent recall and recommendation processes, thereby increasing the speed of cold start of content distribution of the novice from the media author, reducing the drain speed of the novice account, and assisting the discovery and operation of the novice potential account by a human.
In some embodiments, as shown in fig. 4, step S104 "recalling at least one target distribution identifier from the multiple identifiers to be distributed based on the similarity between the different identifiers to be distributed" shown in fig. 3 may be implemented by steps S1041 to S1042 described below, which is described below with reference to fig. 4.
Step S1041, obtaining each identifier level of each identifier to be distributed, and determining a first reference identifier from the plurality of identifiers to be distributed based on each identifier level.
Step S1042, based on the similarity between different identifiers to be distributed, recalling at least one target distribution identifier, whose similarity with the first reference identifier is greater than the similarity threshold, from the multiple identifiers to be distributed.
Thus, for some newly registered accounts, even if the initial level of the newly registered account is not high, if the similarity between the newly registered account and the identifier to be distributed with the high identifier level is high, the newly registered account may be a potential good account, and at this time, the account is recalled and distributed, so that the recall rate of the newly registered account can be improved.
In some embodiments, as shown in fig. 4, after step S104, or after step S105, the following steps may also be performed:
step S106, a plurality of published contents under the target distribution identification are obtained, and each piece of interaction information of each published content is obtained.
When the step S106 is implemented, the multiple published contents of the target distribution identifier in the preset time period may be obtained, for example, the multiple published contents in the past month may be obtained, or all published contents under the target distribution identifier may be obtained.
The interactive information of the published content may be a approval rate, a comment rate, an attention rate, etc. for the published content.
Step S107, determining respective forward feedback rates for respective published contents based on the respective interaction information.
In the embodiment of the present application, determining the forward feedback rate for the published content may be determining an approval rate, determining an attention rate, or determining a sum of the approval rate and the attention rate. In the embodiment of the present application, a forward feedback rate is taken as an example of the sum of the approval rate and the attention rate.
Step S108, based on each forward feedback rate, determining target content from the plurality of released contents, and storing the target content to a preset resource pool.
In the implementation of step S108, after determining each forward feedback rate of each published content, the first P contents with the highest forward feedback rate and larger than the forward feedback rate threshold are determined, and then it is further determined whether the first P contents are stored in the preset resource pool, and the contents that do not exist in the first P contents and the preset resource pool are determined as target contents, and the target contents are stored in the preset resource pool.
In the embodiment of the present application, the published content with the forward feedback rate greater than the forward feedback rate threshold, that is, the resource pool that can be regarded as the high-quality content, is stored in the preset resource pool, and through the steps S105 to S108, the high-quality content can be screened from the target distribution identifier and added to the resource pool, so that the content number in the resource pool storing the high-quality content can be expanded, and the supply of the high-quality content is increased.
In some embodiments, when, in order to accelerate the cold start of the newly registered account, a sorting target may be set as the registration time, and at this time, in step S105, "sort the at least one target distribution identifier by using a preset sorting target, obtain a sorting result, and distribute the content to the at least one target distribution identifier based on the sorting result," the method may be implemented by:
step S1051, based on the registration time, sorting the at least one target distribution identifier according to a sequence from near to far from the current time, and obtaining a sorting result.
Based on the sequence of the registration time from near to far from the current time, the time interval between the registration time and the current time may also be in the sequence from small to large. For example, if three target distribution identifiers are an account a, an account B, and an account C, respectively, and the registration times are 10/1/2020/10, 5/8/2020/3/5/2021/3/5, respectively, the identifiers are sorted from near to far from the current time, and the obtained sorting results are the account C, the account a, and the account B.
Step S1052, acquiring target distribution objects corresponding to the target distribution identifiers.
Because some new account numbers with short registration time may exist in the target distribution identifier, and for the new account numbers, the feature information of the content distribution precipitate is relatively less, and the target distribution object cannot be accurately found, in this embodiment of the application, when the step S1052 is implemented, the target distribution object may be determined in an auxiliary manner by using a similar account number.
In some embodiments, when step S1052 is implemented, a first identifier focusing on the target distribution identifier may be first obtained, then other identifiers to be distributed, of which the similarity with the target distribution identifier is greater than a similarity threshold, are obtained, a second identifier focusing on the other identifiers to be distributed is obtained, and finally, the first identifier and the second identifier are determined as target distribution objects corresponding to the target distribution identifier.
Since the first identifier concerning the target distribution identifier is definitely the account number interested in the target distribution identifier, in the embodiment of the present application, the similar account number of the target distribution identifier may also be determined by the identifier vector, the second identifier concerning the similar account number is obtained, and then both the second identifier and the first identifier are determined as the target distribution object, so that the number of the target distribution objects can be increased, and the accuracy of the target distribution object can also be ensured.
Step S1053, distributing the content to the target distribution objects corresponding to the target distribution identifiers based on the sorting order.
When the step S1053 is implemented, the server may invoke the content export service, obtain the content to be distributed corresponding to the target distribution identifier from the content database, and send the content to be distributed to the target distribution object, thereby completing content distribution.
In steps S1051 to S1053, the sort target is determined as the registration time, and the account of the new account with shorter registration time is ranked ahead, and the account of the new account with longer registration time is ranked behind, so that the cold start time of the new account can be shortened.
Based on the foregoing embodiments, a content distribution method is further provided in an embodiment of the present application, and is applied to the network architecture shown in fig. 1A, fig. 5 is a schematic diagram of a further implementation flow of the content distribution method provided in the embodiment of the present application, and as shown in fig. 5, the flow includes:
step S501, the content production terminal obtains the content to be released.
Here, the content to be distributed may be a video recorded by the user through the content production terminal, an edited article of the public account, a locally stored video, or the like.
In step S502, the content production terminal sends a content distribution request to the server in response to the operation instruction for content distribution.
When the content publishing request is realized, the content to be published is carried in the content publishing request, and a publishing account corresponding to the content production terminal can also be carried in the content publishing request.
Step S503, after receiving the content publishing request, the server obtains the content to be published and the publishing account.
Step S504, the server obtains the meta-information of the content to be released, and stores the meta-information and the content to be released to a content database.
Here, the meta information is information about information, and may also be considered as attribute information of the content to be distributed, and may be, for example, a size of the content to be distributed, a cover page link, a title, a distribution time, an account author, a source channel, and the like.
Step S505, the server determines the issuing account as an identifier to be distributed, and obtains a feature vector of the identifier to be distributed.
In the embodiment of the application, the feature vector can be determined by the vertical class vector, the tag vector and the vertical vector of the account, that is, the feature vector can reflect the concentration degree of the category, the tag and the release content of the identifier to be distributed.
Step S506, the server obtains the interactive information reported by the content consumption terminal.
The interaction information may include attention behavior information, praise behavior information, comment behavior information, and the like.
Step S507, the server invokes a content vector service to determine a content vector corresponding to the identifier to be distributed.
Step S508, the server determines the identifier vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information.
In step S509, the server determines similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed.
In implementation, the distance between different to-be-distributed identifiers can be calculated through the identifier vectors, the similarity between different to-be-distributed identifiers is determined through the distance, or the similarity between different to-be-distributed identifiers is determined by calculating a cosine value between two identifier vectors.
Step S510, the server recalls at least one target distribution identifier from the multiple identifiers to be distributed based on the similarity between different identifiers to be distributed.
Step S511, the server sorts the at least one target distribution identifier according to a sequence from near to far from the current time based on the registration time, so as to obtain a sorting result.
Step S512, the server acquires the target distribution object corresponding to each target distribution identification.
Step S513, the server performs content distribution to the target distribution objects corresponding to the target distribution identifiers based on the sorting order.
It should be noted that, for steps or technical terms of the embodiment of the present application, which are the same as those of the other embodiments, please refer to the description and implementation processes of the other embodiments.
In some embodiments, the content consumption terminal may click to check after receiving the content issued by the target distribution identifier, and may perform interactive operations such as comment, concern, like, forward, complain, report, and the like after checking.
In the content distribution method provided in the implementation of the application, after a content production terminal generates content to be distributed and sends the content to be distributed to a server, the server stores the content to be distributed to a content database, the server determines a distribution account corresponding to the content to be distributed as an identifier to be distributed, acquires a feature vector, a content vector and interactive behavior information of the identifier to be distributed, comprehensively constructs an identifier vector through the interactive behavior information, the feature vector and the content vector, and recalls a target distribution identifier by calculating a phase velocity among different identifiers to be distributed.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
An embodiment of the present application provides a content distribution method, which is applied to a system structure shown in fig. 6, and as shown in fig. 6, the system includes: the system comprises a content production end 601, an uplink and downlink content interface server 602, a content consumption end 603, a content database 604, a scheduling center service 605, a manual review system 606, a duplication elimination service 607, a statistical reporting interface server 608, a content storage service 609, a content vector service 610, a statistical database 611, a self-media account embedding model and service 612, a self-media account vector recall service 613, a recommendation recall system 614, a recommendation ranking service 615 and a content distribution export service 616, wherein the core part comprises the self-media account embedding model and service 612 and the self-media account vector recall service 613. The self-media account embedding model and service 612 is linked to the recommendation recall system 614 through a self-media account vector recall service 613.
The functions of the service modules in the content distribution system and the content distribution process are described below with reference to fig. 6.
The content producer of PGC or UGC, MCN or PUGC provides the graphic content or the uploaded video content including short video and small video provided by the local or web publishing system through the content producing end 601 (mobile end or backend interface API system), which are the main content sources for distributing the content. The content production end 601 firstly obtains the interface address of the uploading server through the communication with the uplink and downlink content interface server 602, and then releases the content.
The uplink and downlink content interface server 602 communicates with the content production end 601 directly, obtains the title, the publisher, the abstract, the cover page, the publishing time and the like of the content after receiving the content submitted by the content production end 601, and stores the content in the content database 604. The uplink and downlink content interface server 602 sets a preliminary review identification level of the account through configuration of operation according to the account source of the publisher, and can mark a part of high-quality accounts during implementation, which is mainly closely related to an operation policy, or can grade the account through an independent model. In the embodiment of the present application, an implementation manner of setting a preliminary review identification level for an account is not limited.
The uplink and downlink content interface server 602 reports the messaging flow information of each account, including the messaging time and the content type, to the statistical report interface server 608, and also stores the content marking information provided by the media, such as classification, label, selected cover page and title, as the extension information in the content database.
The content consuming side 603, as a consumer, communicates with the uplink and downlink content interface server 602 to obtain index information of accessed content, and then communicates with the uplink and downlink content interface server 602 and the content export service 616 to directly consume content, wherein the consumption premise obtains the index of the content, i.e. the entry address of content access, through Feeds recommendation and distribution, and the content consuming side includes Feeds and a user clicking behavior and environment reporting module, which is used for collecting the current network environment of the user, the user clicking operation behavior of the user on Feeds intermediate information and the exposure data of the Feeds content, and reporting to the statistical reporting interface server 608; if the video content reports that the video is played for too long, the video content is cached for too long, and various interactive behaviors of the content such as comment, forwarding, sharing, collection, praise, report and the like are performed.
A content database 604, which is a core database of content, in which meta-information of all content released by a producer is stored, the meta-information of the content may include size, cover map link, title, release time, account number author, source channel, and storage time, and may also include classification of the content during manual review (including first, second, and third-level classification and tag information, such as an article explaining a brand a mobile phone, the first-level classification is science and technology, the second-level classification is a smart phone, the third-level classification is a domestic mobile phone, the tag information is a brand a, and the model is MX); during the manual review, the information in the content database 604 is read, and the result and status of the manual review are also returned to the content database 604 for storage.
The scheduling center service 605 is responsible for the whole scheduling process of the content flow, receives the content stored in the content database 604 through the uplink and downlink content interface server 602, then obtains the meta-information of the content from the content database 604, and can also schedule the duplication elimination server to mark and filter the content repeatedly put in storage; for contents which cannot be processed by the machine, such as those related to sensitivity and requiring manual review for security problems, a manual review system is called to perform manual review processing.
The manual review system 606, which needs to read the original information of the video content itself in the content database 604, is usually a system developed based on a network (web) database with complex business, and mainly ensures that the pushed content conforms to the access permitted by local laws and policies; the manually checked content comes from active release of a media account and supplement acquired by a web crawler from a public network; the results of the manual review are finally written to the content database 604 by the dispatch center service 605.
In the whole business process, the processing of the content includes machine processing and manual review processing, and the deduplication service 607 belongs to a machine processing process. The deduplication service 607 communicates with the dispatch center service 605. The duplicate removal service comprises title duplicate removal, picture duplicate removal of a cover picture, content text duplicate removal and video fingerprint and audio fingerprint duplicate removal. In implementation, the title and the text of the text content are usually vectorized, similar hash (hammhash) and BERT text vector and picture vector are used for deduplication, video fingerprints and audio fingerprints are extracted for the video content to construct vectors, and then distances between the vectors, such as euclidean distances, are calculated to determine whether to repeat the vectors.
The statistics report interface server 608 receives the current network environment of the content consumption end 603 user, the click operation behavior of the user on the Feeds intermediate information, and the report of the exposure data of the Feeds article, writes the reported statistics data result into the statistics database 611, and receives the original stream of the account text reported by the uplink and downlink content interface server 602.
The content storage service 609 is configured to store a content source file uploaded by the uplink and downlink content interface server 602, and transmit the content source file to the content vector service 610.
The content vector service 610 constructs a content vector based on the image-text content and the video content issued by the account number, and the content vector is used as an input dimension of the final identification vector, and the implicit content vector characteristics are mainly used, so that the final vector has better representation capability.
The statistical database 611 receives and reports statistical data of the content consumption terminal 603, provides data support for subsequent statistical analysis and mining, communicates with the self-media account embedded model and the service 612, and provides corresponding information and data; and receiving the message flow report of the content production end 601, and counting the tag information of Top20 of the content released from the media account.
The self-media account embedding model and service 612 jointly constructs a self-media identification vector model through self-media account characteristics (through vertical classes, labels and vertical degree depicting), user and account content consumption behaviors and content vector embedding characteristics released by self-media, obtains statistical data needed for constructing the model from a statistical database 611, services the model, and each account can be represented through an identification vector of a corresponding account obtained through the service, wherein the identification vector represents the similarity of subsequent identification vector recalls and measurement accounts.
The self-media identification vector recall service 613 is engineered to retrieve similar identification vectors by self-media identification vectors, and the recall method can be considered similar by calculating the similarity of the vectors, usually by metric learning, calculating the distance of the vectors, such as cosine similarity, and satisfying a certain threshold. And a recall subsystem service of the recommendation system, providing results recalled individually from the media account path.
A recommendation recall system 614 for implementing various content recall algorithms in the recommendation system, such as collaborative recall, classification, topic recall, user historical behavior, user long and short term interest point recall, and the like; in the embodiment of the present application, the self-media vector recall service 613 is mainly used for communication, and the self-media vector recall is used as a single recall, for example, to implement the migration of fans and the cold start of similar accounts, and to assist the growth of potential accounts of novices.
A recommendation sorting service 615, which generally includes a rough sorting and a fine sorting, takes the result of the recommendation recall system 614 as an input, and scores the recalled content through click rate estimation according to a recommended core target, such as click rate and user duration, which is taken as an optimization target in combination with user context environment information; and sorting according to the results of scoring calculation, combining with a rule strategy of a certain service, and finally outputting the result of content recommendation and distribution to the content consumption end through the content distribution export service 616.
The content distribution export service 616 communicates with the recommendation sorting service 615 to obtain a result of the recommendation sorting, and based on the result of the recommendation sorting, issues the recommendation content to the content consumption end 603 and displays the recommendation content in the Feed s list of the user; the content distribution export service 616 is typically a set of access services that are geographically proximally deployed near the user.
The following describes implementation processes for the content vector service 610 to determine content vectors and from the media account embedding model and service 612 to determine embedded (embedding) vectors (hereinafter referred to as identification vectors) from the media account.
When the content is video content, the video content vector has two layers of meanings:
layer 1 meaning, meaning learning (rendering), low-dimensional dense features, one-dimensional arrays (e.g., video embedding is 128 float-type values);
layer 2 meaning, metric learning (metric learning), a vector of similarity metrics, the "distance" of two vectors representing the "similarity" of two objects. "
Fig. 7 is an implementation process of determining a video content vector according to an embodiment of the present application, and as shown in fig. 7, the implementation process includes:
in step S701, a video is input.
Step S702, frame extraction is performed through a Time Slot Network (TSN).
In step S703, image features are extracted.
When implemented, image features may be extracted by Xception.
And step S704, fusing the multi-frame features to obtain a content vector of the video.
In implementation, image feature vectors obtained from the middle layer of the network game model of Youtub8M-NeXtVLad may be used, and then the content vectors of the video may be obtained by adding and averaging the image feature vectors.
When the content is teletext content, a vector of content may be generated using BERT pre-training. For the text content of the image and text, formatting style texts such as various HTML labels in the text are removed, the text is subjected to BERT to extract semantic features of the text, namely, a text character string is converted into a vector, and the vector of the second last layer of the BERT is extracted to serve as a text representation vector. The BERT pre-training model improves the benchmark performance of the NLP task by a large margin by using 12-layer transformer encoder. Compared with wo rd2vec, the BERT pre-trained by massive texts can introduce more migration knowledge in text feature vectorization representation, and provides more accurate text features.
In the embodiment of the application, the identification vector is constructed by the characteristics of the media account (described by vertical classes, labels and vertical drawing) and the content consumption behaviors of the user and the account and the content vector issued by the media. Further, identifying the pre-vector features that the vector needs to rely on includes: the account number recently issues top5 content vectors, account number content first-level vertical vectors, account number content second-level vertical vectors, account number issued content labels vectors, account number vertical vectors, account number ID vectors and user behaviors such as attention and content consumption behaviors. The identification vector can be obtained by weighting the characteristic after various dimension vectors of the account.
Account verticality (H) profession ) The method is used for measuring the stability of the content class sent from the media account. Account verticality statistics, which is a main starting point of accounts with tendency to consider accounts with relatively concentrated vertical aspect of the text in the account to have relatively high authoring concentration and relatively high potential. And part of transport account numbers with disordered categories which are randomly transported and plagiarized can be filtered by using the verticality. The account verticality can be realized by the formula (2-1):
Figure BDA0003452855070000321
wherein i is the ith letter vertical class, n is the total number of letter vertical classes, P i Is the proportion of the ith vertical type message, P i In the calculation, the content of the text in one month of an account can be obtained, and counting the ratio of the vertical classes of the sent texts to all the vertical classes of the sent texts by utilizing the first-level vertical class result of the sent text contents.
The interaction relationship between the consuming user and the self-media account and the published content includes but is not limited to: focusing on self-media (i.e., behavior of a self-media account by Follow); the self-media account corresponding to the consumption behavior of the user, such as browsing, playing, commenting, collecting, forwarding, sharing, praising and the like, is one-time interaction between the user and the account content. The accounts are aggregations of the content, and the similarity of the accounts is not only similarity of the content, but also similarity of the accounts includes similarity of consumption behaviors of the users. The similar account numbers have a large number of users to pay attention to/click together, so that an inter-account number weighted directed graph can be constructed by utilizing the attention/click relation among the account numbers, wherein the node represents the account number, the weight represents the number of the users, account number prior characteristics including category, label and content label are added, and interaction operation similarity and prior characteristic similarity are fused in a model layer.
A user can pay attention to a plurality of different self-media authors, one self-media author can be paid attention to a plurality of different users, and attention relations among accounts can form a Graph (Graph) which comprises nodes and edges. Similarly, for consumption behaviors (such as browsing, playing, commenting, collecting, forwarding, sharing and agreeing) of the user and the self-media account, the weight degrees of the consumption behaviors are different, so that a Graph with a weight is constructed in the consumption behaviors, then the sampling is performed on the basis of the edge weight (weighted walk), the sampling is made to move towards the hot node direction as much as possible, and the confidence of the sampled sample is higher.
Fig. 8 is a schematic diagram of an implementation process for determining an identification vector according to an embodiment of the present application, and as shown in fig. 8, the process may be implemented by the following steps:
step S801, a user attention/click sequence is acquired.
And step S802, constructing an account weighting directed graph based on the user attention/click sequence.
Step S803, determining a random walk sequence based on the account weighting directed graph.
And step S804, determining an identification vector by using the account number prior characteristics, the content vector and the random walk sequence.
The account number prior characteristic and the content vector are also the side information of the self-media account number, so the step can also obtain the identification vector by using the side information of the self-media account number and the random walk sequence.
In the embodiment of the application, the identification vectors can be constructed by fusing the vectors in a Deepwalk & Skip-gram mode. Here Deepwalk is a combination of Random Walk and Skip-gram. The Random Walk is responsible for sampling the graph structure to obtain the adjacency relation between the nodes in the graph, and the Skip-gram trains the vectors of the nodes from the sampled sequence. Deep Walk belongs to Random Walk class of graph algorithms. Fig. 9 is a schematic diagram of constructing an identification vector by fusing features, where a new sequence is generated by random walk on original data of an interactive operation sequence, and then coding information (SI 0 to SI n in fig. 9) of codeinfo is added to the network shown in fig. 9, where Densen embedding in fig. 9 is training of input different vector layers such as content embedding) to obtain a final identification vector.
In order to obtain the feature representation of the self-media account, a data set is extracted from a consumption sequence of account contents (such as watching the content A1 of the account a, then watching the content B1 of the account B, then watching the content A2 of the account a, and the like) of a publisher, ordered nodes are sampled by adopting a Random walk method, so that consumption behaviors are converted into ordered nodes for learning, and a Skip-gram neural language model is applied to the Random walk sequence to obtain an identification vector. The tag ID and the classification ID of the self-media account can be understood as the prior 'summary' information characteristic of self-media release content, and are determined by using a complete account sequence concerned by a terminal user in the last 24 hours and an account with text sent in the last 1 month.
In the actual implementation process, 31 IDs (1 +5+ 20) are first input, which include one account ID,5 primary class IDs (corresponding to the primary vertical class information in other embodiments), 5 secondary class IDs (corresponding to the secondary vertical class information in other embodiments), and 20 tag IDs (corresponding to the content tags in other embodiments). 3 vector matrixes are constructed based on 31 IDs, wherein one account ID matrix, one classification matrix and one label matrix. The reason for this is that the vectors of the same tag should be the same under different accounts, and the coverage of tagID can be increased without being affected by the input sequence of tagID. Then, the ID, tagID and classification ID of the account are respectively added with a vector which is numbered 0, is all 0 and can not be updated, and the purpose is to serve as an empty input placeholder, so that the final vector can not be influenced by an empty value when most of the input IDs are empty. The account characteristics belong to static information of the account, in the embodiment of the application, account issuing content is used for counting primary classification and secondary classification, and each classification is used as vector and mark information. And accumulating the 20 tag IDs of the content tags of the released content in a maximum manner within the past period (1 month) of releasing the content through the statistical account.
The following describes an implementation of vectorization of content vectors and account features.
Account ID: and directly vectorizing the account ID by using a characteristic vectorization method, wherein each line of the lookup lightweight file matrix represents an account ID.
Recent Top5 video (graphics and text) characteristics released by account numbers: and 5 video multi-modal vectors recently issued by the account are taken, video features are obtained through a nextvlad model, and the text-text content is obtained by extracting full-text vectorization features through a BERT model.
Category of the account number: and taking the most vertical classes of the 5 contents recently issued by the account as account vertical class IDs, and vectorizing the vertical class IDs.
Secondary category of account number: and 5 content second-level vertical classes recently issued by the account number, taking the most vertical classes as account number second-level vertical class IDs, and vectorizing the vertical class IDs.
Account content Tag: and releasing 20 tags with the maximum account content tags in the last month, and vectorizing the ID of the tags.
Account verticality: and (3) taking account perpendicularity (decimal between 0 and 1), dividing the account perpendicularity into 10 grades (0-10), and vectorizing the grade ID.
Fig. 10 is a schematic diagram of an example of recalling similar account numbers by using an identification vector according to an embodiment of the present application, and from right to left, account numbers corresponding to different similar thresholds can be found by taking a seed account number as an input, as shown in fig. 10, content 1001 issued by the seed account number is a course in middle school, a true explanation of junior high school mathematics is explained, similarity is from high to low, junior high school mathematics 1002 and high school mathematics 1002 are found in sequence, and finally account numbers related to primary school mathematics 1003 are found, and it can be seen that styles and description contents of these account numbers are very similar.
The most widely used in content understanding is the categorization and labeling of content, which are "explicit" features of content, with the advantage of interpretability. Content vectors are "implicit" features, uninterpretable, but increasingly important in recommendations. An account number may be understood as an aggregate representation of content, for example, a content vector of video content is a low-dimensional vector representing a video, and a "distance" between two content vectors represents a distance between two videos.
In the embodiment of the application, the graph information of account dimensionality is established through the consumption behavior of a user, the inter-account sequence is established through random walk, the sideinfo characteristic information of the account is fused to improve the accuracy of account characterization, and finally the vector representation of the account is obtained through the combination of the prior-posterior information. The identification vector can be applied to account content distribution ecology, such as cold start and high-quality content supplement of an account, the speed of cold start of content distribution of a novice self-media author can be increased, the loss speed of the novice account is reduced, and manual discovery and operation of the novice potential account are assisted; the number of contents in the high-quality content pool of the content pool can be expanded and the supply of the contents can be increased by using the high-quality account as seed input through the identification vector; in addition, the content issued by the potential account can be sensed more on the user level, and the exposure chance of high-quality content and the utilization efficiency of cold start traffic are increased; the efficiency of processing the content of the high-quality account can be improved through the comparison and matching of the identification vectors, so that the content ecology enters a benign cycle and forms a healthy content ecology.
It is understood that in the embodiment of the present application, the content related to the user information, for example, the data related to the interaction information of the consuming user with the self-media account, needs to obtain user permission or consent when the embodiment of the present application is applied to actual products or technologies, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.
Continuing with the exemplary structure of the content distribution apparatus 443 provided by the embodiments of the present application as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the content distribution apparatus 443 of the memory 440 may include:
a first obtaining module 4431, configured to obtain a feature vector of an identifier to be distributed, a content vector of published content under the identifier to be distributed, and interaction information of the published content;
a first determining module 4432, configured to determine an identification vector of the to-be-distributed identifier based on the feature vector, the content vector, and the interaction information;
a second determining module 4433, configured to determine similarity between different identifiers to be distributed based on the identifier vector of each identifier to be distributed;
a recalling module 4434, configured to recall at least one target distribution identifier from the multiple identifiers to be distributed based on similarity between different identifiers to be distributed;
the content distribution module 4435 is configured to sort the at least one target distribution identifier by using a preset sort target to obtain a sort result, and distribute the at least one target distribution identifier based on the sort result.
In some embodiments, the first obtaining module is further configured to:
determining vertical class vectors corresponding to the to-be-distributed identifiers based on the vertical class information of the N pieces of published content which are recently published by the to-be-distributed identifiers;
determining M content tags of the identifier to be distributed based on the content tags of the content released by the identifier to be distributed in a preset time period, and vectorizing the M content tags to obtain M tag vectors;
determining the verticality of the identifier to be distributed, and vectorizing the verticality to obtain a verticality vector of the identifier to be distributed;
and determining the vertical vector, the M label vectors and the verticality vector as the feature vector of the identifier to be distributed.
In some embodiments, the first obtaining module is further configured to:
acquiring N first-level vertical information of N published contents which are recently published by the identifier to be distributed;
vectorizing the N pieces of first-level vertical information to obtain N pieces of first-level vertical vectors;
acquiring N pieces of secondary vertical information of the N pieces of published content; vectorizing the N pieces of secondary vertical information to obtain N pieces of secondary vertical vectors;
and determining the N primary vertical vectors and the N secondary vertical vectors as vertical vectors corresponding to the identifier to be distributed.
In some embodiments, the first obtaining module is further configured to:
determining the content number corresponding to each level of verticality based on the information of each level of verticality of the content published in the preset time period;
determining the vertical class proportion corresponding to each first-level vertical class based on the content number corresponding to each first-level vertical class and the total content number of the released contents in the preset time period;
determining the verticality of the mark to be distributed based on the vertical class proportion corresponding to each primary vertical class;
and determining the verticality grade corresponding to the verticality, and determining the verticality vector corresponding to the verticality grade.
In some embodiments, the first obtaining module is further configured to:
acquiring the N published contents which are recently published by the identifier to be distributed;
determining respective video content vectors for respective video content of the N published content;
determining each text content vector of each of the graphics content of the N published content.
In some embodiments, the first obtaining module is further configured to:
analyzing the video content to obtain a plurality of video frame images corresponding to the video content;
extracting a plurality of video frame images to obtain a plurality of extracted target video frames;
and extracting the image characteristics of each frame of each target video frame, and performing characteristic fusion on the image characteristics of each frame to obtain a video content vector of the video content.
In some embodiments, the first obtaining module is further configured to:
acquiring each text content in each image-text content, and deleting format information in each text content to obtain each processed text;
and extracting semantic features of the processed text to obtain the text content vectors.
In some embodiments, the first determining module is further configured to:
determining an interaction behavior sequence based on the interaction information;
constructing a weighted directed graph of identifiers to be distributed based on the interaction behavior sequence, the feature vector and the content vector, wherein the weight of a directed edge in the weighted directed graph is determined by the feature vector and the content vector;
and carrying out random walk on the weighted directed graph to obtain the identification vector of the identification to be distributed.
In some embodiments, the recall module is further configured to:
acquiring each identification grade of each identification to be distributed, and determining a first reference identification from the plurality of identifications to be distributed based on each identification grade;
and recalling at least one target distribution identifier with the similarity greater than a similarity threshold value from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring a plurality of published contents under the target distribution identifier and acquiring each piece of interaction information of each published content;
a third determining module, configured to determine, based on the respective interaction information, respective forward feedback rates for the respective published contents;
and the fourth determining module is used for determining target content from the plurality of released contents based on each forward feedback rate and storing the target content into a preset resource pool.
In some embodiments, when the ordering objective is registration time, the content distribution module is further configured to:
sequencing the at least one target distribution identifier according to a sequence from near to far from the current moment based on the registration time to obtain a sequencing result;
acquiring target distribution objects corresponding to the target distribution identifications;
and distributing the content to the target distribution objects corresponding to the target distribution identifications based on the sorting sequence.
In some embodiments, the content distribution module is further configured to:
acquiring a first identifier concerning the target distribution identifier;
acquiring other identifiers to be distributed, wherein the similarity between the identifiers and the target distribution identifier is greater than a similarity threshold value;
acquiring a second identifier concerning the other identifiers to be distributed;
and determining the first identifier and the second identifier as a target distribution object corresponding to the target distribution identifier.
Here, it should be noted that: the above description of the content distribution apparatus embodiment is similar to the above description of the method, and has the same advantageous effects as the method embodiment. For technical details not disclosed in the embodiments of the content distribution apparatus of the present application, a person skilled in the art shall refer to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the content distribution method described in the embodiment of the present application.
Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 3, 4 and 5.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (15)

1. A content distribution method, comprising:
acquiring a feature vector of an identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content;
determining an identification vector of the identifier to be distributed based on the feature vector, the content vector and the interaction information;
determining similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed;
recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed;
and sequencing the at least one target distribution identifier by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identifier based on the sequencing result.
2. The method according to claim 1, wherein the obtaining the feature vector of the identifier to be distributed comprises:
determining vertical class vectors corresponding to the to-be-distributed identifiers based on the vertical class information of the N pieces of published content which are recently published by the to-be-distributed identifiers;
determining M content tags of the identifier to be distributed based on the content tags of the content released by the identifier to be distributed in a preset time period, and vectorizing the M content tags to obtain M tag vectors;
determining the verticality of the identifier to be distributed, and vectorizing the verticality to obtain a verticality vector of the identifier to be distributed;
and determining the vertical vector, the M label vectors and the verticality vector as the characteristic vector of the identifier to be distributed.
3. The method according to claim 2, wherein the determining a vertical vector corresponding to the identifier to be distributed based on the vertical information of the N pieces of published content recently published by the identifier to be distributed comprises:
acquiring N first-level vertical information of N published contents which are recently published by the identifier to be distributed;
vectorizing the N pieces of first-level vertical information to obtain N pieces of first-level vertical vectors;
acquiring N pieces of secondary vertical information of the N pieces of published content; vectorizing the N pieces of secondary vertical information to obtain N pieces of secondary vertical vectors;
and determining the N primary vertical vectors and the N secondary vertical vectors as vertical vectors corresponding to the identifier to be distributed.
4. The method according to claim 2, wherein the determining the verticality of the identifier to be distributed and vectorizing the verticality to obtain a verticality vector of the identifier to be distributed comprises:
determining the content number corresponding to each level of verticality based on the information of each level of verticality of the content published in the preset time period;
determining the vertical class proportion corresponding to each first-level vertical class based on the content number corresponding to each first-level vertical class and the total content number of the released contents in the preset time period;
determining the verticality of the mark to be distributed based on the vertical class proportion corresponding to each primary vertical class;
and determining the verticality grade corresponding to the verticality, and determining the verticality vector corresponding to the verticality grade.
5. The method of claim 1, wherein obtaining the content vector that identifies the published content under the to-be-distributed comprises:
acquiring the N published contents which are recently published by the identifier to be distributed;
determining respective video content vectors for respective video content of the N published content;
determining each text content vector of each of the graphics content of the N published content.
6. The method of claim 5, wherein determining each video content vector for each of the N published content comprises:
analyzing the video content to obtain a plurality of video frame images corresponding to the video content;
extracting a plurality of video frame images to obtain a plurality of extracted target video frames;
and extracting the image characteristics of each frame of each target video frame, and performing characteristic fusion on the image characteristics of each frame to obtain a video content vector of the video content.
7. The method of claim 5, wherein determining each text content vector for each of the N published contents comprises:
acquiring each text content in each image-text content, and deleting format information in each text content to obtain each processed text;
and extracting semantic features of the processed text to obtain the text content vectors.
8. The method of claim 1, wherein the determining the identification vector of the to-be-distributed identifier based on the feature vector, the content vector, and the interaction information comprises:
determining an interaction behavior sequence based on the interaction information;
constructing a weighted directed graph of identifiers to be distributed based on the interaction behavior sequence, the feature vector and the content vector, wherein the weight of a directed edge in the weighted directed graph is determined by the feature vector and the content vector;
and carrying out random walk on the weighted directed graph to obtain the identification vector of the identification to be distributed.
9. The method according to claim 1, wherein the recalling at least one target distribution identifier from a plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed comprises:
acquiring each identification grade of each identification to be distributed, and determining a first reference identification from the plurality of identifications to be distributed based on each identification grade;
and recalling at least one target distribution identifier with the similarity greater than a similarity threshold value from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed.
10. The method of claim 9, further comprising:
acquiring a plurality of published contents under the target distribution identifier, and acquiring each piece of interaction information of each published content;
determining respective forward feedback rates for respective published content based on respective interaction information;
and determining target content from the plurality of published contents based on each forward feedback rate, and storing the target content to a preset resource pool.
11. The method according to any one of claims 1 to 9, wherein when the ordering target is the registration time, the ordering the at least one target distribution identifier by using a preset ordering target to obtain an ordering result, and performing content distribution on the at least one target distribution identifier based on the ordering result includes:
sequencing the at least one target distribution identifier according to the sequence from near to far from the current moment based on the registration time to obtain a sequencing result;
acquiring target distribution objects corresponding to the target distribution identifications;
and distributing the content to the target distribution objects corresponding to the target distribution identifications based on the sorting sequence.
12. The method of claim 11, wherein the obtaining the target distribution object corresponding to the target distribution identifier comprises:
acquiring a first identifier concerning the target distribution identifier;
acquiring other identifiers to be distributed, wherein the similarity between the identifiers and the target distribution identifier is greater than a similarity threshold value;
acquiring a second identifier concerning the other identifiers to be distributed;
and determining the first identifier and the second identifier as target distribution objects corresponding to the target distribution identifiers.
13. A content distribution apparatus, characterized by comprising:
the first acquisition module is used for acquiring a feature vector of an identifier to be distributed, a content vector of published content under the identifier to be distributed and interaction information of the published content;
a first determining module, configured to determine an identifier vector of the identifier to be distributed based on the feature vector, the content vector, and the interaction information;
the second determining module is used for determining the similarity between different identifiers to be distributed based on the identifier vectors of the identifiers to be distributed;
the recall module is used for recalling at least one target distribution identifier from the plurality of identifiers to be distributed based on the similarity between different identifiers to be distributed;
and the content distribution module is used for sequencing the at least one target distribution identifier by using a preset sequencing target to obtain a sequencing result, and distributing the at least one target distribution identifier based on the sequencing result.
14. A content distribution apparatus characterized by comprising:
a memory for storing executable instructions;
a processor for implementing the method of any one of claims 1 to 12 when executing executable instructions stored in the memory.
15. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 12.
CN202111670560.4A 2021-09-01 2021-12-31 Content distribution method, apparatus, device and computer readable storage medium Active CN115730111B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111020281 2021-09-01
CN2021110202813 2021-09-01

Publications (2)

Publication Number Publication Date
CN115730111A true CN115730111A (en) 2023-03-03
CN115730111B CN115730111B (en) 2024-02-06

Family

ID=85292317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111670560.4A Active CN115730111B (en) 2021-09-01 2021-12-31 Content distribution method, apparatus, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115730111B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009157500A (en) * 2007-12-25 2009-07-16 Ntt Docomo Inc Distribution server and distribution method
CN111008278A (en) * 2019-11-22 2020-04-14 厦门美柚股份有限公司 Content recommendation method and device
US20200125574A1 (en) * 2018-10-18 2020-04-23 Oracle International Corporation Smart content recommendations for content authors
CN111639291A (en) * 2020-05-29 2020-09-08 腾讯科技(武汉)有限公司 Content distribution method, content distribution device, electronic equipment and storage medium
CN111885399A (en) * 2020-06-29 2020-11-03 腾讯科技(武汉)有限公司 Content distribution method, content distribution device, electronic equipment and storage medium
CN112165639A (en) * 2020-09-23 2021-01-01 腾讯科技(深圳)有限公司 Content distribution method, content distribution device, electronic equipment and storage medium
CN112202849A (en) * 2020-09-15 2021-01-08 腾讯科技(深圳)有限公司 Content distribution method, content distribution device, electronic equipment and computer-readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009157500A (en) * 2007-12-25 2009-07-16 Ntt Docomo Inc Distribution server and distribution method
US20200125574A1 (en) * 2018-10-18 2020-04-23 Oracle International Corporation Smart content recommendations for content authors
CN111008278A (en) * 2019-11-22 2020-04-14 厦门美柚股份有限公司 Content recommendation method and device
CN111639291A (en) * 2020-05-29 2020-09-08 腾讯科技(武汉)有限公司 Content distribution method, content distribution device, electronic equipment and storage medium
CN111885399A (en) * 2020-06-29 2020-11-03 腾讯科技(武汉)有限公司 Content distribution method, content distribution device, electronic equipment and storage medium
CN112202849A (en) * 2020-09-15 2021-01-08 腾讯科技(深圳)有限公司 Content distribution method, content distribution device, electronic equipment and computer-readable storage medium
CN112165639A (en) * 2020-09-23 2021-01-01 腾讯科技(深圳)有限公司 Content distribution method, content distribution device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115730111B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
Yang et al. Mining Chinese social media UGC: a big-data framework for analyzing Douban movie reviews
US20210281569A1 (en) Enhanced access to media, systems and methods
CN111885399A (en) Content distribution method, content distribution device, electronic equipment and storage medium
De Saulles Information 2.0: New models of information production, distribution and consumption
CN111639291A (en) Content distribution method, content distribution device, electronic equipment and storage medium
CN111310041B (en) Image-text publishing method, model training method and device and storage medium
CN112153426A (en) Content account management method and device, computer equipment and storage medium
CN111507097A (en) Title text processing method and device, electronic equipment and storage medium
CN106233325A (en) Generate activity summary
CN114692007B (en) Method, device, equipment and storage medium for determining representation information
CN104021140A (en) Network video processing method and device
US20230134118A1 (en) Decentralized social news network website application (dapplication) on a blockchain including a newsfeed, nft marketplace, and a content moderation process for vetted content providers
CN114996486A (en) Data recommendation method and device, server and storage medium
CN110851622A (en) Text generation method and device
CN113822127A (en) Video processing method, video processing device, video processing equipment and storage medium
CN113011126A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN113656560B (en) Emotion category prediction method and device, storage medium and electronic equipment
CN112989167B (en) Method, device and equipment for identifying transport account and computer readable storage medium
Drott Streaming Music, Streaming Capital
CN115730111B (en) Content distribution method, apparatus, device and computer readable storage medium
CN115114519A (en) Artificial intelligence based recommendation method and device, electronic equipment and storage medium
CN114996435A (en) Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN114363660A (en) Video collection determining method and device, electronic equipment and storage medium
CN108053335A (en) A kind of new media public service platform
Nixon How do destinations relate to one another? a study of destination visual branding on Instagram

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40084144

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant