CN111639291A - Content distribution method, content distribution device, electronic equipment and storage medium - Google Patents

Content distribution method, content distribution device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111639291A
CN111639291A CN202010478221.5A CN202010478221A CN111639291A CN 111639291 A CN111639291 A CN 111639291A CN 202010478221 A CN202010478221 A CN 202010478221A CN 111639291 A CN111639291 A CN 111639291A
Authority
CN
China
Prior art keywords
content
account
distribution
distributed
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010478221.5A
Other languages
Chinese (zh)
Inventor
刘刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Wuhan Co Ltd
Original Assignee
Tencent Technology Wuhan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Wuhan Co Ltd filed Critical Tencent Technology Wuhan Co Ltd
Priority to CN202010478221.5A priority Critical patent/CN111639291A/en
Publication of CN111639291A publication Critical patent/CN111639291A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a content distribution method, a content distribution device, an electronic device and a storage medium, wherein the content distribution method comprises the following steps: acquiring content to be distributed, a distribution account and a content type corresponding to the content issued by the distribution account; determining distribution information of the distribution account on the content based on the content type and the number of the distributed contents of the distribution account; acquiring a reference account list and historical contents issued by each reference account in the reference account list within a historical time period from a content distribution system to which a distribution account belongs; calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account; when the calculated content originality is larger than a preset value, the content to be distributed is distributed, and the scheme can improve the content auditing efficiency.

Description

Content distribution method, content distribution device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a content distribution method and apparatus, an electronic device, and a storage medium.
Background
With the development of modern technologies, the way of media publishing information is more and more convenient. These media may register with an account on the network platform and then publish information, such as text information, audio information, and video information, based on the account. These media also include self-media, which refers to the way the general public publishes their own facts and news through the internet, etc. In recent years, the content creation is performed, all large internet companies actively enter a content market, various self-media are gushed out like bamboo shoots in spring after rain, and everyone can create the self-media by writing. A huge amount of articles are created from media every day, however, some content published from the media account may be copied from the media platform, and therefore, it is necessary to determine whether the content published from the media account is historical content.
At present, a manual auditing scheme is adopted to audit the content issued by the self-media account, however, because the number of the self-media accounts is huge, the operation personnel conduct item-by-item auditing to a great number of postings every day, which is time-consuming and labor-consuming and has low efficiency.
Disclosure of Invention
The application provides a content distribution method, a content distribution device, an electronic device and a storage medium, which can improve the content auditing efficiency.
The application provides a content distribution method, which comprises the following steps:
acquiring content to be distributed, a distribution account and a content type corresponding to the content issued by the distribution account;
determining distribution information of the distribution account on the content based on the content type and the number of the distributed contents of the distribution account;
acquiring a reference account list and historical contents issued by each reference account in the reference account list within a historical time period from a content distribution system to which the distribution account belongs;
calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account;
and when the calculated content originality is greater than a preset value, distributing the content to be distributed.
Correspondingly, the application also provides a content distribution device, which comprises:
the acquisition module is used for acquiring the content to be distributed, the distribution account and the content type corresponding to the content issued by the distribution account;
the determining module is used for determining the distribution information of the distribution account on the content based on the content type and the number of the contents published by the distribution account;
the acquisition module is used for acquiring a reference account list and historical contents issued by each reference account in the reference account list within a historical time period from a content distribution system to which the distribution account belongs;
the computing module is used for computing the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account;
and the distribution module is used for distributing the content to be distributed when the calculated content originality is greater than a preset value.
Optionally, in some embodiments of the present application, the calculation module includes:
the extraction submodule is used for respectively extracting the content authentication information of the historical content and the content authentication information of the content to be distributed;
the updating submodule is used for updating the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
and the calculating submodule is used for calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification.
Optionally, in some embodiments of the present application, the update sub-module includes:
the generating unit is used for generating the correlation degree between the distribution account and each reference account in the reference account list according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
and the updating unit is used for updating the account identification of the distribution account based on the correlation degree between the distribution account and each reference account in the reference account list.
Optionally, in some embodiments of the present application, the generating unit includes:
the detection subunit is used for detecting whether the content identification information of the historical content is consistent with the content identification information of the content to be distributed;
the first calculating subunit is used for calculating the similarity between the content title information of each historical content and the content title information of the content to be distributed;
and the generating subunit is configured to generate a correlation degree between the distribution account and the reference account according to the similarity between the content title information of the historical content and the content title information of the content to be distributed and each detection result.
Optionally, in some embodiments of the present application, the generating subunit is specifically configured to:
and performing fusion processing on the similarity between the content title information and the content title information of the content to be distributed and each detection result to obtain the correlation between the distribution account and the reference account.
Optionally, in some embodiments of the present application, the calculation module includes:
an obtaining unit, configured to obtain account authentication levels corresponding to reference accounts in the reference account list in a content distribution system to which the reference accounts belong;
the computing unit is used for computing the similarity between the content to be distributed and each historical content based on the account authentication level corresponding to each reference account in the reference account list;
and the fusion unit is used for fusing the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification to obtain the content originality of the distribution account.
Optionally, in some embodiments of the present application, the fusion unit is specifically configured to:
generating content repetition degrees of the contents to be distributed according to the similarity between the contents to be distributed and each historical content;
and fusing the updated account identification, the distribution information and the content repetition degree of the content to be distributed to obtain the content originality degree of the distribution account.
Optionally, in some embodiments of the present application, the computing unit includes:
the determining subunit is used for determining the weight corresponding to each historical content according to the account authentication level corresponding to each reference account in the reference account list;
and the second calculating subunit is used for calculating the similarity between the content to be distributed and each historical content based on the weight corresponding to each historical content.
Optionally, in some embodiments of the present application, the second calculating subunit is specifically configured to:
vectorizing the content to be distributed and each historical content respectively to obtain a first vector corresponding to the content to be distributed and a second vector corresponding to each historical content;
respectively calculating the distance between the first vector and each second vector;
and determining the similarity between the content to be distributed and each historical content based on the distance between the first vector and each second vector and the weight corresponding to each historical content.
After acquiring content to be distributed, a distribution account and a content type corresponding to the content issued by the distribution account, determining distribution information of the distribution account on the content based on the content type, then acquiring a reference account list and historical content issued by each reference account in the reference account list in a historical time period from a content distribution system to which the distribution account belongs, then respectively extracting content authentication information of the historical content and content authentication information of the content to be distributed, then updating an account identifier of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, and finally calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identifier, and when the calculated content originality is greater than a preset value, distributing the content to be distributed. Therefore, the scheme can improve the efficiency of content auditing.
Drawings
In order to more clearly illustrate the technical solutions in the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1a is a scene schematic diagram of a content distribution method provided in the present application;
FIG. 1b is a schematic flow chart of a content distribution method provided herein;
fig. 1c is a schematic diagram of the verticality of a text in the content distribution method provided in the present application;
fig. 1d is a schematic diagram illustrating a method for calculating similarity between content to be distributed and reference content in the content distribution method provided in the present application;
FIG. 2a is another schematic flow chart of a content distribution method provided herein;
fig. 2b is a schematic diagram of another scenario of a content distribution method provided in the present application;
fig. 3 is a schematic structural diagram of a content distribution apparatus provided in the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The application provides a content distribution method, a content distribution device, an electronic device and a storage medium.
The content delivery device may be specifically integrated in a server, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
For example, referring to fig. 1a, the content distribution apparatus is integrated on a server, after acquiring content to be distributed, a distribution account and a content type corresponding to content already issued by the distribution account, for example, a user logs in the distribution account through a terminal, the content to be distributed is an article a, the distribution account is a self-media account H in a self-media platform X, distribution information of the self-media account H on the content is determined based on the content type corresponding to the content already issued by the self-media account H, then, historical content issued by each reference account in a reference account list and the reference account list in a historical period of time is collected from a content distribution system (the self-media platform X) to which the distribution account belongs, and then, the server extracts content authentication information of the historical content and content authentication information of the content to be distributed, where the content authentication information may include content title information and content identification information, the content identification information can comprise watermarks in the articles, authors of the articles, the article release time and the like, then the server updates the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, finally, the server calculates the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification, and distributes the content to be distributed when the calculated content originality is larger than a preset value.
According to the content distribution method, the account identification of the distribution account can be updated according to the content authentication information of the historical content and the content authentication information of the content to be distributed, the content originality of the distribution account is calculated based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification, and when whether the content to be distributed is the original content is judged, the content authentication information of the historical content and the content authentication information of the content to be distributed are considered, so that the content originality of the distribution account obtained through subsequent calculation is more accurate, manual intervention is not needed in the whole process, the waste of human resources is reduced, the content auditing efficiency is improved, and the content distribution efficiency is improved.
The following are detailed below. It should be noted that the description sequence of the following embodiments is not intended to limit the priority sequence of the embodiments.
A content distribution method, comprising: the content distribution method includes the steps that content to be distributed, distribution accounts and content types corresponding to content already issued by the distribution accounts are obtained, distribution information of the distribution accounts on the content is determined based on the content types and the number of the content already issued by the distribution accounts, historical content issued by reference accounts in a reference account list and the reference account list in a historical time period is collected from a content distribution system to which the distribution accounts belong, the content originality of the distribution accounts is calculated based on the distribution information, the similarity between the content to be distributed and each historical content and account identification of the distribution accounts, and the content to be distributed is distributed when the calculated content originality is larger than a preset value.
Referring to fig. 1b, fig. 1b is a schematic flow chart of a content distribution method provided in the present application. The specific flow of the content distribution method may be as follows:
101. and acquiring the content to be distributed, the distribution account and the content type corresponding to the content issued by the distribution account.
For example, specifically, the content to be distributed, which is the content to be distributed, obtained from the content distributed based on the distribution account, which is an account having a content distribution function, and the distribution account, which may be a self-media account, may be obtained by accessing the network interface. It is understood that the self Media (We Media) refers to a general term of new Media which is a personalized and autonomous propagator and delivers normative and non-normative information to unspecified majority or specific single people by means of modernization and electronization, and the self Media account can be an account (such as a microblog account) which is registered in an independent content distribution platform and can autonomously publish content, and can also be an account which is registered in a content distribution platform integrated in a social platform and can autonomously publish content. The content distribution platform integrated in the social platform may be an integrated content distribution platform in an instant messaging platform.
102. And determining the distribution information of the distribution account on the content based on the content type and the number of the contents released by the distribution account.
For example, specifically, the number of content released by the distribution account is collected, for example, 10 articles are released by the distribution account in total, where the article types of 3 articles belong to the military class, the article types of 2 articles belong to the life class, and the article types of 5 articles belong to the pharmaceutical class, so the distribution of the distribution account in content is: military, living and medicine, and are not distributed in other fields.
It should be noted that, for some transport accounts, the distribution of content may be very rich, for example, multiple domains may be involved, and these domains are also irrelevant, such as the distribution of a certain distribution account on content is: medicine, metal manufacturing, military, automobile manufacturing and sports, the distribution account is likely to be a transport account, and for the original account, the content is often distributed to some specific fields, so the distribution of the content is relatively concentrated, and the original account distributes a large amount of content in the specific fields, and here, a concept needs to be introduced: the verticality of the original, which is the concentration degree of the published content of the distribution account in the field of excellence, can be explained by using normal distribution and kurtosis as follows, referring to fig. 1 c: the distribution condition of the vertical classes of the text messages of one account number is shown, the horizontal axis is the vertical classes of the text messages (which can be represented by the first-level classification of the text messages), the vertical axis is the proportion of the corresponding vertical classes of the text messages, and the proportion is taken as a normal distribution, so that the area of the shaded part is 1 (the sum of the proportions of all the vertical classes of the text messages is 1), namely the vertical situation one (left picture): the smaller the normal distribution kurtosis (the smaller the vertical proportion example with the most sent texts), the larger the standard deviation (the more dispersed the vertical classes of the sent texts) under the condition of unchanged area, namely, the sent texts are not vertical, and then, the second case (the right picture): the larger the kurtosis of normal distribution (the larger the vertical class example with the most hair letters), the smaller the standard deviation (the more concentrated the vertical classes of hair letters) under the condition of unchanged area.
103. And acquiring a reference account list and historical contents issued by each reference account in the reference account list in a historical time period from a content distribution system to which the distribution account belongs.
The reference account refers to a distribution account authenticated by a content distribution system (also referred to as a content distribution platform), and may include an enterprise account and a private account, for example, the enterprise account may be a distribution account of a news media, the private account may be a distribution account of a certain writer, and specifically, according to an actual situation, a reference account list and historical content issued by each reference account in a historical time period may be collected from the distribution account content distribution platform, where the historical time period may be one month in the past, may be one year in the past, or may be a period of time from registration of the reference account to a current time point, and specifically, according to an actual situation, the reference account is collected.
104. And calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account.
For the same content distribution platform, an account identifier is assigned to the corresponding distribution account, and is used to identify whether the distribution account is a transport account or an original account, and the account identifier of the distribution account is not always unchanged, and the identifier of the distribution account may be updated according to the content to be distributed and the published historical content, and then, based on the distribution information, the similarity between the content to be distributed and each historical content, and the updated account identifier, the content originality of the distribution account is calculated, that is, optionally, in some embodiments, the step "calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content, and the account identifier of the distribution account" may specifically include:
(11) content authentication information for extracting history content and content authentication information for content to be distributed, respectively
(12) According to the content authentication information of the historical content and the content authentication information of the content to be distributed, the account identification of the distribution account is updated
(13) And calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification.
The content authentication information is protection information embedded in a carrier (content), and is mainly used for identifying a source of the content, a creator of the content (i.e., an account for distributing the content), a structural representation of the content, and the like, for example, the content authentication information may include content title information and content identification information, the content title information includes text length information of a content title and semantic information, the content identification information includes watermark information, time information of content distribution, and the like, the content identification information is used for identifying the content to facilitate subsequent content identification, specifically, the account identification of the distribution account may be updated according to the content authentication information of the history content and the content authentication information of the content to be distributed, for example, when the distribution account is determined to be a transport account according to the content authentication information of the history content and the content authentication information of the content to be distributed, then updating the account id of the distribution account, wherein the updated account id indicates that the distribution account is a transport account, and then calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account id, wherein the originality refers to independently completed creation, and the originality does not belong to works created by falsification, plagiarism or piracy of others, or works created by recomposition, translation, annotation and arrangement of others, it can be understood that the content originality is used for measuring the degree of originality of the content of the distribution account, and because the distribution information, the similarity between the content to be distributed and each historical content and the measurement unit of the updated account id are different, in order to be able to participate in calculation of the distribution information, the similarity between the content to be distributed and each historical content and the updated account id, the data needs to be normalized, and the values thereof are mapped to a certain value interval through function transformation, so that the distribution information, the similarity between the content to be distributed and each historical content, and the updated account id can be normalized and normalized respectively, then the processing results are weighted and added according to a preset strategy to obtain the content originality of the distribution account id, it should be noted that before the normalization processing is performed on the distribution information, the similarity between the content to be distributed and each historical content, and the updated account id respectively, the distribution information and the updated account id can be assigned, the content occupation ratio of the distribution account in the maximum content can be determined according to the distribution information, and the proportion corresponding to the content occupation ratio is given to the distribution information, for example, the content type of the distribution account in the maximum content is of military type, the proportion of the military type in the content is 80%, the value of the distribution information is 80%, otherwise, if the account id of the distribution account is the original account, the value is assigned to 1, and if the account id is a plagiarism account, the value is assigned to 0, and the preset policy can be set by the content distribution system according to the actual requirement, which is not described herein again.
Optionally, in some embodiments, in order to improve the accuracy of the calculated content originality, a correlation between the distribution account and each reference account in the reference account list may be generated according to the content authentication information of the historical content and the content authentication information of the content to be distributed, and then the account identifier is updated based on the correlation, that is, the step "updating the account identifier of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed" may specifically include:
(21) generating a correlation degree between a distribution account and each reference account in a reference account list according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
(22) and updating the account identification of the distribution account based on the correlation degree between the distribution account and each reference account in the reference account list.
For example, if the similarity between the account name of the distribution account and the account name of the reference account is high, such as 90%, but the content distributed by the distribution account is not related to the content already issued by the reference account, and the similarity between the account name of the distribution account and the account name of the reference account is determined as the correlation between the distribution account and the reference account, which may cause the content originality of the distribution account to be calculated later to be inaccurate, in some embodiments, optionally, the generating of the correlation between the distribution account and each reference account in the reference account list according to the content identification information and the content title information may specifically include:
(31) detecting whether the content identification information of the historical content is consistent with the content identification information of the content to be distributed;
(32) calculating the similarity between the content title information of each historical content and the content title information of the content to be distributed;
(33) and generating the correlation degree between the distribution account and the reference account according to the similarity between the content title information of the historical content and the content title information of the content to be distributed and each detection result.
For example, specifically, one distribution account carries video content of "XX sound", for video content distributed by the content distribution platform of "XX sound", the video content is marked with identification information of "XX sound", such as a watermark of "XX sound" in the video, and at this time, since "XX sound" is in the reference account list, however, the distribution account corresponding to the content to be distributed is not the account of "XX sound", it may be determined that the distribution account is strongly related to the account of "XX sound", then "1" may be recorded in the account identification of the distribution account, which indicates that the distribution account carries content of one reference account, and if the distribution account carries content of two reference accounts, the distribution account is recorded as "1, 1", it may be understood that the distribution account carries content of several reference accounts, the number of times of recording "1" is determined, and the similarity between the content title information of each historical content and the content title information of the content to be distributed can be calculated, and the content title of each historical content and the content title of the content to be distributed can be processed by adopting a natural language processing technology, so as to obtain the similarity between the content title information of each historical content and the content title information of the content to be distributed, wherein Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
For text processing in natural language processing, Machine Learning (ML) technology is generally used to implement text processing. The machine learning is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
In the application, the similarity between the content titles of the historical content and the content titles of the content to be distributed is detected through a natural language processing technology, for example, an entity in the content title of the historical content and an entity of the content title of the content to be distributed can be respectively extracted, then, the similarity between the entity and the entity is calculated, further, the similarity between the content title of the historical content and the content title of the content to be distributed is generated, and finally, the similarity between the content title information and the content title information of the content to be distributed and all detection results are fused to obtain the correlation between the distribution account and the reference account.
For distribution accounts of the same content distribution system, when a distribution account enters the content distribution system, the content distribution system assigns each distribution account an account authentication level, which is usually determined by the content distribution system according to its own operation policy in the early stage, for example, the content distribution system Q presets 5 account levels of S level, a level, B level, C level and D level, and determines all distribution accounts corresponding to the S level as reference accounts of the content distribution system Q, that is, the reference account list includes all distribution accounts of the S level in the content distribution system Q, such as accounts registered in the content distribution system Q by various authoritative media such as people' S daily report, southern weekend, and central news, and also includes some account with a known major number such as XX account, second and visual account in the industry, in addition, account numbers which are distributed abundantly in the vertical field can be determined as class a account numbers, it should be noted that the grade of the account numbers is not constant, the reference account numbers are usually determined by an operation policy, and for the rest distribution account numbers such as class a account numbers and class B account numbers, the grade of the account numbers which can grow rapidly is determined in a content distribution system by the original degree and the platform distribution condition of the content of the text, wherein the platform distribution condition of the content of the text comprises user complaints and report feedback, therefore, in some embodiments, the content original degree of the distribution account numbers can be calculated based on the account number authentication grade corresponding to the reference account number, the similarity between the content to be distributed and each historical content, the distribution information and the updated account number identification, that is, the step "is based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account number identification, calculating the content originality of the distribution account, which may specifically include:
(41) acquiring account authentication levels corresponding to all reference accounts in the reference account list in the content distribution system to which the reference accounts belong;
(42) calculating the similarity between the content to be distributed and each historical content based on the account authentication level corresponding to each reference account in the reference account list;
(43) and fusing the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification to obtain the content originality of the distribution account.
In practical application, for account numbers registered in a content distribution system for authoritative media and some account numbers known in the industry, the content of such account numbers is often easy to copy and transport, so for identification of distribution account numbers, the similarity between the content to be distributed and each historical content can be calculated, and since account number authentication levels corresponding to different reference account numbers may be different, in some embodiments, the weight corresponding to each historical content can be determined based on the account number authentication level of the reference account number, and then the similarity between the content to be distributed and each historical content is calculated based on the determined weight, that is, the step "calculating the similarity between the content to be distributed and each historical content based on the account number authentication level corresponding to each reference account number in a reference account number list" may specifically include:
(51) determining the weight corresponding to each historical content according to the account authentication level corresponding to each reference account in the reference account list;
(52) and calculating the similarity between the content to be distributed and each historical content based on the weight corresponding to each historical content.
Referring to fig. 1d, taking the corresponding level of the distribution account in the content distribution system as B level as an example, if there is a repetition with the content (history content) already issued by the S level account, a record score of 1 is repeated, if there is a repetition with the content (history content) already issued by the a level account, a record of 0.5 is repeated 1 time, and a record of the same level or below is 0.
It should be noted that there are many ways of determining the content, where the content may include texts, pictures and videos, and therefore, in order to reduce the calculation amount, the content (including the historical content and the content to be distributed) may be subjected to the dimension reduction processing, that is, optionally, in some embodiments, the step "calculating the similarity between the content to be distributed and each historical content based on the weight corresponding to each historical content" may specifically include:
(61) vectorizing the content to be distributed and each historical content respectively to obtain a first vector corresponding to the content to be distributed and a second vector corresponding to each historical content;
(62) respectively calculating the distance between the first vector and each second vector;
(63) and determining the similarity between the content to be distributed and each historical content based on the distance between the first vector and each second vector and the weight corresponding to each historical content.
For example, for a plain text, after segmenting a text, the text after segmenting the text is converted into feature vectors, and then the distance between vectors to be compared is calculated, such as euclidean distance and the like, but since there is a possibility that the feature vector words of an article are particularly many, which results in high whole vector dimension, the calculation cost is too large, the feature vector corresponding to the content to be distributed and the feature vector corresponding to the historical content can be respectively hashed to convert the high-dimensional feature vector into a fingerprint (fingerprint), and then the hamming distance (hamming distance) between two fingerprints is calculated to determine the similarity between the historical content and the content to be distributed, and the smaller the hamming distance is, the lower the similarity is
For video content, video feature vectors and audio feature vectors corresponding to the video content may be extracted, and then, a distance between the vectors is calculated to determine whether the video is repeated.
And finally, fusing the updated account identification, distribution information and the content repetition degree of the content to be distributed to obtain the content originality of the distribution account, wherein the updated account identification, distribution information and the content repetition degree of the content to be distributed can be fused in a norm calculation mode to obtain the content originality of the distribution account.
105. And when the calculated content originality is greater than a preset value, distributing the content to be distributed.
The preset value is preset, and may be specifically set according to a policy in the content distribution system, for example, for a content distribution system with rich original content, the preset value may be set to 1, and for a content distribution system with lack of original content (i.e., a small number of original content), the preset value may be set to 10, and when the content originality of the distribution account is greater than the preset value, the content to be distributed is distributed.
After acquiring content to be distributed, a distribution account and a content type corresponding to the content already issued by the distribution account, determining distribution information of the distribution account on the content based on the content type and the number of the content already issued by the distribution account, then collecting historical content issued by each reference account in a reference account list and the reference account list in a historical time period from a content distribution system to which the distribution account belongs, and finally calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account, and when the calculated content originality is greater than a preset value, distributing the content to be distributed. When judging whether the content to be distributed is original content, not only the similarity among the historical contents and the account identification of the distribution account are considered, but also the distribution information of the distribution account on the content is considered, so that the content originality of the distribution account obtained by subsequent calculation is more accurate, in addition, the whole process does not need manual intervention, the waste of manpower resources is reduced, the content auditing efficiency is improved, and the content distribution efficiency is further improved.
The method according to the examples is further described in detail below by way of example.
In the present embodiment, the content distribution apparatus will be described by taking an example in which the content distribution apparatus is specifically integrated in a server.
Referring to fig. 2a, a content distribution method may specifically include the following processes:
201. the server obtains the content to be distributed, the distribution account and the content type corresponding to the content issued by the distribution account.
For example, specifically, the server may obtain, through accessing the network interface, content to be distributed, which is content to be distributed, a distribution account and a content type corresponding to content already issued by the distribution account, where the content to be distributed is obtained from content issued based on the distribution account, the distribution account is an account having a content issuing function, and the distribution account may be a self-media account. It is understood that the self Media (We Media) refers to a general term of new Media which is a personalized and autonomous propagator and delivers normative and non-normative information to unspecified majority or specific single people by means of modernization and electronization, and the self Media account can be an account (such as a microblog account) which is registered in an independent content distribution platform and can autonomously publish content, and can also be an account which is registered in a content distribution platform integrated in a social platform and can autonomously publish content. The content distribution platform integrated in the social platform may be an integrated content distribution platform in an instant messaging platform.
202. And the server determines the distribution information of the distribution account on the content based on the content type and the number of the contents released by the distribution account.
For example, specifically, the server collects the number of content released by the distribution account, for example, the distribution account total releases 10 articles, where the article types of 3 articles belong to the military class, the article types of 2 articles belong to the life class, and the article types of 5 articles belong to the pharmaceutical class, and therefore, the distribution of the distribution account in content is: military, living and medicine, and are not distributed in other fields.
It should be noted that, for some transport accounts, the distribution of the content may be very rich, for example, multiple fields may be involved, and these fields are still unrelated fields, so that the server may use the primary classification result of the text content to count the ratio of the most text to all text. The text is not defined vertically: the proportion of the articles of the vertical type with the most vertical types sent by one account to the total articles is smaller and more vertical, that is, the distribution information SpCan be represented by the following formula:
Figure BDA0002516461510000151
u represents the number of vertical articles with the most vertical types of texts in a period of time of one account, T represents the total number of texts in a period of time of one account, and the time period can be 1 month.
203. The server collects a reference account list and historical contents issued by each reference account in the reference account list in a historical time period from a content distribution system to which the distribution account belongs.
204. The server extracts the content authentication information of the history content and the content authentication information of the content to be distributed, respectively.
The content authentication information comprises content title information and content identification information, the content title information comprises text length information and semantic information of a content title, the content identification information comprises watermark information, time information of content release and the like, and the content identification information is used for marking the content and facilitating subsequent content identification.
205. And the server updates the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed.
For the same content distribution platform, an account identification is given to the affiliated distribution account, and is used for identifying whether the distribution account is a transport account or an original accountFor example, when the distribution account is determined to be a transport account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, the account id of the distribution account is updated, and the updated account id indicates that the distribution account is a transport account, wherein the updated account id S may be represented by the following formulaaccountThe method comprises the following steps:
Saccount=||Xtag+Xtitle||3
wherein, XtagThe case that a corresponding tag (which may be generated by a server or manually labeled) of the content of the message sent by the account (i.e., the content to be distributed) hits an account tag in a white list is recorded as 1 every hit, for example, a video content with an account carrying XX sound is usually played with XX for the purpose of distributing the version of the content itself, and at this time, the message sent by the account itself is not the account and is hit once because the XX sound has the account already in the white list;
Xtitleand the approximate hit times of the repeated titles of the content messages in the account message content and the white list are represented, and if the hit time is marked as 0.5, the hit time is not marked as 0.
206. And the server calculates the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identifier.
Specifically, comparing the repetition of the released content, that is, the repetition of the content to be distributed and the historical content, may calculate the repetition of the content to be distributed and the historical content, and specifically may include: the method is used for calculating the repetition condition of graphics and texts (text de-duplication Simhash, picture de-duplication and bert text de-duplication) and calculating the repetition condition of videos (video fingerprint vector de-duplication and audio fingerprint de-duplication), the Simhash is proposed by Charikar in 2002 and is a hash algorithm capable of calculating the similarity of documents, and google uses the Simhash to carry out massive de-duplication on the documentsAnd (5) text deduplication work. The simhash belongs to a local sensitive type (localitysensive hash), the main idea is to reduce dimensions, convert a high-dimensional feature vector into an f-bit fingerprint (fingerprint), and determine the similarity of two articles by calculating the hamming distance (hamming distance) of two fingerprints, wherein the lower the hamming distance is, the lower the similarity is (according to the detection Near-Duplicates for Web browsing paper), and the common hamming distance of 3 represents that the two articles are the same. BERT is an abbreviation for Bidirectional Encode representation from transforms, a novel language model that trains a pre-trained deep bi-directional representation by jointly adjusting bi-directional Transformers in all layers. Based on the originally realized multi-layer bidirectional transducer encoder described in Vaswani et al (2017) and the transducer architecture published in Google 2017, a common transducer uses a group of encoder and decoder networks, BERT only needs an additional output layer to perform fine-tune pre-training, so that various tasks can be met, and the de-duplication of video fingerprint vectors and audio vectors can be realized by extracting video and audio characteristics vectorization from video contents, calculating the distance of the vectors to judge whether the videos are repeated, and finally, the calculation results can be expressed by 3-norm aggregation, and the repetition condition S of the published contentscopyThe following were used:
Scopy=||Xtxt+Xpic+Xbert+Xvideo+Xvoice||3
wherein, XtxtThe text de-duplication result is shown, and the text de-duplication result is the result of a text simhash algorithm; xpicThe repeated result of the pictures is shown, for example, the pictures contained in the text are mostly the same and are considered to be repeated, and the common proportion of the pictures is more than 50%; xbertRepresenting the result of the repetition of BERT text; xvideoFingerprint rearrangement results representing video contents; xvoiceRepresented is the result of audio fingerprinting of video content.
For distribution accounts of the same content distribution system, when a distribution account enters the content distribution system, the content distribution system assigns each distribution account an account authentication level, which is usually determined by the content distribution system according to its own operation policy in the early stage, for example, the content distribution system Q presets 5 account levels of S-level, a-level, B-level, C-level and D-level, and determines all distribution accounts corresponding to the S-level as reference accounts of the content distribution system Q, that is, the reference account list includes all distribution accounts of the S-level in the content distribution system Q, accounts registered in the content distribution system Q for various authoritative media such as people' S daily report, south weekend, and central news, and account numbers with known major numbers such as XX and visual and the like in some industries, and further, accounts which are rich in distribution in the vertical field can be determined to be class-A accounts, it should be noted that the grades of the accounts are not invariable, the reference account is usually determined by an operation strategy, and for other distribution accounts such as class-A accounts and class-B accounts, the grades of the accounts which can grow rapidly are determined in a content distribution system by the originality and the platform distribution condition of the content of the text, wherein the platform distribution condition of the content of the text comprises user complaints and report feedback, therefore, in some embodiments, a server can calculate the originality of the content of the distribution accounts based on the account authentication grade corresponding to the reference account, the similarity between the content to be distributed and each historical content, the distribution information and the updated account identification, wherein the updated account identification, the distribution information and the content repeatability of the content to be distributed can be fused by adopting a calculation mode of norm, the content originality of the distribution account number is obtained, and norm (norm) is a basic concept in mathematics. In functional analysis, it is defined in a normalized linear space and satisfies a certain condition, i.e., non-negativity; homogeneity; the triangle inequality. It is often used to measure the length or size of each vector in a certain vector space (or matrix). The most commonly used norm is the p-norm.
The specific definition is as follows:
if x is ═ x1,x2,...xn]TThen, the process of the present invention,
Figure BDA0002516461510000171
it can be verified that the p-norm does satisfy the definition of norm. Where the proof of the triangular inequality is not trivial, this conclusion is commonly referred to as the Minkowski (Minkowski) inequality. When p takes 1, 2 or ∞ the following are the simplest cases, respectively:
1-norm: | x | | ═ x |1|+|x2|+…+|xn|
2-norm:
Figure BDA0002516461510000181
infinity-norm: | x | non-conducting phosphor=max(|x1|+|x2|+…+|xn|)
Using a 3 norm here, which is equivalent to taking p to 3 here, yields:
S=(Scopy+1)1×(Saccount+1)1×(Sp+1)0.5
wherein, the content originality degree S of the distribution account is equal to the updated account mark SaccountAnd duplication of distribution content ScopyAnd distribution information SpAnd multiplying by corresponding weights respectively, wherein the weight corresponding to each parameter can be set according to actual conditions, which is not described herein.
207. And when the calculated content originality is greater than a preset value, the server distributes the content to be distributed.
For example, for a content distribution system with rich original content, the preset value may be set to 1, and for a content distribution system lacking original content (i.e., with a small number of original content), the preset value may be set to 10, and when the content originality of the distribution account is greater than the preset value, the server distributes the content to be distributed.
To facilitate a further understanding of the content distribution scheme of the present application, please refer to fig. 2b, which illustrates a flowchart of a method and system for modeling a self-media transport account based on unsupervised machine learning. On a main process link of self-media production and text distribution, carrying degree ranking of the account is identified by calling carrying account identification service, and then different application strategies are adopted according to different scenes. Different self-media platforms usually have self-oriented user groups, self-media accounts entering a system are assigned with an account grade (such as five grades of S-5, A-4, B-3, C-2 and D-1), the grade is usually determined by the platform according to an operation strategy of the platform at an early stage, and a head account white list is formed, such as various authoritative media accounts of people' S daily news, southern weekends, central watching news and the like are S grades. There are also some industry-wide original accounts such as XX, visual, etc. that are rated S. There are also some accounts such as XX entertainment and X-reading book originality, which also affect the good positioning to 4-level accounts in the vertical domain. The grade of the account is not invariable, the top-known large name is usually determined by an operation strategy, and the account capable of growing rapidly in the middle is determined by the originality and the distribution condition of the content of the text in the platform, including user complaints and reporting feedback, wherein the existing grade result is mainly used for identifying whether to carry the new text and the quantitative degree of carrying the new text. The identification result of account number transportation can be used in the following scenes: (1) when the original account is not created, the right of the transport account is reduced or the distribution is limited or even the distribution is cancelled when the distribution is recommended; (2) in order to protect the interests of the author of the original account, for identifying the content issued by the transport account, after distribution is started, if the original account subsequently issues the same content, the transport account content is withdrawn, and the flow is given to the original account; (3) reducing the granularity of incentive of subsidy of the transport account according to the transport degree, or canceling subsidy and incentive of the transport account according to the operation strategy of the platform, and limiting the text of the transport account; (4) on the content auditing link, because of limited auditing resources, in order to enable the content of the original head account to be processed and distributed as soon as possible, the transport account is placed at the end of auditing scheduling during auditing scheduling. The various scenes need to accurately identify and sort the transport account numbers.
The main functions of the individual service modules in fig. 2b are described below as follows:
c-end publishing system or web publishing system (production end) and content consumption end
(1) A content producer of PGC or UGC, MCN or PUGC provides graphic and text content or uploading video content provided by a local or web publishing system through a mobile end or a backend interface API system, wherein the graphic and text content or the uploading video content comprises short videos and small videos which are main content sources for distributing the content;
(2) through the communication with the uplink and downlink content interface server, the interface address of the uploading server is firstly obtained, and then the content is released;
(3) as a consumer, communicating with the uplink and downlink content interface servers to obtain index information of access content, then communicating with the uplink and downlink content interface servers and the content export service to directly consume the content, and obtaining content index through Feeds recommendation and distribution on the premise of consumption;
(4) the Feeds and user clicking behavior and environment reporting module is used for collecting the current network environment of the user, the clicking operation behavior of the user on the Feeds intermediate information and the exposure data of the Feeds content and reporting the data to the statistical reporting interface server;
(5) if the video content reports that the video is played for too long, the buffering time and various interactive behaviors of the content such as forwarding, sharing, collecting, praise and the like are adopted.
Second, uplink and downlink content interface server and content export service
(1) Directly communicating with a content production end, storing the content submitted from the front end, which is usually the title, the publisher, the abstract, the cover page picture and the publishing time of the content, in a database;
(2) the content delivery service and the recommendation and distribution system permit to obtain a recommendation and distribution result, and the result is issued to the consumption end and displayed in a Feeds list of the user;
(3) a content export service is typically a set of access services deployed geographically nearby in the vicinity of a user;
(4) the method comprises the steps that account number sources of publishers are imported and exported in a content storage, and the initial audit account number level of the account number is set through operation configuration and is mainly closely related to an operation strategy;
(5) simultaneously reporting the message sending flow information of each account number to a statistical interface server, wherein the message sending flow information comprises message sending time and content types, and simultaneously storing content marking information provided by a number owner, such as classification, labels, selected cover pictures and titles as expansion information in a content database;
content database
(1) The key point is the meta information of the content, such as the size, a cover map link, a title, the release time, an account number author, a source channel and warehousing practice, and the classification of the content in the manual review process (including first, second and third level classification and label information, such as an article explaining Hua as a mobile phone, first level department is science and technology, second level classification is a smart phone, third level classification is a domestic phone, label information is Hua as Hua, mate 30);
(2) the information in the content database can be read in the process of manual review, and meanwhile, the result and the state of the manual review can be returned to the content database for storage, and the result of the manual review is also an important basis for subsequently measuring the efficiency of the algorithm filtering model;
(3) the content processing in the whole business process mainly comprises machine processing and manual review processing, the content marking content library is divided into different content pools according to different content marking content libraries, a recommendation distribution server and a rearrangement server are recommended, and the content characteristic modeling service needs to acquire content from the content database. For example, the image-text re-warehousing server loads the contents which have been warehoused and started for a period of time (such as one week) in the past according to the business requirements, adds a filtering mark to the contents which are warehoused repeatedly and is not provided for the content recommendation service any more, and outputs the contents to the user;
(4) the duplicate removal service and the transport account number identification service are machine processing processes, and processing results are stored in a content database;
fourth, dispatching center
(1) The method comprises the steps that the method is responsible for the whole scheduling process of content circulation, the contents stored in a warehouse are received through an uplink and downlink content interface server, and then meta information of the contents is obtained from a content database;
(2) the scheduling and duplicate removal server is used for marking and filtering the content repeatedly put in storage and synchronously sending duplicate removal flow information to the carrying characteristic mining model module as input;
(3) scheduling a transport account identification service, evaluating and calculating the transport (manually marked and authenticated as original accounts can exempt from passing through the process) score ranking of each messaging account, and using the score ranking in the practical application scenes of subsequent manual review scheduling or distribution process right reduction and the like;
(4) for contents which cannot be processed by the machine, such as politics sensitivity and safety problems needing manual review, a manual review system is called to perform manual review processing;
fifth, the service system is audited manually
(1) The need to read the original information of the video content itself in the content database is usually a complex system developed based on web database, mainly to ensure that the pushed content meets the access allowed by local laws and policies, such as if it relates to pornography, gambling, political sensitive features, and to perform a round of preliminary filtering;
(2) the audited content comes from the active release of the self-media and the acquisition of the web crawler from the public network;
(3) the result of the audit is written into the content database through the dispatching center;
sixth, heavy-load elimination service
(1) The communication with the content scheduling server mainly comprises title duplication removal, picture duplication removal of a cover picture, content text duplication removal and video fingerprint and audio fingerprint duplication removal, the title and the text of the picture content are vectorized, a Simhash and BERT text vector is adopted, the picture vector duplication removal is carried out, the video fingerprint and the audio fingerprint are extracted for the video content to construct vectors, then the distance between the vectors such as the Euclidean distance is calculated to determine whether the duplication is repeated, the method can be introduced by a separate invention and scheme, and is not the key point of the invention, and the invention mainly utilizes the judgment result;
(2) communicating with a carrying characteristic model mining module and providing original information of the weight-removing running water;
seventh, statistics reporting interface server
(1) Receiving the current network environment of a content consumption end user, the clicking operation behavior of the user on the Feeds intermediate information and the report of exposure data of the Feeds article;
(2) writing the reported statistical data result into a statistical database;
(3) account text original flowing water 'reported by content production entry is received'
Eight-carrying characteristic model excavation
(1) According to the specific unsupervised model described above, account conflict characteristics, plagiarism characteristics and perpendicularity characteristics are constructed through content processing.
(2) The modeled content data is used for statistical database and duplicate removal service by reading content metadata in a content database.
Nine-transport account identification service
(1) The quantitative evaluation of the transport account is carried out by engineering realization of the characteristic result mined by the transport characteristic model, and the core is the fusion of transport account identification;
(2) the method comprises the steps of working with a dispatching center service to finish the carrying grade identification mark of a text sending account;
statistics database
(1) Receiving statistical data report of a content consumption end, and providing data support for subsequent statistical analysis and mining;
(2) and receiving the message flow report of the content production end.
The method comprises the steps that after a server acquires content to be distributed, a distribution account and a content type corresponding to the content issued by the distribution account, the server determines distribution information of the distribution account on the content based on the content type, then the server collects a reference account list and historical content issued by each reference account in the reference account list in a historical time period from a content distribution system to which the distribution account belongs, then the server respectively extracts content authentication information of the historical content and content authentication information of the content to be distributed, next, the server updates account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, and finally, the server calculates content originality of the distribution account based on the distribution information, similarity between the content to be distributed and each historical content and the updated account identification, when the calculated content originality is greater than the preset value, the content to be distributed is distributed, the server provided by the application can update the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, and calculate the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification, and when judging whether the content to be distributed is the original content, the content authentication information of the historical content and the content authentication information of the content to be distributed are taken into consideration, so that the content originality of the distribution account obtained by subsequent calculation is more accurate, manual intervention is not needed in the whole process, the waste of human resources is reduced, the content checking efficiency is improved, and the content distribution efficiency is improved.
In order to better implement the content distribution method of the present application, the present application further provides a content distribution apparatus (distribution apparatus for short) based on the foregoing content distribution method. Wherein the noun has the same meaning as in the content distribution method described above, and the details of the implementation can be referred to the description in the method embodiment.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a content distribution apparatus provided in the present application, where the distribution apparatus may include an obtaining module 301, a determining module 302, a collecting module 303, a calculating module 304, and a distributing module 305, and specifically may be as follows:
the obtaining module 301 is configured to obtain content to be distributed, a distribution account, and a content type corresponding to a content issued by the distribution account.
For example, specifically, the obtaining module 301 may obtain, through an access network interface, content types corresponding to content to be distributed, a distribution account, and content published by the distribution account.
A determining module 302, configured to determine distribution information of the distribution account on the content based on the content type and the number of the content released by the distribution account.
For example, specifically, the determining module 302 collects the number of the content released by the distribution account, for example, the distribution account total releases 10 articles, where the article types of 3 articles belong to the military class, the article types of 2 articles belong to the life class, and the article types of 5 articles belong to the pharmaceutical class, and therefore, the distribution of the distribution account in the content is: military, living and medicine, and are not distributed in other fields.
The acquisition module 303 is configured to acquire the reference account list and historical content issued by each reference account in the reference account list in a historical time period from a content distribution system to which the distribution account belongs;
the reference account refers to a distribution account authenticated by a content distribution system (also referred to as a content distribution platform), and may include an enterprise account and a private account, for example, the enterprise account may be a distribution account of a news media, the private account may be a distribution account of a certain writer, and is specifically determined according to an actual situation, the acquisition module 303 may acquire, from the distribution account content distribution platform, a reference account list and historical content issued by each reference account in a historical time period, where the historical time period may be the past month or the past year, or may be a period of time from when the reference account is registered to the current time point, and is specifically determined according to an actual situation.
The calculating module 304 is configured to calculate the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content, and the account id of the distribution account.
For the same content distribution platform, an account identifier is given to the corresponding distribution account, and is used to identify whether the distribution account is a transport account or an original account, and the account identifier of the distribution account is not always unchanged, the calculation module 304 may update the identifier of the distribution account according to the content to be distributed and the published historical content, and then the calculation module 304 calculates the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content, and the updated account identifier, that is, optionally, in some embodiments, the calculation module 304 may specifically include:
the extraction submodule is used for respectively extracting the content authentication information of the historical content and the content authentication information of the content to be distributed;
the updating submodule is used for updating the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
and the calculating submodule is used for calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification.
The content authentication information comprises content title information and content identification information, the content title information comprises text length information and semantic information of a content title, the content identification information comprises watermark information, time information of content release and the like, and the content identification information is used for marking the content and facilitating subsequent content identification.
For the same content distribution platform, an account identification is given to the belonging distribution account and is used for identifying whether the distribution account is a transport account or an original account, so that the updating submodule can update the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, for example, when the updating submodule determines that the distribution account is the transport account according to the content authentication information of the historical content and the content authentication information of the content to be distributed, the account identification of the distribution account is updated, the updated account identification indicates that the distribution account is the transport account, specifically, the updating submodule can generate the correlation between the distribution account and each reference account in the reference account list according to the content authentication information of the historical content and the content authentication information of the content to be distributed, and then update the account identification based on the correlation, optionally, in some embodiments, the updating sub-module may specifically include:
the generating unit is used for generating the correlation between the distribution account and each reference account in the reference account list according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
and the updating unit is used for updating the account identification of the distribution account based on the correlation degree between the distribution account and each reference account in the reference account list.
Optionally, in some embodiments, the generating unit includes:
the detection subunit is used for detecting whether the content identification information of the historical content is consistent with the content identification information of the content to be distributed;
the first calculating subunit is used for calculating the similarity between the content title information of each historical content and the content title information of the content to be distributed;
and the generating subunit is used for generating the correlation degree between the distribution account and the reference account according to the similarity between the content title information of the historical content and the content title information of the content to be distributed and each detection result.
Optionally, in some embodiments, the generating subunit is specifically configured to: and performing fusion processing on the similarity between the content title information and the content title information of the content to be distributed and each detection result to obtain the correlation between the distribution account and the reference account.
The calculating module 304 may calculate the content originality of the distribution account based on the account authentication level corresponding to the reference account, the similarity between the content to be distributed and each historical content, the distribution information, and the updated account identifier, and optionally, in some embodiments, the calculating module 304 may specifically include:
the acquisition unit is used for acquiring account authentication levels corresponding to all reference accounts in the reference account list in the content distribution system to which the reference accounts belong;
the computing unit is used for computing the similarity between the content to be distributed and each historical content based on the account authentication level corresponding to each reference account in the reference account list;
and the fusion unit is used for fusing the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification to obtain the content originality of the distribution account.
Optionally, in some embodiments, the fusion unit is specifically configured to: and generating the content repetition degree of the content to be distributed according to the similarity between the content to be distributed and each historical content, and fusing the updated account identification, the distribution information and the content repetition degree of the content to be distributed to obtain the content originality degree of the distribution account.
Optionally, in some embodiments, the computing unit may specifically include:
the determining subunit is used for determining the weight corresponding to each historical content according to the account authentication level corresponding to each reference account in the reference account list;
and the second calculating subunit is used for calculating the similarity between the content to be distributed and each historical content based on the weight corresponding to each historical content.
Optionally, in some embodiments, the second calculating subunit is specifically configured to: vectorizing the content to be distributed and each historical content respectively to obtain a first vector corresponding to the content to be distributed and a second vector corresponding to each historical content, calculating the distance between the first vector and each second vector respectively, and determining the similarity between the content to be distributed and each historical content based on the distance between the first vector and each second vector and the weight corresponding to each historical content.
A distribution module 305, configured to distribute the content to be distributed when the calculated content originality is greater than a preset value.
The preset value is preset, and the distribution module 305 may specifically perform setting according to a policy in the content distribution system, for example, for a content distribution system with rich original content, the preset value may be set to 1, and for a content distribution system with lack of original content (i.e., a small number of original content), the preset value may be set to 10, and when the content originality of the distribution account is greater than the preset value, the content to be distributed is distributed.
After an obtaining module 301 obtains content to be distributed, a distribution account and a content type corresponding to the content already issued by the distribution account, a determining module 302 determines distribution information of the distribution account on the content based on the content type, then an acquisition module 303 acquires historical content issued by each reference account in a reference account list and the reference account list in a historical time period from a content distribution system to which the distribution account belongs, and finally a calculation module 304 calculates content originality of the distribution account based on the distribution information, similarity between the content to be distributed and each historical content and an account identification of the distribution account, and a distribution module 305 distributes the content to be distributed when the calculated content originality is greater than a preset value. And calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification, and when judging whether the content to be distributed is the original content, considering the content authentication information of the historical content and the content authentication information of the content to be distributed, so that the content originality of the distribution account obtained by subsequent calculation is more accurate, the whole process does not need manual intervention, the waste of manpower resources is reduced, the content auditing efficiency is improved, and the content distribution efficiency is further improved.
In addition, the present application also provides an electronic device, as shown in fig. 4, which shows a schematic structural diagram of the electronic device related to the present application, specifically:
the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
the content distribution method includes the steps that content to be distributed, distribution accounts and content types corresponding to content already issued by the distribution accounts are obtained, distribution information of the distribution accounts on the content is determined based on the content types and the number of the content already issued by the distribution accounts, historical content issued by reference accounts in a reference account list and the reference account list in a historical time period is collected from a content distribution system to which the distribution accounts belong, the content originality of the distribution accounts is calculated based on the distribution information, the similarity between the content to be distributed and each historical content and account identification of the distribution accounts, and the content to be distributed is distributed when the calculated content originality is larger than a preset value.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
After acquiring content to be distributed, a distribution account and a content type corresponding to the content already issued by the distribution account, determining distribution information of the distribution account on the content based on the content type and the number of the content already issued by the distribution account, then collecting historical content issued by each reference account in a reference account list and the reference account list in a historical time period from a content distribution system to which the distribution account belongs, and finally calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account, and when the calculated content originality is greater than a preset value, distributing the content to be distributed. When judging whether the content to be distributed is original content, not only the similarity among the historical contents and the account identification of the distribution account are considered, but also the distribution information of the distribution account on the content is considered, so that the content originality of the distribution account obtained by subsequent calculation is more accurate, in addition, the whole process does not need manual intervention, the waste of manpower resources is reduced, the content auditing efficiency is improved, and the content distribution efficiency is further improved.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium having stored therein a plurality of instructions that can be loaded by a processor to perform the steps of any of the content distribution methods provided herein. For example, the instructions may perform the steps of:
the content distribution method includes the steps that content to be distributed, distribution accounts and content types corresponding to content already issued by the distribution accounts are obtained, distribution information of the distribution accounts on the content is determined based on the content types and the number of the content already issued by the distribution accounts, historical content issued by reference accounts in a reference account list and the reference account list in a historical time period is collected from a content distribution system to which the distribution accounts belong, the content originality of the distribution accounts is calculated based on the distribution information, the similarity between the content to be distributed and each historical content and account identification of the distribution accounts, and the content to be distributed is distributed when the calculated content originality is larger than a preset value.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any content distribution method provided by the present application, the beneficial effects that can be achieved by any content distribution method provided by the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.
The content distribution method, device, electronic device and storage medium provided by the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A content distribution method, comprising:
acquiring content to be distributed, a distribution account and a content type corresponding to the content issued by the distribution account;
determining distribution information of the distribution account on the content based on the content type and the number of the distributed contents of the distribution account;
acquiring a reference account list and historical contents issued by each reference account in the reference account list within a historical time period from a content distribution system to which the distribution account belongs;
calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account;
and when the calculated content originality is greater than a preset value, distributing the content to be distributed.
2. The method according to claim 1, wherein the calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content, and the account id of the distribution account comprises:
respectively extracting content authentication information of the historical content and content authentication information of the content to be distributed;
updating the account identification of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
and calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification.
3. The method according to claim 2, wherein the updating the account id of the distribution account according to the content authentication information of the historical content and the content authentication information of the content to be distributed includes:
generating the correlation degree between the distribution account and each reference account in the reference account list according to the content authentication information of the historical content and the content authentication information of the content to be distributed;
and updating the account identification of the distribution account based on the correlation degree between the distribution account and each reference account in the reference account list.
4. The method according to claim 3, wherein the content authentication information includes content title information and content identification information, and the generating a correlation between the distribution account and each reference account in the reference account list according to the content authentication information of the historical content and the content authentication information of the content to be distributed includes:
detecting whether the content identification information of the historical content is consistent with the content identification information of the content to be distributed;
calculating the similarity between the content title information of each historical content and the content title information of the content to be distributed;
and generating the correlation degree between the distribution account and the reference account according to the similarity between the content title information of the historical content and the content title information of the content to be distributed and each detection result.
5. The method according to claim 4, wherein the generating the correlation between the distribution account and the reference account according to the similarity between the content title information of the historical content and the content title information of the content to be distributed and each detection result comprises:
and performing fusion processing on the similarity between the content title information and the content title information of the content to be distributed and each detection result to obtain the correlation between the distribution account and the reference account.
6. The method according to claim 2, wherein the calculating the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content, and the updated account id comprises:
acquiring account authentication levels corresponding to all reference accounts in the reference account list in the content distribution system to which the reference accounts belong;
calculating the similarity between the content to be distributed and each historical content based on the account authentication level corresponding to each reference account in the reference account list;
and fusing the distribution information, the similarity between the content to be distributed and each historical content and the updated account identification to obtain the content originality of the distribution account.
7. The method according to claim 6, wherein the fusing the distribution information, the similarity between the content to be distributed and each historical content, and the updated account id to obtain the content originality of the distribution account includes:
generating content repetition degrees of the contents to be distributed according to the similarity between the contents to be distributed and each historical content;
and fusing the updated account identification, the distribution information and the content repetition degree of the content to be distributed to obtain the content originality degree of the distribution account.
8. The method according to claim 6, wherein the calculating the similarity between the content to be distributed and each historical content based on the account authentication level corresponding to each reference account in the reference account list comprises:
determining the weight corresponding to each historical content according to the account authentication level corresponding to each reference account in the reference account list;
and calculating the similarity between the content to be distributed and each historical content based on the weight corresponding to each historical content.
9. The method according to claim 8, wherein the calculating the similarity between the content to be distributed and each historical content based on the weight corresponding to each historical content comprises:
vectorizing the content to be distributed and each historical content respectively to obtain a first vector corresponding to the content to be distributed and a second vector corresponding to each historical content;
respectively calculating the distance between the first vector and each second vector;
and determining the similarity between the content to be distributed and each historical content based on the distance between the first vector and each second vector and the weight corresponding to each historical content.
10. A content distribution apparatus, characterized by comprising:
the acquisition module is used for acquiring the content to be distributed, the distribution account and the content type corresponding to the content issued by the distribution account;
the determining module is used for determining the distribution information of the distribution account on the content based on the content type and the number of the contents published by the distribution account;
the acquisition module is used for acquiring a reference account list and historical contents issued by each reference account in the reference account list within a historical time period from a content distribution system to which the distribution account belongs;
the computing module is used for computing the content originality of the distribution account based on the distribution information, the similarity between the content to be distributed and each historical content and the account identification of the distribution account;
and the distribution module is used for distributing the content to be distributed when the calculated content originality is greater than a preset value.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the content distribution method according to any of claims 1-9 are implemented when the program is executed by the processor.
12. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the content distribution method according to any one of claims 1 to 9.
CN202010478221.5A 2020-05-29 2020-05-29 Content distribution method, content distribution device, electronic equipment and storage medium Pending CN111639291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010478221.5A CN111639291A (en) 2020-05-29 2020-05-29 Content distribution method, content distribution device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010478221.5A CN111639291A (en) 2020-05-29 2020-05-29 Content distribution method, content distribution device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111639291A true CN111639291A (en) 2020-09-08

Family

ID=72332247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010478221.5A Pending CN111639291A (en) 2020-05-29 2020-05-29 Content distribution method, content distribution device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111639291A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112104642A (en) * 2020-09-11 2020-12-18 腾讯科技(深圳)有限公司 Abnormal account number determination method and related device
CN112153426A (en) * 2020-09-21 2020-12-29 腾讯科技(深圳)有限公司 Content account management method and device, computer equipment and storage medium
CN112989167A (en) * 2021-04-15 2021-06-18 腾讯科技(深圳)有限公司 Method, device and equipment for identifying transport account and computer readable storage medium
CN113010644A (en) * 2021-03-23 2021-06-22 腾讯科技(深圳)有限公司 Method and device for identifying media information, storage medium and electronic equipment
CN113360657A (en) * 2021-06-30 2021-09-07 安徽商信政通信息技术股份有限公司 Intelligent document distribution and handling method and device and computer equipment
CN114124490A (en) * 2021-11-11 2022-03-01 北京搜房科技发展有限公司 Method and device for releasing new media content, storage medium and electronic equipment
CN115730111A (en) * 2021-09-01 2023-03-03 腾讯科技(深圳)有限公司 Content distribution method, device, equipment and computer readable storage medium
CN117891929A (en) * 2024-03-18 2024-04-16 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112104642A (en) * 2020-09-11 2020-12-18 腾讯科技(深圳)有限公司 Abnormal account number determination method and related device
CN112104642B (en) * 2020-09-11 2021-12-28 腾讯科技(深圳)有限公司 Abnormal account number determination method and related device
CN112153426B (en) * 2020-09-21 2023-08-29 腾讯科技(深圳)有限公司 Content account management method and device, computer equipment and storage medium
CN112153426A (en) * 2020-09-21 2020-12-29 腾讯科技(深圳)有限公司 Content account management method and device, computer equipment and storage medium
CN113010644A (en) * 2021-03-23 2021-06-22 腾讯科技(深圳)有限公司 Method and device for identifying media information, storage medium and electronic equipment
CN112989167A (en) * 2021-04-15 2021-06-18 腾讯科技(深圳)有限公司 Method, device and equipment for identifying transport account and computer readable storage medium
CN113360657B (en) * 2021-06-30 2023-10-24 安徽商信政通信息技术股份有限公司 Intelligent document distribution handling method and device and computer equipment
CN113360657A (en) * 2021-06-30 2021-09-07 安徽商信政通信息技术股份有限公司 Intelligent document distribution and handling method and device and computer equipment
CN115730111A (en) * 2021-09-01 2023-03-03 腾讯科技(深圳)有限公司 Content distribution method, device, equipment and computer readable storage medium
CN115730111B (en) * 2021-09-01 2024-02-06 腾讯科技(深圳)有限公司 Content distribution method, apparatus, device and computer readable storage medium
CN114124490A (en) * 2021-11-11 2022-03-01 北京搜房科技发展有限公司 Method and device for releasing new media content, storage medium and electronic equipment
CN114124490B (en) * 2021-11-11 2023-11-24 北京搜房科技发展有限公司 Method and device for publishing new media content, storage medium and electronic equipment
CN117891929A (en) * 2024-03-18 2024-04-16 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm
CN117891929B (en) * 2024-03-18 2024-05-17 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm

Similar Documents

Publication Publication Date Title
CN111639291A (en) Content distribution method, content distribution device, electronic equipment and storage medium
CN111324774B (en) Video duplicate removal method and device
Abdullah et al. Fake news classification bimodal using convolutional neural network and long short-term memory
CN111885399B (en) Content distribution method, device, electronic equipment and storage medium
CN110598070B (en) Application type identification method and device, server and storage medium
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
KR102144126B1 (en) Apparatus and method for providing information for enterprise
CN112905868A (en) Event extraction method, device, equipment and storage medium
US11886556B2 (en) Systems and methods for providing user validation
CN113111369B (en) Data protection method and system in data annotation
CN112165639B (en) Content distribution method, device, electronic equipment and storage medium
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CA3167569A1 (en) Systems and methods for determining entity attribute representations
CN112231563A (en) Content recommendation method and device and storage medium
CN110826315B (en) Method for identifying timeliness of short text by using neural network system
CN113011126B (en) Text processing method, text processing device, electronic equipment and computer readable storage medium
CN113986660A (en) Matching method, device, equipment and storage medium of system adjustment strategy
CN112579771B (en) Content title detection method and device
CN113609866A (en) Text marking method, device, equipment and storage medium
CN115131052A (en) Data processing method, computer equipment and storage medium
US11163761B2 (en) Vector embedding models for relational tables with null or equivalent values
CN112989167B (en) Method, device and equipment for identifying transport account and computer readable storage medium
CN116522131A (en) Object representation method, device, electronic equipment and computer readable storage medium
Juan Resource cache sharing system of education information center network based on internet of things
CN112507912A (en) Method and device for identifying illegal picture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40028899

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination