CN115221307A - Article identification method and device, computer equipment and storage medium - Google Patents

Article identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115221307A
CN115221307A CN202110406839.5A CN202110406839A CN115221307A CN 115221307 A CN115221307 A CN 115221307A CN 202110406839 A CN202110406839 A CN 202110406839A CN 115221307 A CN115221307 A CN 115221307A
Authority
CN
China
Prior art keywords
article
information
target
sample
evaluation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110406839.5A
Other languages
Chinese (zh)
Inventor
曾瞾
马连洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110406839.5A priority Critical patent/CN115221307A/en
Publication of CN115221307A publication Critical patent/CN115221307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an article identification method, an article identification device, computer equipment and a storage medium, which relate to the technical field of artificial intelligence, and the method can acquire article information of a target article; inputting article information of a target article into an article evaluation model to predict to obtain first information and second information of the target article, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rate of a large disk, and the second information represents the probability that the single-access reading length of the target article is higher than the average value of the single-access reading length of the large disk; and generating a recognition result of the target article based on the first information and the second information of the target article. The article identification result is generated without depending on the subjective experience of workers for screening, the influence of the subjective experience of workers is avoided, the distance between the article sample label and the interest points really liked by the actual user is shortened, and the accuracy of identifying the articles of the citizen is improved.

Description

Article identification method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technology, and more particularly, to an article identification method, apparatus, computer device, and storage medium.
Background
A large number of articles are stored in an information flow scene every day, and the quality of the articles is different. In an information flow recommendation scene, it is desirable that high-quality articles newly put in a warehouse are screened out to have multiple exposures as much as possible, so that the exposure Click Rate (CTR) is improved, and the efficiency of a recommendation system is improved.
In the prior art, articles attracting users need to be screened out depending on the manual subjective experience of workers, and the subjective experience of the workers is not short of the interest points really liked by the actual users, so that the problem of low identification accuracy of the articles attracting the users usually exists.
Disclosure of Invention
In view of the above, in order to solve the above problems, the present invention provides an article identification method, an article identification device, a computer device, and a storage medium, so as to improve accuracy of identifying an article attracting a user, and the technical solution is as follows:
an article identification method comprising:
acquiring article information of a target article, wherein the article information comprises a label representing article content of the target article and any one or more of titles of the target article;
through an article evaluation model, predicting to obtain first information and second information of the target article based on the article information, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rate of the large disc, and the second information represents the probability that the single-access reading time length of the target article is higher than the average value of the single-access reading time length of the large disc;
generating a recognition result of the target article based on the first information and the second information of the target article, wherein the recognition result represents that the target article attracts a user or the target article does not attract the user;
the article evaluation model is obtained by determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample based on the article sample, and training the article evaluation model to be trained by taking the predicted first information approaching to a target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
An article recognition apparatus comprising:
the article information acquisition unit is used for acquiring article information of a target article, wherein the article information comprises any one or more of a label representing article content of the target article and a title of the target article;
the information prediction unit is used for predicting to obtain first information and second information of the target article based on the article information through an article evaluation model, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rates of the large disks, and the second information represents the probability that the single-access reading length of the target article is higher than the average value of the single-access reading lengths of the large disks;
a recognition result generating unit, configured to generate a recognition result of the target article based on first information and second information of the target article, where the recognition result indicates that the target article attracts a user or that the target article does not attract a user;
the article evaluation model is obtained by determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample based on the article sample, and training the article evaluation model to be trained by taking the predicted first information approaching to a target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
A computer device, comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory is used for storing programs, and the programs are used for realizing the article identification method.
A computer-readable storage medium having stored thereon a computer program which, when loaded and executed by a processor, carries out the steps of the article identification method.
The application provides an article identification method, an article identification device, computer equipment and a storage medium, article information of a target article is obtained, and the article information comprises any one or more of a label representing article content of the target article and a title of the target article; inputting article information of a target article into an article evaluation model to predict to obtain first information and second information of the target article, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rate of a large disk, and the second information represents the probability that the single-access reading length of the target article is higher than the average value of the single-access reading length of the large disk; and generating a recognition result of the target article based on the first information and the second information of the target article. The article identification result is generated without depending on the subjective experience of workers for screening, the influence of the subjective experience of workers is avoided, the distance between the article sample label and the interest points really liked by the actual user is shortened, and the accuracy of identifying the articles of the citizen is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an article determination method provided in an embodiment of the present application;
fig. 2 is a flowchart of an article evaluation model generation method provided in the embodiment of the present application;
fig. 3 is a flowchart of a method for determining target first information carried by a sample of the article according to an embodiment of the present application;
fig. 4 is a flowchart of a method for determining second information of a target carried by an article sample according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an article recognition model generation method according to an embodiment of the present application;
FIG. 6 is a flowchart of an article evaluation model optimization method provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of an article evaluation model optimization method according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an article determining apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of a hardware structure of a computer device to which an article determination method according to an embodiment of the present application is applied.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A large number of articles are stored every day in an information flow scene, and the quality of the articles is different. In the information flow recommendation scene, it is desirable that high-quality articles newly put in a warehouse are screened out to have multiple exposures as much as possible, so that the exposure click rate (CTR) is improved, and the efficiency of a recommendation system is improved.
In the prior art, articles attracting users need to be screened out depending on the manual subjective experience of workers, and the subjective experience of the workers is not short of the interest points really liked by the actual users, so that the problem of low identification accuracy of the articles attracting the users usually exists.
In order to improve the identification accuracy of articles attracting users, the inventor firstly provides a high-quality article identification scheme based on article titles, and the scheme is that attractive title standards are manually formulated; and under the guidance of artificial priori knowledge, marking the samples and constructing a classification model.
However, the inventor finds that the solution of the attraction article is difficult to grasp the interest point of the title to the user, and the user often has a short distance from the interest point that the user really likes due to the difficulty in screening out the title that is attracted to the user through the standard summarized by manual experience. And in the actual content production, the interest points of the users can be deviated along with the current hot events, and the styles of the titles that the users like are changed to some extent along with the development and expression trend of the titles. The deviation of user interest points and the development of title styles easily cause the degradation of classification models, and the current method for manually screening titles is difficult to improve the click rate, the viewing duration and other indexes on the business side.
Therefore, the inventor of the present application further proposes an article identification method for screening articles with good titles from a massive article library, and the key technical points of the method mainly include: 1) An article evaluation model fused with high-conversion keywords is provided, so that various posterior indexes can be combined, and multiple condition posterior constraints are met; 2) A system for constructing a whole set of training samples to model training and automatically updating and iterating the model is provided.
The article identification method mainly relates to the following two points: 1) A method for automatically screening attractive articles based on an article evaluation model is provided; 2) By designing the automatic updating iteration process of the article evaluation model, the model can be ensured not to be degraded in the process of the article evaluation model along with the time, and the favorite article content of the user can still be learned.
A particular application in the information flow platform is to serve push scenes of premium articles. For example, in information flow scenes such as a QQ viewpoint, a quick report every day, a browser and the like, some high-quality articles are pushed to specific users every day, the articles are often current and new articles, and how to screen articles with high duration and high click conversion from a large number of articles is a breakthrough capable of improving recommendation efficiency. According to the method and the device, the model for screening the high-quality articles by multiple targets is built, and the automatic updating process of the model is added, so that the model can continuously and efficiently screen the high-quality titles.
In the embodiment of the application, the high-quality article can be regarded as the article attracting the user, and the article attracting the user can be regarded as the article with high duration and high click conversion. The embodiment of the application provides an article identification method, which realizes identification of whether an article is an article attracting a user or not depending on a pre-trained article evaluation model.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
Fig. 1 is a flowchart of an article identification method according to an embodiment of the present application.
As shown in fig. 1, the method includes:
s101, obtaining article information of a target article, wherein the article information comprises any one or more of a label representing article content of the target article and a title of the target article;
in the embodiment of the present application, for convenience of distinguishing, an article to be determined whether the article is a high-quality article may be temporarily referred to as a target article. Accordingly, when determining whether the target article is a high-quality article (i.e., when determining whether the target article is an article attracting the user), article information of the target article needs to be acquired, where the article information of the target article includes any one or more of a tag representing article content of the target article and a title of the target article.
Taking the example that the article information of the target article includes a tag representing the article content of the target article, the obtaining of the article information of the target article includes: and determining a preset tag white list, and determining tags matched with the article content of the target article from the tag white list.
In the embodiment of the application, the tag white list comprises at least one tag; and for each label in at least one label, determining whether the label is matched with the article content of the target article, and if the label is matched with the article content of the target article, determining the label as the label representing the article content of the target article.
Illustratively, the at least one tag is related to an entity of the article content, for example, the at least one tag includes a general name, a guest, an author, and so on, which are merely preferred contents of the at least one tag provided in the embodiments of the present application, and specific contents related to the at least one tag may be set by a person skilled in the art according to his or her own needs, and are not limited herein.
S102, predicting to obtain first information and second information of a target article based on article information through an article evaluation model, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rate of a large disk, and the second information represents the probability that the single-access reading length of the target article is higher than the average value of the single-access reading length of the large disk;
in the embodiment of the application, the article evaluation model is obtained by determining the first information and the second information predicted by the article evaluation model to be trained according to the article information of the article sample based on the article sample, and training the article evaluation model to be trained by taking the predicted first information approaching to the target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
For the generation process of the article evaluation model, please refer to the detailed description of fig. 2, which is not repeated herein.
S103, generating an identification result of the target article based on the first information and the second information of the target article, wherein the identification result represents that the target article attracts the user or the target article does not attract the user.
According to the embodiment of the application, the identification result of the target article can be generated based on the first information and the second information of the target article, the identification result of the target article represents that the target article attracts a user or the target article does not attract the user, the target article attracts the user to explain that the target article is an article attracting the user, and the target article does not attract the user to explain that the target article is an article not attracting the user.
If the first information of the target article exceeds a preset first threshold value and the second information of the target article exceeds a preset second threshold value, the identification result of the target article represents that the target article attracts a user; and if the first information of the target article does not exceed a preset first threshold value or the second information of the target article does not exceed a preset second threshold value, the identification result of the target article represents that the target article does not attract the user.
In one implementation manner, the manner of generating the recognition result of the target article based on the first information and the second information of the target article may be: judging whether the first information exceeds a preset first threshold value or not; if the first information exceeds a first threshold value, judging whether the second information exceeds a preset second threshold value; if the second information exceeds a second threshold value, determining that the identification result of the target article represents that the target article attracts the user; and if the first information does not exceed the first threshold value or the second information does not exceed the second threshold value, determining that the identification result of the target article represents that the target article does not attract the user.
For example, the first threshold may be 0.5, and the second threshold may be 0.5, which is just one example of the first threshold and the second threshold provided in the embodiments of the present application, and a person skilled in the art may set the first threshold and the second threshold according to his own needs, which is not limited herein.
Fig. 2 is a flowchart of a method for generating an article evaluation model according to an embodiment of the present application. As shown in fig. 2, the method includes:
s201, article data of the information flow platform in a first historical time period are obtained, the article data indicate at least one article pushed to a user by the information flow platform in the first historical time period, and each article in the at least one article is an article sample;
s202, determining article information, target first information and target second information of the article sample;
s203, inputting the article information of the article sample into the article evaluation model to be trained, determining first information and second information of the article sample predicted by the article evaluation model to be trained according to the article information of the article sample, and training the article evaluation model to be trained by taking the predicted first information of the article sample approaching the target first information of the article sample and the predicted second information of the article sample approaching the target second information of the article sample as a training target to obtain the article evaluation model.
For example, the process of training the article evaluation model to be trained to generate the article evaluation model may be: acquiring an article sample, and determining article information, target first information and target second information of the article sample; the method comprises the steps of inputting article information of an article sample into an article evaluation model to be trained, determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample, taking the predicted first information approaching the target first information of the article sample and the predicted second information approaching the target second information of the article sample as training targets, and training the article evaluation model to be trained to obtain the article evaluation model.
The article sample acquisition mode can be as follows: the article data of the information flow platform in the first historical time period are obtained, the article data indicate at least one article pushed to a user by the information flow platform in the first historical time period, and each article in the at least one article can be regarded as an article sample.
Taking an article sample as an example, a manner of determining the target first information carried by the article sample is shown in fig. 3. Referring to fig. 3, the way of determining the target first information of the article sample may be:
s301, determining the exposure click rate of the article sample;
s302, determining the average value of the exposure click rates of the large disks according to the exposure click rates of all articles in the large disks;
taking an article sample as an example, the method for determining the target first information of the article sample may be: calculating the exposure click rate of the article sample; determining all articles in the large disk, and respectively calculating the exposure click rate of each article in the large disk; determining the average value of the exposure click rate of the large disc according to the exposure click rate of each article in the large disc; and determining the target first information of the article sample according to the average value of the exposure click rate and the large disc exposure click rate of the article sample.
For example, the method for determining the average value of the large disc exposure click rates according to the exposure click rates of the articles in the large disc may be as follows: calculating the sum of the exposure click rates of all the articles in the large disc (for the convenience of distinguishing, the sum of the exposure click rates of all the articles in the large disc can be called a first numerical value), determining the total number of the articles in the large disc (for the convenience of distinguishing, the total number of the articles in the large disc can be called a second numerical value), and dividing the first numerical value by the second numerical value to obtain the average exposure click rate of all the articles in the large disc; the average exposed click rate of all articles in the large disc can be considered as the average exposed click rate of the large disc, i.e., the average of the large disc exposed click rates.
S303, judging whether the exposure click rate of the article sample is not lower than the average value of the large disc exposure click rates; if the exposure click rate of the article sample is not lower than the average value of the exposure click rates of the large disks, executing a step S304; if the exposure click rate of the article sample is lower than the average value of the exposure click rates of the large disks, executing a step S305;
s304, determining target first information of the article sample as a first preset value;
s305, determining the first target information of the article sample as a second preset value.
For example, the first preset value may be 1, the second preset value may be 0, which are merely preferred contents of the first preset value and the second preset value provided in the embodiment of the present application, and specific contents of the first preset value and the second preset value may be set by a person skilled in the art according to his own needs, which is not limited herein.
Taking an article sample as an example, a manner of determining the target second information carried by the article sample is shown in fig. 4. The way of determining the target second information of the article sample with reference to fig. 4 may be:
s401, determining the single access reading time length of the article sample;
s402, determining the average value of the single-time access reading duration of the large disc according to the single-time access reading duration of all articles in the large disc;
taking an article sample as an example, the manner of determining the target second information of the article sample may be: calculating the single visit reading time length of the article sample; determining all articles in the large disc, and respectively calculating the single access reading time of each article in the large disc; determining the average value of the single access reading time length of the large disc according to the single access reading time length of each article in the large disc; and determining the target second information of the article sample according to the average value of the single-access reading duration and the single-access reading duration of the large disc.
For example, the method for determining the average value of the reading duration of the single access of the large disc according to the reading duration of the single access of each article in the large disc may be as follows: calculating the sum of the single-access reading durations of all the articles in the large plate (for the convenience of distinguishing, the sum of the single-access reading durations of all the articles in the large plate can be called a third numerical value), determining the total number of the articles in the large plate (for the convenience of distinguishing, the total number of the articles in the large plate can be called a second numerical value), and dividing the third numerical value by the second numerical value to obtain the average single-access reading duration of all the articles in the large plate; the average single-access reading duration of all articles in the large disc can be regarded as the average single-access reading duration of the large disc, i.e., the average of the single-access reading durations of the large disc.
S403, judging whether the single access reading time length of the article sample is not less than the average value of the single access reading time lengths of the large discs; if the single access reading time length of the article sample is not less than the average value of the single access reading time lengths of the large disks, executing the step S404; if the single access reading time length of the article sample is lower than the average value of the single access reading time lengths of the large disc, executing the step S405;
s404, determining target second information of the article sample as a first preset value;
s405, determining that the target second information of the article sample is a second preset value.
For example, the first preset value may be 1, the second preset value may be 0, which are only preferred contents of the first preset value and the second preset value provided in the embodiments of the present application, and specific contents of the first preset value and the second preset value may be set by a person skilled in the art according to their own needs, which is not limited herein.
In the embodiment of the application, when an article evaluation model is generated in training, each article in at least one article pushed to a user by an information flow platform in a first historical time period is used as an article sample, and the article sample carries two pieces of information, wherein one piece of information is target first information, and the other piece of information is target second information.
Because the first target information of the article sample is determined depending on the relationship between the exposure click rate of the article sample and the average exposure click rate of the large disk, when the article evaluation model trained based on the first target information predicts the target article, the relationship between the exposure click rate of the target article and the average exposure click rate of the large disk can be predicted, specifically, the probability that the exposure click rate of the target article is higher than the average exposure click rate of the large disk is predicted. The target second information of the article sample is determined according to the relation between the single-visit reading time of the article sample and the average single-visit reading time of the large plate, and when the target article is predicted by the article evaluation model trained based on the target second information, the relation between the single-visit reading time of the target article and the average value of the single-visit reading time of the large plate can be predicted, specifically, the probability that the single-visit reading time of the target article is longer than the average value of the single-visit reading time of the large plate is predicted.
One implementation may consider at least one article pushed by an information flow platform to a user within a first historical period of time as a large disk.
Illustratively, when an article evaluation model is trained and generated, at least one article pushed to a user by an information flow platform in a first historical time period is determined, and each article in the at least one article is regarded as an article sample; the method comprises the steps of determining target first information of an article sample according to the relation between the exposure click rate of the article sample and the average exposure click rate of at least one article, determining target second information of the article sample according to the relation between the single access reading duration of the article sample and the average single access reading duration of at least one article, and training an article evaluation model to be trained according to the article sample carrying the target first information and the target second information to obtain the article evaluation model. In this way, when the article evaluation model is used for prediction, the article information of the target article is input into the article evaluation model, so that first information and second information of the target article can be predicted, the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rates of the large disks, and the second information represents the probability that the single-time access reading length of the target article is higher than the average value of the single-time access reading lengths of the large disks.
In another implementation, the article pushed by the information flow platform to the user in the fourth historical time period can be regarded as a large disk, the end time of the first historical time period is earlier than the end time of the fourth historical time period, and the end time of the fourth historical time period is earlier than the start time of training and generating the article evaluation model according to the article sample.
Illustratively, when an article evaluation model is trained and generated, at least one article pushed to a user by an information flow platform in a first historical time period is determined, and each article in the at least one article is regarded as an article sample; the article pushed to the user by the information flow platform in the fourth historical time period is regarded as a large disk; the end time of the fourth historical time period is earlier than the start time of the article evaluation model generated according to the article sample training, and the end time of the fourth historical time period is later than the end time of the first historical time period. Therefore, the articles in the large plate can be closer to the training generation time of the article evaluation model and the latest article preference of the user, and the accuracy of the article identification result of the article evaluation model is further improved.
The article evaluation model generation method provided in the embodiment of the present application is described in detail below with reference to a schematic diagram of the article evaluation model generation method shown in fig. 5.
For example, the application classifies articles of attraction by using a multitask model, specifically, as shown in fig. 5, word2vec encoding is performed on titles of article samples and labels corresponding to the articles, an article evaluation model to be trained, which is constructed based on a Recurrent Neural Network (RNN) -Attention (Attention) mechanism, is input, multitask modeling is performed, and data tagging is performed on the article samples through two posterior data dimensions, namely Label _ ctr and Label _ read _ dur (the two posterior data can be regarded as user reading behavior data).
In fig. 5, label _ CTR is a sample Label constructed according to CTR, and the Label is constructed by the following method: calculating the average ctr of all article samples, if the ctr of an article sample is not lower than the average ctr, marking the label of label _ ctr corresponding to the article sample as 1, if the ctr of the article sample is lower than the average ctr, marking the label of label _ ctr corresponding to the article sample as 0, and considering the label _ ctr corresponding to the article sample as the first target information of the article sample.
In fig. 5, label _ read _ dur is a sample label constructed according to a single access reading duration, and the label is constructed by: and calculating the single access reading time length of each article sample and the average single access reading time length of all the article samples, wherein the label _ read _ dur of the article sample is 1 if the single access reading time length of the article sample is not lower than the average single access reading time length, the label _ read _ dur of the article sample is 0 if the single access reading time length of the article sample is lower than the average single access reading time length, and the label _ read _ dur of the article sample can be regarded as the target second information of the article sample.
Taking an article sample as an example, the reading duration of a single access of the article sample can be obtained by dividing the reading duration of the article sample by the total number of accesses of the article sample.
In the embodiment of the application, each article sample corresponds to two tags, the two tags are respectively target first information and target second information, and a high-quality article screening model (which may also be referred to as an article evaluation model) is constructed by using a Binary Cross Entropy (BCE) loss function as a target for screening high-quality articles from two tag directions.
Correspondingly, in the model prediction process, the scores of the two labels, namely, the label _ ctr and the label _ read _ dur, need to be combined, and when the score of the label _ ctr of the target article is higher than the first threshold and the score of the label _ read _ dur of the target article is higher than the second threshold, the target article can be considered as a good-quality article.
Further, the article identification method provided by the embodiment of the application can optimize the article evaluation model, and the optimization mode is shown in fig. 6.
As shown in fig. 6, the method includes:
s601, training an article evaluation model to be trained by using article data of an information flow platform in a second historical time period to obtain a target article evaluation model, wherein the second historical time period is later than the first historical time period;
s602, predicting the recognition result of each article pushed to the user by the information flow platform in a third history time period by using an article evaluation model to obtain a first recognition result set; the third historical time period is later than the second historical time period;
illustratively, the information flow platform may be a content push platform. The above is only the preferred content of the information flow platform provided in the embodiment of the present application, and the specific content of the information flow platform may be set by a person skilled in the art according to his own needs, which is not limited herein.
According to the method and the device for predicting the article recognition result, the article platform model is used for predicting the recognition result of each article pushed to the user by the information flow platform in the third history time period, and the obtained recognition result of each article forms a first recognition result set.
S603, predicting the recognition result of each article pushed to the user by the information flow platform in a third history time period by using the target article evaluation model to obtain a second recognition result set;
according to the embodiment of the application, the recognition result of each article pushed to the user by the information flow platform in the third history time period is predicted by using the target article platform model, and the obtained recognition result of each article forms a second recognition result set.
S604, determining a standard recognition result of each article pushed to the user by the information flow platform in a third history time period to obtain a third recognition result set;
illustratively, the information flow platform pushes an article to the user in a third history time period, and a standard identification result of the article can be determined according to feedback information of the user to the article; for example, the feedback information representation of the article by the user is disliked, and the standard identification result representation of the article can be determined to not attract the user; the feedback information of the articles is characterized and liked by the user, and the standard identification result of the articles can be determined to characterize the articles to attract the user.
The above is only a preferred way for determining the standard recognition result of the article provided in the embodiment of the present application, and regarding the specific way for determining the standard recognition result of the article, those skilled in the art can set the method according to their own needs, which is not limited herein.
And the information flow platform pushes the standard recognition results of each article to the user in a third history time period to form a third recognition result set.
And S605, if the second recognition result set is closer to the third recognition result set relative to the first recognition result set, updating the article evaluation model to be a target article evaluation model.
Because the third historical time period is later than the second historical time period which is later than the first historical time period, the article sample used in the training process of the target article evaluation model is updated a bit compared with the article evaluation model adopted currently; the article evaluation model and the target article evaluation model are used for predicting the articles pushed to the user by the information flow platform in the third history time period respectively, so that whether the article evaluation model is more suitable for the current situation or the target article evaluation model is more suitable for the current situation can be determined; and when the target article evaluation model is more suitable for the current situation, updating the currently adopted article evaluation model into the target article evaluation model. Therefore, the problem that the accuracy of the recognition result of the article is reduced due to the time lapse, the change of the hot event and the change of the article content liked by the user is solved.
The automatic updating process of the article evaluation model comprises the following steps: with the lapse of time, the hotspot events change, the articles content liked by the user also changes, and in order to prevent the model from degrading when a fresh object is generated, the model is updated and iterated by using a hot update process, so that the performance of the model can be maintained in a better state. The specific flow of the automatic model update is shown in fig. 7 below. When a new model is trained, data of a month is taken, posterior data such as titles, labels, click rates and watching duration are extracted, data of the first 28 days of the month is selected as training data, the data is constructed according to a data label construction mode in fig. 5, and a to-be-trained article evaluation model is trained by using the data of the first 28 days to generate a new article evaluation model. And considering the article evaluation model adopted on the current line as an old article evaluation model, reserving data of nearly two days in the last month as a test set, testing the old article evaluation model and the new article evaluation model in the test set, comparing indexes such as AUC, recall rate, accuracy and the like, and automatically replacing the old article evaluation model on the current line with the new article evaluation model if the indexes corresponding to the new article evaluation model exceed the indexes corresponding to the old article evaluation model.
Compared with the traditional method relying on the prior knowledge of the operation products, the article identification method provided by the embodiment of the application is more suitable for the favorite contents of the user in data construction, and reduces the dependence on a manual labeling data set to a certain extent. And the model automatically updates the iterative flow system, so that the model can be continuously updated and optimized, hot content and classical content which are preferred by a platform user are kept learned for a long time, model degradation is prevented, and the model is continuously updated.
Experiments prove that if the article identification method provided by the embodiment of the application is online in the information flow field scenes such as the view points, the view point daily newspapers and the like, the article identification method is applied to a high-quality content pool in the view point scenes, and the click conversion rate and the number conversion rate of people of the main feeds are improved. The daily newspaper scene is pushed at the viewpoint, and the first conversion rate and the number conversion rate are greatly improved.
Fig. 8 is a schematic structural diagram of an article recognition apparatus according to an embodiment of the present application.
As shown in fig. 8, the apparatus includes:
an article information obtaining unit 801, configured to obtain article information of a target article, where the article information includes any one or more of a tag representing article content of the target article and a title of the target article;
the information prediction unit 802 is configured to obtain first information and second information of the target article through an article evaluation model based on article information prediction, where the first information represents a probability that an exposure click rate of the target article is higher than an average value of the exposure click rates of the large disks, and the second information represents a probability that a single-access reading time length of the target article is higher than an average value of single-access reading time lengths of the large disks;
a recognition result generating unit 803, configured to generate a recognition result of the target article based on the first information and the second information of the target article, where the recognition result represents that the target article attracts the user or that the target article does not attract the user;
the article evaluation model is obtained by training the article evaluation model to be trained based on the article sample by determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample, and taking the predicted first information approaching to a target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
In this embodiment of the present application, preferably, the article information of the target article includes a tag representing article content of the target article; correspondingly, the article information acquiring unit is used for determining a preset tag white list and determining a tag matched with the article content of the target article from the tag white list.
In this embodiment of the application, preferably, the identification result generating unit includes:
the first generating unit is used for determining that the identification result of the target article represents the target article to attract the user if the first information of the target article exceeds a preset first threshold value and the second information of the target article exceeds a preset second threshold value;
and the second generating unit is used for determining that the identification result of the target article represents that the target article does not attract the user if the first information of the target article does not exceed the first threshold value or the second information of the target article does not exceed the second threshold value.
Further, an article recognition apparatus provided in an embodiment of the present application further includes a model generation unit, where the model generation unit includes:
the article data acquisition unit is used for acquiring article data of the information flow platform in a first historical time period, the article data indicates at least one article pushed to a user by the information flow platform in the first historical time period, and each article in the at least one article is an article sample;
the information determining unit is used for determining article information, target first information and target second information of the article sample;
the training unit is used for inputting the article information of the article sample into the article evaluation model to be trained, determining first information and second information of the article sample predicted by the article evaluation model to be trained according to the article information of the article sample, and training the article evaluation model to be trained by taking the predicted first information of the article sample approaching to the target first information of the article sample and the predicted second information of the article sample approaching to the target second information of the article sample as a training target to obtain the article evaluation model.
In this embodiment of the present application, preferably, the information determining unit for determining the target first information of the article sample includes:
the first determining unit is used for determining the exposure click rate of the article sample and the average exposure click rate of all articles in the large disc;
the second determining unit is used for determining that the first target information of the article sample is a first preset value if the exposure click rate of the article sample is not lower than the average exposure click rate;
and the third determining unit is used for determining that the target first information of the article sample is a second preset value if the exposure click rate of the article sample is lower than the average exposure click rate.
In this embodiment of the present application, preferably, the information determining unit for determining the target second information of the article sample includes:
the fourth determining unit is used for determining the single access reading time length of the article sample and the average single access reading time length of all articles in the large plate;
the fifth determining unit is used for determining that the target second information of the article sample is the first preset value if the single access reading time of the article sample is not lower than the average single access reading time;
and the sixth determining unit is used for determining that the target second information of the article sample is a second preset value if the single access reading time of the article sample is lower than the average single access reading time.
Further, an article determining apparatus provided in an embodiment of the present application further includes a model optimizing unit, where the model optimizing unit includes:
the target article evaluation model generation unit is used for training the article evaluation model to be trained by utilizing article data of the information flow platform in a second historical time period to obtain the target article evaluation model, wherein the second historical time period is later than the first historical time period;
the first prediction unit is used for predicting the recognition result of each article pushed to the user by the information flow platform in the third history time period by using the article evaluation model to obtain a first recognition result set; the third historical time period is later than the second historical time period;
the second prediction unit is used for predicting the recognition result of each article pushed to the user by the information flow platform in a third history time period by using the target article evaluation model to obtain a second recognition result set;
the standard identification result determining unit is used for determining a standard identification result of each article pushed to the user by the information flow platform in a third history time period to obtain a third identification result set;
and the model optimization subunit is used for updating the article evaluation model to the target article evaluation model if the second recognition result set is closer to the third recognition result set relative to the first recognition result set.
As shown in fig. 9, a block diagram of an implementation manner of a computer device provided in an embodiment of the present application is shown, where the computer device includes:
a memory 901 for storing a program;
a processor 902 configured to execute a program, the program specifically configured to:
acquiring article information of a target article, wherein the article information comprises any one or more of a label representing article content of the target article and a title of the target article;
through an article evaluation model, predicting to obtain first information and second information of a target article based on article information, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rate of a large disk, and the second information represents the probability that the single-time access reading length of the target article is higher than the average value of the single-time access reading length of the large disk;
generating an identification result of the target article based on the first information and the second information of the target article, wherein the identification result represents that the target article attracts a user or the target article does not attract the user;
the article evaluation model is obtained by training the article evaluation model to be trained based on the article sample by determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample, and taking the predicted first information approaching to a target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
The processor 902 may be a central processing unit CPU or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit).
The control device may further comprise a communication interface 903 and a communication bus 904, wherein the memory 901, the processor 902 and the communication interface 903 are in communication with each other via the communication bus 904.
The embodiment of the present application further provides a readable storage medium, where a computer program is stored, and the computer program is loaded and executed by a processor to implement the steps of the article identification method.
The present application also proposes a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the method provided in the various optional implementation manners in the aspect of the article identification method or the aspect of the article identification apparatus.
The application provides an article identification method, an article identification device, computer equipment and a storage medium, article information of a target article is acquired, and the article information is input into a pre-trained article evaluation model to generate an identification result of the target article. The article evaluation model is generated without depending on the fact that whether the article sample is an article attracting a user is marked by artificial priori knowledge, the article sample is marked according to the click exposure rate of the article sample and the single access reading time length, the influence of artificial subjective experience is avoided, the distance between the article sample mark and the interest point really liked by the actual user is shortened, and the accuracy of the recognition result of the article attracting the user is improved. In addition, compared with the single-task modeling mode which depends on whether the article is an article attracting users or not in the prior art, the mode of multi-task modeling depending on the click exposure rate and the single-access reading time further improves the accuracy of the article identification result.
The article identification method, the article identification device, the computer device and the storage medium provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It should be noted that, in this specification, each embodiment is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same as and similar to each other in each embodiment may be referred to. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include or include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An article identification method, comprising:
acquiring article information of a target article, wherein the article information comprises a label representing article content of the target article and any one or more of titles of the target article;
through an article evaluation model, predicting to obtain first information and second information of the target article based on the article information, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rate of a large disk, and the second information represents the probability that the single-access reading time length of the target article is higher than the average value of the single-access reading time length of the large disk;
generating an identification result of the target article based on the first information and the second information of the target article, wherein the identification result represents that the target article attracts a user or the target article does not attract a user;
the article evaluation model is obtained by determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample based on the article sample, and training the article evaluation model to be trained by taking the predicted first information approaching to a target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
2. The method of claim 1, wherein the article information of the target article comprises a tag that characterizes article content of the target article, and the obtaining the article information of the target article comprises:
and determining a preset tag white list, and determining tags matched with the article content of the target article from the tag white list.
3. The method of claim 1, wherein the generating the recognition result of the target article based on the first information and the second information of the target article comprises:
if the first information of the target article exceeds a preset first threshold value and the second information of the target article exceeds a preset second threshold value, determining that the identification result of the target article represents that the target article attracts a user;
and if the first information of the target article does not exceed the first threshold value or the second information of the target article does not exceed the second threshold value, determining that the identification result of the target article represents that the target article does not attract the user.
4. The method of claim 1, further comprising:
the method comprises the steps that article data of an information flow platform in a first historical time period are obtained, the article data indicate at least one article pushed to a user by the information flow platform in the first historical time period, and each article in the at least one article is an article sample;
determining article information, target first information and target second information of the article sample;
inputting the article information of the article sample into an article evaluation model to be trained, determining first information and second information of the article sample predicted by the article evaluation model to be trained according to the article information of the article sample, and training the article evaluation model to be trained by taking the predicted first information of the article sample approaching to the target first information of the article sample and the predicted second information of the article sample approaching to the target second information of the article sample as a training target to obtain the article evaluation model.
5. The method of claim 4, wherein determining the target first information for the article sample comprises:
determining an exposure click rate of the article sample;
determining the average value of the exposure click rates of the large disks according to the exposure click rates of all articles in the large disks;
if the exposure click rate of the article sample is not lower than the large-disc exposure click rate average value, determining that the first target information of the article sample is a first preset value;
and if the exposure click rate of the article sample is lower than the average value of the exposure click rates of the large disks, determining that the first target information of the article sample is a second preset value.
6. The method of claim 4, wherein determining the target second information for the article sample comprises:
determining a single visit reading duration of the article sample;
determining the average value of the single access reading time length of the large disc according to the single access reading time length of all articles in the large disc;
if the single access reading time length of the article sample is not lower than the average value of the single access reading time lengths of the large discs, determining that the target second information of the article sample is a first preset value;
and if the single-visit reading time length of the article sample is less than the average value of the single-visit reading time lengths of the large discs, determining that the target second information of the article sample is a second preset value.
7. The method of claim 1, further comprising:
training an article evaluation model to be trained by using article data of the information flow platform in a second historical time period to obtain a target article evaluation model, wherein the second historical time period is later than the first historical time period;
predicting the recognition result of each article pushed to the user by the information flow platform in a third history time period by using the article evaluation model to obtain a first recognition result set; the third historical time period is later than the second historical time period;
predicting the recognition result of each article pushed to the user by the information flow platform in the third history time period by using the target article evaluation model to obtain a second recognition result set;
determining a standard recognition result of each article pushed to a user by the information flow platform within the third history time period to obtain a third recognition result set;
and if the second recognition result set is closer to the third recognition result set relative to the first recognition result set, updating the article evaluation model to the target article evaluation model.
8. An article recognition apparatus, comprising:
the article information acquisition unit is used for acquiring article information of a target article, wherein the article information comprises any one or more of a label representing article content of the target article and a title of the target article;
the information prediction unit is used for predicting to obtain first information and second information of the target article based on the article information through an article evaluation model, wherein the first information represents the probability that the exposure click rate of the target article is higher than the average value of the exposure click rates of the large disks, and the second information represents the probability that the single-access reading length of the target article is higher than the average value of the single-access reading lengths of the large disks;
a recognition result generating unit, configured to generate a recognition result of the target article based on first information and second information of the target article, where the recognition result indicates that the target article attracts a user or that the target article does not attract a user;
the article evaluation model is obtained by determining first information and second information predicted by the article evaluation model to be trained according to the article information of the article sample based on the article sample, and training the article evaluation model to be trained by taking the predicted first information approaching to a target carried by the article sample and the predicted second information approaching to the target carried by the article sample as training targets.
9. A computer device, comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory for storing a program for implementing an article identification method as claimed in any one of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program which, when loaded and executed by a processor, carries out the steps of an article identification method as claimed in any one of claims 1 to 7.
CN202110406839.5A 2021-04-15 2021-04-15 Article identification method and device, computer equipment and storage medium Pending CN115221307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110406839.5A CN115221307A (en) 2021-04-15 2021-04-15 Article identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110406839.5A CN115221307A (en) 2021-04-15 2021-04-15 Article identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115221307A true CN115221307A (en) 2022-10-21

Family

ID=83604809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110406839.5A Pending CN115221307A (en) 2021-04-15 2021-04-15 Article identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115221307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172245A (en) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 Control method and control system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172245A (en) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 Control method and control system

Similar Documents

Publication Publication Date Title
CN110162698B (en) User portrait data processing method, device and storage medium
CN111729305B (en) Map scene preloading method, model training method, device and storage medium
US20230222341A1 (en) Targeted crowd sourcing for metadata management across data sets
CN111966914B (en) Content recommendation method and device based on artificial intelligence and computer equipment
CN110795657B (en) Article pushing and model training method and device, storage medium and computer equipment
CN107463701B (en) Method and device for pushing information stream based on artificial intelligence
CN111708876B (en) Method and device for generating information
JP2018526710A (en) Information recommendation method and information recommendation device
WO2016107354A1 (en) Method and apparatus for providing user personalised resource message pushing
CN109471978B (en) Electronic resource recommendation method and device
CN111859872A (en) Text labeling method and device
WO2020258773A1 (en) Method, apparatus, and device for determining pushing user group, and storage medium
CN112699309A (en) Resource recommendation method, device, readable medium and equipment
CN112188312A (en) Method and apparatus for determining video material of news
CN109829063A (en) A kind of data processing method, device and storage medium
CN111680218B (en) User interest identification method and device, electronic equipment and storage medium
CN113032676B (en) Recommendation method and system based on micro-feedback
CN115221307A (en) Article identification method and device, computer equipment and storage medium
CN108563648B (en) Data display method and device, storage medium and electronic device
CN113836388A (en) Information recommendation method and device, server and storage medium
CN111311015A (en) Personalized click rate prediction method, device and readable storage medium
CN113392266B (en) Training and sorting method and device of sorting model, electronic equipment and storage medium
CN113704613B (en) Resource recommendation model training method, resource recommendation device and server
US20230030341A1 (en) Dynamic user interface and machine learning tools for generating digital content and multivariate testing recommendations
CN113837807A (en) Heat prediction method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination