CN109710840B - Article content depth evaluation method and device - Google Patents

Article content depth evaluation method and device Download PDF

Info

Publication number
CN109710840B
CN109710840B CN201811540935.3A CN201811540935A CN109710840B CN 109710840 B CN109710840 B CN 109710840B CN 201811540935 A CN201811540935 A CN 201811540935A CN 109710840 B CN109710840 B CN 109710840B
Authority
CN
China
Prior art keywords
article
depth
quality
author
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811540935.3A
Other languages
Chinese (zh)
Other versions
CN109710840A (en
Inventor
袁德璋
何径舟
付志宏
杨宇鸿
赖佳伟
陈笑
张小彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811540935.3A priority Critical patent/CN109710840B/en
Publication of CN109710840A publication Critical patent/CN109710840A/en
Application granted granted Critical
Publication of CN109710840B publication Critical patent/CN109710840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an article content depth evaluation method and device, wherein the method comprises the following steps: acquiring an article to be evaluated; inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, a vector corresponding to each keyword is obtained, a vector corresponding to each paragraph is determined according to the vector corresponding to the keyword in each paragraph, and a vector and content depth corresponding to the article are determined according to the vector corresponding to each paragraph; the content depth output by the depth scoring model is obtained, so that the articles can be deeply scored according to the content of the articles, the article scoring accuracy is improved, and the article recommendation efficiency is improved.

Description

Article content depth evaluation method and device
Technical Field
The invention relates to the technical field of data processing, in particular to an article content depth evaluation method and device.
Background
At present, when a search engine, a feed application and the like recommend an article to a user, the following two factors are mainly considered: the matching degree of the article, the user requirement and the user interest is high; secondly, the quality of the article itself is good and bad. At present, the quality of the articles is marked mainly based on prior information aiming at the quality of the articles. The prior information, such as the typesetting, richness and authority of the article, does not relate to the content of the article, so that the scoring accuracy of the article is reduced, and the article recommendation efficiency is reduced.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the first purpose of the present invention is to provide an article content depth evaluation method, which is used for solving the problem in the prior art that the article recommendation efficiency is poor due to low article quality scoring accuracy.
The second purpose of the invention is to provide an article content depth assessment device.
The third purpose of the invention is to provide another article content depth assessment device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for evaluating depth of article content, including:
acquiring an article to be evaluated;
inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, obtains a vector corresponding to each keyword, determines a vector corresponding to each paragraph according to the vector corresponding to the keyword in each paragraph, and determines a vector and a content depth corresponding to the article according to the vector corresponding to each paragraph;
and obtaining the content depth output by the depth scoring model.
Further, the structure of the depth scoring model is word vector model + convolution pooling model + bidirectional long-short term memory network model + classification model;
the word vector model is used for performing word segmentation and keyword extraction on each paragraph in the article to obtain a vector corresponding to each keyword;
the convolution pooling model is used for determining a vector corresponding to each paragraph according to a vector corresponding to the keyword in each paragraph;
the bidirectional long and short term memory network model is used for determining the vector corresponding to the article according to the vector corresponding to each paragraph;
and the classification model is used for determining the content depth of the article according to the vector corresponding to the article.
Further, before the obtaining of the article to be evaluated, the method further includes:
obtaining a training sample, wherein the training sample comprises: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold;
for each article sample in the training samples, acquiring user feedback data corresponding to the article sample;
calculating and determining the quality score corresponding to the article sample according to the user feedback data;
generating first training data according to each article sample in the training samples and the corresponding quality score;
and training an initial depth scoring model according to the first training data to obtain the depth scoring model.
Further, the user feedback data includes any one or more of the following data: the average article stay time, the second retreat percentage, the extra reading time of the user, the praise number, the trample number, the praise number ratio, the trample number ratio, the collection number ratio, the share number and the share number ratio.
Further, after the initial depth scoring model is trained according to the first training data to obtain the depth scoring model, the method further includes:
acquiring authors corresponding to article samples in the first training data;
for each author, determining the high quality rate or the low quality rate of the author according to the quality score corresponding to the article sample of the author in the first training data;
determining whether the author is a high-quality author according to the high-quality rate or the low-quality rate of the author;
acquiring the content depth of an article sample corresponding to each high-quality author in the first training data;
generating second training data according to the content depth of the article sample corresponding to each high-quality author in the first training data;
and training the depth scoring model according to the second training data.
Further, the determining, for each author, a high quality rate or a low quality rate of the author according to a quality score corresponding to the article sample of the author in the first training data includes:
for each author, acquiring a quality score of an article sample corresponding to the author in the first training data;
determining the article samples with the corresponding mass scores larger than a first mass score threshold value as high-quality articles;
determining the article samples with the corresponding quality scores smaller than a second quality score threshold value as low-quality articles;
and determining the high quality rate or the low quality rate of the author according to the number of the high quality articles and the number of the low quality articles.
The method for evaluating the depth of the article content comprises the steps of obtaining an article to be evaluated; inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, a vector corresponding to each keyword is obtained, a vector corresponding to each paragraph is determined according to the vector corresponding to the keyword in each paragraph, and a vector and content depth corresponding to the article are determined according to the vector corresponding to each paragraph; the content depth output by the depth scoring model is obtained, so that the articles can be deeply scored according to the content of the articles, the article scoring accuracy is improved, and the article recommendation efficiency is improved.
To achieve the above object, a second embodiment of the present invention provides an apparatus for evaluating depth of article content, including:
the acquisition module is used for acquiring the article to be evaluated;
the input module is used for inputting the article into a preset depth scoring model so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article to obtain a vector corresponding to each keyword, determines a vector corresponding to each paragraph according to the vector corresponding to the keyword in each paragraph, and determines a vector corresponding to the article and content depth according to the vector corresponding to each paragraph;
the obtaining module is further configured to obtain a content depth output by the depth scoring model.
Further, the structure of the depth scoring model is word vector model + convolution pooling model + bidirectional long-short term memory network model + classification model;
the word vector model is used for performing word segmentation and keyword extraction on each paragraph in the article to obtain a vector corresponding to each keyword;
the convolution pooling model is used for determining a vector corresponding to each paragraph according to a vector corresponding to the keyword in each paragraph;
the bidirectional long and short term memory network model is used for determining the vector corresponding to the article according to the vector corresponding to each paragraph;
and the classification model is used for determining the content depth of the article according to the vector corresponding to the article.
Further, the device further comprises: the device comprises a determining module, a generating module and a training module;
the obtaining module is further configured to obtain a training sample, where the training sample includes: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold;
the obtaining module is further configured to obtain, for each article sample in the training samples, user feedback data corresponding to the article sample;
the determining module is used for calculating and determining the quality score corresponding to the article sample according to the user feedback data;
the generating module is used for generating first training data according to each article sample in the training samples and the corresponding quality score;
and the training module is used for training an initial depth scoring model according to the first training data to obtain the depth scoring model.
Further, the user feedback data includes any one or more of the following data: the average article stay time, the second retreat percentage, the extra reading time of the user, the praise number, the trample number, the praise number ratio, the trample number ratio, the collection number ratio, the share number and the share number ratio.
Further, the obtaining module is further configured to obtain authors corresponding to article samples in the first training data;
the determining module is further configured to determine, for each author, a high quality rate or a low quality rate of the author according to a quality score corresponding to the article sample of the author in the first training data;
the determining module is further configured to determine whether the author is a high-quality author according to the high-quality rate or the low-quality rate of the author;
the acquisition module is further configured to acquire a content depth of an article sample corresponding to each high-quality author in the first training data;
the generating module is further configured to generate second training data according to the content depth of the article sample corresponding to each high-quality author in the first training data;
the training module is further used for training the depth scoring model according to the second training data.
Further, the determining module is specifically configured to,
for each author, acquiring a quality score of an article sample corresponding to the author in the first training data;
determining the article samples with the corresponding mass scores larger than a first mass score threshold value as high-quality articles;
determining the article samples with the corresponding quality scores smaller than a second quality score threshold value as low-quality articles;
and determining the high quality rate or the low quality rate of the author according to the number of the high quality articles and the number of the low quality articles.
The article content depth evaluation device of the embodiment of the invention obtains the article to be evaluated; inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, a vector corresponding to each keyword is obtained, a vector corresponding to each paragraph is determined according to the vector corresponding to the keyword in each paragraph, and a vector and content depth corresponding to the article are determined according to the vector corresponding to each paragraph; the content depth output by the depth scoring model is obtained, so that the articles can be deeply scored according to the content of the articles, the article scoring accuracy is improved, and the article recommendation efficiency is improved.
In order to achieve the above object, a third embodiment of the present invention provides another apparatus for evaluating depth of article content, including: the article content depth assessment method is characterized by comprising the following steps of storing the article content depth assessment program in a memory, storing the article content depth assessment program in the memory, and executing the article content depth assessment program by the processor.
In order to achieve the above object, a fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for evaluating the depth of article content as described above.
In order to achieve the above object, a fifth aspect of the present invention provides a computer program product, wherein when being executed by an instruction processor of the computer program product, the method for evaluating the depth of article content as described above is implemented.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart illustrating an evaluation method of article content depth according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a depth scoring model;
FIG. 3 is a flowchart illustrating another method for evaluating depth of article content according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating another method for evaluating depth of article content according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for evaluating depth of article content according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an apparatus for evaluating content depth of an article according to another embodiment of the present invention;
fig. 7 is a schematic structural diagram of another apparatus for evaluating content depth of an article according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes an evaluation method and apparatus for depth of article content according to an embodiment of the present invention with reference to the drawings.
Fig. 1 is a flowchart illustrating an evaluation method for depth of article content according to an embodiment of the present invention. As shown in fig. 1, the method for evaluating the depth of the article content includes the following steps:
s101, obtaining an article to be evaluated.
The execution subject of the method for evaluating the depth of the article content provided by the invention is an evaluation device of the depth of the article content, and the evaluation device of the depth of the article content can be hardware equipment such as terminal equipment and a server, or software installed on the hardware equipment. Taking feed applications as an example, the articles to be evaluated may be articles to be recommended to the user.
S102, inputting the article into a preset depth scoring model, enabling the depth scoring model to perform word segmentation and keyword extraction on each paragraph in the article, obtaining a vector corresponding to each keyword, determining a vector corresponding to each paragraph according to the vector corresponding to the keyword in each paragraph, and determining a vector corresponding to the article and content depth according to the vector corresponding to each paragraph.
In this embodiment, the input of the depth scoring model may be an article, and the output may be the content depth of the article. The structure of the depth scoring model may be, for example, a word vector model (EMB) + a convolution pooling model (CNN + POOL) + a bidirectional long-short term memory network model (LSTM) + a classification model (SOFTMAX). As shown in fig. 2, a schematic diagram of a depth scoring model is shown.
In fig. 2, a plurality of word vector models are included, and each word vector model has an input of a paragraph of an article and an output of keywords in the paragraph. The processing process of the word vector model on the paragraph can be that the paragraph is cut, the importance degree of each word obtained by cutting words is calculated by adopting a tf-idf method, keywords are extracted from each word according to the importance degree of each word, and the vector corresponding to the keywords is determined. The vector corresponding to the keyword may represent the semantics corresponding to the keyword, for example, when the keyword has multiple semantics, the vector corresponding to the keyword may be determined according to the current semantics of the keyword.
In fig. 2, a plurality of convolution pooling models are included, and each convolution pooling model has an input of a vector corresponding to each keyword in a paragraph and an output of a vector corresponding to the paragraph. The convolution pooling model consists of N times of convolution pooling structures, each time of convolution pooling structure is CNN + POOL. Wherein N may be 3, for example.
In fig. 2, a plurality of bidirectional long-short term memory network models are included, each bidirectional long-short term memory network model corresponds to a paragraph, and the forward input is the vector corresponding to the paragraph and the output of the previous bidirectional long-short term memory network model; the backward input is the vector corresponding to the paragraph and the output of the next bidirectional long-short term memory network model; thereby obtaining a forward vector output by the first bidirectional long and short term memory network model and a backward vector output by the tail bidirectional long and short term memory network model; and splicing the forward vector and the backward vector to obtain a vector corresponding to the article.
In fig. 2, a classification model is included, the input of the classification model is a vector corresponding to an article, and the output is a content depth of the article. The classification model is used for determining the probability that the article has each content depth according to the vector corresponding to the article; and determining the content depth with the maximum corresponding probability as the content depth of the article.
And S103, acquiring the content depth output by the depth scoring model.
The method for evaluating the depth of the article content comprises the steps of obtaining an article to be evaluated; inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, a vector corresponding to each keyword is obtained, a vector corresponding to each paragraph is determined according to the vector corresponding to the keyword in each paragraph, and a vector and content depth corresponding to the article are determined according to the vector corresponding to each paragraph; the content depth output by the depth scoring model is obtained, so that the articles can be deeply scored according to the content of the articles, the article scoring accuracy is improved, and the article recommendation efficiency is improved.
Fig. 3 is a flowchart illustrating another method for evaluating depth of article content according to an embodiment of the present invention. As shown in fig. 3, on the basis of the embodiment shown in fig. 1, before step 101, the method may further include the following steps:
s104, obtaining a training sample, wherein the training sample comprises: and the number of words is greater than a preset word number threshold, and the reading times of the user is greater than a preset time threshold.
In this embodiment, when the article recommended for the user does not fit the interest of the user, the feedback data of the user for the article is easily affected by factors such as the mood of the user, and therefore, a training sample needs to be selected from the recommended article fit with the interest of the user to ensure that the feedback data of the user is the only factor affecting the quality of the article. Therefore, the source of the article sample in the training sample can be an application recommending the article fitting the interest of the user to the user or a background server corresponding to the application. Among them, articles that fit the user's interests, for example, feed applications, etc., are recommended to the user.
In this embodiment, since the short articles are generally the articles with a lower depth score, the training sample chinese article sample needs to be a longer article with a number of words greater than the preset number of words threshold. Wherein the preset word number threshold may be, for example, 1000 words, etc.
In this embodiment, in order to ensure that the user feedback data is sufficient, an article with a reading time greater than a preset time threshold value needs to be selected as an article sample. The preset number threshold may be, for example, 100.
And S105, acquiring user feedback data corresponding to the article samples aiming at each article sample in the training samples.
In this embodiment, the user feedback data may include any one or more of the following data: the average article stay time, the second retreat percentage, the extra reading time of the user, the praise number, the trample number, the praise number ratio, the trample number ratio, the collection number ratio, the share number and the share number ratio. The average article stay time is the average time for the user to read the article. The second quit percentage is the proportion of the times of reading the article and quitting the article in seconds to the total reading times of the article. The extra reading time of the user is the time length beyond the normal reading time of the article.
And S106, calculating and determining the quality score corresponding to the article sample according to the user feedback data.
In this embodiment, the apparatus for evaluating the depth of the article content may construct a linear model, where the input of the linear model is the user feedback data corresponding to the article sample, and the output of the linear model is the quality score corresponding to the article sample, where the quality score may reflect the depth of the article sample content to a certain extent. The mass fraction may be, for example, 0.1, 0.5, 0.8, or the like.
And S107, generating first training data according to each article sample in the training samples and the corresponding quality score.
And S108, training the initial depth scoring model according to the first training data to obtain a depth scoring model.
In this embodiment, in order to further improve the accuracy of the depth scoring model obtained by training, the process executed by the article content depth evaluating device in step 107 may be, for example, acquiring an article sample with a corresponding quality score greater than or equal to 0.8 in the training sample, and determining the article sample as the article sample in the positive example; acquiring article samples with the corresponding mass fraction of less than or equal to 0.2 in the training samples, and determining the article samples as the article samples in the negative example; positive + negative examples are determined as the first training data. The above 0.8 and 0.2 are only examples, and may be changed to other values according to actual needs.
According to the method for evaluating the depth of the article content, disclosed by the embodiment of the invention, the training sample is obtained, and the training sample comprises the following steps: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold; acquiring user feedback data corresponding to the article samples aiming at each article sample in the training samples; calculating and determining the corresponding quality score of the article sample according to the user feedback data; generating first training data according to each article sample in the training samples and the corresponding quality score; the initial depth scoring model is trained according to the first training data to obtain a depth scoring model, and the content depth of the article to be evaluated is determined based on the depth scoring model, so that the article can be deeply scored according to the content of the article, the article scoring accuracy is improved, and the article recommendation efficiency is improved.
Fig. 4 is a flowchart illustrating another method for evaluating depth of article content according to an embodiment of the present invention. As shown in fig. 4, on the basis of the embodiment shown in fig. 3, after step 108, the method may further include the following steps:
s109, acquiring authors corresponding to the article samples in the first training data.
S110, determining the high quality rate or the low quality rate of each author according to the quality score corresponding to the article sample of the author in the first training data.
In this embodiment, the process of the apparatus for evaluating the depth of article content to execute step 110 may specifically be that, for each author, a quality score of an article sample corresponding to the author in the first training data is obtained; determining the article samples with the corresponding mass scores larger than a first mass score threshold value as high-quality articles; determining the article samples with the corresponding quality scores smaller than a second quality score threshold value as low-quality articles; the high-quality rate or the low-quality rate of the author is determined according to the number of the high-quality articles and the number of the low-quality articles. The first quality score threshold may be, for example, 0.8, 0.9, or the like. The second quality score threshold may be, for example, 0.2, 0.3, etc.
And S111, determining whether the author is a high-quality author or not according to the high-quality rate or the low-quality rate of the author.
In this embodiment, if the high quality rate is greater than the preset high quality rate threshold, or the low quality rate is less than the preset low quality rate threshold, the author is determined to be a high quality author.
And S112, acquiring the content depth of the article sample corresponding to each high-quality author in the first training data.
In this embodiment, the article sample corresponding to each high-quality author may be manually marked, and the content depth of the article sample may be manually marked.
And S113, generating second training data according to the content depth of the article sample corresponding to each high-quality author in the first training data.
And S114, training the depth scoring model according to the second training data.
In this embodiment, because the content depth of the article can only be reflected to a certain extent by the user feedback data, in order to improve the accuracy of the depth scoring model, the article sample corresponding to the high-quality author in the first training data may also be subjected to content depth labeling, and the depth scoring model is trained according to the article sample and the labeled content depth, so that the accuracy of the depth scoring model can be further improved.
Fig. 5 is a schematic structural diagram of an article content depth evaluation apparatus according to an embodiment of the present invention. As shown in fig. 5, includes: an acquisition module 51 and an input module 52.
The obtaining module 51 is configured to obtain an article to be evaluated;
the input module 52 is configured to input the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, obtains a vector corresponding to each keyword, determines a vector corresponding to each paragraph according to the vector corresponding to the keyword in each paragraph, and determines a vector corresponding to the article and a content depth according to the vector corresponding to each paragraph;
the obtaining module 51 is further configured to obtain a content depth output by the depth scoring model.
The evaluation device for the depth of the article content provided by the invention can be hardware equipment such as terminal equipment and a server, or software installed on the hardware equipment. Taking feed applications as an example, the articles to be evaluated may be articles to be recommended to the user.
In this embodiment, the input of the depth scoring model may be an article, and the output may be the content depth of the article. The structure of the depth scoring model may be, for example, a word vector model (EMB) + a convolution pooling model (CNN + POOL) + a bidirectional long-short term memory network model (LSTM) + a classification model (SOFTMAX). As shown in fig. 2, a schematic diagram of a depth scoring model is shown.
In fig. 2, a plurality of word vector models are included, and each word vector model has an input of a paragraph of an article and an output of keywords in the paragraph. The processing process of the word vector model on the paragraph can be that the paragraph is cut, the importance degree of each word obtained by cutting words is calculated by adopting a tf-idf method, keywords are extracted from each word according to the importance degree of each word, and the vector corresponding to the keywords is determined. The vector corresponding to the keyword may represent the semantics corresponding to the keyword, for example, when the keyword has multiple semantics, the vector corresponding to the keyword may be determined according to the current semantics of the keyword.
In fig. 2, a plurality of convolution pooling models are included, and each convolution pooling model has an input of a vector corresponding to each keyword in a paragraph and an output of a vector corresponding to the paragraph. The convolution pooling model consists of N times of convolution pooling structures, each time of convolution pooling structure is CNN + POOL. Wherein N may be 3, for example.
In fig. 2, a plurality of bidirectional long-short term memory network models are included, each bidirectional long-short term memory network model corresponds to a paragraph, and the forward input is the vector corresponding to the paragraph and the output of the previous bidirectional long-short term memory network model; the backward input is the vector corresponding to the paragraph and the output of the next bidirectional long-short term memory network model; thereby obtaining a forward vector output by the first bidirectional long and short term memory network model and a backward vector output by the tail bidirectional long and short term memory network model; and splicing the forward vector and the backward vector to obtain a vector corresponding to the article.
In fig. 2, a classification model is included, the input of the classification model is a vector corresponding to an article, and the output is a content depth of the article. The classification model is used for determining the probability that the article has each content depth according to the vector corresponding to the article; and determining the content depth with the maximum corresponding probability as the content depth of the article.
The article content depth evaluation device of the embodiment of the invention obtains the article to be evaluated; inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, a vector corresponding to each keyword is obtained, a vector corresponding to each paragraph is determined according to the vector corresponding to the keyword in each paragraph, and a vector and content depth corresponding to the article are determined according to the vector corresponding to each paragraph; the content depth output by the depth scoring model is obtained, so that the articles can be deeply scored according to the content of the articles, the article scoring accuracy is improved, and the article recommendation efficiency is improved.
With reference to fig. 6, on the basis of the embodiment shown in fig. 5, the apparatus may further include: a determination module 53, a generation module 54 and a training module 55;
the obtaining module 51 is further configured to obtain a training sample, where the training sample includes: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold;
the obtaining module 51 is further configured to obtain, for each article sample in the training samples, user feedback data corresponding to the article sample;
the determining module 53 is configured to calculate and determine a quality score corresponding to the article sample according to the user feedback data;
the generating module 54 is configured to generate first training data according to each article sample in the training samples and the corresponding quality score;
the training module 55 is configured to train an initial depth scoring model according to the first training data, so as to obtain the depth scoring model.
In this embodiment, when the article recommended for the user does not fit the interest of the user, the feedback data of the user for the article is easily affected by factors such as the mood of the user, and therefore, a training sample needs to be selected from the recommended article fit with the interest of the user to ensure that the feedback data of the user is the only factor affecting the quality of the article. Therefore, the source of the article sample in the training sample can be an application recommending the article fitting the interest of the user to the user or a background server corresponding to the application. Among them, articles that fit the user's interests, for example, feed applications, etc., are recommended to the user.
In this embodiment, since the short articles are generally the articles with a lower depth score, the training sample chinese article sample needs to be a longer article with a number of words greater than the preset number of words threshold. Wherein the preset word number threshold may be, for example, 1000 words, etc.
In this embodiment, in order to ensure that the user feedback data is sufficient, an article with a reading time greater than a preset time threshold value needs to be selected as an article sample. The preset number threshold may be, for example, 100.
In this embodiment, the user feedback data may include any one or more of the following data: the average article stay time, the second retreat percentage, the extra reading time of the user, the praise number, the trample number, the praise number ratio, the trample number ratio, the collection number ratio, the share number and the share number ratio. The average article stay time is the average time for the user to read the article. The second quit percentage is the proportion of the times of reading the article and quitting the article in seconds to the total reading times of the article. The extra reading time of the user is the time length beyond the normal reading time of the article.
In this embodiment, the apparatus for evaluating the depth of the article content may construct a linear model, where the input of the linear model is the user feedback data corresponding to the article sample, and the output of the linear model is the quality score corresponding to the article sample, where the quality score may reflect the depth of the article sample content to a certain extent. The mass fraction may be, for example, 0.1, 0.5, 0.8, or the like.
In this embodiment, in order to further improve the accuracy of the depth scoring model obtained by training, the generating module 54 may be specifically configured to obtain an article sample with a mass score greater than or equal to 0.8 in the training sample, and determine the article sample as an article sample in a positive example; acquiring article samples with the corresponding mass fraction of less than or equal to 0.2 in the training samples, and determining the article samples as the article samples in the negative example; positive + negative examples are determined as the first training data. The above 0.8 and 0.2 are only examples, and may be changed to other values according to actual needs.
The evaluation device for the depth of article content in the embodiment of the invention obtains the training sample, wherein the training sample comprises: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold; acquiring user feedback data corresponding to the article samples aiming at each article sample in the training samples; calculating and determining the corresponding quality score of the article sample according to the user feedback data; generating first training data according to each article sample in the training samples and the corresponding quality score; the initial depth scoring model is trained according to the first training data to obtain a depth scoring model, and the content depth of the article to be evaluated is determined based on the depth scoring model, so that the article can be deeply scored according to the content of the article, the article scoring accuracy is improved, and the article recommendation efficiency is improved.
Further, on the basis of the embodiment shown in fig. 6, the obtaining module 51 is further configured to obtain authors corresponding to article samples in the first training data;
the determining module 53 is further configured to determine, for each author, a high quality rate or a low quality rate of the author according to a quality score corresponding to the article sample of the author in the first training data;
the determining module 53 is further configured to determine whether the author is a good-quality author according to the good-quality rate or the low-quality rate of the author;
the obtaining module 51 is further configured to obtain a content depth of an article sample corresponding to each high-quality author in the first training data;
the generating module 54 is further configured to generate second training data according to the content depth of the article sample corresponding to each high-quality author in the first training data;
the training module 55 is further configured to train the depth scoring model according to the second training data.
In this embodiment, the determining module 53 may be specifically configured to, for each author, obtain a quality score of an article sample corresponding to the author in the first training data; determining the article samples with the corresponding mass scores larger than a first mass score threshold value as high-quality articles; determining the article samples with the corresponding quality scores smaller than a second quality score threshold value as low-quality articles; the high-quality rate or the low-quality rate of the author is determined according to the number of the high-quality articles and the number of the low-quality articles. The first quality score threshold may be, for example, 0.8, 0.9, or the like. The second quality score threshold may be, for example, 0.2, 0.3, etc.
In this embodiment, because the content depth of the article can only be reflected to a certain extent by the user feedback data, in order to improve the accuracy of the depth scoring model, the article sample corresponding to the high-quality author in the first training data may also be subjected to content depth labeling, and the depth scoring model is trained according to the article sample and the labeled content depth, so that the accuracy of the depth scoring model can be further improved.
Fig. 7 is a schematic structural diagram of another apparatus for evaluating content depth of an article according to an embodiment of the present invention. The article content depth evaluation device comprises:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002, when executing the program, implements the method of evaluating the depth of article content provided in the above-described embodiment.
Further, the device for evaluating the depth of the article content further comprises:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that may be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (e.g., at least one disk memory).
The processor 1002 is configured to implement the method for evaluating the depth of article content according to the foregoing embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of assessing the depth of article content as described above.
The invention also provides a computer program product, and when the instruction processor in the computer program product executes, the method for evaluating the depth of the article content is realized.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (13)

1. A method for evaluating the depth of article content is characterized by comprising the following steps:
acquiring an article to be evaluated;
inputting the article into a preset depth scoring model, so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article, obtains a vector corresponding to each keyword, determines a vector corresponding to each paragraph according to the vector corresponding to the keyword in each paragraph, and determines a vector and a content depth corresponding to the article according to the vector corresponding to each paragraph;
acquiring the content depth output by the depth scoring model;
before the obtaining of the article to be evaluated, the method further includes:
generating first training data, wherein the first training data comprises: the method comprises the following steps of obtaining article samples with word number larger than a preset word number threshold value and user reading times larger than a preset time threshold value, and a quality score corresponding to each article sample, wherein the quality score is determined according to user feedback data corresponding to the article samples;
training an initial depth scoring model according to the first training data to obtain the depth scoring model;
acquiring authors corresponding to article samples in the first training data;
for each author, determining the high quality rate or the low quality rate of the author according to the quality score corresponding to the article sample of the author in the first training data;
determining whether the author is a high-quality author according to the high-quality rate or the low-quality rate of the author;
acquiring the content depth of the article sample corresponding to each high-quality author in the first training data, wherein the content depth is determined by manual marking;
generating second training data according to the article sample corresponding to each high-quality author in the first training data and the content depth of the article sample corresponding to each high-quality author;
and training the depth scoring model according to the second training data.
2. The method of claim 1, wherein the depth scoring model has a structure of a word vector model + a convolutional pooling model + a two-way long-short term memory network model + a classification model;
the word vector model is used for performing word segmentation and keyword extraction on each paragraph in the article to obtain a vector corresponding to each keyword;
the convolution pooling model is used for determining a vector corresponding to each paragraph according to a vector corresponding to the keyword in each paragraph;
the bidirectional long and short term memory network model is used for determining the vector corresponding to the article according to the vector corresponding to each paragraph;
and the classification model is used for determining the content depth of the article according to the vector corresponding to the article.
3. The method of claim 1, wherein generating first training data comprises:
obtaining a training sample, wherein the training sample comprises: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold;
for each article sample in the training samples, acquiring user feedback data corresponding to the article sample;
calculating and determining the quality score corresponding to the article sample according to the user feedback data;
and generating first training data according to each article sample in the training samples and the corresponding quality score.
4. The method of claim 3, wherein the user feedback data comprises any one or more of: the average article stay time, the second retreat percentage, the extra reading time of the user, the praise number, the trample number, the praise number ratio, the trample number ratio, the collection number ratio, the share number and the share number ratio.
5. The method of claim 1, wherein the determining, for each author, a quality rating or a low quality rating for the author based on a quality score corresponding to the sample of the author's articles in the first training data comprises:
for each author, acquiring a quality score of an article sample corresponding to the author in the first training data;
determining the article samples with the corresponding mass scores larger than a first mass score threshold value as high-quality articles;
determining the article samples with the corresponding quality scores smaller than a second quality score threshold value as low-quality articles;
and determining the high quality rate or the low quality rate of the author according to the number of the high quality articles and the number of the low quality articles.
6. An article content depth assessment apparatus, comprising:
the acquisition module is used for acquiring the article to be evaluated;
the input module is used for inputting the article into a preset depth scoring model so that the depth scoring model performs word segmentation and keyword extraction on each paragraph in the article to obtain a vector corresponding to each keyword, determines a vector corresponding to each paragraph according to the vector corresponding to the keyword in each paragraph, and determines a vector corresponding to the article and content depth according to the vector corresponding to each paragraph;
the obtaining module is further configured to obtain a content depth output by the depth scoring model;
a generating module, configured to generate first training data, where the first training data includes: the method comprises the following steps of obtaining article samples with word number larger than a preset word number threshold value and user reading times larger than a preset time threshold value, and a quality score corresponding to each article sample, wherein the quality score is determined according to user feedback data corresponding to the article samples;
the training module is used for training an initial depth scoring model according to the first training data to obtain the depth scoring model;
the obtaining module is further configured to obtain authors corresponding to article samples in the first training data;
a determining module, configured to determine, for each author, a high quality rate or a low quality rate of the author according to a quality score corresponding to the article sample of the author in the first training data;
the determining module is further configured to determine whether the author is a high-quality author according to the high-quality rate or the low-quality rate of the author;
the acquisition module is further configured to acquire a content depth of the article sample corresponding to each high-quality author in the first training data, where the content depth is determined by manual marking;
the generating module is further configured to generate second training data according to the article sample corresponding to each high-quality author in the first training data and the content depth of the article sample corresponding to each high-quality author;
the training module is further used for training the depth scoring model according to the second training data.
7. The apparatus of claim 6, wherein the depth scoring model has a structure of a word vector model + a convolution pooling model + a two-way long-short term memory network model + a classification model;
the word vector model is used for performing word segmentation and keyword extraction on each paragraph in the article to obtain a vector corresponding to each keyword;
the convolution pooling model is used for determining a vector corresponding to each paragraph according to a vector corresponding to the keyword in each paragraph;
the bidirectional long and short term memory network model is used for determining the vector corresponding to the article according to the vector corresponding to each paragraph;
and the classification model is used for determining the content depth of the article according to the vector corresponding to the article.
8. The apparatus of claim 6, wherein the generation module is specifically configured to,
obtaining a training sample, wherein the training sample comprises: the article sample with the word number larger than a preset word number threshold and the reading times of the user larger than a preset time threshold;
for each article sample in the training samples, acquiring user feedback data corresponding to the article sample;
calculating and determining the quality score corresponding to the article sample according to the user feedback data;
and generating first training data according to each article sample in the training samples and the corresponding quality score.
9. The apparatus of claim 8, wherein the user feedback data comprises any one or more of: the average article stay time, the second retreat percentage, the extra reading time of the user, the praise number, the trample number, the praise number ratio, the trample number ratio, the collection number ratio, the share number and the share number ratio.
10. The apparatus of claim 6, wherein the means for determining is specifically configured to,
for each author, acquiring a quality score of an article sample corresponding to the author in the first training data;
determining the article samples with the corresponding mass scores larger than a first mass score threshold value as high-quality articles;
determining the article samples with the corresponding quality scores smaller than a second quality score threshold value as low-quality articles;
and determining the high quality rate or the low quality rate of the author according to the number of the high quality articles and the number of the low quality articles.
11. An article content depth assessment apparatus, comprising:
memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for assessing the depth of article content according to any one of claims 1 to 5 when executing the program.
12. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method for assessing the depth of content of an article according to any one of claims 1 to 5.
13. A computer program product, wherein a processor of instructions of the computer program product, when executed, implements a method of assessing depth of article content as claimed in any one of claims 1 to 5.
CN201811540935.3A 2018-12-17 2018-12-17 Article content depth evaluation method and device Active CN109710840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811540935.3A CN109710840B (en) 2018-12-17 2018-12-17 Article content depth evaluation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811540935.3A CN109710840B (en) 2018-12-17 2018-12-17 Article content depth evaluation method and device

Publications (2)

Publication Number Publication Date
CN109710840A CN109710840A (en) 2019-05-03
CN109710840B true CN109710840B (en) 2020-12-11

Family

ID=66255731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811540935.3A Active CN109710840B (en) 2018-12-17 2018-12-17 Article content depth evaluation method and device

Country Status (1)

Country Link
CN (1) CN109710840B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334356B (en) * 2019-07-15 2023-08-04 腾讯科技(深圳)有限公司 Article quality determining method, article screening method and corresponding device
CN111104486A (en) * 2019-12-25 2020-05-05 郑州师范学院 Modern literature comparison and explanation system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095271A (en) * 2014-05-12 2015-11-25 北京大学 Microblog retrieval method and microblog retrieval apparatus
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN108287821A (en) * 2018-01-23 2018-07-17 北京奇艺世纪科技有限公司 A kind of high-quality text screening technique, device and electronic equipment
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN108875821A (en) * 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095271A (en) * 2014-05-12 2015-11-25 北京大学 Microblog retrieval method and microblog retrieval apparatus
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN108287821A (en) * 2018-01-23 2018-07-17 北京奇艺世纪科技有限公司 A kind of high-quality text screening technique, device and electronic equipment
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN108875821A (en) * 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN109710840A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109710841B (en) Comment recommendation method and device
CN107679033B (en) Text sentence break position identification method and device
CN109726274B (en) Question generation method, device and storage medium
CN108681541B (en) Picture searching method and device and computer equipment
CN109710840B (en) Article content depth evaluation method and device
CN109033244B (en) Search result ordering method and device
CN107368613B (en) Short text sentiment analysis method and device
CN107203265B (en) Information interaction method and device
CN110738046B (en) Viewpoint extraction method and apparatus
CN106599047B (en) Information pushing method and device
CN111460155A (en) Information credibility assessment method and device based on knowledge graph
CN112069316B (en) Emotion recognition method and device
CN113468034A (en) Data quality evaluation method and device, storage medium and electronic equipment
CN104850537A (en) Method and device for screening text content
CN104102662B (en) A kind of user interest preference similarity determines method and device
CN112446717B (en) Advertisement putting method and device
CN110837732B (en) Method and device for identifying intimacy between target persons, electronic equipment and storage medium
CN110232117B (en) Sentence fluency detection method and device and terminal
CN112163415A (en) User intention identification method and device for feedback content and electronic equipment
CN108829896B (en) Reply information feedback method and device
CN111475409B (en) System test method, device, electronic equipment and storage medium
CN106570116B (en) Search result aggregation method and device based on artificial intelligence
CN111401563A (en) Machine learning model updating method and device
CN109189886A (en) A kind of intelligent video recommender system
CN106815592B (en) Text data processing method and device and wrong word recognition methods and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant