US20180349734A1 - Method and apparatus for evaluating article value based on artificial intelligence, and storage medium - Google Patents

Method and apparatus for evaluating article value based on artificial intelligence, and storage medium Download PDF

Info

Publication number
US20180349734A1
US20180349734A1 US16/001,111 US201816001111A US2018349734A1 US 20180349734 A1 US20180349734 A1 US 20180349734A1 US 201816001111 A US201816001111 A US 201816001111A US 2018349734 A1 US2018349734 A1 US 2018349734A1
Authority
US
United States
Prior art keywords
article
quality
paragraph
low
articles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/001,111
Other versions
US11481572B2 (en
Inventor
Bo Huang
Daren Li
Qiaoqiao She
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Assigned to BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. reassignment BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HUANG, BO, LI, DAREN, SHE, Qiaoqiao
Publication of US20180349734A1 publication Critical patent/US20180349734A1/en
Application granted granted Critical
Publication of US11481572B2 publication Critical patent/US11481572B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06K9/623
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • G06F17/30699
    • G06F17/30867
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • G06K9/6256
    • G06K9/6263
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a method and apparatus for evaluating article value based on artificial intelligence, and a storage medium. The solution of present disclosure may be employed to pre-mine high-quality articles and low-quality articles as training data, and train according to the training data to obtain a value-scoring model. As such, value evaluation needs to be performed for the to-be-evaluated article, it is feasible to first perform feature extraction for the to-be-evaluated article, determine a score of the to-be-evaluated article based on the extracted features and the value-scoring model, and thereby implement effective evaluation of the article value.

Description

  • The present application claims the priority of Chinese Patent Application No. 201710417749X, filed on Jun. 6, 2017, with the title of “Method and apparatus for evaluating article value based on artificial intelligence, and storage medium”. The disclosure of the above applications is incorporated herein by reference in its entirety.
  • FIELD OF THE DISCLOSURE
  • The present disclosure relates to computer application technologies, and particularly to a method and apparatus for evaluating article value based on artificial intelligence, and a storage medium.
  • BACKGROUND OF THE DISCLOSURE
  • Artificial intelligence AI is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer sciences and attempts to learn about the essence of intelligence, and produces a new intelligent machine capable of responding in a manner similar to human intelligence. The studies in the field comprise robots, language recognition, image recognition, natural language processing, expert systems and the like.
  • To better satisfy the user in using his fragmented time, an information distribution platform of a mobile internet tries to recommend some new and short articles that can cater to the user's interest, to the user. To obtain more display opportunities, a content producer also caters to this demand, and produces more articles that can attract user's click and include less content.
  • As such, some articles that are truly valuable and informative cannot be displayed sufficiently, and the content producer has not motivation to produce such articles, thereby forming a vicious circle so that high-quality articles become less and less whereas low-quality articles without less value become more and more.
  • The increase of low-quality articles is an extremely large threat to whether the user believes information resources of the Internet.
  • Hence, it is necessary to, upon information distribution, minimize the number of low-quality articles but increase the number of high-quality articles, to enable the user to obtain more high-quality resources, encourage creation of high-quality articles while enhancing the user's experience, and thereby create a healthy ecology of internet content.
  • Correspondingly, it is necessary to evaluate the value of articles and thereby regard articles with higher value as high-quality articles and recommend them to the user. However, there is not yet an effective value-evaluating method in the prior art.
  • SUMMARY OF THE DISCLOSURE
  • In view of the above, the present disclosure provides a method and apparatus for evaluating article value based on artificial intelligence, and a storage medium.
  • Specific technical solutions are as follows:
  • A method for evaluating article value based on artificial intelligence, comprising:
  • mining high-quality articles and low-quality articles as training data, and training according to the training data to obtain a value-scoring model;
  • performing feature extraction for a to-be-evaluated article;
  • determining a score of the to-be-evaluated article based on the extracted features and the value-scoring model.
  • According to a preferred embodiment of the present disclosure, the mining the training data comprises:
  • mining the training data according to manually-annotated information, the user's feedback, and preset mining rules.
  • According to a preferred embodiment of the present disclosure, the mining the training data according to manually-annotated information, the user's feedback, and preset mining rules comprises:
  • regarding articles corresponding to manually-annotated high-quality content sources as high-quality articles, and adding the articles into the training data;
  • adding high-quality articles and low-quality articles determined according to the user's feedback behaviors, into the training data;
  • regarding articles having pre-set low-quality article features as low-quality articles, and adding the articles into the training data.
  • According to a preferred embodiment of the present disclosure, the performing feature extraction for a to-be-evaluated article comprises:
  • extracting one or any combination of the following features respectively with respect to each paragraph in the to-be-evaluated article:
  • relevance between the paragraph and a title of the to-be-evaluated article;
  • relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
  • the number of newly-added words in the paragraph;
  • the total number of words in the paragraph;
  • whether the paragraph begins with a subtitle;
  • the number of pictures in the paragraph;
  • the number of sentences in the paragraph;
  • an average length of the sentences in the paragraph;
  • the number of pronouns in the paragraph.
  • According to a preferred embodiment of the present disclosure, the method further comprises:
  • comparing the score with a preset threshold, and determining whether the to-be-evaluated article is a high-quality article or a low-quality article.
  • According to a preferred embodiment of the present disclosure, the method further comprises:
  • obtaining M preset low-quality article features, M being a positive integer;
  • if the to-be-evaluated article has any low-quality article feature, the to-be-evaluated article is determined as the low-quality article.
  • According to a preferred embodiment of the present disclosure, the low-quality article features include one or any combination of the following:
  • repetitious content in the article exceeds a predetermined threshold;
  • the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
  • the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
  • the article includes a case that the expression is incomplete;
  • a typographic error exists in the article.
  • An apparatus for evaluating article value based on artificial intelligence, comprising: a mining unit, a training unit and an evaluating unit;
  • the mining unit is configured to mine high-quality articles and low-quality articles as training data, and send the training data to the training unit;
  • the training unit is configured to training according to the training data to obtain a value-scoring model, and send the value-scoring model to the evaluating unit;
  • the evaluating unit is configured to perform feature extraction for a to-be-evaluated article, and determine a score of the to-be-evaluated article based on the extracted features and the value-scoring model.
  • According to a preferred embodiment of the present disclosure, the mining unit mines the training data according to manually-annotated information, the user's feedback, and preset mining rules.
  • According to a preferred embodiment of the present disclosure, the mining unit regards articles corresponding to manually-annotated high-quality content sources as high-quality articles, and adds the articles into the training data;
  • the mining unit adds high-quality articles and low-quality articles determined according to the user's feedback behaviors, into the training data;
  • the mining unit regards articles having pre-set low-quality article features as low-quality articles, and add the articles into the training data.
  • According to a preferred embodiment of the present disclosure, the evaluating unit extracts one or any combination of the following features respectively with respect to each paragraph in the to-be-evaluated article:
  • relevance between the paragraph and a title of the to-be-evaluated article;
  • relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
  • the number of newly-added words in the paragraph;
  • the total number of words in the paragraph;
  • whether the paragraph begins with a subtitle;
  • the number of pictures in the paragraph;
  • the number of sentences in the paragraph;
  • an average length of the sentences in the paragraph;
  • the number of pronouns in the paragraph.
  • According to a preferred embodiment of the present disclosure, the evaluating unit is further configured to
  • compare the score with a preset threshold, and determine whether the to-be-evaluated article is a high-quality article or a low-quality article.
  • According to a preferred embodiment of the present disclosure, the evaluating unit is further configured to
  • obtain M preset low-quality article features, M being a positive integer;
  • if the to-be-evaluated article has any low-quality article feature, the to-be-evaluated article is determined as the low-quality article.
  • According to a preferred embodiment of the present disclosure, the low-quality article features include one or any combination of the following:
  • repetitious content in the article exceeds a predetermined threshold;
  • the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
  • the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
  • the article includes a case that the expression is incomplete;
  • a typographic error exists in the article.
  • A computer device, comprising a memory, a processor and a computer program which is stored on the memory and runs on the processor, the processor, upon executing the program, implementing the above-mentioned method.
  • A computer-readable storage medium on which a computer program is stored, the program, when executed by the processor, implementing the aforesaid method.
  • As can be seen from the above introduction, the solution of present disclosure is employed to pre-mine high-quality articles and low-quality articles as training data, and train according to the training data to obtain a value-scoring model. As such, value evaluation needs to be performed for the to-be-evaluated article, it is feasible to first perform feature extraction for the to-be-evaluated article, determine a score of the to-be-evaluated article based on the extracted features and the value-scoring model, and thereby implement effective evaluation of the article value.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 is a flow chart of an embodiment of a method of evaluating article value based on artificial intelligence according to the present disclosure.
  • FIG. 2 is a structural schematic diagram of an RNN model according to the present disclosure.
  • FIG. 3 is a schematic diagram of an implementation process of a method of evaluating article value based on artificial intelligence according to the present disclosure.
  • FIG. 4 is a block diagram of an embodiment of an apparatus for evaluating article value based on artificial intelligence according to the present disclosure.
  • FIG. 5 is illustrates a block diagram of an example computer system/server 12 adapted to implement an implementation mode of the present disclosure.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • Technical solutions of the present disclosure will be described in more detail in conjunction with figures and embodiments to make technical solutions of the present disclosure clear and more apparent.
  • Obviously, the described embodiments are partial embodiments of the present disclosure, not all embodiments. Based on embodiments in the present disclosure, all other embodiments obtained by those having ordinary skill in the art without making inventive efforts all fall within the protection scope of the present disclosure.
  • FIG. 1 is a flow chart of an embodiment of a method of evaluating article value based on artificial intelligence according to the present disclosure. As shown in FIG. 1, the embodiment comprises the following specific implementation mode.
  • 101: mining high-quality articles and low-quality articles as training data, and training according to the training data to obtain a value-scoring model.
  • It is necessary to mine a lot of training data to train the value-scoring model. The value-scoring model is obtained by training according to the mined training data including high-quality article and low-quality articles.
  • In the present embodiment, it is feasible to mine the training data according to manually-annotated information, the user's feedback, preset mining rules and so on, which will be introduced respectively as follows.
  • 1) Manually-Annotated Information
  • For example, it is possible to regard articles corresponding to manually-annotated high-quality content sources as high-quality articles, and add them into the training data.
  • Specifically, it is feasible to screen to obtain a batch of candidate content sources according to article issuance data quantity and activeness of content sources such as an author's website, then manually score according to comprehensive quality of articles issued by the candidate content sources, determine content sources whose scores exceed a predetermined threshold, as high-quality content sources, and add articles corresponding to the high-quality content sources into the training data as high-quality articles.
  • It can be seen that the above manner is mainly used to mine high-quality articles.
  • 2) The User's Feedback Behaviors
  • For example, it is possible to add high-quality articles and low-quality articles determined according to the user's feedback behaviors, into the training data.
  • In practical application, after viewing an article, the user performs a series of feedback behaviors such as keeping the article as a favorite, commenting and sharing, so the training data may be mined according to the user's feedback behaviors.
  • For example, a certain article is commented by many users as an article with very low quality, it may be believed that this article is a low-quality article and added to the training data.
  • Again for example, a certain article is kept by many users as their favorites and read by each user in a long duration, it may be believed that this article is a high-quality article and added into the training data.
  • It can be seen that the above manner can be used to mine high-quality articles as well as low-quality articles.
  • 3) Mining Rules
  • For example, it is possible to regard articles having pre-set low-quality article features as low-quality articles, and add them into the training data.
  • The low-quality article features may be preset. As such, after a certain article is analyzed, if it is found as having low-quality article features, the article may be regarded as a low-quality article, and added into the training data.
  • It can be seen that low-quality articles are mined mainly through preset rules/policies.
  • After an enough number of training data are obtained, the value-scoring model may be obtained by training according to the training data/
  • When training is performed, it is feasible to respectively perform feature extraction for high-quality articles and low-quality articles as the training data in the manner stated in subsequent 102, set a score of the high-quality articles as 1, set a score of the low-quality articles as 0, and then train to obtain the value-scoring model. How to train is of the prior art.
  • The value-scoring model may be a deep learning model such as a Recurrent Neural Network (RNN).
  • In 102, feature extraction is performed for a to-be-evaluated article.
  • The high-quality article usually has the following features: good typesetting, sufficient arguments, clear logic, definite opinions, professional terms and the like.
  • Based on the above features, it is possible to manually preset a plurality of features to be extracted, and extract these features with respect to the to-be-evaluated article.
  • For example, one or any combination of the following features may be extracted respectively with respect to each paragraph of the to-be-evaluated article.
  • Feature 1: relevance between the paragraph and a title of the to-be-evaluated article;
  • Feature 2: relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
  • Feature 3: the number of newly-added words in the paragraph;
  • Feature 4: the total number of words in the paragraph;
  • Feature 5: whether the paragraph begins with a subtitle;
  • Feature 6: the number of pictures in the paragraph;
  • Feature 7: the number of sentences in the paragraph;
  • Feature 8: an average length of the sentences in the paragraph;
  • Feature 9: the number of pronouns in the paragraph.
  • Table 1 shows roles played by the features upon measuring the value of the article.
  • TABLE 1
    Roles played by the features upon
    measuring the value of the article
    Features Roles
    Feature 1 Whether opinions are definite
    Feature
    2 Whether logic is clear
    Feature 3 Whether arguments are sufficient
    Feature 4 Whether typesetting is excellent
    Feature 5 Whether typesetting is excellent
    Feature 6 Whether typesetting is excellent
    Feature 7 Whether typesetting is excellent
    Feature 8 Whether typesetting is excellent
    Feature 9 Whether terms used are professional
  • The above nine features may be extracted from each paragraph in the to-be-evaluated article.
  • Regarding the first paragraph in the to-be-evaluated article, since there is not a neighboring preceding paragraph of the paragraph, the relevance between the paragraph and the title may be regarded as the relevance between the paragraph and the neighboring preceding paragraph of the paragraph, namely, feature 1=feature 2.
  • Regarding a paragraph other than the first paragraph, for example, the second paragraph, feature 1 refers to the relevance between the second paragraph and the title, and feature 2 refers to the relevance between the second paragraph and the first paragraph.
  • In addition, feature 3 usually refers to the number of newly-added words in the paragraph as compared with all content preceding the content of the paragraph. For example, regarding the second paragraph, feature 3 may refer to the number of newly-added words in the second paragraph as compared with the content formed by the first paragraph and the title.
  • In the present embodiment, it is feasible to use a deep learning semantic similarity model which is obtained by pre-training and based on a Convolutional Neural Network (CNN), to determine Feature 1 and Feature 2, i.e., Feature 1 and Feature 2 may share one model, and the title is treated as a paragraph.
  • How to train the CNN-based deep learning semantic similarity model is of prior art. For example, it is possible to manually construct a sufficient number of training data and therefore train according to the training data to obtain the CNN-based deep learning semantic similarity model, for example, use one title and one paragraph to form a pair, namely, form a training sample, or use two paragraphs to form a pair. If two components in the pair come from the same article, the relevance corresponding to the pair may be set as 1, otherwise as 0.
  • In 103, a score of the to-be-evaluated article is determined based on the extracted features and the value-scoring model.
  • After the features stated in 102 are extracted, the extracted features may be input to the value-scoring model to obtain a score of the to-be-evaluated article output by the value-scoring model.
  • Since paragraphs of the article are in a sequence relationship, the RNN model may be employed as the value-scoring model as stated above.
  • FIG. 2 is a structural schematic diagram of an RNN model according to the present disclosure. As stated in FIG. 2, the model finally outputs a score whose value may be between 0 and 1. The higher the score is, the larger the value of the article is.
  • Regarding the to-be-evaluated article, after its score is obtained, it is feasible to compare the score with a preset threshold, and determine whether the to-be-evaluated article is a high-quality article or a low-quality article according to a comparison result.
  • For example, if the score is larger than the threshold, it may be determined that the to-be-evaluated article is a high-quality article, otherwise it is a low-quality article.
  • It can be seen that effective evaluation of the value of the article may be implemented in the manner of the above embodiment.
  • The value-scoring model is advantageous in strong generalization capability, but there might be a case that some articles apparently having low-quality article features cannot be recognized. To overcome the problem and further improve the accuracy of evaluation result, the following processing manner may be employed.
  • Obtaining M preset low-quality article features, M being a positive integer. If the to-be-evaluated article has any of the low-quality article features, the to-be-evaluated article is determined as the low-quality article.
  • Which specific features are regarded as low-quality article features may depend on actual situations, for example, the low-quality article features may include one or any combination of the following:
  • Feature A: repetitious content in the article exceeds a predetermined threshold;
  • Feature B: the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
  • Feature C: the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
  • Feature D: the article includes a case that the expression is incomplete;
  • Feature E: a typographic error exists in the article
  • Regarding Feature A, if a lot of repetitious content exists in the article, for example, the content of the title is repeatedly mentioned in many paragraphs, when the number of repetition reaches a certain degree, the article may be regarded as the low-quality article.
  • Regarding feature B, if the number of characters in the article is too small and there is no picture, the article may be regarded as the low-quality article.
  • Regarding Feature C, if the article includes a case that a paragraph contains too many characters, the article may be regarded as the low-quality article.
  • Regarding Feature D, if the title or text of the article includes a case in which the expression is incomplete, for example, “**star will show up at . . . today (Chinese expression: **
    Figure US20180349734A1-20181206-P00001
    . . . )”, the article may be regarded as the low-quality article.
  • Regarding Feature E, if a typographic error appears in the title or text of the article, the article may be regarded as the low-quality article.
  • Regarding the to-be-evaluated article, if it has any one of Features A-E, it may be regarded as the low-quality article.
  • To facilitate expression, the above manner of determining whether the to-be-evaluated article is a high-quality article or low-quality article according to the score is called the first evaluation manner, and the above manner of determining whether the to-be-evaluated article is a high-quality article or low-quality article according to low-quality article features is called the second evaluation manner.
  • In practical application, it is feasible to use the first evaluation manner and second evaluation manner in combination, namely, evaluate the value of the article based on two dimensions: expression of the content of the article and content depth. Specific combination manners are not limited. For example, regarding the to-be-evaluated article, if it is determined as the low-quality article in both the first evaluation manner and second evaluation manner, it is believed that the to-be-evaluated article is the low-quality article. Alternatively, after the to-be-evaluated article is determined as the high-quality article in the first evaluation manner, the second evaluation manner is further employed to determine whether the to-be-evaluated article is the high-quality article or low-quality article, if the article is determined as the low-quality article, it is believed that the to-be-evaluated article is the low-quality article, otherwise is the high-quality article.
  • Specific values of the thresholds involved in the above introduction all may depend on actual needs.
  • To conclude the above introduction, FIG. 3 is a schematic diagram of an implementation process of a method of evaluating article value based on artificial intelligence according to the present disclosure. As shown in FIG. 3, the implementation process is mainly formed by two portions: training data offline mining and online value evaluation.
  • As compared with the prior art, the above embodiments provide an effective evaluation manner of the value of the article; furthermore, the extracted features can accurately and visually reflect the level of quality of the article, thereby improving the accuracy of the evaluation result, and a better training effect may be obtained by using less training data. In addition, the two evaluation manners may be combined flexibly to facilitate flexible adjustment according to actual needs.
  • Correspondingly, it is necessary to, upon information distribution, minimize the number of low-quality articles but increase the number of high-quality articles, to enable the user to obtain more high-quality resources, encourage creation of high-quality articles while enhancing the user's experience, and thereby create a healthy ecology of internet content.
  • The above introduces the method embodiments. The solution of the present disclosure will be further described through an apparatus embodiment.
  • FIG. 4 is a block diagram of an embodiment of an apparatus for evaluating article value based on artificial intelligence according to the present disclosure. As shown in FIG. 4, the apparatus comprises: a mining unit 401, a training unit 402 and an evaluating unit 403.
  • The mining unit 401 is configured to mine high-quality articles and low-quality articles as training data, and send the training data to the training unit 402.
  • The training unit 402 is configured to train according to the training data to obtain a value-scoring model, and send the value-scoring model to the evaluating unit 403.
  • The evaluating unit 403 is configured to perform feature extraction for a to-be-evaluated article, and determine a score of the to-be-evaluated article based on the extracted features and the value-scoring model.
  • It is feasible to mine a lot of training data to train the value-scoring model. The value-scoring model is obtained by training according to the mined training data including high-quality article and low-quality articles.
  • The mining unit 401 may mine the training data according to manually-annotated information, the user's feedback, preset mining rules and so on.
  • For example, the mining unit 401 may regard articles corresponding to manually-annotated high-quality content sources as high-quality articles, and add them into the training data.
  • Specifically, it is feasible to screen to obtain a batch of candidate content sources according to article issuance data quantity and activeness of content sources such as an author's website, then manually score according to comprehensive quality of articles issued by the candidate content sources, determine content sources whose scores exceed a predetermined threshold, as high-quality content sources, and add articles corresponding to the high-quality content sources into the training data as high-quality articles.
  • The mining unit 401 may further add high-quality articles and low-quality articles determined according to the user's feedback behaviors, into the training data.
  • In practical application, after viewing an article, the user performs a series of feedback behaviors such as keeping the article as a favorite, commenting and sharing, so the training data may be mined according to the user's feedback behaviors.
  • For example, a certain article is commented by many users as an article with very low quality, it may be believed that this article is a low-quality article and added to the training data.
  • Again for example, a certain article is kept by many users as their favorites and read by each user in a long duration, it may be believed that this article is a high-quality article and added into the training data.
  • The mining unit 401 may further regard articles having pre-set low-quality article features as low-quality articles, and add them into the training data.
  • The low-quality article features may be preset. As such, after a certain article is analyzed, if it is found as having low-quality article features, the article may be regarded as a low-quality article, and added into the training data.
  • After an enough number of training data are obtained, the training unit 402 trains according to the training data to obtain the value-scoring model.
  • The value-scoring model may be a deep learning model such as a Recurrent Neural Network (RNN).
  • After the above processing, upon performing value evaluation for the to-be-evaluated article, the evaluating unit 403 may first perform feature extraction for the to-be-evaluated article, and then determine the score of the to-be-evaluated article according to the extracted features and the value-scoring model.
  • The high-quality article usually has the following features: good typesetting, sufficient arguments, clear logic, definite opinions, professional terms and the like.
  • Based on the above features, a plurality of features to be extracted may be manually preset, and the evaluating unit 403 extracts these features with respect to the to-be-evaluated article.
  • Specifically, the evaluating unit 403 may extract one or any combination of the following features respectively with respect to each paragraph in the to-be-evaluated article:
  • relevance between the paragraph and a title of the to-be-evaluated article;
  • relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
  • the number of newly-added words in the paragraph;
  • the total number of words in the paragraph;
  • whether the paragraph begins with a subtitle;
  • the number of pictures in the paragraph;
  • the number of sentences in the paragraph;
  • an average length of the sentences in the paragraph;
  • the number of pronouns in the paragraph.
  • The above nine features may be extracted from each paragraph in the to-be-evaluated article.
  • Roles played by the features upon measuring the value of the article are shown in Table 1.
  • The evaluating unit 403 may input the extracted features into the value-scoring model to obtain a score of the to-be-evaluated article output by the value-scoring model. The higher the score is, the larger the value of the article is.
  • Then, the evaluating unit 403 may further compare the score with a preset threshold, and determine whether the to-be-evaluated article is a high-quality article or a low-quality article.
  • For example, if the score is larger than the threshold, it may be determined that the to-be-evaluated article is a high-quality article, otherwise it is a low-quality article.
  • In addition, the evaluating unit 403 may further obtain M preset low-quality article features, M being a positive integer. If the to-be-evaluated article has any of the low-quality article features, the to-be-evaluated article is determined as the low-quality article.
  • Which specific features are regarded as low-quality article features may depend on actual situations, for example, the low-quality article features may include one or any combination of the following:
  • Feature A: repetitious content in the article exceeds a predetermined threshold;
  • Feature B: the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
  • Feature C: the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
  • Feature D: the article includes a case that the expression is incomplete;
  • Feature E: a typographic error exists in the article
  • Regarding Feature A, if a lot of repetitious content exists in the article, for example, the content of the title is repeatedly mentioned in many paragraphs, when the number of repetition reaches a certain degree, the article may be regarded as the low-quality article.
  • Regarding feature B, if the number of characters in the article is too small and there is no picture, the article may be regarded as the low-quality article.
  • Regarding Feature C, if the article includes a case that a paragraph contains too many characters, the article may be regarded as the low-quality article.
  • Regarding Feature D, if the title or text of the article includes a case in which the expression is incomplete, for example, “** star will show up at . . . today (Chinese expression: **
    Figure US20180349734A1-20181206-P00002
    . . . )”, the article may be regarded as the low-quality article.
  • Regarding Feature E, if a typographic error appears in the title or text of the article, the article may be regarded as the low-quality article.
  • Regarding the to-be-evaluated article, if it has any one of Features A-E, it may be regarded as the low-quality article.
  • To facilitate expression, the above manner of determining whether the to-be-evaluated article is a high-quality article or low-quality article according to the score is called the first evaluation manner, and the above manner of determining whether the to-be-evaluated article is a high-quality article or low-quality article according to low-quality article features is called the second evaluation manner.
  • In practical application, it is feasible to use the first evaluation manner and second evaluation manner in combination, namely, evaluate the value of the article based on two dimensions: expression of the content of the article and content depth. Specific combination manners are not limited. For example, regarding the to-be-evaluated article, if it is determined as the low-quality article in both the first evaluation manner and second evaluation manner, it is believed that the to-be-evaluated article is the low-quality article. Alternatively, after the to-be-evaluated article is determined as the high-quality article in the first evaluation manner, the second evaluation manner is further employed to determine whether the to-be-evaluated article is the high-quality article or low-quality article, if the article is determined as the low-quality article, it is believed that the to-be-evaluated article is the low-quality article, otherwise is the high-quality article.
  • Reference may be made to corresponding depictions in the aforesaid method embodiment for a specific workflow of the apparatus embodiment shown in FIG. 4. The workflow is not detailed any more
  • As compared with the prior art, the above embodiment provides an effective evaluation manner of the value of the article; furthermore, the extracted features can accurately and visually reflect the level of quality of the article, thereby improving the accuracy of the evaluation result, and a better training effect may be obtained by using less training data. In addition, the two evaluation manners may be combined flexibly to facilitate flexible adjustment according to actual needs.
  • It is feasible to, upon information distribution, minimize the number of low-quality articles but increase the number of high-quality articles, to enable the user to obtain more high-quality resources, encourage creation of high-quality articles while enhancing the user's experience, and thereby create a healthy ecology of internet content.
  • FIG. 5 illustrates a block diagram of an example computer system/server 12 adapted to implement an implementation mode of the present disclosure. The computer system/server 12 shown in FIG. 5 is only an example and should not bring about any limitation to the function and scope of use of the embodiments of the present disclosure.
  • As shown in FIG. 5, the computer system/server 12 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors (processing units) 16, a memory 28, and a bus 18 that couples various system components including system memory 28 and the processor 16.
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
  • Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.
  • Memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown in FIG. 5 and typically called a “hard drive”). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each drive can be connected to bus 18 by one or more data media interfaces. The memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the present disclosure.
  • Program/utility 40, having a set (at least one) of program modules 42, may be stored in the system memory 28 by way of example, and not limitation, as well as an operating system, one or more disclosure programs, other program modules, and program data. Each of these examples or a certain combination thereof might include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments of the present disclosure.
  • Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; with one or more devices that enable a user to interact with computer system/server 12; and/or with any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted in FIG. 5, network adapter 20 communicates with the other communication modules of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software modules could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
  • The processor 16 executes various function applications and data processing by running programs stored in the memory 28, for example, implement the method in the embodiments shown in FIG. 1, namely, mine high-quality articles and low-quality articles as training data, obtain a value-scoring model by training according to the training data, perform feature extraction for a to-be-evaluated article, and determine a score of the to-be-evaluated article based on the extracted features and the value-scoring model.
  • Reference may be made to related depictions in the above embodiments for specific implementations, which will not be detailed any more.
  • The present disclosure meanwhile provides a computer-readable storage medium on which a computer program is stored, the program, when executed by the processor, implementing the method stated in the embodiment shown in FIG. 1.
  • The computer-readable medium of the present embodiment may employ any combinations of one or more computer-readable media. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the text herein, the computer readable storage medium can be any tangible medium that include or store programs for use by an instruction execution system, apparatus or device or a combination thereof.
  • The computer-readable signal medium may be included in a baseband or serve as a data signal propagated by part of a carrier, and it carries a computer-readable program code therein. Such propagated data signal may take many forms, including, but not limited to, electromagnetic signal, optical signal or any suitable combinations thereof. The computer-readable signal medium may further be any computer-readable medium besides the computer-readable storage medium, and the computer-readable medium may send, propagate or transmit a program for use by an instruction execution system, apparatus or device or a combination thereof.
  • The program codes included by the computer-readable medium may be transmitted with any suitable medium, including, but not limited to radio, electric wire, optical cable, RF or the like, or any suitable combination thereof.
  • Computer program code for carrying out operations disclosed herein may be written in one or more programming languages or any combination thereof. These programming languages include an object oriented programming language such as Java, Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • In the embodiments provided by the present disclosure, it should be understood that the revealed apparatus and method can be implemented in other ways. For example, the above-described embodiments for the apparatus are only exemplary, e.g., the division of the units is merely logical one, and, in reality, they can be divided in other ways upon implementation.
  • The units described as separate parts may be or may not be physically separated, the parts shown as units may be or may not be physical units, i.e., they can be located in one place, or distributed in a plurality of network units. One can select some or all the units to achieve the purpose of the embodiment according to the actual needs.
  • Further, in the embodiments of the present disclosure, functional units can be integrated in one processing unit, or they can be separate physical presences; or two or more units can be integrated in one unit. The integrated unit described above can be implemented in the form of hardware, or they can be implemented with hardware plus software functional units.
  • The aforementioned integrated unit in the form of software function units may be stored in a computer readable storage medium. The aforementioned software function units are stored in a storage medium, including several instructions to instruct a computer device (a personal computer, server, or network equipment, etc.) or processor to perform some steps of the method described in the various embodiments of the present disclosure. The aforementioned storage medium includes various media that may store program codes, such as U disk, removable hard disk, Read-Only Memory (ROM), a Random Access Memory (RAM), magnetic disk, or an optical disk.
  • What are stated above are only preferred embodiments of the present disclosure and not intended to limit the present disclosure. Any modifications, equivalent substitutions and improvements made within the spirit and principle of the present disclosure all should be included in the extent of protection of the present disclosure.

Claims (21)

What is claimed is:
1. A method for evaluating article value based on artificial intelligence, comprising:
mining high-quality articles and low-quality articles as training data, and training according to the training data to obtain a value-scoring model;
performing feature extraction for a to-be-evaluated article;
determining a score of the to-be-evaluated article based on extracted features and the value-scoring model.
2. The method according to claim 1, wherein
mining the training data comprises:
mining the training data according to manually-annotated information, user's feedback behaviors, and preset mining rules.
3. The method according to claim 2, wherein
the mining the training data according to manually-annotated information, user's feedback behaviors, and preset mining rules comprises:
regarding articles corresponding to manually-annotated high-quality content sources as high-quality articles, and adding the articles into the training data;
adding high-quality articles and low-quality articles determined according to user's feedback behaviors, into the training data;
regarding articles having preset low-quality article features as low-quality articles, and adding the articles into the training data.
4. The method according to claim 1, wherein
the performing feature extraction for a to-be-evaluated article comprises:
extracting one or any combination of the following features respectively with respect to each paragraph in the to-be-evaluated article:
relevance between the paragraph and a title of the to-be-evaluated article;
relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
the number of newly-added words in the paragraph;
the total number of words in the paragraph;
whether the paragraph begins with a subtitle;
the number of pictures in the paragraph;
the number of sentences in the paragraph;
an average length of sentences in the paragraph;
the number of pronouns in the paragraph.
5. The method according to claim 1, wherein
the method further comprises:
comparing the score with a preset threshold, and determining whether the to-be-evaluated article is a high-quality article or a low-quality article.
6. The method according to claim 5, wherein
the method further comprises:
obtaining M preset low-quality article features, M being a positive integer;
if the to-be-evaluated article has any low-quality article feature, determining the to-be-evaluated article as a low-quality article.
7. The method according to claim 6, wherein
the low-quality article features include one or any combination of the following:
the number of repetitious content in the article exceeds a predetermined threshold;
the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
the article includes a case that the expression is incomplete;
a typographic error exists in the article.
8. A computer device, comprising a memory, a processor and a computer program which is stored on the memory and runs on the processor, wherein the processor, upon executing the program, implements the following operation:
mining high-quality articles and low-quality articles as training data, and training according to the training data to obtain a value-scoring model;
performing feature extraction for a to-be-evaluated article;
determining a score of the to-be-evaluated article based on extracted features and the value-scoring model.
9. The computer device according to claim 8, wherein
mining the training data comprises:
mining the training data according to manually-annotated information, user's feedback behaviors, and preset mining rules.
10. The computer device according to claim 9, wherein
the mining the training data according to manually-annotated information, user's feedback behaviors, and preset mining rules comprises:
regarding articles corresponding to manually-annotated high-quality content sources as high-quality articles, and adding the articles into the training data;
adding high-quality articles and low-quality articles determined according to user's feedback behaviors, into the training data;
regarding articles having preset low-quality article features as low-quality articles, and adding the articles into the training data.
11. The computer device according to claim 8, wherein
the performing feature extraction for a to-be-evaluated article comprises:
extracting one or any combination of the following features respectively with respect to each paragraph in the to-be-evaluated article:
relevance between the paragraph and a title of the to-be-evaluated article;
relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
the number of newly-added words in the paragraph;
the total number of words in the paragraph;
whether the paragraph begins with a subtitle;
the number of pictures in the paragraph;
the number of sentences in the paragraph;
an average length of sentences in the paragraph;
the number of pronouns in the paragraph.
12. The computer device according to claim 8, wherein
the operation further comprises:
comparing the score with a preset threshold, and determining whether the to-be-evaluated article is a high-quality article or a low-quality article.
13. The computer device according to claim 12, wherein
the operation further comprises:
obtaining M preset low-quality article features, M being a positive integer;
if the to-be-evaluated article has any low-quality article feature, determining the to-be-evaluated article as a low-quality article.
14. The computer device according to claim 13, wherein
the low-quality article features include one or any combination of the following:
the number of repetitious content in the article exceeds a predetermined threshold;
the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
the article includes a case that the expression is incomplete;
a typographic error exists in the article.
15. A computer-readable storage medium on which a computer program is stored, wherein the program, when executed by a processor, implements the following operation:
mining high-quality articles and low-quality articles as training data, and training according to the training data to obtain a value-scoring model;
performing feature extraction for a to-be-evaluated article;
determining a score of the to-be-evaluated article based on extracted features and the value-scoring model.
16. The computer-readable storage medium according to claim 15, wherein
mining the training data comprises:
mining the training data according to manually-annotated information, user's feedback behaviors, and preset mining rules.
17. The computer-readable storage medium according to claim 16, wherein
the mining the training data according to manually-annotated information, user's feedback behaviors, and preset mining rules comprises:
regarding articles corresponding to manually-annotated high-quality content sources as high-quality articles, and adding the articles into the training data;
adding high-quality articles and low-quality articles determined according to user's feedback behaviors, into the training data;
regarding articles having preset low-quality article features as low-quality articles, and adding the articles into the training data.
18. The computer-readable storage medium according to claim 15, wherein
the performing feature extraction for a to-be-evaluated article comprises:
extracting one or any combination of the following features respectively with respect to each paragraph in the to-be-evaluated article:
relevance between the paragraph and a title of the to-be-evaluated article;
relevance between the paragraph and a preceding neighboring paragraph of the paragraph;
the number of newly-added words in the paragraph;
the total number of words in the paragraph;
whether the paragraph begins with a subtitle;
the number of pictures in the paragraph;
the number of sentences in the paragraph;
an average length of sentences in the paragraph;
the number of pronouns in the paragraph.
19. The computer-readable storage medium according to claim 15, wherein
the operation further comprises:
comparing the score with a preset threshold, and determining whether the to-be-evaluated article is a high-quality article or a low-quality article.
20. The computer-readable storage medium according to claim 19, wherein
the operation further comprises:
obtaining M preset low-quality article features, M being a positive integer;
if the to-be-evaluated article has any low-quality article feature, determining the to-be-evaluated article as a low-quality article.
21. The computer-readable storage medium according to claim 20, wherein
the low-quality article features include one or any combination of the following:
the number of repetitious content in the article exceeds a predetermined threshold;
the number of characters in the article is less than a predetermined threshold, and the article does not contain a picture;
the article includes a paragraph in which the number of characters exceeds a predetermined threshold;
the article includes a case that the expression is incomplete;
a typographic error exists in the article.
US16/001,111 2017-06-06 2018-06-06 Method and apparatus for evaluating article value based on artificial intelligence, and storage medium Active 2041-08-26 US11481572B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201710417749.XA CN107193805B (en) 2017-06-06 2017-06-06 Article value evaluation method and device based on artificial intelligence and storage medium
CN201710417749.X 2017-06-06
CN201710417749X 2017-06-06

Publications (2)

Publication Number Publication Date
US20180349734A1 true US20180349734A1 (en) 2018-12-06
US11481572B2 US11481572B2 (en) 2022-10-25

Family

ID=59877005

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/001,111 Active 2041-08-26 US11481572B2 (en) 2017-06-06 2018-06-06 Method and apparatus for evaluating article value based on artificial intelligence, and storage medium

Country Status (2)

Country Link
US (1) US11481572B2 (en)
CN (1) CN107193805B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024036939A1 (en) * 2022-08-17 2024-02-22 东南大学 Method, system and apparatus for evaluating adaptability reuse of existing residential building

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107910066A (en) * 2017-11-13 2018-04-13 医渡云(北京)技术有限公司 Case history appraisal procedure, device, electronic equipment and storage medium
CN108090127B (en) * 2017-11-15 2021-02-12 北京百度网讯科技有限公司 Method and device for establishing question and answer text evaluation model and evaluating question and answer text
CN108805332B (en) * 2018-05-07 2022-12-02 北京奇艺世纪科技有限公司 Feature evaluation method and device
CN110555198B (en) * 2018-05-31 2023-05-23 北京百度网讯科技有限公司 Method, apparatus, device and computer readable storage medium for generating articles
CN110555579B (en) * 2018-06-01 2022-09-23 佛山市顺德区美的电热电器制造有限公司 Cooking grading method, intelligent cooking equipment, server and storage medium
CN109543090A (en) * 2018-08-07 2019-03-29 宜人恒业科技发展(北京)有限公司 A kind of method and apparatus for evaluating web documents
CN110889274B (en) * 2018-08-17 2022-02-08 北大方正集团有限公司 Information quality evaluation method, device, equipment and computer readable storage medium
CN109582953B (en) * 2018-11-02 2023-04-07 中国科学院自动化研究所 Data support scoring method and equipment for information and storage medium
CN109614537A (en) * 2018-12-06 2019-04-12 北京百度网讯科技有限公司 For generating the method, apparatus, equipment and storage medium of video
CN109635087A (en) * 2018-12-12 2019-04-16 广东小天才科技有限公司 A kind of composition methods of marking and private tutor's equipment
CN109829165A (en) * 2019-02-11 2019-05-31 杭州乾博科技有限公司 One kind is from media article Valuation Method and system
CN110175774A (en) * 2019-05-24 2019-08-27 中译语通科技股份有限公司 Document value appraisal procedure and device
CN110162797B (en) * 2019-06-21 2023-04-07 北京百度网讯科技有限公司 Article quality detection method and device
CN110378396A (en) * 2019-06-26 2019-10-25 北京百度网讯科技有限公司 Sample data mask method, device, computer equipment and storage medium
CN110334356B (en) * 2019-07-15 2023-08-04 腾讯科技(深圳)有限公司 Article quality determining method, article screening method and corresponding device
CN111192602A (en) * 2019-12-03 2020-05-22 广州荔支网络技术有限公司 White noise audio content value evaluation method based on audio content portrait system
CN111193795B (en) * 2019-12-30 2021-07-02 腾讯科技(深圳)有限公司 Information pushing method and device, electronic equipment and computer readable storage medium
CN111368081A (en) * 2020-03-03 2020-07-03 支付宝(杭州)信息技术有限公司 Method and system for determining selected text content
CN111461785A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Content value attribute evaluation method and device and copyright trading platform
CN111488931B (en) * 2020-04-10 2023-04-07 腾讯科技(深圳)有限公司 Article quality evaluation method, article recommendation method and corresponding devices
CN111858905B (en) * 2020-07-20 2024-05-07 北京百度网讯科技有限公司 Model training method, information identification device, electronic equipment and storage medium
CN112115703B (en) * 2020-09-03 2023-10-17 腾讯科技(深圳)有限公司 Article evaluation method and device
CN113536769A (en) * 2021-07-21 2021-10-22 深圳证券信息有限公司 Text conciseness and clarity evaluation method and related equipment
US11853688B2 (en) * 2022-03-04 2023-12-26 Adobe Inc. Automatic detection and removal of typesetting errors in electronic documents

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6181909B1 (en) * 1997-07-22 2001-01-30 Educational Testing Service System and method for computer-based automatic essay scoring
TW579470B (en) * 2002-07-26 2004-03-11 Inst Information Industry Chinese article evaluation method and system and computer reading medium
US9836455B2 (en) * 2011-02-23 2017-12-05 New York University Apparatus, method and computer-accessible medium for explaining classifications of documents
CN102779220A (en) * 2011-05-10 2012-11-14 李德霞 English test paper scoring system
CN102279844A (en) * 2011-08-31 2011-12-14 中国科学院自动化研究所 Method and system for automatically testing Chinese composition
CN103634473B (en) * 2013-12-05 2016-03-23 南京理工大学连云港研究院 Based on mobile phone method for filtering spam short messages and the system of Naive Bayes Classification
CN104021075A (en) * 2014-05-22 2014-09-03 小米科技有限责任公司 Method and device for evaluating program codes
AU2016102425A4 (en) * 2015-04-28 2019-10-24 Red Marker Pty Ltd Device, process and system for risk mitigation
CN108280065B (en) * 2017-01-05 2021-12-14 广州讯飞易听说网络科技有限公司 Foreign text evaluation method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024036939A1 (en) * 2022-08-17 2024-02-22 东南大学 Method, system and apparatus for evaluating adaptability reuse of existing residential building

Also Published As

Publication number Publication date
CN107193805B (en) 2021-05-14
US11481572B2 (en) 2022-10-25
CN107193805A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
US11481572B2 (en) Method and apparatus for evaluating article value based on artificial intelligence, and storage medium
US11645554B2 (en) Method and apparatus for recognizing a low-quality article based on artificial intelligence, device and medium
US11550998B2 (en) Method and apparatus for generating a competition commentary based on artificial intelligence, and storage medium
CN107076567B (en) Method and device for image question answering
US10891427B2 (en) Machine learning techniques for generating document summaries targeted to affective tone
US20190095758A1 (en) Method and system for obtaining picture annotation data
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
CN108090127B (en) Method and device for establishing question and answer text evaluation model and evaluating question and answer text
CN110234018B (en) Multimedia content description generation method, training method, device, equipment and medium
CN108090043B (en) Error correction report processing method and device based on artificial intelligence and readable medium
US20190114937A1 (en) Grouping users by problematic objectives
CN109618236B (en) Video comment processing method and device
CN111325020A (en) Event argument extraction method and device and electronic equipment
JP7223056B2 (en) Image screening method, device, electronic device and storage medium
US11397852B2 (en) News interaction method, apparatus, device and computer storage medium
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
US10541884B2 (en) Simulating a user score from input objectives
US9830533B2 (en) Analyzing and exploring images posted on social media
US20190114346A1 (en) Optimizing user time and resources
CN111614986A (en) Bullet screen generation method, system, equipment and storage medium based on online education
US20200038748A1 (en) Providing content
Rony et al. Climate Bot: A Machine Reading Comprehension System for Climate Change Question Answering.
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN111079489A (en) Content identification method and electronic equipment
CN110866393B (en) Resume information extraction method and system based on domain knowledge base

Legal Events

Date Code Title Description
AS Assignment

Owner name: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUANG, BO;LI, DAREN;SHE, QIAOQIAO;REEL/FRAME:045999/0075

Effective date: 20180605

Owner name: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., L

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUANG, BO;LI, DAREN;SHE, QIAOQIAO;REEL/FRAME:045999/0075

Effective date: 20180605

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE