CN112836487A - Automatic comment method and device, computer equipment and storage medium - Google Patents

Automatic comment method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112836487A
CN112836487A CN202110169250.8A CN202110169250A CN112836487A CN 112836487 A CN112836487 A CN 112836487A CN 202110169250 A CN202110169250 A CN 202110169250A CN 112836487 A CN112836487 A CN 112836487A
Authority
CN
China
Prior art keywords
news
article
comment
topic
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110169250.8A
Other languages
Chinese (zh)
Other versions
CN112836487B (en
Inventor
陈涵宇
高登科
李少博
余伟
徐桢虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Cover Media Co ltd
Original Assignee
Sichuan Cover Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Cover Media Co ltd filed Critical Sichuan Cover Media Co ltd
Priority to CN202110169250.8A priority Critical patent/CN112836487B/en
Publication of CN112836487A publication Critical patent/CN112836487A/en
Application granted granted Critical
Publication of CN112836487B publication Critical patent/CN112836487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of information interaction, and discloses an automatic comment method, an automatic comment device, computer equipment and a storage medium. In addition, when no matched news exists or the recommendation index value does not meet the requirement, deep matching search can be performed from the dimensions of the topic and event triples through the comment template library, reusability of comments is greatly expanded, further more news without open comment data can be commented and filled, and a multi-dimensional and high-accuracy comment template library can be constructed for high-quality comments.

Description

Automatic comment method and device, computer equipment and storage medium
Technical Field
The invention belongs to the technical field of information interaction, and particularly relates to an automatic commenting method and device, computer equipment and a storage medium.
Background
The comment is the most common information interaction mode on each large Internet platform at present. The quality of the comments is high, the quantity of the comments often determines the overall liveness of the product, and particularly, the quality of the comments can improve the viscosity of users, increase the interaction of the users and create a good atmosphere, so that the retention of product relationships is enhanced, and social relationships are created. Therefore, automatic review has been attracting attention as an important method for increasing the number of reviews in the early stage of products.
The existing automatic comment method mainly adopts template generation, and although a large number of comments can be easily generated, the method lacks emotional expression and thinking logic capability, so that the answer dialog is single and rigid, and a large amount of manual participation is required. Automatic comment based on a generative algorithm has the problems of uncontrollable generated content and low accuracy, so that the usability is low.
Disclosure of Invention
The invention aims to provide an automatic comment method, an automatic comment device, computer equipment and a storage medium, aiming at solving the problems that template generation is too dependent on manual work, emotion and thinking expression are insufficient, generation content is uncontrollable and accuracy is low in the existing automatic comment method.
In a first aspect, the present invention provides an automatic comment method, including:
the method comprises the steps that according to the title of an article to be commented, at least one piece of news and comment information is obtained through web crawling, wherein the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
semantic coding is carried out on the article to be commented on the text dimension, the picture dimension and the video dimension, and an article text semantic vector, an article picture semantic vector and an article video semantic vector are obtained;
semantic coding is respectively carried out on the news in each piece of news and comment information on a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information, and a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news are obtained;
for the at least one piece of news and comment information, importing the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a first matching detection result of each news and the article to be commented;
if the first matching detection result contains at least one matching news matched with the to-be-evaluated paper chapter, weighting and calculating corresponding comment reply number, comment praise number and news source weight coefficient aiming at each comment content of the at least one matching news to obtain a corresponding recommendation index value;
and determining the comment content corresponding to the maximum recommendation index value as the automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
Based on the invention content, an automatic comment scheme based on the whole-network comment data is provided, namely, from the public whole-network data, publicly available comment data is obtained by utilizing a real-time capture technology, and then the best comment content is directly found from the comment data for automatic comment by combining a deep semantic matching technology, so that the output comment quality can be controlled, the manual participation is greatly reduced, the reliability is high, and the topical requirement that news comments need to be added in a real scene can be well met.
In one possible design, after obtaining a first matching detection result of each news and the article to be reviewed, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news matching the article to be reviewed, the method further includes:
obtaining a triple of topics and/or news events of the article to be commented;
according to the topic and/or news event triples of the article to be commented, at least one existing comment template similar to the article to be commented on in topic similar dimension and/or news event triples similar dimension is searched from a comment template library, wherein the comment template library stores a plurality of existing comment templates, each existing comment template comprises template comment content and the topic and/or news event triples bound with the template comment content, and the template comment content comprises at least one slot to be filled, which corresponds to at least one comment entity candidate word one by one;
performing entity extraction on the article to be commented to obtain at least one article entity candidate word;
for the at least one existing comment template, filling pairwise matching slots of the at least one article entity candidate word and the at least one comment entity candidate word in each existing comment template to obtain corresponding new comment content;
for at least one new comment content, applying a DNN language model to score each new comment content to obtain a score value of each new comment content;
and determining the new comment content corresponding to the highest score value as the automatic comment content of the article to be commented, wherein the highest score value is the highest value in the score values of all the new comment contents.
Based on the possible design, when no matched news exists or the recommendation index value does not meet the requirement, deep matching search can be performed from the dimensions of the topic and event triples through the comment template library, reusability of comments is greatly expanded, and further more news without open comment data can be commented and filled, so that high-availability, high-reusability and high-topicality automatic news comments are achieved.
In one possible design, obtaining topics of the article to be commented includes:
crawling at least one topic and news information of a whole network news source in real time based on a web crawler technology, wherein the topic and news information comprises one topic and at least one topic news belonging to the topic;
performing bad audit processing including negative information filtering processing, sensitive information filtering processing and false information filtering processing on all the crawled topics and news information to obtain at least one piece of compliant topic and news information;
performing semantic coding on each topic and news information in a text dimension, a picture dimension and a video dimension respectively aiming at the at least one piece of compliant topic and news information to obtain topic text semantic vectors, topic picture semantic vectors and topic video semantic vectors of each topic and news information, and then performing de-duplication processing on the at least one piece of compliant topic and news information in a text similar dimension, a picture similar dimension and a video similar dimension according to the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all the topic and news information to obtain at least one piece of non-repetitive topic and news information;
aiming at the at least one non-repeated topic and news information, respectively obtaining topic mapping vectors and topic distribution of each topic and news information according to at least one topic news belonging to the topic, then performing semantic clustering fusion on the at least one non-repeated topic and news information according to the topic mapping vectors, the topic distribution, the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the topics and news information to obtain at least one fused topic, at least one topic heat weight value and a one-to-one correspondence relationship between the at least one fused topic and the at least one topic heat weight value, and finally determining at least one hot topic from the at least one fused topic according to the magnitude degree of the at least one topic heat weight value;
aiming at the at least one hot topic, importing the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the hot topics, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a second deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a second matching detection result of each hot topic and the article to be commented;
and taking the hot topic in the second matching detection result and matched with the chapter of the paper to be commented as the topic of the article to be commented.
Based on the possible design, the deep fusion clustering of the topics can be performed from the perspective of multi-modes such as texts, pictures and videos based on the topics and news belonging to the topics, by combining a topic model and semantic vectors, and the multi-mode-based matching of news and topics and the matching of the articles to be evaluated and the topics are realized by utilizing deep learning.
In one possible design, obtaining a news event triple of the article to be commented on includes:
performing sentence division processing on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, respectively extracting open domain events of the article sentences by applying a DMCNN event extraction algorithm to obtain sentence event triples of the article sentences;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
and carrying out weighted sequencing on the multiple sentence event triples according to the sentence important index values of the corresponding article sentences, and taking the sentence event triplet with the highest sequencing as the news event triplet of the article to be commented.
Based on the possible design, the event extraction task can be subjected to combined training based on the deep learning model, error transmission caused by multi-stage tasks of the traditional method is reduced, and the modeling capability of long-distance dependence is improved by adding the pre-training model and the attention layer.
In one possible design, performing entity extraction on the article to be commented to obtain at least one article entity candidate word, including:
performing sentence division processing on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
aiming at the article sentences, entity extraction is carried out on each article sentence by combining a dictionary and a deep learning model, and confidence scores are carried out on the extraction results by adopting a weighted score method to obtain a plurality of sentence entities and corresponding confidence scores;
and for the plurality of sentence entities, performing inverted arrangement of weighted scores according to the corresponding confidence scores and the sentence importance index values of the article sentences, and taking at least one sentence entity ranked in the front as the at least one article entity candidate word.
In one possible design, after determining the review content corresponding to the maximum recommendation index value as the automatic review content of the article to be reviewed, the method further includes:
determining the comment content corresponding to the maximum recommendation index value as target comment content, and determining the news corresponding to the maximum recommendation index value as target news;
performing entity extraction on the target comment content to obtain at least one new comment entity candidate word;
converting the target comment content into new template comment content in which the slot to be filled and the at least one new comment entity candidate word are in one-to-one correspondence;
obtaining a topic and/or news event triple of the target news;
binding and storing the new template comment content and the topic of the target news and/or the news event triple in a new comment template;
and adding the new comment template into the comment template library.
Based on the possible design, a comment template can be generated by entity extraction aiming at high-quality comments, and meanwhile, deeper dimensional information of news is mined by utilizing a topic fusion and matching technology and an event discovery technology, so that a multi-dimensional and high-accuracy comment template library is constructed.
In one possible design, the method for obtaining at least one piece of news and comment information through web crawling according to the title of an article to be commented comprises the following steps:
carrying out quantitative analysis on comment quality dimension, comment quantity dimension and comment interaction dimension aiming at different news sources in the whole network news sources to obtain weight coefficients of the different news sources;
for each news source in the whole network news sources, dynamically distributing captured resources based on the weight coefficient of the news source, and obtaining at least one piece of original news and comment information from the news source through a dynamic real-time crawling algorithm;
aiming at all the original news and comment information obtained by crawling, applying a first bert pre-training model to respectively judge the emotional polarity of each piece of news and comment information, and then filtering information corresponding to a negative judgment result to obtain at least one piece of non-negative news and comment information;
aiming at the at least one piece of non-negative news and comment information, respectively carrying out sensitive word detection on each piece of news and comment information by applying a sensitive information detection algorithm based on a dictionary, pinyin, special-shaped words and/or a deep learning model, and then filtering out information containing sensitive words to obtain at least one piece of insensitive news and comment information;
aiming at the at least one piece of insensitive news and comment information, respectively carrying out false information judgment on each piece of news and comment information by applying a false information judgment algorithm based on rules, a knowledge map and/or a deep learning model, and then filtering information corresponding to a false judgment result to obtain at least one piece of compliant news and comment information;
and for the at least one piece of compliant news and comment information, sequentially performing dirty data cleaning processing and duplicate removal processing for the same news source to obtain the at least one piece of news and comment information.
Based on the possible design, in the process of capturing the comment data of the whole network, the news source of the whole network is subjected to quantitative analysis, so that the reliability of the follow-up recommended index value can be guaranteed, efficient distribution of crawler resources is facilitated, meanwhile, negative, sensitive and false information can be filtered through bad information audit on all news and comment information, and the high quality of the automatic comment content is greatly guaranteed.
The invention provides an automatic comment device, which comprises a full-network information crawling module, an article semantic coding module, a news semantic coding module, an article matching detection module, a recommendation index calculation module and an automatic comment content determination module;
the system comprises a global network information crawling module, a global network information crawling module and a global network information crawling module, wherein the global network information crawling module is used for crawling at least one piece of news and comment information in a global network according to the title of an article to be commented, and the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
the article semantic coding module is in communication connection with the full-network information crawling module and is used for performing semantic coding on the article to be commented on a text dimension, a picture dimension and a video dimension to obtain an article text semantic vector, an article picture semantic vector and an article video semantic vector;
the news semantic coding module is used for performing semantic coding on the news in each piece of news and comment information in a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information respectively to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
the article matching detection module is respectively in communication connection with the article semantic coding module and the news semantic coding module, and is used for importing, for the at least one piece of news and comment information, the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, so as to obtain a first matching detection result of each piece of news and the article to be commented;
the recommendation index calculation module is communicatively connected to the article matching detection module and configured to, when the first matching detection result includes at least one matching news matching the to-be-evaluated article, perform weighted calculation on corresponding review reply numbers, review praise numbers and news source weight coefficients of the at least one matching news, so as to obtain corresponding recommendation index values;
the automatic comment content determining module is in communication connection with the recommendation index calculating module and is used for determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
In a third aspect, the present invention provides a computer device comprising a memory, a processor and a transceiver communicatively connected in sequence, wherein the memory is used for storing a computer program, the transceiver is used for transmitting and receiving information, and the processor is used for reading the computer program and executing the automatic comment method according to the first aspect or any possible design.
In a fourth aspect, the present invention provides a storage medium having stored thereon instructions which, when run on a computer, perform an automatic review method as described in the first aspect above or any possible design.
In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the automatic review method of the first aspect or any possible design above.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart diagram of an automatic review method provided by the present invention.
Fig. 2 is a schematic structural diagram of an automatic comment device provided by the present invention.
Fig. 3 is a schematic structural diagram of a computer device provided by the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. Specific structural and functional details disclosed herein are merely representative of exemplary embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of exemplary embodiments of the present invention.
It should be understood that, for the term "and/or" as may appear herein, it is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time; for the term "/and" as may appear herein, which describes another associative object relationship, it means that two relationships may exist, e.g., a/and B, may mean: a exists independently, and A and B exist independently; in addition, for the character "/" that may appear herein, it generally means that the former and latter associated objects are in an "or" relationship.
It will be understood that when an element is referred to herein as being "connected," "connected," or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Conversely, if a unit is referred to herein as being "directly connected" or "directly coupled" to another unit, it is intended that no intervening units are present. In addition, other words used to describe the relationship between elements should be interpreted in a similar manner (e.g., "between … …" versus "directly between … …", "adjacent" versus "directly adjacent", etc.).
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used herein, specify the presence of stated features, quantities, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, quantities, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative designs, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
It should be understood that specific details are provided in the following description to facilitate a thorough understanding of example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, systems may be shown in block diagrams in order not to obscure the examples in unnecessary detail. In other instances, well-known processes, structures and techniques may be shown without unnecessary detail in order to avoid obscuring example embodiments.
As shown in fig. 1, the automatic review method provided in the first aspect of the present embodiment may be, but is not limited to being, executed by a computer device with certain computing resources, for example, by a platform server for publishing news/articles, or by a terminal device for reading news/articles. The automatic review method may include, but is not limited to, the following steps S101 to S106.
S101, according to the title of an article to be commented, crawling to at least one piece of news and comment information in a whole network, wherein the news and comment information can include but is not limited to one piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content.
In the step S101, it is optimized that at least one piece of news and comment information is obtained through web crawling according to the title of the article to be commented, and the steps S1011 to S1016 may be included, but are not limited to the following steps.
S1011, aiming at different news sources in the whole network news source, carrying out quantitative analysis on the comment quality dimension, the comment quantity dimension and the comment interaction dimension to obtain the weight coefficients of the different news sources.
In step S1011, the news sources in the whole network are specifically news portals in the whole network, and when the portals are combed, the news sources may be scored and weighted in the comment quality dimension, the comment quantity dimension, and the comment interaction dimension, and then the weighting coefficients of different news sources are obtained through conventional quantitative analysis.
S1012, aiming at each news source in the whole network news sources, dynamically distributing the captured resources based on the weight coefficient of the news source, and obtaining at least one piece of original news and comment information from the news source through a dynamic real-time crawling algorithm.
In the step S1012, the original news and comment information may include, but is not limited to, a piece of news having the same title as the article to be commented, at least one piece of comment content belonging to the news, and comment like number and comment return number corresponding to the comment content.
S1013, applying a first bert pre-training model to respectively judge the emotional polarity of each piece of news and comment information according to all the original news and comment information obtained through crawling, and then filtering information corresponding to a negative judgment result to obtain at least one piece of non-negative news and comment information.
In the step S1013, bert is called Bidirectional Encoder reproduction from transforms, and is a pre-training model proposed in Google2018, that is, an Encoder of Bidirectional Transformer, because decoder cannot obtain information to be predicted, the main innovation point of the model is on the pre-train method, that is, word and Sentence level Representation is captured by using Masked LM and Next sequence Prediction, respectively. The specific manner of filtering out the information corresponding to the negative determination result may be, but is not limited to: if a certain news corresponds to a negative judgment result, filtering out the whole news and comment information; if a certain comment content corresponds to a negative judgment result, filtering the comment content, and the comment praise number and the comment reply number corresponding to the comment content; if all the comment contents of a certain news correspond to negative judgment results, the whole news and comment information are filtered.
S1014, aiming at the at least one piece of non-negative news and comment information, sensitive word detection is carried out on each piece of news and comment information respectively by applying a sensitive information detection algorithm based on a dictionary, a pinyin, a special-shaped character and/or a deep learning model and the like, and then information containing sensitive words is filtered out to obtain at least one piece of insensitive news and comment information.
In the step S1014, the sensitive word may be, but not limited to, a yellow word, an storm word, or an advertisement word. The specific way of filtering out the information containing the sensitive words may be, but is not limited to: if a certain news contains sensitive words, filtering out the whole news and comment information; if some comment content contains sensitive words, filtering the comment content, and the comment praise number and comment reply number corresponding to the comment content; if all the comment contents of a certain news contain sensitive words, the whole news and comment information are filtered.
And S1015, aiming at the at least one piece of insensitive news and comment information, applying a false information discrimination algorithm based on rules, knowledge maps and/or deep learning models to discriminate false information of each piece of news and comment information respectively, and then filtering information corresponding to a false discrimination result to obtain the at least one piece of compliant news and comment information.
In step S1015, the specific manner of filtering out the information corresponding to the false determination result may be, but is not limited to: if a certain news corresponds to a false judgment result, filtering out the whole news and comment information; if a certain comment content corresponds to a false judgment result, filtering the comment content, and the comment praise number and the comment reply number corresponding to the comment content; if all the comment contents of a certain news correspond to the false judgment results, the whole news and comment information are filtered.
And S1016, for the at least one piece of compliant news and comment information, sequentially performing dirty data cleaning processing and duplicate removal processing for the same news source to obtain the at least one piece of news and comment information.
S102, semantic coding is carried out on the article to be commented on the text dimension, the picture dimension and the video dimension, and an article text semantic vector, an article picture semantic vector and an article video semantic vector are obtained.
In step S102, specifically, a second bert pre-training model may be applied to map at least one text in the article to be commented to the article text semantic vector, where the text includes a title, a body, and/or a summary; mapping at least one picture in the article to be commented into the article picture semantic vector by applying a first ResNet101 (namely a fast training residual error network ResNet with 101 layers) pre-training model; and extracting at least one video key frame in the article to be commented by applying a video clustering algorithm, and then mapping the at least one video key frame in the article to be commented into the article video semantic vector by applying a second ResNet101 pre-training model.
S103, semantic coding is respectively carried out on the news in each piece of news and comment information on the text dimension, the picture dimension and the video dimension aiming at the at least one piece of news and comment information, and a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news are obtained.
In the step S103, the semantic encoding mode for the news is consistent with the to-be-evaluated paper chapter, which is not described herein again.
S104, aiming at the at least one piece of news and comment information, importing the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a first matching detection result of each news and the article to be commented.
And S105, if the first matching detection result contains at least one matching news matched with the to-be-evaluated paper chapter, weighting and calculating corresponding comment reply number, comment praise number and news source weight coefficient aiming at each comment content of the at least one matching news to obtain a corresponding recommendation index value.
S106, determining the comment content corresponding to the maximum recommendation index value as the automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
Therefore, based on the automatic comment method described in detail in the foregoing steps S101 to S106, an automatic comment scheme based on comment data on the whole network is provided, that is, from the public data on the whole network, publicly available comment data is obtained by using a real-time capture technology, and then, in combination with a deep semantic matching technology, the best comment content is directly found from the comment data for automatic comment, so that not only can the quality of output comments be controlled, but also manual participation is greatly reduced, and high reliability is achieved, and the topical demand of news comments needing to be added in real scenes can be well met. In addition, in the process of capturing the comment data of the whole network, the reliability of subsequent recommendation index values can be guaranteed and efficient distribution of crawler resources is facilitated by performing quantitative analysis on news sources of the whole network, and meanwhile, negative, sensitive and false information can be filtered by performing bad information audit on all news and comment information, so that the high quality of the automatic comment content is greatly guaranteed.
On the basis of the technical solution of the first aspect, the present embodiment further specifically provides a possible design for automatically commenting based on a comment template library, that is, after a first matching detection result of each piece of news and the article to be commented is obtained, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news matching the article to be commented, the method further includes, but is not limited to, the following steps S201 to S206.
S201, obtaining a triple of topics and/or news events of the article to be commented.
In the step S201, a specific manner of acquiring the topic of the article to be commented may be, but is not limited to, including the following steps S301 to S306.
S301, at least one topic and news information of a whole network news source are obtained in a real-time crawling mode based on a web crawler technology, wherein the topic and news information comprises one topic and at least one topic news belonging to the topic.
S302, performing bad audit processing including negative information filtering processing, sensitive information filtering processing and false information filtering processing on all the obtained topics and news information to obtain at least one piece of compliant topic and news information.
In step S302, the specific manner of performing the bad review processing on the topic and the news information can refer to the aforementioned steps S1013 to S1015, which are not described herein again.
S303, carrying out semantic coding on each topic and news information in a text dimension, a picture dimension and a video dimension aiming at the at least one piece of compliant topic and news information respectively to obtain topic text semantic vectors, topic picture semantic vectors and topic video semantic vectors of each topic and news information, and then carrying out de-duplication processing on the at least one piece of compliant topic and news information in a text similar dimension, a picture similar dimension and a video similar dimension according to the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all topic and news information to obtain at least one piece of non-duplicated topic and news information.
In the step S303, the specific manner of semantic coding the topic and the news information is consistent with the to-be-evaluated paper chapter, which is not described herein again.
S304, aiming at the at least one non-repeated topic and news information, respectively obtaining topic mapping vectors and topic distribution of each topic and news information according to at least one topic news belonging to the topic, and then performing semantic clustering fusion on the at least one non-repeated topic and news information according to the topic mapping vectors, the topic distribution, the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the topics and news information to obtain at least one fused topic, at least one topic heat weight value and a one-to-one correspondence relationship between the at least one fused topic and the at least one topic heat weight value, and finally determining at least one hot topic from the at least one fused topic according to the magnitude degree of the at least one topic heat weight value.
In the step S304, the specific topic fusion method may include, but is not limited to, the following steps S3041 to S3044.
S3041, aiming at the at least one piece of non-repetitive topic and news information, a doc2vec paragraph vector method (which is an unsupervised algorithm and can learn to obtain feature representations with fixed lengths from a text with long lengths is applied, the algorithm trains a predictive word in a document to enable the predictive word to represent each document by using a single dense vector) to respectively extract semantic vectors of at least one piece of topic news belonging to the topic in each topic and news information, and then average weighting processing is carried out on all the extracted semantic vectors of each topic and news information to obtain the topic mapping vector.
S3042, for the at least one non-repetitive topic and news information, applying a plda topic model (a model proposed by Ramage et al, which is called partial laboratory dichhere Allocation) to extract topic information of at least one topic news belonging to a topic from each topic and news information, and then performing average weighting processing on all the extracted topic information of each topic and news information to obtain topic distribution.
S3043, according to the topic mapping vectors, the topic distribution, the topic text semantic vector, the topic picture semantic vector, and the topic video semantic vector of all the topics and news information, applying a dbscan Clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, which is a very typical Density Clustering algorithm) to perform Density cluster analysis on the at least one non-repeated topic and news information, so as to obtain a one-to-one correspondence relationship between the at least one fused topic, the at least one topic heat weight value, and the at least one fused topic and the at least one topic heat weight value.
S3044, aiming at the at least one fused topic, sorting the at least one fused topic from big to small according to the corresponding topic heat weight value, and taking the fused topic sorted in the front as the hot topic.
S305, aiming at the at least one hot topic, importing the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the hot topics, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a second deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a second matching detection result of each hot topic and the article to be commented.
S306, taking the hot topic which is in the second matching detection result and is matched with the chapter of the paper to be commented as the topic of the article to be commented.
In the step S201, a specific manner of obtaining the news event triple of the article to be commented may include, but is not limited to, the following steps S401 to S405.
S401, sentence dividing processing is carried out on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence.
S402, aiming at the article sentences, applying a DMCNN event extraction algorithm to respectively extract open domain events of the article sentences to obtain sentence event triples of the article sentences.
In step S402, the DMCNN Event Extraction algorithm is an Event Extraction method based on a Dynamic Pooling (Dynamic Pooling) Convolutional Neural network model, which is an Event Extraction scheme in a pipeline manner, that is, two tasks of detecting and identifying trigger words and detecting and identifying arguments are performed separately, and the latter depends on the prediction result of the former. The execution task of the DMCNN event extraction algorithm is divided into a trigger word identification subtask and an argument identification subtask. In the argument recognition subtask, a third bert pre-training model after fine tuning can be adopted to carry out semantic coding on the article sentence to obtain an initial value of a sentence semantic vector, and an attention layer is added, so that long-distance dependence can be modeled, and the improvement is particularly obvious for the condition that the article sentence contains a plurality of events.
And S403, aiming at the article sentences, applying a second bert pre-training model to map the article sentences into corresponding sentence semantic vectors respectively.
S404, aiming at the articles and sentences, modeling and scoring the articles and sentences based on the sentence semantic vectors by applying a TextRank algorithm to obtain important sentence index values of the articles and sentences, wherein the TextRank algorithm adopts the sentence semantic vectors of the article title sentences as central vectors.
In step S404, the TextRank algorithm is a graph-based ranking algorithm for the text, and is to divide the text into a plurality of constituent units (sentences), construct a node-connected graph, calculate the TextRank value of the sentence by loop iteration using the similarity between the sentences as the weight of the edge, and finally extract the high-ranked sentences to combine into the text abstract.
S405, for a plurality of sentence event triples, carrying out weighted sequencing according to sentence importance index values of corresponding article sentences, and using the sentence event triplet with the top sequencing as a news event triplet of the article to be commented.
In step S405, since the article sentences include one article title sentence and at least one article content sentence, and the article title sentence is necessarily more important than other article content sentences, the weighting factor corresponding to the article title sentence is greater than that of other article content sentences in the weighted ordering.
S202, finding at least one existing comment template similar to the to-be-commented paper chapter in topic similar dimension and/or news event triple similar dimension from a comment template library according to the topic and/or news event triple of the to-be-commented article, wherein the comment template library stores a plurality of existing comment templates, each existing comment template comprises template comment content and topic and/or news event triples bound with the template comment content, and each template comment content comprises at least one to-be-filled slot position corresponding to at least one comment entity candidate word one by one;
s203, performing entity extraction on the article to be commented to obtain at least one article entity candidate word.
In step S203, the specific entity extraction method may include, but is not limited to, the following steps S2031 to S2035.
S2031, sentence dividing processing is carried out on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence.
S2032, aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively.
S2033, for the article sentences, modeling and scoring are carried out on each article sentence based on the sentence semantic vector by using a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector.
S2034, aiming at the article sentences, entity extraction is carried out on each article sentence by combining a dictionary and a deep learning model, confidence scores are carried out on the extraction results by adopting a weighted score method, and a plurality of sentence entities and corresponding confidence scores are obtained.
In step S2034, a fast matching of the industry dictionary may be built based on a finite state machine, and entity extraction may be performed based on a bidirectional LSTM + CRF (an existing model structure for entity recognition, under the LSTM + CRF model, output labels that are no longer independent of each other, but are an optimal label sequence) training deep learning extraction model.
S2035, for the sentence entities, performing inverted arrangement of weighted scores according to the corresponding confidence score and the sentence importance index value of the article sentence to which the sentence entity belongs, and taking at least one sentence entity ranked in the front as the at least one article entity candidate word.
In step S2035, since the article sentences include one article title sentence and at least one article content sentence, and the article title sentence is necessarily more important than other article content sentences, the weight coefficient corresponding to the article title sentence is greater than that of other article content sentences in the weighted sorting.
And S204, aiming at the at least one existing comment template, filling pairwise matching slots of the at least one article entity candidate word and the at least one comment entity candidate word in each existing comment template to obtain corresponding new comment content.
S205, aiming at least one new comment content, a DNN language model (an existing model which can judge whether a formed sentence accords with objective language expression habits or not by calculating the probability of a sentence formed by given words) is applied to score each new comment content, and the score value of each new comment content is obtained.
In the step S205, considering that the new review content may also have a problem of non-compliance, before scoring, at least one new review content may be subjected to a bad review process including a negative information filtering process, a sensitive information filtering process, and a false information filtering process, so as to obtain at least one piece of compliant new review content, where the specific manner of performing the bad review process on the new review content may refer to the foregoing steps S1013 to S1015, and details are not repeated here.
S206, determining the new comment content corresponding to the highest score value as the automatic comment content of the article to be commented, wherein the highest score value is the highest value in the score values of all the new comment contents.
Therefore, based on the possible design one described in detail in the foregoing steps S201 to S206, when there is no matched news or a recommendation index value is not met, deep matching search can be performed from the dimensions of the topic and event triples through the comment template library, reusability of comments is greatly expanded, and further, more news without open comment data can be comment-filled, so that high availability, high reusability, and high topicality of automated news comments can be achieved. In addition, based on the topic and the news belonging to the topic, from the perspective of multi-modes such as texts, pictures and videos, the deep fusion clustering of the topic is carried out by combining a topic model and a semantic vector, and the multi-mode-based matching of news and the topic and the matching of the paper chapter to be evaluated and the topic are realized by utilizing deep learning. And the event extraction task can be subjected to combined training based on a deep learning model, so that error transmission caused by multi-stage tasks of the traditional method is reduced, and the modeling capability of long-distance dependence is improved by adding a pre-training model and an attention layer.
On the basis of the technical solution of the first possible design, the present embodiment further specifically provides a second possible design of an automatic rich comment template library, that is, after the comment content corresponding to the maximum recommendation index value is determined as the automatic comment content of the article to be commented, the method further includes, but is not limited to, the following steps S501 to S506.
S501, determining the comment content corresponding to the maximum recommendation index value as target comment content, and determining the news corresponding to the maximum recommendation index value as target news.
S502, entity extraction is carried out on the target comment content, and at least one new comment entity candidate word is obtained.
In the step S502, the specific manner of performing entity extraction may refer to the foregoing steps S2031 to S2035, which are not described herein again.
S503, converting the target comment content into a new template comment content in which the slot to be filled and the at least one new comment entity candidate word are in one-to-one correspondence.
S504, obtaining the topic and/or news event triples of the target news.
In the step S504, a specific obtaining manner may refer to the step S201, which is consistent with a manner of obtaining the topic and/or news event triples of the article to be commented, and is not described herein again.
And S505, binding and storing the new template comment content and the topic of the target news and/or the news event triple in a new comment template.
S506, adding the new comment template into the comment template library.
Therefore, based on the second possible design detailed in the foregoing steps S501 to S506, a comment template can be generated by entity extraction for high-quality comments, and meanwhile, information of deeper dimensions of news is mined by using topic fusion and matching technology and event discovery technology, so that a multidimensional and highly accurate comment template library is constructed.
As shown in fig. 2, a second aspect of this embodiment provides a virtual device for implementing the automatic comment method in the first aspect, the first possible design or the second possible design, where the virtual device includes a full-web information crawling module, an article semantic coding module, a news semantic coding module, an article matching detection module, a recommendation index calculation module, and an automatic comment content determination module;
the system comprises a global network information crawling module, a global network information crawling module and a global network information crawling module, wherein the global network information crawling module is used for crawling at least one piece of news and comment information in a global network according to the title of an article to be commented, and the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
the article semantic coding module is in communication connection with the full-network information crawling module and is used for performing semantic coding on the article to be commented on a text dimension, a picture dimension and a video dimension to obtain an article text semantic vector, an article picture semantic vector and an article video semantic vector;
the news semantic coding module is used for performing semantic coding on the news in each piece of news and comment information in a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information respectively to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
the article matching detection module is respectively in communication connection with the article semantic coding module and the news semantic coding module, and is used for importing, for the at least one piece of news and comment information, the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, so as to obtain a first matching detection result of each piece of news and the article to be commented;
the recommendation index calculation module is communicatively connected to the article matching detection module and configured to, when the first matching detection result includes at least one matching news matching the to-be-evaluated article, perform weighted calculation on corresponding review reply numbers, review praise numbers and news source weight coefficients of the at least one matching news, so as to obtain corresponding recommendation index values;
the automatic comment content determining module is in communication connection with the recommendation index calculating module and is used for determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
For the working process, the working details and the technical effects of the foregoing apparatus provided in the second aspect of this embodiment, reference may be made to the automatic review method described in the first aspect, the first possible design, or the second possible design, which is not described herein again.
As shown in fig. 3, a third aspect of the present embodiment provides a computer device for executing the automatic comment method in the first aspect, the first possible design, or the second possible design, where the computer device includes a memory, a processor, and a transceiver, which are sequentially and communicatively connected, where the memory is used for storing a computer program, and the transceiver is used for sending and receiving information to execute the automatic comment method in the first aspect, the first possible design, or the second possible design. For example, the Memory may include, but is not limited to, a Random-Access Memory (RAM), a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a First-in First-out (FIFO), and/or a First-in Last-out (FILO), and the like; the transceiver may be, but is not limited to, a WiFi (wireless fidelity) wireless transceiver, a bluetooth wireless transceiver, a GPRS (General Packet Radio Service) wireless transceiver, and/or a ZigBee (ZigBee protocol, low power consumption local area network protocol based on ieee802.15.4 standard) wireless transceiver, etc.; the processor may not be limited to the use of a microprocessor of the model number STM32F105 family. In addition, the computer device may also include, but is not limited to, a power module, a display screen, and other necessary components.
For the working process, working details and technical effects of the foregoing computer device provided in the third aspect of this embodiment, reference may be made to the automatic comment method described in the first aspect, the first possible design, or the second possible design, which is not described herein again.
A fourth aspect of the present embodiment provides a storage medium storing instructions including the automatic review method in the first aspect, the first possible design or the second possible design, that is, the storage medium stores instructions that, when executed on a computer, perform the automatic review method in the first aspect, the first possible design or the second possible design. The storage medium refers to a carrier for storing data, and may include, but is not limited to, a floppy disk, an optical disk, a hard disk, a flash Memory, a flash disk and/or a Memory Stick (Memory Stick), etc., and the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
The working process, working details and technical effects of the foregoing storage medium provided in the fourth aspect of this embodiment may refer to the automatic review method described in the first aspect, the first possible design, or the second possible design, which are not described herein again.
A fifth aspect of the present embodiments provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the automatic review method as described in the first aspect, the first possible design, or the second possible design. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices.
The embodiments described above are merely illustrative, and may or may not be physically separate, if referring to units illustrated as separate components; if reference is made to a component displayed as a unit, it may or may not be a physical unit, and may be located in one place or distributed over a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications may be made to the embodiments described above, or equivalents may be substituted for some of the features described. And such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Finally, it should be noted that the present invention is not limited to the above alternative embodiments, and that various other forms of products can be obtained by anyone in light of the present invention. The above detailed description should not be taken as limiting the scope of the invention, which is defined in the claims, and which the description is intended to be interpreted accordingly.

Claims (10)

1. An automatic review method, comprising:
the method comprises the steps that according to the title of an article to be commented, at least one piece of news and comment information is obtained through web crawling, wherein the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
semantic coding is carried out on the article to be commented on the text dimension, the picture dimension and the video dimension, and an article text semantic vector, an article picture semantic vector and an article video semantic vector are obtained;
semantic coding is respectively carried out on the news in each piece of news and comment information on a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information, and a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news are obtained;
for the at least one piece of news and comment information, importing the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a first matching detection result of each news and the article to be commented;
if the first matching detection result contains at least one matching news matched with the to-be-evaluated paper chapter, weighting and calculating corresponding comment reply number, comment praise number and news source weight coefficient aiming at each comment content of the at least one matching news to obtain a corresponding recommendation index value;
and determining the comment content corresponding to the maximum recommendation index value as the automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
2. The automatic review method of claim 1, wherein after obtaining a first matching detection result between each news item and the article to be reviewed, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news item matching the article to be reviewed, the method further comprises:
obtaining a triple of topics and/or news events of the article to be commented;
according to the topic and/or news event triples of the article to be commented, at least one existing comment template similar to the article to be commented on in topic similar dimension and/or news event triples similar dimension is searched from a comment template library, wherein the comment template library stores a plurality of existing comment templates, each existing comment template comprises template comment content and the topic and/or news event triples bound with the template comment content, and the template comment content comprises at least one slot to be filled, which corresponds to at least one comment entity candidate word one by one;
performing entity extraction on the article to be commented to obtain at least one article entity candidate word;
for the at least one existing comment template, filling pairwise matching slots of the at least one article entity candidate word and the at least one comment entity candidate word in each existing comment template to obtain corresponding new comment content;
for at least one new comment content, applying a DNN language model to score each new comment content to obtain a score value of each new comment content;
and determining the new comment content corresponding to the highest score value as the automatic comment content of the article to be commented, wherein the highest score value is the highest value in the score values of all the new comment contents.
3. The automatic review method of claim 2, wherein obtaining topics of the article to be reviewed comprises:
crawling at least one topic and news information of a whole network news source in real time based on a web crawler technology, wherein the topic and news information comprises one topic and at least one topic news belonging to the topic;
performing bad audit processing including negative information filtering processing, sensitive information filtering processing and false information filtering processing on all the crawled topics and news information to obtain at least one piece of compliant topic and news information;
performing semantic coding on each topic and news information in a text dimension, a picture dimension and a video dimension respectively aiming at the at least one piece of compliant topic and news information to obtain topic text semantic vectors, topic picture semantic vectors and topic video semantic vectors of each topic and news information, and then performing de-duplication processing on the at least one piece of compliant topic and news information in a text similar dimension, a picture similar dimension and a video similar dimension according to the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all the topic and news information to obtain at least one piece of non-repetitive topic and news information;
aiming at the at least one non-repeated topic and news information, respectively obtaining topic mapping vectors and topic distribution of each topic and news information according to at least one topic news belonging to the topic, then performing semantic clustering fusion on the at least one non-repeated topic and news information according to the topic mapping vectors, the topic distribution, the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the topics and news information to obtain at least one fused topic, at least one topic heat weight value and a one-to-one correspondence relationship between the at least one fused topic and the at least one topic heat weight value, and finally determining at least one hot topic from the at least one fused topic according to the magnitude degree of the at least one topic heat weight value;
aiming at the at least one hot topic, importing the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the hot topics, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a second deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a second matching detection result of each hot topic and the article to be commented;
and taking the hot topic in the second matching detection result and matched with the chapter of the paper to be commented as the topic of the article to be commented.
4. The automatic review method of claim 2, wherein obtaining news event triples for the article to be reviewed comprises:
performing sentence division processing on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, respectively extracting open domain events of the article sentences by applying a DMCNN event extraction algorithm to obtain sentence event triples of the article sentences;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
and carrying out weighted sequencing on the multiple sentence event triples according to the sentence important index values of the corresponding article sentences, and taking the sentence event triplet with the highest sequencing as the news event triplet of the article to be commented.
5. The automatic review method of claim 2, wherein the entity extraction of the article to be reviewed to obtain at least one article entity candidate word comprises:
performing sentence division processing on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
aiming at the article sentences, entity extraction is carried out on each article sentence by combining a dictionary and a deep learning model, and confidence scores are carried out on the extraction results by adopting a weighted score method to obtain a plurality of sentence entities and corresponding confidence scores;
and for the plurality of sentence entities, performing inverted arrangement of weighted scores according to the corresponding confidence scores and the sentence importance index values of the article sentences, and taking at least one sentence entity ranked in the front as the at least one article entity candidate word.
6. The automatic review method of claim 2, wherein after determining the review content corresponding to the maximum recommendation index value as the automatic review content of the article to be reviewed, the method further comprises:
determining the comment content corresponding to the maximum recommendation index value as target comment content, and determining the news corresponding to the maximum recommendation index value as target news;
performing entity extraction on the target comment content to obtain at least one new comment entity candidate word;
converting the target comment content into new template comment content in which the slot to be filled and the at least one new comment entity candidate word are in one-to-one correspondence;
obtaining a topic and/or news event triple of the target news;
binding and storing the new template comment content and the topic of the target news and/or the news event triple in a new comment template;
and adding the new comment template into the comment template library.
7. The automatic review method of claim 1, wherein crawling the web to obtain at least one piece of news and review information according to the title of the article to be reviewed comprises:
carrying out quantitative analysis on comment quality dimension, comment quantity dimension and comment interaction dimension aiming at different news sources in the whole network news sources to obtain weight coefficients of the different news sources;
for each news source in the whole network news sources, dynamically distributing captured resources based on the weight coefficient of the news source, and obtaining at least one piece of original news and comment information from the news source through a dynamic real-time crawling algorithm;
aiming at all the original news and comment information obtained by crawling, applying a first bert pre-training model to respectively judge the emotional polarity of each piece of news and comment information, and then filtering information corresponding to a negative judgment result to obtain at least one piece of non-negative news and comment information;
aiming at the at least one piece of non-negative news and comment information, respectively carrying out sensitive word detection on each piece of news and comment information by applying a sensitive information detection algorithm based on a dictionary, pinyin, special-shaped words and/or a deep learning model, and then filtering out information containing sensitive words to obtain at least one piece of insensitive news and comment information;
aiming at the at least one piece of insensitive news and comment information, respectively carrying out false information judgment on each piece of news and comment information by applying a false information judgment algorithm based on rules, a knowledge map and/or a deep learning model, and then filtering information corresponding to a false judgment result to obtain at least one piece of compliant news and comment information;
and for the at least one piece of compliant news and comment information, sequentially performing dirty data cleaning processing and duplicate removal processing for the same news source to obtain the at least one piece of news and comment information.
8. An automatic comment device is characterized by comprising a full-network information crawling module, an article semantic coding module, a news semantic coding module, an article matching detection module, a recommendation index calculation module and an automatic comment content determination module;
the system comprises a global network information crawling module, a global network information crawling module and a global network information crawling module, wherein the global network information crawling module is used for crawling at least one piece of news and comment information in a global network according to the title of an article to be commented, and the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
the article semantic coding module is in communication connection with the full-network information crawling module and is used for performing semantic coding on the article to be commented on a text dimension, a picture dimension and a video dimension to obtain an article text semantic vector, an article picture semantic vector and an article video semantic vector;
the news semantic coding module is used for performing semantic coding on the news in each piece of news and comment information in a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information respectively to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
the article matching detection module is respectively in communication connection with the article semantic coding module and the news semantic coding module, and is used for importing, for the at least one piece of news and comment information, the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, so as to obtain a first matching detection result of each piece of news and the article to be commented;
the recommendation index calculation module is communicatively connected to the article matching detection module and configured to, when the first matching detection result includes at least one matching news matching the to-be-evaluated article, perform weighted calculation on corresponding review reply numbers, review praise numbers and news source weight coefficients of the at least one matching news, so as to obtain corresponding recommendation index values;
the automatic comment content determining module is in communication connection with the recommendation index calculating module and is used for determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
9. A computer device, comprising a memory, a processor and a transceiver which are sequentially connected in communication, wherein the memory is used for storing a computer program, the transceiver is used for transmitting and receiving information, and the processor is used for reading the computer program and executing the automatic comment method according to any one of claims 1 to 7.
10. A storage medium having stored thereon instructions for performing the automatic review method of any of claims 1-7 when the instructions are run on a computer.
CN202110169250.8A 2021-02-07 2021-02-07 Automatic comment method and device, computer equipment and storage medium Active CN112836487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110169250.8A CN112836487B (en) 2021-02-07 2021-02-07 Automatic comment method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110169250.8A CN112836487B (en) 2021-02-07 2021-02-07 Automatic comment method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112836487A true CN112836487A (en) 2021-05-25
CN112836487B CN112836487B (en) 2023-01-24

Family

ID=75932689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110169250.8A Active CN112836487B (en) 2021-02-07 2021-02-07 Automatic comment method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112836487B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486260A (en) * 2021-07-15 2021-10-08 北京三快在线科技有限公司 Interactive information generation method and device, computer equipment and storage medium
CN113946681A (en) * 2021-12-20 2022-01-18 军工保密资格审查认证中心 Text data event extraction method and device, electronic equipment and readable medium
CN114492407A (en) * 2022-01-26 2022-05-13 中国科学技术大学 News comment generation method, system, equipment and storage medium
CN115730030A (en) * 2021-08-26 2023-03-03 腾讯科技(深圳)有限公司 Comment information processing method and related device
CN116306514A (en) * 2023-05-22 2023-06-23 北京搜狐新媒体信息技术有限公司 Text processing method and device, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
WO2017084267A1 (en) * 2015-11-18 2017-05-26 乐视控股(北京)有限公司 Method and device for keyphrase extraction
CN108170773A (en) * 2017-12-26 2018-06-15 百度在线网络技术(北京)有限公司 Media event method for digging, device, computer equipment and storage medium
US20190050731A1 (en) * 2016-03-01 2019-02-14 Microsoft Technology Licensing, Llc Automated commentary for online content
US20190197122A1 (en) * 2017-12-27 2019-06-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for generating review article of hot news, and terminal device
CN110097419A (en) * 2019-03-29 2019-08-06 努比亚技术有限公司 Commodity data processing method, computer equipment and storage medium
CN110162752A (en) * 2019-05-13 2019-08-23 百度在线网络技术(北京)有限公司 Article sentences weight processing method, device and electronic equipment
CN110516067A (en) * 2019-08-23 2019-11-29 北京工商大学 Public sentiment monitoring method, system and storage medium based on topic detection
CN110569334A (en) * 2019-09-11 2019-12-13 北京搜狐新动力信息技术有限公司 method and device for automatically generating comments
CN110688832A (en) * 2019-10-10 2020-01-14 河北省讯飞人工智能研究院 Comment generation method, device, equipment and storage medium
CN111263238A (en) * 2020-01-17 2020-06-09 腾讯科技(深圳)有限公司 Method and equipment for generating video comments based on artificial intelligence
CN112182335A (en) * 2020-09-28 2021-01-05 四川封面传媒有限责任公司 Hot news capturing method and device and server
CN112203122A (en) * 2020-10-10 2021-01-08 腾讯科技(深圳)有限公司 Artificial intelligence-based similar video processing method and device and electronic equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
WO2017084267A1 (en) * 2015-11-18 2017-05-26 乐视控股(北京)有限公司 Method and device for keyphrase extraction
US20190050731A1 (en) * 2016-03-01 2019-02-14 Microsoft Technology Licensing, Llc Automated commentary for online content
CN108170773A (en) * 2017-12-26 2018-06-15 百度在线网络技术(北京)有限公司 Media event method for digging, device, computer equipment and storage medium
US20190197122A1 (en) * 2017-12-27 2019-06-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for generating review article of hot news, and terminal device
CN110097419A (en) * 2019-03-29 2019-08-06 努比亚技术有限公司 Commodity data processing method, computer equipment and storage medium
CN110162752A (en) * 2019-05-13 2019-08-23 百度在线网络技术(北京)有限公司 Article sentences weight processing method, device and electronic equipment
CN110516067A (en) * 2019-08-23 2019-11-29 北京工商大学 Public sentiment monitoring method, system and storage medium based on topic detection
CN110569334A (en) * 2019-09-11 2019-12-13 北京搜狐新动力信息技术有限公司 method and device for automatically generating comments
CN110688832A (en) * 2019-10-10 2020-01-14 河北省讯飞人工智能研究院 Comment generation method, device, equipment and storage medium
CN111263238A (en) * 2020-01-17 2020-06-09 腾讯科技(深圳)有限公司 Method and equipment for generating video comments based on artificial intelligence
CN112182335A (en) * 2020-09-28 2021-01-05 四川封面传媒有限责任公司 Hot news capturing method and device and server
CN112203122A (en) * 2020-10-10 2021-01-08 腾讯科技(深圳)有限公司 Artificial intelligence-based similar video processing method and device and electronic equipment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LIANHUI QIN等: "Automatic Article Commenting: the Task and Dataset", 《ARXIV:1805.03668V2》 *
WENHUAN ZENG等: "Automatic Generation of Personalized Comment Based on User Profile", 《ARXIV》 *
孙腾: "面向社交媒体的评论自动生成系统的设计与实现", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
林江豪等: "基于 PLSA 的新闻评论情绪类别自动标注方法", 《计算机系统应用》 *
王茹皓等: "融合门控注意力机制的基于生成对抗网络模型的新闻评论自动生成方法研究", 《科教文汇(中旬刊)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486260A (en) * 2021-07-15 2021-10-08 北京三快在线科技有限公司 Interactive information generation method and device, computer equipment and storage medium
CN115730030A (en) * 2021-08-26 2023-03-03 腾讯科技(深圳)有限公司 Comment information processing method and related device
CN113946681A (en) * 2021-12-20 2022-01-18 军工保密资格审查认证中心 Text data event extraction method and device, electronic equipment and readable medium
CN113946681B (en) * 2021-12-20 2022-03-29 军工保密资格审查认证中心 Text data event extraction method and device, electronic equipment and readable medium
CN114492407A (en) * 2022-01-26 2022-05-13 中国科学技术大学 News comment generation method, system, equipment and storage medium
CN116306514A (en) * 2023-05-22 2023-06-23 北京搜狐新媒体信息技术有限公司 Text processing method and device, electronic equipment and storage medium
CN116306514B (en) * 2023-05-22 2023-09-08 北京搜狐新媒体信息技术有限公司 Text processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112836487B (en) 2023-01-24

Similar Documents

Publication Publication Date Title
CN112836487B (en) Automatic comment method and device, computer equipment and storage medium
CN110717339B (en) Semantic representation model processing method and device, electronic equipment and storage medium
CN110717017B (en) Method for processing corpus
CN109657054B (en) Abstract generation method, device, server and storage medium
WO2022116537A1 (en) News recommendation method and apparatus, and electronic device and storage medium
CN111753060A (en) Information retrieval method, device, equipment and computer readable storage medium
JP2015162244A (en) Methods, programs and computation processing systems for ranking spoken words
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
CN112085120B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN114997288A (en) Design resource association method
CN114078468B (en) Voice multi-language recognition method, device, terminal and storage medium
CN115455171A (en) Method, device, equipment and medium for mutual retrieval and model training of text videos
CN114661951A (en) Video processing method and device, computer equipment and storage medium
CN114942994A (en) Text classification method, text classification device, electronic equipment and storage medium
CN116821307B (en) Content interaction method, device, electronic equipment and storage medium
CN116977992A (en) Text information identification method, apparatus, computer device and storage medium
CN116977701A (en) Video classification model training method, video classification method and device
CN116978028A (en) Video processing method, device, electronic equipment and storage medium
CN115269961A (en) Content search method and related device
CN116090450A (en) Text processing method and computing device
CN115269781A (en) Modal association degree prediction method, device, equipment, storage medium and program product
CN115129885A (en) Entity chain pointing method, device, equipment and storage medium
CN116821781A (en) Classification model training method, text analysis method and related equipment
Hyun et al. An image selection framework for automatic report generation
CN112712056A (en) Video semantic analysis method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant