CN112836487B - Automatic comment method and device, computer equipment and storage medium - Google Patents
Automatic comment method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112836487B CN112836487B CN202110169250.8A CN202110169250A CN112836487B CN 112836487 B CN112836487 B CN 112836487B CN 202110169250 A CN202110169250 A CN 202110169250A CN 112836487 B CN112836487 B CN 112836487B
- Authority
- CN
- China
- Prior art keywords
- news
- article
- comment
- information
- topic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of information interaction, and discloses an automatic comment method, an automatic comment device, computer equipment and a storage medium. In addition, when no news is matched or the recommendation index value is not up to the requirement, deep matching search can be performed from the dimensions of the topic and event triples through the comment template library, reusability of comments is greatly expanded, further more news without open comment data can be commented and filled, and a multi-dimensional and high-accuracy comment template library can be constructed for high-quality comments.
Description
Technical Field
The invention belongs to the technical field of information interaction, and particularly relates to an automatic commenting method and device, computer equipment and a storage medium.
Background
The comment is the most common information interaction mode on each large Internet platform at present. The quality of the comments is high, the quantity of the comments often determines the overall liveness of the product, and particularly, the quality of the comments can improve the viscosity of users, increase the interaction of the users and create a good atmosphere, so that the retention of product relationships is enhanced, and social relationships are created. Therefore, automatic review has been attracting attention as an important method for increasing the number of reviews in the early stage of products.
The existing automatic comment method mainly adopts template generation, and although a large number of comments can be easily generated, the method lacks emotional expression and thinking logic capability, so that the answer dialog is single and rigid, and a large amount of manual participation is required. Automatic comment based on a generative algorithm has the problems of uncontrollable generated content and low accuracy, so that the usability is low.
Disclosure of Invention
The invention aims to provide an automatic comment method, an automatic comment device, computer equipment and a storage medium, aiming at solving the problems that template generation is too dependent on manual work, emotion and thinking expression are insufficient, generation content is uncontrollable and accuracy is low in the existing automatic comment method.
In a first aspect, the present invention provides an automatic comment method, including:
according to the title of an article to be commented, obtaining at least one piece of news and comment information through web crawling, wherein the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment number and comment reply number corresponding to the comment content;
semantic coding is carried out on the article to be commented on the text dimension, the picture dimension and the video dimension, and an article text semantic vector, an article picture semantic vector and an article video semantic vector are obtained;
for the at least one piece of news and comment information, semantic coding is respectively carried out on news in each piece of news and comment information in a text dimension, a picture dimension and a video dimension to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
for the at least one piece of news and comment information, importing the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a first matching detection result of each news and the article to be commented;
if the first matching detection result contains at least one matched news matched with the to-be-evaluated paper chapter, weighting and calculating corresponding review reply number, review approval number and news source weight coefficient aiming at each review content of the at least one matched news to obtain a corresponding recommendation index value;
and determining the comment content corresponding to the maximum recommendation index value as the automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
Based on the invention content, an automatic comment scheme based on the whole-network comment data is provided, namely, from the public whole-network data, publicly available comment data is obtained by utilizing a real-time capture technology, and then the best comment content is directly found from the comment data for automatic comment by combining a deep semantic matching technology, so that the output comment quality can be controlled, the manual participation is greatly reduced, the reliability is high, and the topical requirement that news comments need to be added in a real scene can be well met.
In one possible design, after obtaining a first matching detection result of each news and the article to be reviewed, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news matching the article to be reviewed, the method further includes:
obtaining a topic and/or news event triple of the article to be commented;
according to the topic and/or news event triples of the article to be commented, at least one existing comment template similar to the article to be commented on in topic similar dimension and/or news event triples similar dimension is searched from a comment template library, wherein the comment template library stores a plurality of existing comment templates, each existing comment template comprises template comment content and the topic and/or news event triples bound with the template comment content, and the template comment content comprises at least one slot to be filled, which corresponds to at least one comment entity candidate word one by one;
performing entity extraction on the article to be commented to obtain at least one article entity candidate word;
for the at least one existing comment template, filling pairwise matching slots of the at least one article entity candidate word and the at least one comment entity candidate word in each existing comment template to obtain corresponding new comment content;
for at least one new comment content, applying a DNN language model to score each new comment content to obtain a score value of each new comment content;
and determining the new comment content corresponding to the highest score value as the automatic comment content of the article to be commented, wherein the highest score value is the highest value in the score values of all the new comment contents.
Based on the possible design, when no matched news exists or the recommendation index value does not meet the requirement, deep matching search can be performed from the dimensions of the topic and event triples through the comment template library, reusability of comments is greatly expanded, and further more news without open comment data can be commented and filled, so that high-availability, high-reusability and high-topicality automatic news comments are achieved.
In one possible design, obtaining topics of the article to be commented includes:
the method comprises the steps that at least one topic and news information of a whole network news source are obtained through real-time crawling based on a web crawler technology, wherein the topic and news information comprises one topic and at least one topic news belonging to the topic;
performing bad audit processing including negative information filtering processing, sensitive information filtering processing and false information filtering processing on all the crawled topics and news information to obtain at least one piece of compliant topic and news information;
performing semantic coding on each topic and news information in a text dimension, a picture dimension and a video dimension respectively aiming at the at least one piece of compliant topic and news information to obtain topic text semantic vectors, topic picture semantic vectors and topic video semantic vectors of each topic and news information, and then performing de-duplication processing on the at least one piece of compliant topic and news information in a text similar dimension, a picture similar dimension and a video similar dimension according to the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all the topic and news information to obtain at least one piece of non-repetitive topic and news information;
<xnotran> , , , , , , , , , ; </xnotran>
Aiming at the at least one hot topic, importing the topic text semantic vector, the topic picture semantic vector, the topic video semantic vector and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a second deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer to obtain a second matching detection result of each hot topic and the article to be commented;
and taking the hot topic in the second matching detection result and matched with the chapter of the paper to be commented as the topic of the article to be commented.
Based on the possible design, the deep fusion clustering of the topics can be performed from the perspective of multi-modes such as texts, pictures and videos based on the topics and news belonging to the topics, by combining a topic model and semantic vectors, and the multi-mode-based matching of news and topics and the matching of the articles to be evaluated and the topics are realized by utilizing deep learning.
In one possible design, obtaining a news event triple of the article to be commented on includes:
the paper chapter to be evaluated is subjected to sentence division processing to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, respectively extracting open domain events of the article sentences by applying a DMCNN event extraction algorithm to obtain sentence event triples of the article sentences;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
and carrying out weighted sequencing on the multiple sentence event triples according to the sentence important index values of the corresponding article sentences, and taking the sentence event triplet with the highest sequencing as the news event triplet of the article to be commented.
Based on the possible design, the event extraction task can be subjected to combined training based on the deep learning model, error transmission caused by multi-stage tasks of the traditional method is reduced, and the modeling capability of long-distance dependence is improved by adding the pre-training model and the attention layer.
In one possible design, performing entity extraction on the article to be commented to obtain at least one article entity candidate word, including:
the paper chapter to be evaluated is subjected to sentence division processing to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
aiming at the article sentences, entity extraction is carried out on each article sentence by combining a dictionary and a deep learning model, and confidence scores are carried out on the extraction results by adopting a weighted score method to obtain a plurality of sentence entities and corresponding confidence scores;
and for the plurality of sentence entities, performing inverted arrangement of weighted scores according to the corresponding confidence scores and the sentence importance index values of the article sentences, and taking at least one sentence entity ranked in the front as the at least one article entity candidate word.
In one possible design, after determining the review content corresponding to the maximum recommendation index value as the automatic review content of the article to be reviewed, the method further includes:
determining the comment content corresponding to the maximum recommendation index value as target comment content, and determining the news corresponding to the maximum recommendation index value as target news;
performing entity extraction on the target comment content to obtain at least one new comment entity candidate word;
converting the target comment content into new template comment content in which the slot to be filled corresponds to the at least one new comment entity candidate word one by one;
obtaining a topic and/or news event triple of the target news;
binding and storing the new template comment content and the topic of the target news and/or the news event triple in a new comment template;
and adding the new comment template into the comment template library.
Based on the possible design, a comment template can be generated by entity extraction aiming at high-quality comments, and meanwhile, deeper dimensional information of news is mined by utilizing a topic fusion and matching technology and an event discovery technology, so that a multi-dimensional and high-accuracy comment template library is constructed.
In one possible design, at least one piece of news and comment information is obtained through web crawling according to the titles of articles to be commented, and the method comprises the following steps:
performing quantitative analysis on comment quality dimensions, comment quantity dimensions and comment interaction dimensions aiming at different news sources in the whole network news sources to obtain weight coefficients of the different news sources;
for each news source in the whole network news sources, performing dynamic allocation of captured resources based on the weight coefficient of the news source, and obtaining at least one piece of original news and comment information from the news source through dynamic real-time crawling algorithm;
aiming at all the original news and comment information obtained by crawling, applying a first bert pre-training model to respectively judge the emotional polarity of each piece of news and comment information, and then filtering information corresponding to a negative judgment result to obtain at least one piece of non-negative news and comment information;
aiming at the at least one piece of non-negative news and comment information, respectively carrying out sensitive word detection on each piece of news and comment information by applying a sensitive information detection algorithm based on a dictionary, pinyin, special-shaped words and/or a deep learning model, and then filtering out information containing sensitive words to obtain at least one piece of insensitive news and comment information;
aiming at the at least one piece of insensitive news and comment information, false information discrimination is respectively carried out on each piece of news and comment information by applying a false information discrimination algorithm based on rules, a knowledge map and/or a deep learning model, and then information corresponding to a false discrimination result is filtered out to obtain at least one piece of compliant news and comment information;
and for the at least one piece of compliant news and comment information, sequentially performing dirty data cleaning processing and duplicate removal processing for the same news source to obtain the at least one piece of news and comment information.
Based on the possible design, in the process of capturing the comment data of the whole network, the news source of the whole network is subjected to quantitative analysis, so that the reliability of the follow-up recommended index value can be guaranteed, efficient distribution of crawler resources is facilitated, meanwhile, negative, sensitive and false information can be filtered through bad information audit on all news and comment information, and the high quality of the automatic comment content is greatly guaranteed.
The invention provides an automatic comment device, which comprises a full-network information crawling module, an article semantic coding module, a news semantic coding module, an article matching detection module, a recommendation index calculation module and an automatic comment content determination module;
the system comprises a global network information crawling module, a global network information crawling module and a global network information crawling module, wherein the global network information crawling module is used for crawling at least one piece of news and comment information in a global network according to the title of an article to be commented, and the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
the article semantic coding module is in communication connection with the full-network information crawling module and is used for performing semantic coding on the article to be commented on a text dimension, a picture dimension and a video dimension to obtain an article text semantic vector, an article picture semantic vector and an article video semantic vector;
the news semantic coding module is used for semantic coding the news in each piece of news and comment information on the text dimension, the picture dimension and the video dimension aiming at the at least one piece of news and comment information to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
the article matching detection module is respectively in communication connection with the article semantic coding module and the news semantic coding module, and is used for importing, for the at least one piece of news and comment information, the news text semantic vector, the news image semantic vector, the news video semantic vector of all news, and the article text semantic vector, the article image semantic vector, and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, an image similar dimension, a video similar dimension, and a full link layer, so as to obtain a first matching detection result of each piece of news and the article to be commented;
the recommendation index calculation module is communicatively connected to the article matching detection module and configured to, when the first matching detection result includes at least one matching news matching the to-be-commented thesis chapter, weight-calculate a corresponding comment reply number, a comment like number, and a news source weight coefficient for each comment content of the at least one matching news to obtain a corresponding recommendation index value;
the automatic comment content determining module is in communication connection with the recommendation index calculating module and is used for determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
In a third aspect, the present invention provides a computer device comprising a memory, a processor and a transceiver communicatively connected in sequence, wherein the memory is used for storing a computer program, the transceiver is used for transmitting and receiving information, and the processor is used for reading the computer program and executing the automatic comment method according to the first aspect or any possible design.
In a fourth aspect, the present invention provides a storage medium having stored thereon instructions which, when run on a computer, perform an automatic review method as described in the first aspect or any of the possible designs above.
In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the automatic review method of the first aspect or any possible design above.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flow chart diagram of an automatic review method provided by the present invention.
Fig. 2 is a schematic structural diagram of an automatic comment device provided by the present invention.
Fig. 3 is a schematic structural diagram of a computer device provided by the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. Specific structural and functional details disclosed herein are merely representative of exemplary embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments of the present invention.
It should be understood that, for the term "and/or" as may appear herein, it is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time; for the term "/and" as may appear herein, which describes another associative object relationship, it means that two relationships may exist, e.g., a/and B, may mean: a exists independently, and A and B exist independently; in addition, with respect to the character "/" which may appear herein, it generally means that the former and latter associated objects are in an "or" relationship.
It will be understood that when an element is referred to herein as being "connected," "connected," or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Conversely, if a unit is referred to herein as being "directly connected" or "directly coupled" to another unit, it is intended that no intervening units are present. In addition, other words used to describe the relationship between elements should be interpreted in a similar manner (e.g., "between 8230; \8230; between pairs" directly between 8230; \8230; between "," adjacent "pairs" directly adjacent ", etc.).
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of exemplary embodiments of the present invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used herein, specify the presence of stated features, quantities, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, quantities, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative designs, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
It should be understood that specific details are provided in the following description to facilitate a thorough understanding of example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, systems may be shown in block diagrams in order not to obscure the examples in unnecessary detail. In other instances, well-known processes, structures and techniques may be shown without unnecessary detail in order to avoid obscuring the example embodiments.
As shown in fig. 1, the automatic review method provided in the first aspect of the present embodiment may be, but is not limited to being, executed by a computer device with certain computing resources, for example, by a platform server for publishing news/articles, or by a terminal device for reading news/articles. The automatic review method may include, but is not limited to, the following steps S101 to S106.
S101, according to the title of an article to be commented, crawling to at least one piece of news and comment information in a whole network, wherein the news and comment information can include but is not limited to one piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content.
In the step S101, the optimization method includes, but is not limited to, crawling all the web to obtain at least one piece of news and comment information according to the titles of the articles to be commented on, and includes the following steps S1011 to S1016.
S1011, carrying out quantitative analysis on the comment quality dimension, the comment quantity dimension and the comment interaction dimension aiming at different news sources in the whole network news source to obtain the weight coefficients of the different news sources.
In the step S1011, the news sources in the whole network are news portals in the whole network, and when the portals are combed, the news sources may be scored and weighted in the comment quality dimension, the comment quantity dimension, and the comment interaction dimension, and then the weighting coefficients of different news sources are obtained through conventional quantitative analysis.
S1012, aiming at each news source in the whole network news sources, dynamically distributing the captured resources based on the weight coefficient of the news source, and obtaining at least one piece of original news and comment information from the news source through a dynamic real-time crawling algorithm.
In the step S1012, the original news and comment information may include, but is not limited to, a piece of news having the same title as the article to be commented, at least one piece of comment content belonging to the news, and comment like number and comment return number corresponding to the comment content.
S1013, applying a first bert pre-training model to respectively judge the emotional polarity of each piece of news and comment information according to all the original news and comment information obtained through crawling, and then filtering information corresponding to a negative judgment result to obtain at least one piece of non-negative news and comment information.
In the step S1013, bert is called Bidirectional Encoder reproduction from Transformers, and is a pre-training model proposed in Google2018, that is, an Encoder of Bidirectional Transformer, because decoder cannot obtain information to be predicted, the main innovation point of the model is on a pre-train method, that is, word and Sentence level Representation is captured by using Masked LM and Next sequence Prediction, respectively. The specific manner of filtering out the information corresponding to the negative determination result may be, but is not limited to: if a certain news corresponds to a negative judgment result, filtering out the whole news and comment information; if a certain comment content corresponds to a negative judgment result, filtering the comment content, and the comment praise number and the comment reply number corresponding to the comment content; if all the comment contents of a certain news correspond to negative judgment results, the whole news and comment information are filtered.
S1014, aiming at the at least one piece of non-negative news and comment information, sensitive word detection is carried out on each piece of news and comment information respectively by applying a sensitive information detection algorithm based on a dictionary, a pinyin, a special-shaped character and/or a deep learning model and the like, and then information containing sensitive words is filtered out to obtain at least one piece of insensitive news and comment information.
In the step S1014, the sensitive word may be, but not limited to, a yellow word, an storm word, or an advertisement word. The specific way of filtering out the information containing the sensitive words may be, but is not limited to: if a certain news contains sensitive words, filtering out the whole news and comment information; if some comment content contains sensitive words, filtering the comment content, and the comment praise number and comment reply number corresponding to the comment content; if all the comment contents of a certain news contain sensitive words, the whole news and comment information are filtered.
And S1015, aiming at the at least one piece of insensitive news and comment information, applying a false information discrimination algorithm based on rules, knowledge maps and/or deep learning models to discriminate false information of each piece of news and comment information respectively, and then filtering information corresponding to a false discrimination result to obtain the at least one piece of compliant news and comment information.
In step S1015, the specific manner of filtering out the information corresponding to the false determination result may be, but is not limited to: if a certain news corresponds to a false judgment result, filtering out the whole news and comment information; if a certain comment content corresponds to a false judgment result, filtering the comment content, and the comment praise number and the comment reply number corresponding to the comment content; if all the comment contents of a certain news correspond to the false judgment results, the whole news and comment information are filtered.
And S1016, for the at least one piece of compliant news and comment information, sequentially performing dirty data cleaning processing and duplicate removal processing for the same news source to obtain the at least one piece of news and comment information.
S102, semantic coding is carried out on the article to be commented on the text dimension, the picture dimension and the video dimension, and an article text semantic vector, an article picture semantic vector and an article video semantic vector are obtained.
In step S102, specifically, a second bert pre-training model may be applied to map at least one text in the article to be commented to the article text semantic vector, where the text includes a title, a body, and/or a summary; mapping at least one picture in the article to be commented into the article picture semantic vector by applying a first ResNet101 (namely a fast training residual error network ResNet with 101 layers) pre-training model; and extracting at least one video key frame in the article to be commented by applying a video clustering algorithm, and then mapping the at least one video key frame in the article to be commented into the article video semantic vector by applying a second ResNet101 pre-training model.
S103, semantic coding is respectively carried out on news in each piece of news and comment information on the text dimension, the picture dimension and the video dimension aiming at the at least one piece of news and comment information, and a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news are obtained.
In step S103, the semantic encoding mode for the news is consistent with the to-be-evaluated paper chapter, which is not described herein again.
S104, aiming at the at least one piece of news and comment information, importing the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a first matching detection result of each news and the article to be commented.
And S105, if the first matching detection result contains at least one matching news matched with the to-be-evaluated paper chapter, weighting and calculating corresponding comment reply number, comment praise number and news source weight coefficient aiming at each comment content of the at least one matching news to obtain a corresponding recommendation index value.
S106, determining the comment content corresponding to the maximum recommendation index value as the automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
Therefore, based on the automatic comment method described in detail in the foregoing steps S101 to S106, an automatic comment scheme based on comment data on the whole network is provided, that is, from the public data on the whole network, publicly available comment data is obtained by using a real-time capture technology, and then, in combination with a deep semantic matching technology, the best comment content is directly found from the comment data for automatic comment, so that not only can the quality of output comments be controlled, but also manual participation is greatly reduced, and high reliability is achieved, and the topical demand of news comments needing to be added in real scenes can be well met. In addition, in the process of capturing the comment data of the whole network, the reliability of subsequent recommendation index values can be guaranteed and efficient distribution of crawler resources is facilitated by performing quantitative analysis on news sources of the whole network, and meanwhile, negative, sensitive and false information can be filtered by performing bad information audit on all news and comment information, so that the high quality of the automatic comment content is greatly guaranteed.
On the basis of the technical solution of the first aspect, the present embodiment further specifically provides a possible design for automatically commenting based on a comment template library, that is, after a first matching detection result of each piece of news and the article to be commented is obtained, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news matching the article to be commented, the method further includes, but is not limited to, the following steps S201 to S206.
S201, obtaining a triple of topics and/or news events of the article to be commented.
In the step S201, a specific manner of acquiring the topic of the article to be commented may be, but is not limited to, including the following steps S301 to S306.
S301, at least one topic and news information of a whole network news source are obtained in a real-time crawling mode based on a web crawler technology, wherein the topic and news information comprises one topic and at least one topic news belonging to the topic.
S302, performing bad audit processing including negative information filtering processing, sensitive information filtering processing and false information filtering processing on all the obtained topics and news information to obtain at least one piece of compliant topic and news information.
In step S302, the specific manner of performing the bad review processing on the topic and the news information can refer to the aforementioned steps S1013 to S1015, which are not described herein again.
S303, carrying out semantic coding on each topic and news information in a text dimension, a picture dimension and a video dimension aiming at the at least one piece of compliant topic and news information respectively to obtain topic text semantic vectors, topic picture semantic vectors and topic video semantic vectors of each topic and news information, and then carrying out de-duplication processing on the at least one piece of compliant topic and news information in a text similar dimension, a picture similar dimension and a video similar dimension according to the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all topic and news information to obtain at least one piece of non-duplicated topic and news information.
In the step S303, the specific manner of semantic coding the topic and the news information is consistent with the to-be-evaluated paper chapter, which is not described herein again.
<xnotran> S304. , , , , , , , , , . </xnotran>
In the step S304, the specific topic fusion method may include, but is not limited to, the following steps S3041 to S3044.
S3041, aiming at the at least one piece of non-repeated topic and news information, a doc2vec paragraph vector method (which is an unsupervised algorithm and can learn to obtain feature representation with fixed length from a text with variable length) is applied, a predictive word in a document is trained by the algorithm to enable the predictive word to represent each document by using a single dense vector), semantic vectors of at least one piece of topic news belonging to the topic in each piece of topic and news information are respectively extracted, and then average weighting processing is carried out on all the extracted semantic vectors of each topic and news information respectively, so that a topic mapping vector is obtained.
S3042, for the at least one non-repetitive topic and news information, applying a plda topic model (a model proposed by Ramage et al, which is called partial laboratory dichhere Allocation) to extract topic information of at least one topic news belonging to a topic from each topic and news information, and then performing average weighting processing on all the extracted topic information of each topic and news information to obtain topic distribution.
S3043, according to the topic mapping vectors, the topic distribution, the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all topics and news information, applying a dbscan Clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, which is a typical Density Clustering algorithm) to perform Density Clustering analysis on the at least one non-repeated topic and news information to obtain the at least one fused topic, the at least one topic heat weight value and the one-to-one correspondence relationship between the at least one fused topic and the at least one topic heat weight value.
S3044, aiming at the at least one fused topic, sorting the at least one fused topic from big to small according to the corresponding topic heat weight value, and taking the fused topic sorted in the front as the hot topic.
S305, aiming at the at least one hot topic, importing the topic text semantic vector, the topic picture semantic vector and the topic video semantic vector of all the hot topics, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a second deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, and obtaining a second matching detection result of each hot topic and the article to be commented.
S306, taking the hot topic which is in the second matching detection result and is matched with the chapter of the paper to be commented as the topic of the article to be commented.
In the step S201, a specific manner of obtaining the news event triple of the article to be reviewed may include, but is not limited to, the following steps S401 to S405.
S401, sentence dividing processing is carried out on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence.
S402, aiming at the article sentences, applying a DMCNN event extraction algorithm to respectively extract open domain events of the article sentences to obtain sentence event triples of the article sentences.
In step S402, the DMCNN Event Extraction algorithm is an Event Extraction method based on a Dynamic Pooling (Dynamic Pooling) Convolutional Neural network model, which is an Event Extraction scheme in a pipeline manner, that is, two tasks of detecting and identifying trigger words and detecting and identifying arguments are performed separately, and the latter depends on the prediction result of the former. The execution task of the DMCNN event extraction algorithm is divided into a trigger word identification subtask and an argument identification subtask. In the argument recognition subtask, a third bert pre-training model after fine tuning can be adopted to carry out semantic coding on the article sentence to obtain an initial value of a sentence semantic vector, and an attention layer is added, so that long-distance dependence can be modeled, and the improvement is particularly obvious for the condition that the article sentence contains a plurality of events.
S403, aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively.
S404, aiming at the articles and sentences, modeling and scoring the articles and sentences based on the sentence semantic vectors by applying a TextRank algorithm to obtain important sentence index values of the articles and sentences, wherein the TextRank algorithm adopts the sentence semantic vectors of the article title sentences as central vectors.
In step S404, the TextRank algorithm is a graph-based ranking algorithm for the text, and is to divide the text into a plurality of constituent units (sentences), construct a node-connected graph, calculate the TextRank value of the sentence by loop iteration using the similarity between the sentences as the weight of the edge, and finally extract the high-ranked sentences to combine into the text abstract.
S405, aiming at the multiple sentence event triples, carrying out weighted sequencing according to sentence importance index values of corresponding article sentences, and taking the sentence event triplet with the highest sequencing as a news event triplet of the article to be commented on.
In step S405, since the article sentences include one article title sentence and at least one article content sentence, and the article title sentence is necessarily more important than other article content sentences, the weighting factor corresponding to the article title sentence is greater than that of other article content sentences in the weighted ordering.
S202, finding at least one existing comment template similar to the article to be commented on in topic similar dimension and/or news event triple similar dimension from a comment template library according to the topic and/or news event triple of the article to be commented on, wherein the comment template library stores a plurality of existing comment templates, the existing comment templates comprise template comment contents and topic and/or news event triples bound with the template comment contents, and the template comment contents comprise at least one slot to be filled, wherein the slot to be filled corresponds to at least one comment entity candidate word one by one;
and S203, carrying out entity extraction on the article to be commented to obtain at least one article entity candidate word.
In step S203, the specific entity extraction method may include, but is not limited to, the following steps S2031 to S2035.
S2031, sentence dividing processing is carried out on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence.
S2032, aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively.
S2033, for the article sentences, modeling and scoring are carried out on each article sentence based on the sentence semantic vector by using a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector.
S2034, aiming at the article sentences, entity extraction is carried out on each article sentence by combining a dictionary and a deep learning model, confidence scores are carried out on the extraction results by adopting a weighted score method, and a plurality of sentence entities and corresponding confidence scores are obtained.
In step S2034, a fast matching of the industry dictionary may be built based on a finite state machine, and entity extraction may be performed based on a bidirectional LSTM + CRF (an existing model structure for entity recognition, under the LSTM + CRF model, output labels that are no longer independent of each other, but are an optimal label sequence) training deep learning extraction model.
S2035, for the sentence entities, performing inverted arrangement of weighted scores according to the corresponding confidence score and the sentence importance index value of the article sentence to which the sentence entity belongs, and taking at least one sentence entity ranked in the front as the at least one article entity candidate word.
In step S2035, since the article sentences include one article title sentence and at least one article content sentence, and the article title sentence is necessarily more important than other article content sentences, the weighting factor corresponding to the article title sentence is greater than that of other article content sentences in the weighted ranking.
S204, aiming at the at least one existing comment template, pairwise matching slot filling based on the at least one article entity candidate word and the at least one comment entity candidate word is respectively carried out on template comment contents in each existing comment template, and corresponding new comment contents are obtained.
S205, aiming at least one new comment content, a DNN language model (an existing model can judge whether a formed sentence accords with objective language expression habits or not by calculating the probability of the sentence formed by given words) is applied to score each new comment content, and the score value of each new comment content is obtained.
In the step S205, considering that the new review content may also have a problem of non-compliance, before scoring, at least one new review content may be subjected to a bad review process including a negative information filtering process, a sensitive information filtering process, and a false information filtering process, so as to obtain at least one piece of compliant new review content, where the specific manner of performing the bad review process on the new review content may refer to the foregoing steps S1013 to S1015, and details are not repeated here.
S206, determining the new comment content corresponding to the highest score value as the automatic comment content of the article to be commented, wherein the highest score value is the highest value in the score values of all the new comment contents.
Therefore, based on the possible design one described in detail in the foregoing steps S201 to S206, when there is no matched news or a recommendation index value is not met, deep matching search can be performed from the dimensions of the topic and event triples through the comment template library, reusability of comments is greatly expanded, and further, more news without open comment data can be comment-filled, so that high availability, high reusability, and high topicality of automated news comments can be achieved. In addition, based on topics and news belonging to the topics, from the perspective of multi-modes such as texts, pictures and videos, deep fusion clustering of the topics can be performed by combining a topic model and semantic vectors, and multi-mode-based news and topic matching and to-be-evaluated paper chapter and topic matching can be achieved by utilizing deep learning. And the event extraction task can be subjected to combined training based on a deep learning model, so that error transmission caused by multi-stage tasks of the traditional method is reduced, and the modeling capability of long-distance dependence is improved by adding a pre-training model and an attention layer.
On the basis of the technical solution of the first possible design, the present embodiment further specifically provides a second possible design of an automatic rich comment template library, that is, after the comment content corresponding to the maximum recommendation index value is determined as the automatic comment content of the article to be commented, the method further includes, but is not limited to, the following steps S501 to S506.
S501, determining the comment content corresponding to the maximum recommendation index value as target comment content, and determining the news corresponding to the maximum recommendation index value as target news.
S502, entity extraction is carried out on the target comment content, and at least one new comment entity candidate word is obtained.
In the step S502, the specific manner of performing entity extraction may refer to the foregoing steps S2031 to S2035, which are not described herein again.
S503, converting the target comment content into a new template comment content in which the slot to be filled and the at least one new comment entity candidate word are in one-to-one correspondence.
S504, obtaining the topic and/or news event triples of the target news.
In the step S504, a specific obtaining manner may refer to the step S201, which is consistent with a manner of obtaining the topic and/or news event triples of the article to be commented, and is not described herein again.
And S505, binding and storing the new template comment content and the topic of the target news and/or the news event triple in a new comment template.
S506, adding the new comment template into the comment template library.
Therefore, based on the second possible design detailed in the foregoing steps S501 to S506, a comment template can be generated by entity extraction for high-quality comments, and meanwhile, information of deeper dimensions of news is mined by using topic fusion and matching technology and event discovery technology, so that a multidimensional and highly accurate comment template library is constructed.
As shown in fig. 2, a second aspect of this embodiment provides a virtual device for implementing the automatic comment method in the first aspect, the first possible design or the second possible design, where the virtual device includes a full-web information crawling module, an article semantic coding module, a news semantic coding module, an article matching detection module, a recommendation index calculation module, and an automatic comment content determination module;
the system comprises a global network information crawling module, a global network information crawling module and a global network information crawling module, wherein the global network information crawling module is used for crawling at least one piece of news and comment information in a global network according to the title of an article to be commented, and the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
the article semantic coding module is in communication connection with the full-network information crawling module and is used for performing semantic coding on the article to be commented on a text dimension, a picture dimension and a video dimension to obtain an article text semantic vector, an article picture semantic vector and an article video semantic vector;
the news semantic coding module is used for performing semantic coding on the news in each piece of news and comment information in a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information respectively to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
the article matching detection module is respectively in communication connection with the article semantic coding module and the news semantic coding module, and is used for importing, for the at least one piece of news and comment information, the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, so as to obtain a first matching detection result of each piece of news and the article to be commented;
the recommendation index calculation module is communicatively connected to the article matching detection module and configured to, when the first matching detection result includes at least one matching news matching the to-be-evaluated article, perform weighted calculation on corresponding review reply numbers, review praise numbers and news source weight coefficients of the at least one matching news, so as to obtain corresponding recommendation index values;
the automatic comment content determining module is in communication connection with the recommendation index calculating module and is used for determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value.
For the working process, the working details, and the technical effects of the foregoing apparatus provided in the second aspect of this embodiment, reference may be made to the automatic review method described in the first aspect, the possible design one, or the possible design two, which are not described herein again.
As shown in fig. 3, a third aspect of the present embodiment provides a computer device for executing the automatic comment method in the first aspect, the first possible design, or the second possible design, where the computer device includes a memory, a processor, and a transceiver, which are sequentially and communicatively connected, where the memory is used for storing a computer program, and the transceiver is used for sending and receiving information to execute the automatic comment method in the first aspect, the first possible design, or the second possible design. For example, the Memory may include, but is not limited to, a Random-Access Memory (RAM), a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a First-in First-out (FIFO), and/or a First-in Last-out (FILO), and the like; the transceiver may be, but is not limited to, a WiFi (wireless fidelity) wireless transceiver, a bluetooth wireless transceiver, a GPRS (General Packet Radio Service ) wireless transceiver, and/or a ZigBee (ZigBee protocol, low power consumption local area network protocol based on ieee802.15.4 standard) wireless transceiver, etc.; the processor may not be limited to the microprocessor model number STM32F105 family. In addition, the computer device may also include, but is not limited to, a power module, a display screen, and other necessary components.
For the working process, the working details, and the technical effects of the foregoing computer device provided in the third aspect of this embodiment, reference may be made to the automatic comment method described in the first aspect, the possible design one, or the possible design two, which are not described herein again.
A fourth aspect of the present embodiment provides a storage medium storing instructions including the automatic review method in the first aspect, the first possible design or the second possible design, that is, the storage medium stores instructions that, when executed on a computer, perform the automatic review method in the first aspect, the first possible design or the second possible design. The storage medium refers to a carrier for storing data, and may include, but is not limited to, a floppy disk, an optical disk, a hard disk, a flash Memory, a flash disk and/or a Memory Stick (Memory Stick), etc., and the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
For the working process, the working details, and the technical effects of the foregoing storage medium provided in the fourth aspect of this embodiment, reference may be made to the automatic review method described in the first aspect, the first possible design, or the second possible design, which is not described herein again.
A fifth aspect of the present embodiments provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the automatic review method as described in the first aspect, the possible design one, or the possible design two. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices.
The embodiments described above are merely illustrative, and may or may not be physically separate if they refer to units illustrated as separate components; if reference is made to a component displayed as a unit, it may or may not be a physical unit, i.e. it may be located in one place, or it may be distributed over a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications may be made to the embodiments described above, or equivalents may be substituted for some of the features described. And such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Finally, it should be noted that the present invention is not limited to the above alternative embodiments, and that various other forms of products can be obtained by anyone in light of the present invention. The above detailed description should not be taken as limiting the scope of the invention, which is defined in the claims, and which the description is intended to be interpreted accordingly.
Claims (9)
1. An automatic review method, comprising:
according to the title of an article to be commented, obtaining at least one piece of news and comment information through web crawling, wherein the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment number and comment reply number corresponding to the comment content;
semantic coding is carried out on the article to be commented on the text dimension, the picture dimension and the video dimension, and an article text semantic vector, an article picture semantic vector and an article video semantic vector are obtained;
semantic coding is respectively carried out on the news in each piece of news and comment information on a text dimension, a picture dimension and a video dimension aiming at the at least one piece of news and comment information, and a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news are obtained;
aiming at the at least one piece of news and comment information, importing the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer to obtain a first matching detection result of each piece of news and the article to be commented;
if the first matching detection result contains at least one matching news matched with the to-be-evaluated paper chapter, weighting and calculating corresponding comment reply number, comment praise number and news source weight coefficient aiming at each comment content of the at least one matching news to obtain a corresponding recommendation index value;
determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value;
after obtaining a first matching detection result of each news and the article to be reviewed, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news matching the article to be reviewed, the method further includes: obtaining a topic and/or news event triple of the article to be commented; according to the topic and/or news event triples of the article to be commented, at least one existing comment template similar to the article to be commented on in topic similar dimension and/or news event triples similar dimension is searched from a comment template library, wherein the comment template library stores a plurality of existing comment templates, each existing comment template comprises template comment content and the topic and/or news event triples bound with the template comment content, and the template comment content comprises at least one slot to be filled, which corresponds to at least one comment entity candidate word one by one; performing entity extraction on the article to be commented to obtain at least one article entity candidate word; for the at least one existing comment template, filling pairwise matching slots of the at least one article entity candidate word and the at least one comment entity candidate word in each existing comment template to obtain corresponding new comment content; for at least one new comment content, applying a DNN language model to score each new comment content to obtain a score value of each new comment content; and determining the new comment content corresponding to the highest comment value as the automatic comment content of the article to be commented, wherein the highest comment value is the highest value in the comment values of all the new comment contents.
2. The automatic review method of claim 1, wherein obtaining topics of the article to be reviewed comprises:
crawling at least one topic and news information of a whole network news source in real time based on a web crawler technology, wherein the topic and news information comprises one topic and at least one topic news belonging to the topic;
performing bad audit processing including negative information filtering processing, sensitive information filtering processing and false information filtering processing on all the crawled topics and news information to obtain at least one piece of compliant topic and news information;
performing semantic coding on each topic and news information in a text dimension, a picture dimension and a video dimension respectively aiming at the at least one piece of compliant topic and news information to obtain topic text semantic vectors, topic picture semantic vectors and topic video semantic vectors of each topic and news information, and then performing de-duplication processing on the at least one piece of compliant topic and news information in a text similar dimension, a picture similar dimension and a video similar dimension according to the topic text semantic vectors, the topic picture semantic vectors and the topic video semantic vectors of all the topic and news information to obtain at least one piece of non-repetitive topic and news information;
<xnotran> , , , , , , , , , ; </xnotran>
Aiming at the at least one hot topic, importing the topic text semantic vector, the topic picture semantic vector, the topic video semantic vector and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a second deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer to obtain a second matching detection result of each hot topic and the article to be commented;
and taking the hot topic in the second matching detection result and matched with the chapter of the paper to be commented as the topic of the article to be commented.
3. The automatic review method of claim 1, wherein obtaining news event triples for the article to be reviewed comprises:
performing sentence division processing on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, respectively extracting open domain events of the article sentences by applying a DMCNN event extraction algorithm to obtain sentence event triples of the article sentences;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, modeling and scoring are carried out on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
and aiming at the plurality of sentence event triples, carrying out weighted sequencing according to sentence importance index values of corresponding article sentences, and taking the sentence event triplet with the highest sequencing as a news event triplet of the article to be commented on.
4. The automatic review method of claim 1, wherein the entity extraction of the article to be reviewed to obtain at least one article entity candidate word comprises:
performing sentence division processing on the paper chapter to be evaluated to obtain a plurality of article sentences, wherein the article sentences comprise an article title sentence and at least one article content sentence;
aiming at the article sentences, a second bert pre-training model is applied to map the article sentences into corresponding sentence semantic vectors respectively;
aiming at the article sentences, carrying out modeling scoring on each article sentence based on the sentence semantic vector by applying a TextRank algorithm to obtain a sentence important index value of each article sentence, wherein the TextRank algorithm adopts the sentence semantic vector of the article title sentence as a central vector;
aiming at the article sentences, entity extraction is carried out on each article sentence by combining a dictionary and a deep learning model, and confidence scores are carried out on the extraction results by adopting a weighted score method to obtain a plurality of sentence entities and corresponding confidence scores;
and for the plurality of sentence entities, performing inverted arrangement of weighted scores according to the corresponding confidence score and the sentence importance index value of the sentence of the article to which the sentence belongs, and taking at least one sentence entity ranked at the front as at least one article entity candidate word.
5. The automatic review method of claim 1, wherein after determining a review content corresponding to a maximum recommendation index value as an automatic review content of the article to be reviewed, the method further comprises:
determining the comment content corresponding to the maximum recommendation index value as target comment content, and determining the news corresponding to the maximum recommendation index value as target news;
performing entity extraction on the target comment content to obtain at least one new comment entity candidate word;
converting the target comment content into new template comment content in which the slot to be filled corresponds to the at least one new comment entity candidate word one by one;
obtaining a topic and/or news event triple of the target news;
binding and storing the new template comment content and the topic of the target news and/or news event triples in a new comment template;
and adding the new comment template into the comment template library.
6. The automatic review method of claim 1, wherein crawling the web to obtain at least one piece of news and review information according to the title of the article to be reviewed comprises:
carrying out quantitative analysis on comment quality dimension, comment quantity dimension and comment interaction dimension aiming at different news sources in the whole network news sources to obtain weight coefficients of the different news sources;
for each news source in the whole network news sources, dynamically distributing captured resources based on the weight coefficient of the news source, and obtaining at least one piece of original news and comment information from the news source through a dynamic real-time crawling algorithm;
aiming at all the original news and comment information obtained by crawling, applying a first bert pre-training model to respectively judge the emotional polarity of each piece of news and comment information, and then filtering information corresponding to a negative judgment result to obtain at least one piece of non-negative news and comment information;
aiming at the at least one piece of non-negative news and comment information, respectively carrying out sensitive word detection on each piece of news and comment information by applying a sensitive information detection algorithm based on a dictionary, pinyin, special-shaped words and/or a deep learning model, and then filtering information containing sensitive words to obtain at least one piece of non-sensitive news and comment information;
aiming at the at least one piece of insensitive news and comment information, false information discrimination is respectively carried out on each piece of news and comment information by applying a false information discrimination algorithm based on rules, a knowledge map and/or a deep learning model, and then information corresponding to a false discrimination result is filtered out to obtain at least one piece of compliant news and comment information;
and for the at least one piece of compliant news and comment information, sequentially performing dirty data cleaning processing and duplicate removal processing for the same news source to obtain the at least one piece of news and comment information.
7. An automatic comment device is characterized by comprising a full-network information crawling module, an article semantic coding module, a news semantic coding module, an article matching detection module, a recommendation index calculation module and an automatic comment content determination module;
the system comprises a global network information crawling module, a global network information crawling module and a global network information crawling module, wherein the global network information crawling module is used for crawling at least one piece of news and comment information in a global network according to the title of an article to be commented, and the news and comment information comprises a piece of news with the same title as the article to be commented, at least one piece of comment content affiliated to the news, and comment praise number and comment return number corresponding to the comment content;
the article semantic coding module is in communication connection with the full-network information crawling module and is used for performing semantic coding on the article to be commented on a text dimension, a picture dimension and a video dimension to obtain an article text semantic vector, an article picture semantic vector and an article video semantic vector;
the news semantic coding module is used for semantic coding the news in each piece of news and comment information on the text dimension, the picture dimension and the video dimension aiming at the at least one piece of news and comment information to obtain a news text semantic vector, a news picture semantic vector and a news video semantic vector of each piece of news;
the article matching detection module is respectively in communication connection with the article semantic coding module and the news semantic coding module, and is used for importing, for the at least one piece of news and comment information, the news text semantic vector, the news picture semantic vector and the news video semantic vector of all news, and the article text semantic vector, the article picture semantic vector and the article video semantic vector of the article to be commented into a first deep learning matching detection model constructed based on a text similar dimension, a picture similar dimension, a video similar dimension and a full connection layer, so as to obtain a first matching detection result of each piece of news and the article to be commented;
the recommendation index calculation module is communicatively connected to the article matching detection module and configured to, when the first matching detection result includes at least one matching news matching the to-be-evaluated article, perform weighted calculation on corresponding review reply numbers, review praise numbers and news source weight coefficients of the at least one matching news, so as to obtain corresponding recommendation index values;
the automatic comment content determining module is in communication connection with the recommendation index calculating module and is used for determining comment content corresponding to a maximum recommendation index value as automatic comment content of the article to be commented, wherein the maximum recommendation index value is the maximum value of at least one recommendation index value;
the automatic comment content determining module is further configured to, after obtaining a first matching detection result between each piece of news and the article to be commented, if all obtained recommendation index values are smaller than a preset index threshold value or the first matching detection result does not include at least one matching news matching the article to be commented, then: obtaining a topic and/or news event triple of the article to be commented; according to the topic and/or news event triples of the article to be commented, at least one existing comment template similar to the article to be commented on in topic similar dimension and/or news event triples similar dimension is searched from a comment template library, wherein the comment template library stores a plurality of existing comment templates, each existing comment template comprises template comment content and the topic and/or news event triples bound with the template comment content, and the template comment content comprises at least one slot to be filled, which corresponds to at least one comment entity candidate word one by one; performing entity extraction on the article to be commented to obtain at least one article entity candidate word; for the at least one existing comment template, filling pairwise matching slots of the at least one article entity candidate word and the at least one comment entity candidate word in each existing comment template to obtain corresponding new comment content; for at least one new comment content, applying a DNN language model to score each new comment content to obtain a score value of each new comment content; and determining the new comment content corresponding to the highest comment value as the automatic comment content of the article to be commented, wherein the highest comment value is the highest value in the comment values of all the new comment contents.
8. A computer device comprising a memory, a processor and a transceiver, which are in communication connection in turn, wherein the memory is used for storing a computer program, the transceiver is used for transmitting and receiving information, and the processor is used for reading the computer program and executing the automatic comment method according to any one of claims 1 to 6.
9. A storage medium having stored thereon instructions for performing the automatic review method of any one of claims 1-6 when the instructions are run on a computer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110169250.8A CN112836487B (en) | 2021-02-07 | 2021-02-07 | Automatic comment method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110169250.8A CN112836487B (en) | 2021-02-07 | 2021-02-07 | Automatic comment method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112836487A CN112836487A (en) | 2021-05-25 |
CN112836487B true CN112836487B (en) | 2023-01-24 |
Family
ID=75932689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110169250.8A Active CN112836487B (en) | 2021-02-07 | 2021-02-07 | Automatic comment method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836487B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486260B (en) * | 2021-07-15 | 2023-01-06 | 北京三快在线科技有限公司 | Method and device for generating interactive information, computer equipment and storage medium |
CN115730030B (en) * | 2021-08-26 | 2024-07-05 | 腾讯科技(深圳)有限公司 | Comment information processing method and related device |
CN113946681B (en) * | 2021-12-20 | 2022-03-29 | 军工保密资格审查认证中心 | Text data event extraction method and device, electronic equipment and readable medium |
CN114548073B (en) * | 2022-01-20 | 2024-09-06 | 浙江大学 | Denoising method based on semantic communication system |
CN114492407B (en) * | 2022-01-26 | 2022-12-30 | 中国科学技术大学 | News comment generation method, system, equipment and storage medium |
CN114896958A (en) * | 2022-05-17 | 2022-08-12 | 北京三快在线科技有限公司 | Method, device, server and storage medium for publishing comment text |
CN114969371A (en) * | 2022-05-31 | 2022-08-30 | 北京智谱华章科技有限公司 | Heat sorting method and device of combined knowledge graph |
CN116306514B (en) * | 2023-05-22 | 2023-09-08 | 北京搜狐新媒体信息技术有限公司 | Text processing method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN108170773A (en) * | 2017-12-26 | 2018-06-15 | 百度在线网络技术(北京)有限公司 | Media event method for digging, device, computer equipment and storage medium |
CN110097419A (en) * | 2019-03-29 | 2019-08-06 | 努比亚技术有限公司 | Commodity data processing method, computer equipment and storage medium |
CN110162752A (en) * | 2019-05-13 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Article sentences weight processing method, device and electronic equipment |
CN110569334A (en) * | 2019-09-11 | 2019-12-13 | 北京搜狐新动力信息技术有限公司 | method and device for automatically generating comments |
CN110688832A (en) * | 2019-10-10 | 2020-01-14 | 河北省讯飞人工智能研究院 | Comment generation method, device, equipment and storage medium |
CN111263238A (en) * | 2020-01-17 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Method and equipment for generating video comments based on artificial intelligence |
CN112203122A (en) * | 2020-10-10 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based similar video processing method and device and electronic equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893410A (en) * | 2015-11-18 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | Keyword extraction method and apparatus |
WO2017147785A1 (en) * | 2016-03-01 | 2017-09-08 | Microsoft Technology Licensing, Llc | Automated commentary for online content |
CN108153723B (en) * | 2017-12-27 | 2021-10-19 | 北京百度网讯科技有限公司 | Method and device for generating hotspot information comment article and terminal equipment |
CN110516067B (en) * | 2019-08-23 | 2022-02-11 | 北京工商大学 | Public opinion monitoring method, system and storage medium based on topic detection |
CN112182335A (en) * | 2020-09-28 | 2021-01-05 | 四川封面传媒有限责任公司 | Hot news capturing method and device and server |
-
2021
- 2021-02-07 CN CN202110169250.8A patent/CN112836487B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN108170773A (en) * | 2017-12-26 | 2018-06-15 | 百度在线网络技术(北京)有限公司 | Media event method for digging, device, computer equipment and storage medium |
CN110097419A (en) * | 2019-03-29 | 2019-08-06 | 努比亚技术有限公司 | Commodity data processing method, computer equipment and storage medium |
CN110162752A (en) * | 2019-05-13 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Article sentences weight processing method, device and electronic equipment |
CN110569334A (en) * | 2019-09-11 | 2019-12-13 | 北京搜狐新动力信息技术有限公司 | method and device for automatically generating comments |
CN110688832A (en) * | 2019-10-10 | 2020-01-14 | 河北省讯飞人工智能研究院 | Comment generation method, device, equipment and storage medium |
CN111263238A (en) * | 2020-01-17 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Method and equipment for generating video comments based on artificial intelligence |
CN112203122A (en) * | 2020-10-10 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based similar video processing method and device and electronic equipment |
Non-Patent Citations (4)
Title |
---|
Automatic Article Commenting: the Task and Dataset;Lianhui Qin等;《arXiv:1805.03668v2》;20180511;1-13 * |
Automatic Generation of Personalized Comment Based on User Profile;Wenhuan Zeng等;《arXiv》;20190714;1-7 * |
融合门控注意力机制的基于生成对抗网络模型的新闻评论自动生成方法研究;王茹皓等;《科教文汇(中旬刊)》;20201020;89-90 * |
面向社交媒体的评论自动生成系统的设计与实现;孙腾;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20201015;I138-53 * |
Also Published As
Publication number | Publication date |
---|---|
CN112836487A (en) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112836487B (en) | Automatic comment method and device, computer equipment and storage medium | |
CN110717339B (en) | Semantic representation model processing method and device, electronic equipment and storage medium | |
CN110717017B (en) | Method for processing corpus | |
CN109657054B (en) | Abstract generation method, device, server and storage medium | |
CN111680173A (en) | CMR model for uniformly retrieving cross-media information | |
CN103544266B (en) | A kind of method and device for searching for suggestion word generation | |
CN109408622A (en) | Sentence processing method and its device, equipment and storage medium | |
CN109325146A (en) | A kind of video recommendation method, device, storage medium and server | |
CN116975615A (en) | Task prediction method and device based on video multi-mode information | |
CN116821307B (en) | Content interaction method, device, electronic equipment and storage medium | |
CN116821781A (en) | Classification model training method, text analysis method and related equipment | |
CN115455171A (en) | Method, device, equipment and medium for mutual retrieval and model training of text videos | |
CN114078468B (en) | Voice multi-language recognition method, device, terminal and storage medium | |
CN116977992A (en) | Text information identification method, apparatus, computer device and storage medium | |
CN114661951A (en) | Video processing method and device, computer equipment and storage medium | |
CN112836110B (en) | Hotspot information mining method and device, computer equipment and storage medium | |
CN111986259B (en) | Training of pigment and text detection model, auditing method of video data and related device | |
CN112712056A (en) | Video semantic analysis method and device, storage medium and electronic equipment | |
CN116978028A (en) | Video processing method, device, electronic equipment and storage medium | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium | |
CN116186220A (en) | Information retrieval method, question and answer processing method, information retrieval device and system | |
CN115269961A (en) | Content search method and related device | |
CN116090450A (en) | Text processing method and computing device | |
CN114547435B (en) | Content quality identification method, device, equipment and readable storage medium | |
CN114662496A (en) | Information identification method, device, equipment, storage medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |