CN117151109A - Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium - Google Patents

Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117151109A
CN117151109A CN202310907670.0A CN202310907670A CN117151109A CN 117151109 A CN117151109 A CN 117151109A CN 202310907670 A CN202310907670 A CN 202310907670A CN 117151109 A CN117151109 A CN 117151109A
Authority
CN
China
Prior art keywords
public opinion
opinion information
information
preset
public
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310907670.0A
Other languages
Chinese (zh)
Inventor
刘锐钢
李鸣扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202310907670.0A priority Critical patent/CN117151109A/en
Publication of CN117151109A publication Critical patent/CN117151109A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to the technical field of natural language processing, and provides a public opinion information auditing method, a public opinion information auditing device, electronic equipment and a storage medium, wherein the public opinion information auditing method comprises the following steps: storing the priority level of the public opinion information and the public opinion scene label in a public opinion information base; based on a preset generated abstract model, acquiring public opinion information abstracts of each piece of public opinion information according to a preset processing sequence, wherein the preset processing sequence is a sequence with priority level from high to low; based on a finite automaton DFA algorithm, matching keywords in a preset keyword library corresponding to a public opinion information abstract of public opinion information and a public opinion scene label of the public opinion information according to a preset processing sequence to obtain successfully matched target keywords; and determining the comprehensive confidence coefficient of the public opinion information based on the target keyword, the priority level and the public opinion information abstract, and auditing the public opinion information according to the comprehensive confidence coefficient according to a preset processing sequence, thereby improving the content auditing efficiency, avoiding occurrence of events such as forward erroneous judgment and the like, and reducing the auditing erroneous judgment rate.

Description

Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of natural language processing, in particular to a public opinion information auditing method, a public opinion information auditing device, electronic equipment and a storage medium.
Background
With the explosive growth of internet online services, the data scale is rapidly expanded, and the variety and style of media content display are more and more abundant. Meanwhile, the public opinion information quantity brought by the data scale explosion is increased along with the water-rise ship, the daily new public opinion information quantity is over ten millions of scales, the public opinion risk is increased increasingly, the traditional manual public opinion data interpretation mode is relied on, the labor cost is high, the interpretation efficiency is low, and the unified standard is lacking.
In the related technology, the content auditing mode matched with the full text keywords is used for automatic auditing, however, the content auditing mode matched with the full text keywords is slow in matching speed, long in auditing time and extremely high in resource consumption, and the content auditing speed is adversely affected, misjudgment is often caused in the current auditing process, subsequent manual review is needed, auditing resources are wasted, and auditing efficiency is reduced.
Disclosure of Invention
The embodiment of the application provides a public opinion information auditing method, device, electronic equipment and storage medium, which aim to reduce auditing misjudgment rate and improve public opinion content auditing efficiency.
In a first aspect, an embodiment of the present application provides a public opinion information auditing method, including:
determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in an associated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
based on a preset generated abstract model, obtaining public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence, wherein the preset processing sequence is the sequence from high priority to low priority;
based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to the preset processing sequence to obtain at least one target keyword successfully matched;
and determining the comprehensive confidence of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence.
In some embodiments, the public opinion information base further stores a tag of at least one word in the public opinion information;
the labels of the words are obtained based on text word segmentation and part-of-speech tagging of the public opinion information, and different types of labels of the words are provided with different preset weights, wherein the preset weights are used for determining comprehensive confidence of the public opinion information.
In some embodiments, the determining the integrated confidence of the public opinion information based on the at least one target keyword of the public opinion information, the priority level of the public opinion information, and the public opinion information summary of the public opinion information comprises:
determining target labels of all target keywords of the public opinion information, wherein the target labels of the target keywords are labels of words successfully matched with the target keywords in the public opinion information;
determining the keyword number corresponding to each target keyword, the preset weight corresponding to the target label of each target keyword, the preset index corresponding to the priority level of the public opinion information and the total word number of the public opinion information abstract of the public opinion information;
and carrying out weighted calculation on the preset index and the total word number according to the keyword number and the preset weight corresponding to each target keyword to obtain the comprehensive confidence coefficient of the public opinion information.
In some embodiments, the label of at least one word in the public opinion information is obtained by:
acquiring at least one piece of public opinion information crawled by a web crawler;
performing text word segmentation and part-of-speech tagging on each piece of public opinion information through a pre-trained text word segmentation LAC model, and labeling words with part-of-speech tagging to obtain a label of at least one word in the public opinion information;
the pre-trained text word segmentation LAC model is obtained by training an initial text word segmentation LAC model by using a preset public opinion information word segmentation dictionary.
In some embodiments, the priority level of public opinion information is obtained by:
carrying out emotion analysis on public opinion information based on an emotion dictionary to obtain emotion analysis results of the public opinion information;
setting the priority level of the public opinion information as high priority under the condition that the emotion analysis result is negative emotion;
setting the priority level of the public opinion information as a medium priority under the condition that the emotion analysis result is a neutral emotion;
and setting the priority level of the public opinion information as low priority under the condition that the emotion analysis result is positive emotion.
In some embodiments, the public opinion scene label of the public opinion information is obtained by:
and inputting the public opinion information into a pre-trained language representation BERT model to obtain a public opinion scene label of the public opinion information output by the pre-trained language representation BERT model.
In some embodiments, the preset generation type abstract model includes a pre-trained language representation BERT model and a pointer generation network model, and the public opinion information abstract of the public opinion information is obtained by the following ways:
obtaining word vectors of the public opinion information through the pre-trained language representation BERT model, and obtaining sentence feature scores of the public opinion information by utilizing multidimensional semantic features;
and splicing the word vector and the sentence characteristic score into a target input sequence, and processing the target input sequence through a pointer generation network model and a convergence mechanism to obtain a public opinion information abstract of the public opinion information.
In a second aspect, an embodiment of the present application provides a public opinion information auditing apparatus, including:
the public opinion classification module is used for determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in a correlated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
The public opinion abstract module is used for acquiring public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence based on a preset generation type abstract model, wherein the preset processing sequence is the sequence from high priority to low priority;
the keyword matching module is used for matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to the public opinion scene label of the public opinion information according to the preset processing sequence based on a finite automaton DFA algorithm, so as to obtain at least one target keyword which is successfully matched;
and the public opinion auditing module is used for determining the comprehensive confidence level of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence level.
In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the public opinion information auditing method of the first aspect is implemented.
In a fourth aspect, an embodiment of the present application provides a non-transitory computer readable storage medium, where the non-transitory computer readable storage medium includes a computer program, where the computer program when executed by a processor implements the public opinion information auditing method according to the first aspect.
The public opinion information auditing method, the public opinion information auditing device, the electronic equipment and the storage medium provided by the embodiment of the application determine the priority level and the public opinion scene label of each piece of public opinion information, and store the priority level and the public opinion scene label of the public opinion information in a public opinion information base, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information; sequentially inputting public opinion information in a public opinion information base into a preset generated abstract model according to a preset processing sequence to obtain public opinion information abstracts of all pieces of public opinion information sequentially output by the preset generated abstract model, wherein the preset processing sequence is a sequence with priority level from high to low; based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to a preset processing sequence to obtain at least one target keyword successfully matched; according to the method, the priority level of each piece of public opinion information is determined through emotion analysis so as to process each piece of public opinion information from high to low in the subsequent priority level, content auditing efficiency is improved, content auditing speed is improved, further in the process of public opinion information auditing, on one hand, deep semantic information such as public opinion information abstracts are generated in a mode of generating the abstracts, hidden semantic, variant word, connotation semantic and the like in texts are better understood, and key word matching is performed pertinently by utilizing a preset keyword library corresponding to public opinion scene labels of the public opinion information, so that content auditing efficiency is improved under the condition of guaranteeing identification accuracy, on the other hand, the occurrence of events such as forward judgement is avoided according to the priority level of the target keyword, the public opinion information and the public opinion information integrated confidence of the public opinion information, and the occurrence of false judgement is reduced.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a public opinion information auditing method provided by an embodiment of the present application;
FIG. 2 is a flow chart for determining integrated confidence provided by an embodiment of the present application;
FIG. 3 is a flowchart of classification of public opinion information according to an embodiment of the present application;
FIG. 4 is a flowchart of public opinion information abstract generation provided by an embodiment of the present application;
FIG. 5 is a block diagram of a public opinion information auditing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Fig. 1 is a flowchart of a public opinion information auditing method according to an embodiment of the present application. Referring to fig. 1, an embodiment of the present application provides a public opinion information auditing method including:
step 101, determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in an associated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
in one example, the priority levels of public opinion information include a high priority, a medium priority, and a low priority, and specifically, the priority levels in this embodiment are used to determine the processing order of public opinion information, for example, in the public opinion information auditing process, the processing order of the high priority public opinion information should be located before the processing order of the medium priority public opinion information, and the processing order of the medium priority public opinion information should be located before the processing order of the low priority public opinion information.
In this embodiment, emotion analysis is performed on public opinion information to determine whether positive or negative opinion or positive, negative or neutral opinion is expressed in the public opinion information, and the priority level of the public opinion information is determined according to the preset priority levels of various opinions.
The public opinion scene labels in the embodiment include but are not limited to social, entertainment, education and other scene labels of different categories.
It should be noted that the public opinion information base is used for storing information related to public opinion information, including but not limited to, priority level of public opinion information, public opinion scene label, public opinion title, public opinion information content, source web page link of public opinion information, etc.
In one example, public opinion information disclosed by each media organization is periodically crawled through a web crawler, and related information obtained after the public opinion information is processed is associated and stored in a public opinion information base so as to carry out public opinion information auditing and treatment according to the information stored in the public opinion information base.
Step 102, based on a preset generated abstract model, obtaining public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence, wherein the preset processing sequence is the sequence from high priority to low priority;
it should be noted that, the preset generated formula abstract model in this embodiment refers to a model for generating a public opinion information abstract based on a mode of generating a formula abstract.
The method comprises the steps of generating a formula abstract, analyzing grammar and semantics of public opinion information by using a natural language understanding technology, fusing the information, and generating a new public opinion information abstract by using a natural language generating technology. The generated abstract is not limited to the original public opinion information, words or phrases in the original public opinion information are not simply utilized to form the abstract, but the semantic ideas are expressed in different expression modes after being obtained from the original public opinion information, the abstract can be generated by utilizing new words or phrases, the original text semantics can be expressed more accurately, the flexibility is high, and deep semantic information such as hidden obscuration semantics, variant words, connotation semantics and the like in the text can be better understood, so that the content auditing effect is improved.
It can be understood that, since a plurality of pieces of public opinion information are stored in the public opinion information base, in order to improve the content auditing efficiency, the public opinion information in the public opinion information base is sequentially input into the preset generation type abstract model according to the order of the priority level from high to low, and the public opinion information abstract of each piece of public opinion information is output by the preset generation type abstract model.
Step 103, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to the preset processing sequence based on a finite automaton DFA algorithm, and obtaining at least one target keyword which is successfully matched;
in particular, the principle of the finite automaton (Deterministic Finite Automaton, DFA) algorithm is to construct a tree-like search structure in advance, and then to perform a very efficient search in the tree-like structure based on the input. The efficient keyword filtering can be realized by using the DFA algorithm, and all the existing keywords can be replaced by traversing the character string input by the user once.
It should be noted that, different preset keyword libraries are set for public opinion information of different public opinion scenes, so that keywords in the preset keyword libraries corresponding to the different public opinion scenes are customized, and keywords in the different preset keyword libraries are adopted for matching for public opinion information of the different public opinion scenes, so that content auditing efficiency and content auditing pertinence are improved.
In one example, the keywords in the preset keyword library may be directly collected from historical public opinion information, such as public opinion information related to entertainment, the keywords in the preset keyword library may be collected from user comments under a large amount of entertainment news, articles and videos issued by individual users for entertainment events, and the like, the keywords in the preset keyword library may be collected from user comments under a large amount of social events, articles and videos issued by individual users for social events, and the like, and may include not only variants, shorthand words, and the like of the words directly collected from the historical public opinion information, but also the keywords directly collected, which is not limited in this embodiment.
And 104, determining the comprehensive confidence level of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence level.
Specifically, referring to fig. 2, in this embodiment, after matching a public opinion information abstract of public opinion information with a keyword in a preset keyword library corresponding to a public opinion scene tag of public opinion information, under the condition of successful matching, calculating a comprehensive confidence coefficient of the public opinion information, converting the comprehensive confidence coefficient into a percentage system, setting different audit modes by setting a plurality of preset violation thresholds, judging as a system audit violation when the confidence coefficient is greater than the preset violation thresholds, and finally, judging whether the public opinion information is illegal or not by combining with manual review.
Further, in order to reduce the false positive probability, different weights are set for different types of target keywords, and specifically, labels of at least one word in the public opinion information are also stored in the public opinion information base; the labels of the words are obtained based on text word segmentation and part-of-speech tagging of the public opinion information, and different weights are arranged on different types of words and used for determining comprehensive confidence of the public opinion information.
For example, in one example, the part of speech is tagged to extract name information, location information, time information, organization information, and the like. In this embodiment, different weights may be preset for the words with different attributes, for example, the name matching weight is set to 3, the organization name is set to 2, etc., so as to adjust the weight ratio of the words in each category when the comprehensive confidence is calculated later.
It can be understood that the target keyword is a keyword in the public opinion information, which is matched with a certain word in the preset keyword library, for example, the word "a" in the public opinion information is successfully matched with the keyword "a" in the preset keyword library, the word "B" in the public opinion information is successfully matched with the keyword "B" in the preset keyword library, and because the attributes of the words in the public opinion information are different, for example, the word "a" is a name of a person and the word "B" is a place name, and because different weights are set for the words in different categories, the target keyword matched with the word of each attribute also has different weights correspondingly.
In this embodiment, the comprehensive confidence is calculated by integrating the target keyword, the priority level of the public opinion information and the public opinion information abstract information weight of the public opinion information, and by defining different tag weights, the public opinion information comprehensive confidence is calculated by weighting, and meanwhile, different thresholds are set for judging whether the public opinion information is illegal, so that the false judgment rate is reduced, the occurrence of events such as forward false judgment is avoided, and the public opinion information auditing efficiency is improved.
According to the public opinion information auditing method provided by the embodiment of the application, the priority level and the public opinion scene label of each piece of public opinion information are determined, and the priority level and the public opinion scene label of the public opinion information are stored in a public opinion information base, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information; sequentially inputting public opinion information in a public opinion information base into a preset generated abstract model according to a preset processing sequence to obtain public opinion information abstracts of all pieces of public opinion information sequentially output by the preset generated abstract model, wherein the preset processing sequence is a sequence with priority level from high to low; based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to a preset processing sequence to obtain at least one target keyword successfully matched; according to the method, the priority level of each piece of public opinion information is determined through emotion analysis so as to process each piece of public opinion information from high to low in the subsequent priority level, content auditing efficiency is improved, content auditing speed is improved, further in the process of public opinion information auditing, on one hand, deep semantic information such as public opinion information abstracts are generated in a mode of generating the abstracts, hidden semantic, variant word, connotation semantic and the like in texts are better understood, and key word matching is performed pertinently by utilizing a preset keyword library corresponding to public opinion scene labels of the public opinion information, so that content auditing efficiency is improved under the condition of guaranteeing identification accuracy, on the other hand, the occurrence of events such as forward judgement is avoided according to the priority level of the target keyword, the public opinion information and the public opinion information integrated confidence of the public opinion information, and the occurrence of false judgement is reduced.
In some embodiments, the determining the integrated confidence of the public opinion information based on the at least one target keyword of the public opinion information, the priority level of the public opinion information, and the public opinion information summary of the public opinion information comprises:
determining target labels of all target keywords of the public opinion information, wherein the target labels of the target keywords are labels of words successfully matched with the target keywords in the public opinion information;
determining the keyword number corresponding to each target keyword, the preset weight corresponding to the target label of each target keyword, the preset index corresponding to the priority level of the public opinion information and the total word number of the public opinion information abstract of the public opinion information;
and carrying out weighted calculation on the preset index and the total word number according to the keyword number and the preset weight corresponding to each target keyword to obtain the comprehensive confidence coefficient of the public opinion information.
Specifically, the comprehensive confidence P is weighted and calculated in the present embodiment in the following manner:
wherein, a refers to the keyword number corresponding to the target keyword, B refers to the preset weight corresponding to the target label of the target keyword, M refers to the total word number of the public opinion information abstract, i refers to the preset index corresponding to the priority level of the public opinion information, for example, a high priority of 1.3, a medium priority of 1.2 and a low priority of 1 can be set.
For example, if the total word number of the public opinion information abstract of a piece of high-priority public opinion information is 30, the target keyword is a, b, c, d, e, wherein the word number of the target keyword a is 3, the word number of the target keyword b is 5, the word number of the target keyword c is 5, the word number of the target keyword d is 2, the word number of the target keyword e is 3, the target keyword a belongs to time information, the target keywords b and d belong to place information, the target keyword c belongs to organization name information, the target keyword e belongs to name information, the weight of the name is set to 3, the weight of the organization name is set to 2, the weight of the time is set to 1, and the weight of the place is set to 4, the weighted calculation of the comprehensive confidence P of the piece of high-priority public opinion information is as follows:
according to the method, the false judgment probability is reduced through multi-tag matching and comprehensive confidence calculation, the target keywords of different tags have different weights, the public opinion information of different emotions has different indexes, and the comprehensive confidence of the public opinion information is calculated through weighting.
In some embodiments, the label of at least one word in the public opinion information is obtained by:
acquiring at least one piece of public opinion information crawled by a web crawler;
performing text word segmentation and part-of-speech tagging on each piece of public opinion information through a pre-trained text word segmentation LAC model, and labeling words with part-of-speech tagging to obtain a label of at least one word in the public opinion information;
the pre-trained text word segmentation LAC model is obtained by training an initial text word segmentation LAC model by using a preset public opinion information word segmentation dictionary.
The words in the preset public opinion information word segmentation dictionary are words related to the public opinion information.
In this embodiment, a preset public opinion information word segmentation dictionary is used to perform customized training on a text word segmentation LAC (Lexical Analysis of Chinese, LAC) model, and the preset public opinion information word segmentation dictionary is established to intervene on the LAC model, so that accuracy of the LAC model can be further improved.
Specifically, public opinion information disclosed by each media organization crawled by a web crawler is input into a pre-trained text word segmentation LAC model, text word segmentation, part of speech tagging and proper noun recognition are performed through the pre-trained text word segmentation LAC model, and the processed public opinion information is labeled to obtain labels of at least one word, such as a name label, a place name label, an organization name label and the like.
According to the embodiment, the LAC model is used for text word segmentation and part-of-speech tagging of public opinion information, keywords of the public opinion information such as names of people and names of institutions are extracted and tagged, the weighting calculation of the follow-up comprehensive confidence level is facilitated, furthermore, the LAC model is customized and trained through a preset public opinion information word segmentation dictionary, the dictionary is built for intervention of the model, and the accuracy of the public opinion information word segmentation can be further improved.
In some embodiments, the priority level of public opinion information is obtained by:
carrying out emotion analysis on public opinion information based on an emotion dictionary to obtain emotion analysis results of the public opinion information;
setting the priority level of the public opinion information as high priority under the condition that the emotion analysis result is negative emotion;
setting the priority level of the public opinion information as a medium priority under the condition that the emotion analysis result is a neutral emotion;
and setting the priority level of the public opinion information as low priority under the condition that the emotion analysis result is positive emotion.
In this embodiment, referring to fig. 3, emotion analysis on public opinion information is completed by using a SnowNLP emotion dictionary, emotion scores of the public opinion information are obtained, emotion features in sentences are rapidly captured by using the SnowNLP emotion dictionary, and emotion analysis on the public opinion information is completed.
In one example, it may be given negative emotion to public opinion information whose emotion score is in the range of 0 to 0.4, neutral emotion to public opinion information whose emotion score is in the range of 0.4 to 0.7, positive emotion to public opinion information whose emotion score is in the range of 0.7 to 1.0, high priority to negative public opinion, medium priority to neutral, low priority to positive, and summary generation of public opinion information from high to low in the order of priority in the subsequent auditing task.
In some embodiments, the public opinion scene label of the public opinion information is obtained by:
and inputting the public opinion information into a pre-trained language representation BERT model to obtain a public opinion scene label of the public opinion information output by the pre-trained language representation BERT model.
In this embodiment, referring to fig. 3, the classification of public opinion information is performed based on a pre-trained language representation BERT model, where the BERT model uses a transducer for connection, and is a bi-directional coding model.
In this embodiment, the BERT model collects a training model of public opinion information category labels based on a public opinion information classification label system, and classifies public opinion information into different label categories such as politics, society, entertainment, and the like. And pertinence training is carried out by combining the public opinion information field with the public opinion information base so as to improve the effect of classifying the public opinion in the BERT model.
According to the embodiment, the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information based on the emotion dictionary through the SnowNLP emotion dictionary, and then the public opinion information is correspondingly classified according to the scene corresponding to the public opinion based on the BERT model, so that the running efficiency of an auditing system is greatly improved, and the complexity of subsequent processing is reduced.
In some embodiments, the preset generation type abstract model includes a pre-trained language representation BERT model and a pointer generation network model, and the public opinion information abstract of the public opinion information is obtained by the following ways:
obtaining word vectors of the public opinion information through the pre-trained language representation BERT model, and obtaining sentence feature scores of the public opinion information by utilizing multidimensional semantic features;
and splicing the word vector and the sentence characteristic score into a target input sequence, and processing the target input sequence through a pointer generation network model and a convergence mechanism to obtain a public opinion information abstract of the public opinion information.
Referring to fig. 4, in this embodiment, a pre-trained language representation BERT model and a Pointer Generation Network (PGN) model are combined to form a pre-generated abstract model, and the pre-generated abstract model is input into the model according to the public opinion information priority order, so as to extract the public opinion information abstract, compress the public opinion content, and realize information compaction and quick reading.
Firstly, inputting public opinion information stored in a public opinion information base from high to low according to a priority order, obtaining word vectors of the public opinion information by using a BERT model, scoring sentences in the public opinion information by using multidimensional semantic features to obtain sentence feature scores of the public opinion information, and splicing the sentence feature scores to obtain an input sequence; and then inputting the obtained input sequence into a pointer generation network model, and simultaneously combining a coverage mechanism, so that the problems of non-login words and phrase repetition in the abstract generation process can be relieved, the capability of generating new words is reserved, and finally, the public opinion information abstract is obtained.
The preset generated abstract model provided by the embodiment is different from the simple extraction type abstract model, the pointer generation network PGN combines the advantages of extraction type and generation type, the capability of generating text is reserved while words can be copied from the original text, the constraint of the original text is eliminated, and the generated text sentences are more random and free and accord with normal semantics.
The public opinion information auditing device provided by the embodiment of the application is described below, and the public opinion information auditing device described below and the public opinion information auditing method described above can be correspondingly referred to each other.
Referring to fig. 5, fig. 5 is a block diagram of a public opinion information auditing apparatus according to an embodiment of the present application, where the public opinion information auditing apparatus according to the embodiment of the present application includes:
the public opinion classification module 510 is configured to determine a priority level and a public opinion scene label of each piece of public opinion information, and store the public opinion information, the priority level of the public opinion information, and the public opinion scene label in a public opinion information base in an associated manner, where the priority level of the public opinion information is obtained by performing emotion analysis on the public opinion information;
the public opinion summary module 520 is configured to obtain a public opinion information summary of each piece of public opinion information in the public opinion information base according to a preset processing sequence based on a preset generated summary model, where the preset processing sequence is a sequence from high priority to low priority;
the keyword matching module 530 is configured to match, based on a finite automaton DFA algorithm, the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene tags of the public opinion information according to the preset processing sequence, so as to obtain at least one target keyword that is successfully matched;
and the public opinion auditing module 540 is configured to determine a comprehensive confidence level of the public opinion information based on at least one target keyword of the public opinion information, a priority level of the public opinion information, and a public opinion information abstract of the public opinion information, and audit the public opinion information according to the comprehensive confidence level.
The public opinion information auditing device provided by the embodiment of the application determines the priority level and the public opinion scene label of each piece of public opinion information, and stores the priority level and the public opinion scene label of the public opinion information in a public opinion information base, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information; sequentially inputting public opinion information in a public opinion information base into a preset generated abstract model according to a preset processing sequence to obtain public opinion information abstracts of all pieces of public opinion information sequentially output by the preset generated abstract model, wherein the preset processing sequence is a sequence with priority level from high to low; based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to a preset processing sequence to obtain at least one target keyword successfully matched; according to the method, the priority level of each piece of public opinion information is determined through emotion analysis so as to process each piece of public opinion information from high to low in the subsequent priority level, content auditing efficiency is improved, content auditing speed is improved, further in the process of public opinion information auditing, on one hand, deep semantic information such as public opinion information abstracts are generated in a mode of generating the abstracts, hidden semantic, variant word, connotation semantic and the like in texts are better understood, and key word matching is performed pertinently by utilizing a preset keyword library corresponding to public opinion scene labels of the public opinion information, so that content auditing efficiency is improved under the condition of guaranteeing identification accuracy, on the other hand, the occurrence of events such as forward judgement is avoided according to the priority level of the target keyword, the public opinion information and the public opinion information integrated confidence of the public opinion information, and the occurrence of false judgement is reduced.
In one embodiment, the public opinion information base further stores a tag of at least one word in the public opinion information; the labels of the words are obtained based on text word segmentation and part-of-speech tagging of the public opinion information, and different types of labels of the words are provided with different preset weights, wherein the preset weights are used for determining comprehensive confidence of the public opinion information.
In one embodiment, the public opinion auditing module is further configured to determine a target label of each target keyword of the public opinion information, where the target label of the target keyword is a label of a word successfully matched with the target keyword in the public opinion information; determining the keyword number corresponding to each target keyword, the preset weight corresponding to the target label of each target keyword, the preset index corresponding to the priority level of the public opinion information and the total word number of the public opinion information abstract of the public opinion information; and carrying out weighted calculation on the preset index and the total word number according to the keyword number and the preset weight corresponding to each target keyword to obtain the comprehensive confidence coefficient of the public opinion information.
In one embodiment, the public opinion information auditing device further includes obtaining at least one piece of public opinion information crawled by the web crawler; performing text word segmentation and part-of-speech tagging on each piece of public opinion information through a pre-trained text word segmentation LAC model, and labeling words with part-of-speech tagging to obtain a label of at least one word in the public opinion information; the pre-trained text word segmentation LAC model is obtained by training an initial text word segmentation LAC model by using a preset public opinion information word segmentation dictionary.
In one embodiment, the public opinion classification module is further configured to perform emotion analysis on public opinion information based on an emotion dictionary, so as to obtain an emotion analysis result of the public opinion information; setting the priority level of the public opinion information as high priority under the condition that the emotion analysis result is negative emotion; setting the priority level of the public opinion information as a medium priority under the condition that the emotion analysis result is a neutral emotion; and setting the priority level of the public opinion information as low priority under the condition that the emotion analysis result is positive emotion.
In one embodiment, the public opinion classification module is further configured to input the public opinion information into a pre-trained language representation BERT model, and obtain a public opinion scene tag of the public opinion information output by the pre-trained language representation BERT model.
In one embodiment, the preset generating type abstract model comprises a pre-trained language representation BERT model and a pointer generating network model, and the public opinion abstract module is further used for obtaining word vectors of the public opinion information through the pre-trained language representation BERT model and obtaining sentence feature scores of the public opinion information by utilizing multidimensional semantic features; and splicing the word vector and the sentence characteristic score into a target input sequence, and processing the target input sequence through a pointer generation network model and a convergence mechanism to obtain a public opinion information abstract of the public opinion information.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communication Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may call a computer program in the memory 630 to perform the steps of the public opinion information auditing method, for example, including:
determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in an associated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
based on a preset generated abstract model, obtaining public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence, wherein the preset processing sequence is the sequence from high priority to low priority;
based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to the preset processing sequence to obtain at least one target keyword successfully matched;
And determining the comprehensive confidence of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, an embodiment of the present application further provides a non-transitory computer readable storage medium, where the non-transitory computer readable storage medium includes a computer program, where the computer program may be stored on the non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer program may be capable of executing the steps of the public opinion information auditing method provided in the foregoing embodiments, for example, including:
determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in an associated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
based on a preset generated abstract model, obtaining public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence, wherein the preset processing sequence is the sequence from high priority to low priority;
based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to the preset processing sequence to obtain at least one target keyword successfully matched;
And determining the comprehensive confidence of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. The public opinion information auditing method is characterized by comprising the following steps:
determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in an associated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
based on a preset generated abstract model, obtaining public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence, wherein the preset processing sequence is the sequence from high priority to low priority;
based on a finite automaton DFA algorithm, matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to public opinion scene labels of the public opinion information according to the preset processing sequence to obtain at least one target keyword successfully matched;
And determining the comprehensive confidence of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence.
2. The public opinion information auditing method according to claim 1, characterized in that the public opinion information base also stores a tag of at least one word in the public opinion information;
the labels of the words are obtained based on text word segmentation and part-of-speech tagging of the public opinion information, and different types of labels of the words are provided with different preset weights, wherein the preset weights are used for determining comprehensive confidence of the public opinion information.
3. The public opinion information auditing method of claim 2, characterized in that the determining the comprehensive confidence of the public opinion information based on at least one target keyword of the public opinion information, a priority level of the public opinion information, and a public opinion information abstract of the public opinion information comprises:
determining target labels of all target keywords of the public opinion information, wherein the target labels of the target keywords are labels of words successfully matched with the target keywords in the public opinion information;
Determining the keyword number corresponding to each target keyword, the preset weight corresponding to the target label of each target keyword, the preset index corresponding to the priority level of the public opinion information and the total word number of the public opinion information abstract of the public opinion information;
and carrying out weighted calculation on the preset index and the total word number according to the keyword number and the preset weight corresponding to each target keyword to obtain the comprehensive confidence coefficient of the public opinion information.
4. The public opinion information auditing method according to claim 2, characterized in that the label of at least one word in the public opinion information is obtained by:
acquiring at least one piece of public opinion information crawled by a web crawler;
performing text word segmentation and part-of-speech tagging on each piece of public opinion information through a pre-trained text word segmentation LAC model, and labeling words with part-of-speech tagging to obtain a label of at least one word in the public opinion information;
the pre-trained text word segmentation LAC model is obtained by training an initial text word segmentation LAC model by using a preset public opinion information word segmentation dictionary.
5. The public opinion information auditing method according to claim 1, characterized in that the priority level of public opinion information is obtained by:
Carrying out emotion analysis on public opinion information based on an emotion dictionary to obtain emotion analysis results of the public opinion information;
setting the priority level of the public opinion information as high priority under the condition that the emotion analysis result is negative emotion;
setting the priority level of the public opinion information as a medium priority under the condition that the emotion analysis result is a neutral emotion;
and setting the priority level of the public opinion information as low priority under the condition that the emotion analysis result is positive emotion.
6. The public opinion information auditing method according to claim 1, characterized in that the public opinion scene label of the public opinion information is obtained by:
and inputting the public opinion information into a pre-trained language representation BERT model to obtain a public opinion scene label of the public opinion information output by the pre-trained language representation BERT model.
7. The public opinion information auditing method according to any of claims 1 to 6, characterized in that the pre-set generated abstract model comprises a pre-trained language representation BERT model and a pointer generated network model, and the public opinion information abstract is obtained by:
Obtaining word vectors of the public opinion information through the pre-trained language representation BERT model, and obtaining sentence feature scores of the public opinion information by utilizing multidimensional semantic features;
and splicing the word vector and the sentence characteristic score into a target input sequence, and processing the target input sequence through a pointer generation network model and a convergence mechanism to obtain a public opinion information abstract of the public opinion information.
8. A public opinion information auditing device, characterized by comprising:
the public opinion classification module is used for determining the priority level and the public opinion scene label of each piece of public opinion information, and storing the public opinion information, the priority level of the public opinion information and the public opinion scene label in a public opinion information base in a correlated manner, wherein the priority level of the public opinion information is obtained by carrying out emotion analysis on the public opinion information;
the public opinion abstract module is used for acquiring public opinion information abstracts of each piece of public opinion information in the public opinion information base according to a preset processing sequence based on a preset generation type abstract model, wherein the preset processing sequence is the sequence from high priority to low priority;
the keyword matching module is used for matching the public opinion information abstract of the public opinion information with keywords in a preset keyword library corresponding to the public opinion scene label of the public opinion information according to the preset processing sequence based on a finite automaton DFA algorithm, so as to obtain at least one target keyword which is successfully matched;
And the public opinion auditing module is used for determining the comprehensive confidence level of the public opinion information based on at least one target keyword of the public opinion information, the priority level of the public opinion information and the public opinion information abstract of the public opinion information, and auditing the public opinion information according to the comprehensive confidence level.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the public opinion information auditing method of any of claims 1-7 when executing the computer program.
10. A non-transitory computer readable storage medium comprising a computer program, characterized in that the computer program when executed by a processor implements the public opinion information auditing method of any of claims 1 to 7.
CN202310907670.0A 2023-07-21 2023-07-21 Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium Pending CN117151109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310907670.0A CN117151109A (en) 2023-07-21 2023-07-21 Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310907670.0A CN117151109A (en) 2023-07-21 2023-07-21 Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117151109A true CN117151109A (en) 2023-12-01

Family

ID=88885681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310907670.0A Pending CN117151109A (en) 2023-07-21 2023-07-21 Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117151109A (en)

Similar Documents

Publication Publication Date Title
CN111291195B (en) Data processing method, device, terminal and readable storage medium
CN104881458B (en) A kind of mask method and device of Web page subject
CN110287314B (en) Long text reliability assessment method and system based on unsupervised clustering
CN109271524B (en) Entity linking method in knowledge base question-answering system
CN112069312B (en) Text classification method based on entity recognition and electronic device
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN112131876A (en) Method and system for determining standard problem based on similarity
CN111522919A (en) Text processing method, electronic equipment and storage medium
CN113590810A (en) Abstract generation model training method, abstract generation device and electronic equipment
CN114997288A (en) Design resource association method
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN109284389A (en) A kind of information processing method of text data, device
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN115759071A (en) Government affair sensitive information identification system and method based on big data
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN114880496A (en) Multimedia information topic analysis method, device, equipment and storage medium
CN114298021A (en) Rumor detection method based on sentiment value selection comments
CN114579695A (en) Event extraction method, device, equipment and storage medium
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN114970516A (en) Data enhancement method and device, storage medium and electronic equipment
CN114841143A (en) Voice room quality evaluation method and device, equipment, medium and product thereof
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN114298048A (en) Named entity identification method and device
CN117151109A (en) Public opinion information auditing method, public opinion information auditing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination