CN112581006A

CN112581006A - Public opinion engine and method for screening public opinion information and monitoring enterprise main body risk level

Info

Publication number: CN112581006A
Application number: CN202011562957.7A
Authority: CN
Inventors: 吴美娟
Original assignee: Hangzhou Hengtai Software Co ltd
Current assignee: Hangzhou Hengtai Software Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-30

Abstract

The invention relates to a public opinion engine and a method for screening public opinion information and monitoring enterprise main body risk level, wherein the public opinion engine comprises: the main body emotion classification module comprises a plurality of classified emotion classification models and is used for classifying the emotional tendency of the acquired public opinion information; the theme classification module is used for carrying out single-theme classification or multi-theme classification on the acquired public opinion information; the named body recognition module is used for recognizing the named body and calculating the closeness between the named body and the public opinion information; the public opinion risk scoring module is used for acquiring the risk level of the public opinion information containing the named entity; the similarity retrieval module is used for carrying out similarity calculation on the obtained different public opinion information and carrying out online public opinion information screening; and the enterprise main body risk level monitoring module is used for acquiring the current risk levels of different enterprise main bodies and monitoring in real time. The invention can quickly screen appointed related information from massive news information data in real time and monitor the risk level of an enterprise main body in real time.

Description

Public opinion engine and method for screening public opinion information and monitoring enterprise main body risk level

Technical Field

The invention relates to the technical field of computers, in particular to a public opinion engine and a public opinion method for screening public opinion information and monitoring enterprise subject risk level.

Background

The public opinion information is used for reminding the wind control personnel to pay attention to the public opinion information, and the display information comprises a main body name, public opinion content, verification degree, message disclosure time and the like. The existing public opinion engine usually adopts NLP and ML technologies, combines with financial knowledge background, captures pain points of various service scenes to construct an algorithm model, and accurately analyzes various news. At present, most public opinion engines in the market usually only attach importance to news quantity but ignore news quality, and blindly push massive news information, so that the repeatability of similar news is higher, and inefficient or wrong early warning delivery is often caused. Furthermore, the user is difficult to capture the key points of the news, has high interference by irrelevant news and is easily misled by the irrelevant news.

Disclosure of Invention

The invention aims to provide a public opinion engine and a public opinion method for screening public opinion information and monitoring the risk level of an enterprise main body.

In order to achieve the above object, the present invention provides a public opinion engine for screening public opinion information and monitoring risk level of an enterprise agent, comprising:

the main body emotion classification module comprises a plurality of classified emotion classification models and is used for classifying the emotional tendency of the acquired public opinion information;

the theme classification module is used for carrying out single theme classification or multi-theme classification on the acquired public opinion information;

the named body recognition module is used for recognizing the named body and calculating the closeness of the named body and the public opinion information;

the public opinion risk scoring module is used for acquiring the risk level of the public opinion information containing the named entity;

the similarity retrieval module is used for carrying out similarity calculation on the obtained different public opinion information and carrying out online public opinion information screening;

and the enterprise subject risk level monitoring module is used for acquiring the current risk levels of different enterprise subjects and dynamically monitoring the risk level change of the enterprise subject corresponding to each name body.

According to one aspect of the invention, the main body emotion classification module is obtained by adopting the following steps:

constructing a training sample set, and labeling samples in the sample set with positive, neutral and negative categories;

dividing the sample set, performing optimal search on a parameter grid of each emotion classification model in a cross validation mode, validating the emotion classification models by using a validation set, and taking parameters with optimal performance as optimal models;

and the main body emotion classification module takes the prediction results of all the optimal emotion classification models as the final emotion tendency of the main body through the result obtained by the majority voting rule.

According to one aspect of the invention, the named body recognition module extracts the named body in the obtained key sentence after syntactic analysis is carried out on the obtained public opinion information, and calculates the closeness between the named body and the public opinion information.

According to one aspect of the invention, the public opinion risk scoring module comprises:

the keyword dictionary is used for extracting keywords and calculating word scores of the keywords in the public opinion information;

the negative event library is used for acquiring negative events related to the named body in a past year;

the public opinion risk scoring module scores key sentences in public opinion information based on the named body recognition module, the keyword dictionary and the negative event library to obtain sentence scores, and obtains risk grades of the public opinion information containing the named bodies based on the sentence scores.

According to an aspect of the present invention, the public opinion risk scoring module, based on the named body recognition module, the keyword dictionary and the negative event library, scores the key sentences to obtain sentence scores, includes:

acquiring a named body of a key sentence in the public opinion information and the closeness between the named body and the public opinion information based on the named body identification module;

acquiring keywords, word scores and word frequencies of key sentences in the public opinion information based on the keyword dictionary;

acquiring negative events of key sentences in the public opinion information based on the negative event library;

and scoring key sentences in the public opinion information based on the naming body, the closeness, the keywords, the word score, the word frequency and the negative events to obtain the sentence score.

According to one aspect of the invention, the public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula;

the sentence score formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, Max (scenescore)))

Wherein K represents the closeness of the sentence to the named body, keyscore represents a word score, and scenescore represents a negative event score;

according to one aspect of the invention, the public opinion risk scoring module integrates the named body contents of the scored key sentences, scores the integrated named body contents and obtains the risk level of the public opinion information comprising the named body.

According to an aspect of the invention, the public opinion risk scoring module in the process of integrating the named contents of the scored key sentences comprises the following steps:

the public opinion risk scoring module judges the key sentences which are scored; judging whether the key sentence is an question sentence, if so, directly ignoring the key sentence, and otherwise, keeping the key sentence;

judging whether the key sentence is a sample sentence, if so, ignoring the key sentence, otherwise, keeping the key sentence;

and merging the sentences related to the same naming body in the reserved key sentences according to the public opinion information sequence based on the judgment result.

According to one aspect of the present invention, in the process of scoring the content of the integrated named body to obtain the risk level of the public opinion information including the named body, the risk score of the named body is obtained through a named body risk score formula and a corresponding risk level is obtained, wherein the named body risk score formula is as follows:

a named-body risk score ═ min (1, max (all sentences scored under the same named body) × (1+ min (1, (number of sentences under the same named body-1)/10)) + number of sentences under the same named body ═ average of the additional scores of the public sentiment information);

the public opinion information extra score calculation method comprises the following steps:

max (word score: (1+ (word frequency-1)/10)) × min (2, (1+ (word frequency-1/10) of high scoring word)) × 0.8

Wherein the word score and the word frequency are obtained based on keywords appearing in the rest sentences extracted from the keyword dictionary, and the word frequency of the high-scoring word is the word frequency of the keyword obtained in Max (word score 1+ (word frequency-1)/10)).

According to an aspect of the invention, in the process of calculating the word score of the keyword in the public opinion information, a word score formula is adopted to obtain the word score, wherein the word score formula is as follows:

emotion + topic risk for words with a word score of 1/word rank +0.5 words

According to an aspect of the invention, the process of calculating the closeness of the named body and the public opinion information comprises the following steps:

judging whether the sentence in which the named body is located has a viewpoint, if so, entering the next step, otherwise, outputting a preset first compact density value;

judging whether the sentence in which the named body is located is an interrogative sentence, a conditional sentence or a sample sentence, if not, entering the next step, otherwise, outputting a preset first compact density value;

judging whether the named body carries a suffix word or not, if not, entering a step, otherwise, outputting a preset first compact density value;

judging whether the named body in the sentence where the named body is located is only one, if so, judging whether the syntactic structure of the sentence meets the dominance relation, if so, outputting a preset second compact density value, otherwise, outputting a preset first compact density value; if a plurality of sentences exist, judging whether the sentences are parallel structures or not, if so, splitting the structures of the sentences and determining whether main bodies exist or not, if so, outputting a preset second compact density value, otherwise, outputting a preset third compact density value; and if the sentence is not in the parallel structure, outputting a preset second compact density value.

According to one aspect of the invention, the similarity retrieval module is used for calculating the similarity of the public opinion information and carrying out real-time public opinion information screening;

the similarity calculation process of the similarity retrieval module for public opinion information comprises the following steps:

calculating the similarity between any two pieces of public opinion information, wherein if the title similarity or the text similarity is greater than a preset threshold value, the existence of the similarity between the public opinion information is defined, otherwise, the similarity does not exist;

constructing the public sentiment information with similar relations into a public sentiment similar set;

sequencing the publishing time of the public opinion information in the public opinion similar set, reserving the earliest piece of public opinion information as a comparison sample, and deleting the rest public opinion information in the similar set;

the process of the similarity retrieval module for real-time public opinion information screening comprises the following steps:

and acquiring public sentiment information on the line, and calculating the similarity based on the comparison sample to construct a real-time public sentiment set.

According to one aspect of the invention, the public opinion information in the comparative sample set is grouped according to enterprise subjects based on the subject emotion classification module, the theme classification module, the named body recognition module and the public opinion risk scoring module;

and acquiring the risk score of the enterprise subject corresponding to the current node according to the risk score of the named entity in the corresponding public opinion information, mapping the risk level of the enterprise subject based on the risk score of the enterprise subject, and outputting the risk level for dynamically monitoring the risk level change of the enterprise subject corresponding to each named entity.

In order to achieve the above object, the present invention provides a method for monitoring risk level of an enterprise subject using the public opinion engine, comprising:

s1, obtaining online public opinion information, calculating all dimension label results of the public opinion information, screening the public opinion information meeting requirements according to preset all dimension label values, and constructing an information set, wherein all dimension label results comprise emotional tendency, theme distribution, a naming body and risk score;

s2, performing similarity analysis on the information set, calculating the similarity between the public opinion information in the information set, eliminating similar public opinion information and constructing a comparative sample set;

and S3, classifying the public sentiment information in the comparative sample set according to the enterprise subject, calculating the risk score of the enterprise subject at the current node according to the risk score of the named body in the corresponding public sentiment information, and mapping the risk grade of each enterprise subject based on the risk score of the enterprise subject for dynamically monitoring the risk grade change of the enterprise subject corresponding to each named body.

According to an aspect of the present invention, in the step of acquiring online public opinion information and calculating each dimension label result of the public opinion information in step S1, the step of calculating the emotional tendency includes:

respectively identifying the public opinion information through the emotion classification model, and obtaining a prediction result;

and taking the prediction results of all emotion classification models as final emotional tendency through a majority voting rule.

According to an aspect of the present invention, in the step of acquiring online public opinion information and calculating each dimension label result of the public opinion information in step S1, the step of calculating the risk score includes:

acquiring a named body of a key sentence in the public opinion information and the closeness between the named body and the public opinion information based on a named body identification module;

acquiring keywords, word scores and word frequency in the key sentences based on a keyword dictionary in a public opinion risk scoring module;

acquiring negative events in the key sentences based on a negative event library in a public opinion risk scoring module;

scoring the key sentences based on the named object, the closeness, the keywords, the word scores, the word frequency and the negative events to obtain sentence scores of the key sentences.

According to an aspect of the present invention, in step S1, the public opinion risk scoring module performs named body content integration on the scored key sentences, and scores the integrated named body content to obtain a risk level of the public opinion information including the named body.

According to an aspect of the invention, the public opinion risk scoring module performs named body content integration on the scored key sentences, and the method comprises the following steps:

According to an aspect of the present invention, in step S2, performing similarity analysis on the information sets, calculating similarity between the public opinion information in the information sets, eliminating similar public opinion information, and constructing a comparison sample set, the steps of:

calculating the similarity between any two pieces of public opinion information based on the similarity retrieval module, wherein if the title similarity or the text similarity is greater than a preset threshold value, the existence of the similarity between the pieces of public opinion information is defined, otherwise, the similarity does not exist;

and constructing a comparison sample set based on the obtained comparison sample and the public opinion set sample without similar relation.

According to an aspect of the invention, in step S3, grouping the public opinion information in the comparative sample set according to enterprise subjects based on the subject emotion classification module, the subject classification module, the named entity recognition module and the public opinion risk score module;

for each piece of public opinion information screened out according to the enterprise main body, multiplying the named entity risk score by an attenuation coefficient obtained by a time interval between the current public opinion information and the earliest published public opinion information, sequencing the public opinion information from small to large, and taking the value of a certain quantile as an alternative option I of the enterprise main body risk score;

meanwhile, public opinion information in a certain latest preset time interval is obtained, and the maximum value obtained by multiplying the named entity risk score by the corresponding attenuation coefficient is used as a second alternative option of the enterprise subject risk score;

obtaining the maximum value of the two alternative options as the risk score of the enterprise subject corresponding to the current node;

and mapping the risk level of the current enterprise subject based on the enterprise subject risk score and outputting the risk level so as to dynamically monitor the risk level change of the enterprise subject corresponding to each name.

According to the scheme, the public opinion engine can timely grab, reasonably classify and analyze massive negative news, extracts the negative news which is considered by investors to be related to the main body default risk, and greatly improves the efficiency of users for reviewing information news.

According to the scheme of the invention, the problem of quickly screening the specified related information from the explosively-increased mass news information data in real time is solved, and the public opinion information meeting the demand is quickly and efficiently screened by controlling the labels of the public opinion information in different dimensions.

According to one scheme of the invention, the public opinion engine can perform refinement processing on the obtained document and completely analyze the obtained document to obtain the named object and the viewpoint expressed by the whole document, and can obtain the accurate score of the whole document.

According to one scheme of the invention, the public opinion engine of the invention can more comprehensively divide the acquired documents, and simultaneously realizes the creation mode of combining a naming body library, a white list, a keyword dictionary, a negative event library and the like with financial service scenes, so that semantic analysis results can better meet the requirements of customers.

According to a scheme of the invention, the public opinion analysis engine has the advantages of high efficiency and high accuracy, the processing processes are carried out concurrently, each piece of information can be completed in a short time, and the analysis efficiency is greatly improved.

Drawings

Fig. 1 is a block diagram schematically illustrating a public opinion engine according to the present invention;

fig. 2 is a flow chart schematically showing a process of a main emotion classification module in a public opinion engine according to the present invention;

FIG. 3 is a schematic diagram illustrating a computing process of the compactness between a named object and public sentiment information in a public sentiment engine according to the present invention;

FIG. 4 is a flow diagram schematically illustrating the processing of the consensus risk scoring module according to the present invention;

fig. 5 is a block diagram schematically illustrating steps of a method for monitoring risk levels of an enterprise entity by a public opinion engine according to the present invention.

Detailed Description

The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.

The invention solves the problems of quickly screening specified related information from the explosively-increased mass news information data in real time and monitoring the risk condition of related enterprise main bodies according to the information. The invention relates to a public opinion analysis engine which can quickly and efficiently screen and meet requirements by controlling different-dimension labels of public opinion information. Labels of all dimensions in the engine mainly comprise two categories, one category is that an emotional tendency, theme distribution and text similar model is constructed by combining a machine learning methodology to obtain the emotional tendency and theme labels of news information and similar information sets after all information is clustered; and the other type is to combine the financial field knowledge and the natural language processing technology to construct a named object recognition model and a quantitative method of information risk scores and extract a named object of news information and enterprise subject risk scores. Firstly, information which does not meet the conditions is removed by controlling the label of each dimension; then, keeping the earliest information from the similar information, and removing the rest similar information; and finally, keeping all the information meeting the conditions. The engine provides an efficient information screening function on one hand, and dynamically observes the change of the risk level of the entity on the other hand, so that a basis is provided for the financial institution to carry out wind control management.

As shown in fig. 1, according to an embodiment of the present invention, a public opinion engine for screening public opinion information and monitoring risk level of an enterprise main body includes: the system comprises a main body emotion classification module, a theme classification module, a named body recognition module, a public opinion risk scoring module, a similarity retrieval module and an enterprise main body risk level monitoring module.

In the embodiment, the web page information on the internet is crawled through the information acquisition port, and the original content in the web page information is input to the structural extraction module to perform structural processing (for example, content filtering, automatic duplication elimination, and the like) to obtain public opinion information and store the public opinion information.

In the present embodiment, the public opinion engine receives the acquired public opinion information and processes the received public opinion information respectively. The main body emotion classification module is used for carrying out emotion classification on the acquired public opinion information to acquire emotion tendencies of the public opinion information, and comprises a plurality of classified emotion classification models; the theme classification module is used for carrying out single-theme classification or multi-theme classification on the acquired public sentiment information; the named body recognition module is used for recognizing the named body and calculating the compactness of the named body and public sentiment information; a public opinion risk scoring module acquires a risk level of the public opinion information containing the named body; the similarity retrieval module is used for carrying out similarity calculation on the obtained different public opinion information and carrying out online public opinion information screening; the enterprise main body risk level monitoring module is used for acquiring the current risk level of the enterprise main body and dynamically monitoring the risk level change of the enterprise main body corresponding to each naming body.

Referring to fig. 2, according to an embodiment of the present invention, the main body emotion classification module is based on a machine learning integration method, uses 9 machine learning algorithms with different characteristics as a base learner, performs system identification on news information emotion tendencies, and finally selects 9 learner majority voting results as final emotion tendency results.

In this embodiment, the subject emotion classification module is obtained by:

firstly, analyzing near 3W news by sampling through experts and researchers, finally selecting more than 1W reported news related to enterprise credit as a training sample set, and labeling positive, middle and negative 3 categories;

secondly, dividing a sample set, performing optimal search on parameter grids of the emotion classification models by adopting various cross validation modes for each emotion classification model, validating the emotion classification models by using a validation set, and taking parameters which show the best performance as optimal models; in the embodiment, the base learner selects from linear classification algorithms with different characteristics, algorithms based on probability distribution, inert algorithms, algorithms taking decision numbers as cores and 5-class algorithms of a neural network respectively, for example, an emotion classification model adopts at least one of LR, NB, decision tree, KNN, SVM and other machine learning methods to carry out independent training, and a grid optimization method is adopted to adjust parameters to obtain an optimal model of each model;

and finally, taking the result obtained by the prediction results of all the classifiers through a majority voting rule as the final emotional tendency.

Through practical application verification, the method can remarkably improve the negative information recall rate and the prediction accuracy, improve the prediction performance of the whole learner and achieve the accuracy rate of more than 86%.

According to one embodiment of the invention, the topic classification module is obtained by;

in the embodiment, an LDA method is used to obtain a topic classification module;

firstly, adding a name body and a mechanism dictionary on the basis of original virtual words, prepositions, pronouns and the like, and constructing a stopwords dictionary;

secondly, training all news information of a financial channel of 19 years in 170W east wealth by using an LDA model, and optimizing the model by adjusting the word frequency threshold of common words and special words in the news information;

finally, the first 70 themes are selected, the themes are named according to the probability distribution condition of each word in the theme, and then the themes are mapped into 7 categories of repayment capacity, repayment willingness, laws and regulations, credit compliance, market conditions, high management dynamics, other credit correlations and the like. The LDA topic merging mapping relationship is as follows:

according to the invention, the classification accuracy of the obtained topic classification module on a single topic reaches more than 80%, and the classification accuracy on multiple topics reaches more than 90%.

According to one embodiment of the invention, the named body recognition module is based on syntactic analysis, and not only extracts a named body list related in a text, but also calculates the degree of relation between the named body and the text, which is used for carrying out named body recognition of public opinion information and extracting the recognized named body, and calculates the compactness of the named body and the public opinion information (the compactness influences the body scoring process weight). In this embodiment, the named entity recognition module is obtained by: firstly, acquiring the full name, the short name and the past name of a naming body associated with an enterprise main body based on industrial and commercial data, and simultaneously acquiring a stock code, a stock name, a bond code and a bond name issued by the naming body associated with the enterprise main body according to market data; secondly, referring to the recognition rule of the same named object, and combining word vector similarity results, collecting named object expression fuzzy matched in the information; and finally, auditing the expression list of the named body, and eliminating abnormal and ambiguous named body expression forms.

As shown in fig. 3, according to an embodiment of the present invention, the process of calculating closeness between a named body and public sentiment information includes:

judging whether a sentence in which the named body is located has a viewpoint, if so, entering the next step, otherwise, outputting a preset first compact density value (for example, taking 0);

judging whether the named entity in the sentence where the named entity is located is only one, if so, judging whether the syntactic structure of the sentence meets the dominance relation, if so, outputting a preset second compact density value (for example, taking 1), otherwise, outputting a preset first compact density value; if a plurality of sentences exist, judging whether the sentences are parallel structures, if so, splitting the structures of the sentences, and determining whether main subjects exist (the main subjects represent that the sentences meet the cardinal-predicate relationship and the enterprise names are used as the subject), if so, outputting a preset second compact density value, otherwise, outputting a preset third compact density value (for example, taking 0.3); if the sentence is not in the parallel structure, outputting a preset second compact density value.

According to an embodiment of the invention, a public opinion risk scoring module is used for obtaining a risk level of the public opinion information including the named body. In this embodiment, the public opinion risk scoring module includes: keyword dictionary, negative events library.

In the embodiment, the keyword dictionary is used for extracting the keywords of the public opinion information and calculating the word score of the keywords in the public opinion information; in the embodiment, the keyword dictionary is created in a manner similar to the method for creating the named entity recognition module, and after unsupervised training is performed by adopting news information from a financial website, the keyword dictionary is expanded by taking the attention points of the named entities in the credit risk field as a basis, and finally, the keyword dictionary is determined by adopting cross validation after being audited by experts. In the embodiment, while the keyword dictionary is generated, the word grades, the emotions of the words and the subject risks of the keywords are respectively labeled according to the business scene. Further, the word rank can be understood as being predefined by experts, the emotion of a word is firstly trained by a corpus and adjusted by the experts, the topic risk is obtained by expecting training, and the topic risk of the subject is predefined by the experts.

In this embodiment, the negative event library is used to obtain the negative events related to the named body over the years. In the present embodiment, the negative event library is generated in the following manner: acquiring all information public sentiment information related to the default named body in the past year, and sorting the time types of the information public sentiment information; through statistical analysis and correlation analysis between the events and the default, a negative event library is determined, and attribute values such as emotional tendency, level, risk and type of the events are labeled by combining credit risk scenes. In the embodiment, negative events in the information public opinion information are extracted in a supervised learning mode, and the specific extraction steps are as follows: 1) the intersection character of the event and the sentence > is the event length 0.9; 2) scroll through sentences with event length 1.2 as window. Due to the fixity of the scene, the event extraction accuracy is ensured by combining the events extracted in the two steps of final output.

Referring to fig. 4, according to an embodiment of the invention, the public opinion engine obtains public opinion information and then divides the public opinion information into sentences. For example, the public opinion information is divided into independent sentences by punctuation marks, such as periods, semicolons, question marks, exclamations, etc., which generally represent the end of a sentence. The above-described partitioning process is more efficient for simple sentences, and there is no ambiguity. In the present embodiment, the compound sentences such as the word-connection sentences, the comparison sentences, the turning sentences, and the sorting sentences are secondarily split using the semicolons, the word-connection sentences, and the like as separators.

After the splitting of the sentences is completed, the space removal processing is performed on each sentence, if the length of the sentence is larger than 300 characters and the number of the spaces is larger than 11, the sentence is broken according to the spaces, and the semicolons are supplemented as separators.

In the embodiment, after the public opinion information is split, the content of the public opinion information is extracted into key words and key word groups according to the key dictionary and the negative event library, and key sentences of the text are extracted by using an extraction type automatic abstract method.

And then, the public opinion risk scoring module scores the key sentences based on the named object recognition module, the keyword dictionary and the negative event library to obtain sentence scores, and obtains the risk level of the public opinion information containing the named objects based on the sentence scores.

Referring to fig. 4, in the public opinion risk scoring module scoring the key sentences based on the named body recognition module, the keyword dictionary and the negative event library to obtain the sentence score according to an embodiment of the present invention, the method includes:

acquiring a named body in the key sentence and the compactness of the named body and public opinion information based on a named body identification module;

acquiring keywords, word scores and word frequencies in the key sentences based on the keyword dictionary;

acquiring negative events in the key sentences based on the negative event library;

and scoring the key sentences based on the naming bodies, the closeness, the keywords, the word scores, the word frequency and the negative events to obtain sentence scores.

According to one embodiment of the invention, a public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula;

the sentence score formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, Max (scenescore)))

Where K represents the closeness of the sentence to the named body, keyscore represents the word score, and scenescore represents the negative event score.

According to one embodiment of the invention, the public opinion risk scoring module integrates the content of the named bodies of the scored key sentences, scores the content of the integrated named bodies to obtain the risk level of the public opinion information containing the named bodies.

According to an embodiment of the invention, the public opinion risk scoring module in the process of integrating the named body contents of the scored key sentences comprises the following steps:

the public opinion risk scoring module judges the key sentences which finish scoring; judging whether the key sentence is a question sentence, if so, directly ignoring the key sentence, and otherwise, keeping the key sentence;

and merging sentences related to the same naming body in the reserved key sentences according to the public opinion information sequence based on the judgment result.

Referring to fig. 4, according to an embodiment of the present invention, in the process of scoring the integrated content of the named entity to obtain a risk level of the public opinion information including the named entity, a named entity risk score is obtained by a named entity risk score formula and a corresponding risk level is obtained, where the named entity risk score formula is:

The word score and the word frequency are obtained based on keywords appearing in other sentences extracted from the keyword dictionary, and the word frequency of the high-scoring word is the word frequency of the keyword obtained in Max (word score 1+ (word frequency-1)/10)). It should be noted that the remaining sentences involved in the method of calculating the additional score are a general term of sentences that do not include the corporate body (i.e., the named body) in the public opinion information.

According to an embodiment of the present invention, in the process of calculating the word score of the keyword in the public opinion information, a word score formula is used to obtain the word score, wherein the word score formula is:

emotion + topic risk for words with a word score of 1/word rank +0.5 words

In this embodiment, the score of the word score ranges between [0,1 ].

In the present embodiment, the risk level is output by constructing a mapping relationship between the obtained named body risk score and the risk level. Specifically, after the risk score is statistically analyzed according to 300W historical test data, 1% of the data above and below the test data are removed, and the maximum value and the minimum value are taken as the normalization basis of the risk score. Then mapping the risk score of the named entity in the public sentiment information after normalization into a risk level according to the following relation, wherein the mapping table is as follows:

Scaler_score	Risk_level
		[0,0.3)	without risk
[0.3,0.5)	Low risk
		[0.5,0.8)	Middle risk
[0.8,1]	High risk

According to an embodiment of the invention, the similarity retrieval module is used for calculating the similarity of the public opinion information and performing real-time public opinion information screening. In this embodiment, the process of calculating the similarity of the public opinion information by the similarity search module includes:

constructing public sentiment information with similar relations into a public sentiment similar set;

and acquiring online public sentiment information, and calculating the similarity based on the comparison sample to construct a real-time public sentiment set.

According to one embodiment of the invention, the enterprise agent risk level monitoring module is used for acquiring current risk levels of different enterprise agents and dynamically monitoring the risk level change of the enterprise agent corresponding to each naming body.

Specifically, the public opinion information in the comparative sample set is grouped according to enterprise subjects based on a subject emotion classification module, a subject classification module, a named body recognition module and a public opinion risk scoring module;

and acquiring the risk score of the enterprise subject corresponding to the current node according to the risk score of the named entity in the corresponding public opinion information, mapping the risk level of the current enterprise subject based on the risk score of the enterprise subject and outputting the risk level so as to dynamically monitor the risk level change of the enterprise subject corresponding to each named entity.

Referring to fig. 5, according to an embodiment of the present invention, a method for performing enterprise agent risk level monitoring based on the public opinion engine of the present invention includes:

s1, obtaining online public sentiment information, calculating all dimension label results of the public sentiment information, screening the public sentiment information meeting requirements according to preset all dimension label values, and constructing an information set, wherein each dimension label result comprises an emotional tendency, a theme distribution, a named body and a risk score, and the preset all dimension label values comprise a preset emotional tendency condition, a preset theme distribution condition, a preset named body condition and a preset risk score condition.

S2, carrying out similarity analysis on the information sets, calculating the similarity between the public opinion information in the information sets, rejecting similar public opinion information and constructing a comparative sample set;

Referring to fig. 5, in step S1, in the step of obtaining the online public opinion information, based on the above-mentioned dividing manner of the public opinion information, the HMM model in the Jieba chinese natural language processing branch library is used for dividing, and the integrity of the dictionary is forcibly preserved by self-defining the dictionary. Then, keywords and key phrases in public sentiment information are obtained through a summary generation module, and key sentences of the text are extracted by using an extraction type automatic abstracting method.

In step S1, in the step of obtaining the online public sentiment information and calculating the dimension label result of the public sentiment information, the step of calculating the emotional tendency includes:

respectively identifying the public opinion information through an emotion classification model, and obtaining a prediction result;

and taking the result obtained by the prediction results of all the emotion classification models through a majority voting rule as the final emotion tendency (such as negative).

In step S1, according to an embodiment of the present invention, in the step of obtaining online public sentiment information and calculating the results of the dimension labels of the public sentiment information, the step of calculating the risk score includes:

acquiring a named body in the key sentence and the compactness of the named body and public opinion information based on a named body identification module; in the embodiment, whether a sentence of public sentiment information contains a named body needs to be judged, specifically, in the foregoing step, the sentence is participled and then is intersected with a named body recognition module, and a named body list appearing in the sentence is extracted; and judging whether the name body carries a suffix word or not for the appearing name body, and removing the name body if the name body carries the suffix word. If only one named body is left in the sentence and the named body is in the white list, the subject-predicate syntax judgment is carried out, if the named body is the subject, the named body is reserved, and if not, the named body is removed. If the rest of the sentence is a complex structure with a plurality of main bodies, if a financial institution exists in the sentence, the complex structure is removed, the rest is reserved and analyzed according to syntax, if the complex structure is a passive sentence, the first named body after the character is scored by 100%, and the rest main bodies are scored by 30%. For example, the shares of the Xinhua stock holding race are frozen and the Xinhua stock mortgages the race shares to the China bank.

Acquiring keywords, word scores and word frequencies in the key sentences based on the keyword dictionary; in the present embodiment, the hotspot/sensitive word analysis is implemented by a keyword dictionary. After word segmentation is completed, the public opinion information is intersected with a keyword dictionary, and information such as keywords, word frequency, word distance and the like is extracted.

According to an embodiment of the present invention, the public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula, where the sentence scoring formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, Max (scenescore)))

according to an embodiment of the invention, in step S1, the public opinion risk scoring module performs named body content integration on the scored key sentences, and performs risk scoring on the integrated named body content to obtain a named body risk score.

According to an embodiment of the invention, the step of integrating the named body contents of the scored key sentences by the public opinion risk scoring module comprises the following steps:

According to an embodiment of the present invention, in the process of scoring the integrated content of the named entity to obtain the risk level of the public opinion information including the named entity, the named entity risk score of the named entity is obtained through a named entity risk score formula and a corresponding risk level is obtained, wherein the named entity risk score formula is as follows:

The word score and the word frequency are obtained based on keywords appearing in other sentences extracted from the keyword dictionary, and the word frequency of the high-scoring word is the word frequency of the keyword obtained in Max (word score 1+ (word frequency-1)/10)).

According to an embodiment of the present invention, in the step S1, in the process of calculating the word score of the keyword in the public opinion information, the word score is obtained by using a word score formula, where the word score formula is:

word score 1/word rank +0.5 words emotion + topic risk.

judging whether a sentence in which the named body is located has a viewpoint, if so, entering the next step, otherwise, outputting a preset first compact density value;

judging whether the named body in the sentence where the named body is located is only one, if so, judging whether the syntactic structure of the sentence meets the dominance relation, if so, outputting a preset second compact density value, otherwise, outputting a preset first compact density value; if a plurality of sentences exist, judging whether the sentences are parallel structures, if so, splitting the structures of the sentences, determining whether main bodies exist, if so, outputting a preset second compact density value, otherwise, outputting a preset third compact density value; if the sentence is not in the parallel structure, outputting a preset second compact density value.

Referring to fig. 5, in step S2, according to an embodiment of the present invention, in the steps of performing similarity analysis on an information set, calculating similarity between public opinion information in the information set, removing similar public opinion information, and constructing a comparison sample set, the steps include:

calculating the similarity between any two pieces of public opinion information based on a similarity retrieval module, wherein if the title similarity or the text similarity is greater than a preset threshold value, the existence of the similarity between the pieces of public opinion information is defined, otherwise, the similarity does not exist;

and sequencing the releasing time of the public sentiment information in the public sentiment similar set, reserving the earliest piece of public sentiment information as a comparison sample, and deleting the rest public sentiment information in the similar set.

According to one embodiment of the invention, the similarity retrieval module is composed of 2 parts, wherein one part is used for similarity retrieval of news titles, and similarity is calculated according to a character matching rule and word embedding respectively. Wherein, the character matching rule is as follows: firstly, cleaning a title, and then segmenting words; secondly, calculating a word intersection/word union value of the two text titles and recording the word intersection/word union value as sim _ title; finally, the similarity simvalue of the two titles is determined. The process of calculating the similarity is as follows: when sim _ title > is 0.8, then sim _ title; otherwise, continuing to judge, if the number of the word intersection is greater than 0.9 times of the length of any subject word, the simvalue is 0.9, otherwise, the simvalue is sim _ title. The word embedding is a technology for converting words expressed by natural language into vector or matrix forms which can be understood by a computer, and firstly, 300 ten thousand historical news are used for training the word embedding based on a machine learning word2vec method; then, obtaining a low-dimensional word vector after the text title passes through word embedding; and finally, taking the cosine distance as a similarity value. Max (maximum of both methods) is finally taken for the similarity of titles. The second part is used for similarity retrieval of news texts, 5 words are selected according to the length of a rolling window through model verification based on a simhash pure word frequency statistical method.

In this embodiment, the work flow of the similarity search module is as follows:

firstly, calculating the similarity relation between any two news information, wherein if the similarity of the titles or the similarity of the texts is greater than 0.8, the similarity relation between the texts is defined, otherwise, the similarity relation does not exist;

then, constructing the news with similar relation into a similar set;

and finally, sequencing according to the news release time, reserving the earliest news information (the earliest released public opinion information), and deleting the rest news information in the similar set.

In this embodiment, the similarity search module in the similarity search module can also implement real-time public opinion information screening. The method screens information data meeting requirements from massive news information. According to the invention, based on the labels (emotional tendency, theme distribution, named body identification and risk score) of each designated dimension, the public opinion engine processes the public opinion information and screens out all information meeting the conditions to form an information set. For example, if it is desired to obtain news with high negative risk of the urban enterprise, the named entity labels are all named entity enterprises corresponding to the urban enterprise, the affinity label is greater than 0, the emotion label is negative, the theme label is None, and the risk label is high risk.

Referring to fig. 5, in step S3, classifying the public sentiment information in the comparison sample set according to the enterprise subjects, and calculating the risk score of the enterprise subject at the current node according to the risk score of the named entity in the corresponding public sentiment information, the step of mapping the risk level of each enterprise subject based on the risk score of the enterprise subject includes:

based on the main emotion classification module, the theme classification module, the named entity recognition module and the public opinion risk scoring module, grouping the public opinion information in the comparative sample set according to the enterprise main body;

then, for the public opinion information screened out by each enterprise main body, multiplying the risk score of the named body by the attenuation coefficient (obtained by the time interval between the current public opinion information and the public opinion information issued earliest), and sorting according to the descending order, taking the 95 quantiles of the public opinion information and recording the 95 quantiles asS _ 95; meanwhile, public sentiment information in the last 3 days is taken, and the maximum value obtained by multiplying the named body risk score by the corresponding attenuation coefficient is recorded as s3_ max; finally, the value of max (s3_ max, s _95) is calculated as the risk score of the business entity. And mapping the risk score of the enterprise main body into a risk grade according to a mapping relation shown in the following table, and outputting the risk grade as the current risk grade of the enterprise main body to dynamically monitor the risk grade change of the enterprise main body corresponding to each name body. Wherein the attenuation coefficient is 0.97^TWhere T represents a time interval (e.g., number of days) between the current time and the public opinion earliest release time.

Scaler_score	Risk_level
		[0,0.8)	Without risk
[0.8,0.9)	Low risk
		[0.9,0.95)	Middle risk
[0.95,1]	High risk

The foregoing is merely exemplary of particular aspects of the present invention and devices and structures not specifically described herein are understood to be those of ordinary skill in the art and are intended to be implemented in such conventional ways.

The above description is only one embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The utility model provides a public opinion engine of screening public opinion information and monitoring enterprise subject risk level which characterized in that includes:

2. A consensus engine as claimed in claim 1 wherein the subject emotion classification module is obtained by steps comprising:

3. The public opinion engine of claim 2, wherein the named entity recognition module extracts named entities in the obtained key sentences and calculates closeness between the named entities and the public opinion information based on syntactic analysis of the obtained public opinion information;

in the process of calculating the closeness of the named body and the public opinion information, the method comprises the following steps:

4. A consensus engine as claimed in claim 3, wherein the consensus risk scoring module comprises:

the public opinion risk scoring module scores key sentences in public opinion information based on the named body recognition module, the keyword dictionary and the negative event library to obtain sentence scores, and obtains risk grades of the public opinion information containing the named bodies based on the sentence scores;

in the process of calculating the word score of the keyword in the public opinion information, a word score formula is adopted to obtain the word score, wherein the word score formula is as follows:

word score 1/word rank +0.5 words emotion + topic risk;

the public opinion risk scoring module scores the key sentences based on the named body recognition module, the keyword dictionary and the negative event library to obtain sentence scores, and the method comprises the following steps:

scoring key sentences in the public opinion information based on the named object, the closeness, the keywords, the word score, the word frequency and the negative event to obtain the sentence score;

the public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula;

the sentence score formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, Max (scenescore)))

the public opinion risk scoring module integrates the content of the named bodies of the key sentences which are scored, scores the content of the integrated named bodies and obtains the risk level of the public opinion information containing the named bodies;

the public opinion risk scoring module carries out naming body content integration on the key sentences completing scoring, and the method comprises the following steps:

merging sentences related to the same naming body in the reserved key sentences according to the public opinion information sequence based on the judgment result;

in the process of scoring the integrated content of the named body to obtain the risk level of the public opinion information containing the named body, obtaining the risk score of the named body and obtaining the corresponding risk level by a named body risk score formula, wherein the named body risk score formula is as follows:

5. The public opinion engine as claimed in claim 4, wherein the similarity search module is configured to calculate similarity of public opinion information and perform real-time public opinion information filtering;

6. The public opinion engine of claim 5, wherein the public opinion information in the comparison sample set is grouped according to enterprise subjects based on the subject emotion classification module, the topic classification module, the named entity recognition module, and the public opinion risk score module;

7. A method for monitoring risk level of an enterprise subject using the public opinion engine of any one of claims 1 to 6, comprising:

8. The method as claimed in claim 7, wherein in the step of obtaining online public opinion information and calculating dimensional label results of the public opinion information in step S1, the step of calculating the emotional tendency comprises:

taking the result obtained by the prediction results of all emotion classification models through a majority voting rule as the final emotion tendency;

in step S1, in the step of obtaining online public sentiment information and calculating each dimension label result of the public sentiment information, the step of calculating the risk score includes:

scoring the key sentences based on the named bodies, the closeness, the keywords, the word scores, the word frequency and the negative events to obtain sentence scores of the key sentences;

in step S1, the public opinion risk scoring module integrates the named body content of the key sentence with which the score is completed, scores the integrated named body content to obtain the risk level of the public opinion information including the named body;

9. The public opinion engine of claim 8, wherein in step S2, the steps of performing similarity analysis on the information sets, calculating similarity between the public opinion information in the information sets, eliminating similar public opinion information, and constructing a comparison sample set include:

10. The public opinion engine of claim 9, wherein in step S3, the public opinion information in the comparison sample set is grouped according to enterprise subjects based on the subject emotion classification module, the topic classification module, the named entity recognition module and the public opinion risk score module;