CA3138730C - Public-opinion analysis method and system for providing early warning of enterprise risks - Google Patents

Public-opinion analysis method and system for providing early warning of enterprise risks Download PDF

Info

Publication number
CA3138730C
CA3138730C CA3138730A CA3138730A CA3138730C CA 3138730 C CA3138730 C CA 3138730C CA 3138730 A CA3138730 A CA 3138730A CA 3138730 A CA3138730 A CA 3138730A CA 3138730 C CA3138730 C CA 3138730C
Authority
CA
Canada
Prior art keywords
risk
public
sentiment
label
opinion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA3138730A
Other languages
French (fr)
Other versions
CA3138730A1 (en
Inventor
Jiaqing LI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3138730A1 publication Critical patent/CA3138730A1/en
Application granted granted Critical
Publication of CA3138730C publication Critical patent/CA3138730C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The present invention discloses a method and a system of analyzing public- opinion for providing early warning of enterprise risks. The method involves: collecting public- opinion text data from any designated website, and constructing a data-source sequence for website sources of the public-opinion text data; matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence; performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, and identifying associated enterprise entity names in the public- opinion text data so as to construct an enterprise-association sequence; and according to the data- source sequence, the risk-label sequence, the sentiment-polarity sequence, and the enterprise- association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result.

Description

PUBLIC-OPINION ANALYSIS METHOD AND SYSTEM FOR PROVIDING EARLY
WARNING OF ENTERPRISE RISKS
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of the Internet, and more particularly to a public-opinion analyzing method and a system thereof for providing early warning of enterprise risks.
Description of Related Art
[0002] Currently, practices of enterprise risk early warning increasingly depend on and benefit from applications of technologies like artificial intelligence and natural language processing. With the emergence of a great deal of net-based public opinions, negative public opinions to or risk events of enterprises have become critical to identification and early warning of enterprise risks.
[0003] For users having to pay special attention to enterprise risks, such as loan approval managers or risk control managers, it is a significant task to pay close attention to risk events of enterprises, thereby acquiring sufficient information about these risk events and in turn knowing the risk status of these enterprises. However, this task is quite labor-consuming and thus costly. When the number of monitored enterprises is large, it is difficult to collect comprehensive information through manual works.
Particularly, when used to process the massive public-opinion information about enterprises of interest circulating over the Internet, manual read can take too much time to give risk early warning to relevant enterprises accurately.
SUMMARY OF THE INVENTION
[0004] One objective of the present invention is to provide a method of public-opinion analysis for providing early warning of enterprise risks, which can provide a relevant enterprise Date Regue/Date Received 2023-01-30 with public-opinion analysis service and early warning service accurately and efficiently with reduced human workloads.
[0005] To achieve the foregoing objective, the present invention in a first aspect provides a method of public-opinion analysis for providing early warning of enterprise risks. The method comprises:
[0006] collecting public-opinion text data from any designated website, and constructing a data-source sequence for website sources of the public-opinion text data;
[0007] matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence;
[0008] performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, and identifying associated enterprise entity names in the public-opinion text data so as to construct an enterprise-association sequence; and
[0009] according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result.
[0010] Preferably, the step of constructing a data-source sequence for website sources of the public-opinion text data comprises:
[0011] summing up a total number of the designated websites and configuring a credit weight for each said designated website, so as to construct a data-source sequence set dimensionally consistent with the total number; and
[0012] identifying a location of the source website in the data-source sequence set, constructing the corresponding data-source sequence, and matching a corresponding said credit weight at the same time.
[0013] Preferably, before the step of matching risk labels of the public-opinion text data with a preset risk-label set, the method further comprises:

Date Recue/Date Received 2022-01-12
[0014] constructing the risk-label set in advance, wherein the risk-label set includes plural risk-label classes, and each said risk-label class corresponds to at least one risk keyword; and
[0015] configuring a risk weight for each said risk-label class in the risk-label set.
[0016] More preferably, the step of matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence comprises:
[0017] performing matching of the risk keywords to the public-opinion text data by means of text keyword matching, and searching for corresponding said risk-label class according to matching results; and
[0018] based on locations of the risk-label classes in the risk-label set, constructing the risk-label sequence.
[0019] Preferably, training of the sentiment classification model comprises:
[0020] extracting public opinion corpora of various sentiment polarities respectively from acquired public opinion corpora, so as to construct a tag-corpus set; and
[0021] training the sentiment classification model based on the tag-corpus set using an LSTM or TextCNN model structure;
[0022] classifications of the sentiment polarities include positive sentiment, neutral sentiment, and negative sentiment, and the sentiment-polarity sequence is a sequence representation of one of the three sentiment polarities.
[0023] More preferably, after the step of performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, the method further comprises:
[0024] configuring a corresponding polarity weight for every said kind of sentiment polarity.
[0025] Preferably, the step of identifying associated enterprise entity names in the public-opinion text data so as to construct an enterprise-association sequence comprises:
[0026] constructing a monitored-enterprise list consisting of plural enterprise entities in advance;

Date Recue/Date Received 2022-01-12
[0027] identifying the enterprise entity name associated with the public-opinion text data by means of keyword matching with a Chinese word segmentation tool and/or a NER
naming entity identifying tool; and
[0028] based on a location of the enterprise entity name in the monitored-enterprise list, constructing the enterprise-association sequence.
[0029] Preferably, before the step of according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result, the method further comprises:
[0030] presetting plural kinds of risk-early-warning levels, and defining boundary intervals of each kind of individual risk-early-warning levels.
[0031] More preferably, the step of according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result comprises:
[0032] using a public-opinion-risk-early-warning equation z =riz_oRiLi+V_OWiSi+
Z1:_oQiTi to compute a risk value of the public-opinion text data; and
[0033] computing an early-warning value corresponding to the public-opinion text data in view of the enterprise-association sequence, and outputting the risk-early-warning level based on the boundary interval to which the early-warning value belongs;
[0034] wherein Ri denotes the risk weight of the corresponding risk-label class, Li denotes the risk-label sequence, n denotes a total number of the risk-label classes in the risk-label set, Wi denotes the credit weight of the designated website, Si denotes the data-source sequence, k denotes the total number of the designated websites, Qi denotes the polarity weight, Ti denotes sentiment-polarity sequence, and p denotes a total number of the sentiment polarities.

Date Recue/Date Received 2022-01-12
[0035] As compared to the prior art, the method of public-opinion analysis for providing early warning of enterprise risks provided by the present invention has the following beneficial effects:
[0036] in the method of public-opinion analysis for providing early warning of enterprise risks of the present invention, public-opinion text data are collected from any designated website, and are processed to construct website sources. The risk labels for the public-opinion text data are matched based on a preset risk-label set for constructing a risk-label sequence. Sentiment polarities of the public-opinion text data are classified using a sentiment classification model so as to construct a sentiment-polarity sequence. The entity names of enterprises associated with the public-opinion text data are identified and used to construct an enterprise-association sequence. At last, a public opinion analysis result is computed according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, and then outputted.
[0037] It is thus clear that the present invention deeply digs potential risk information of enterprise through multi-dimensional data processing, so as to form a public-opinion analyzing process, thereby realizing smart early warning of potential risks for enterprises and helping risk business personnel to conduct enterprise risk control and assessment more efficiently.
[0038] In a second aspect, the present invention provides a system of public-opinion analysis for providing early warning of enterprise risks, which is applied to the method of public-opinion analysis for providing early warning of enterprise risks as described in the foregoing technical scheme. The system comprises:
[0039] a public-opinion-collecting module, for collecting public-opinion text data from any designated website, and constructing a data-source sequence for website sources of the public-opinion text data;
[0040] a risk label module, for matching risk labels of the public-opinion text data based on a Date Recue/Date Received 2022-01-12 preset risk-label set to construct a risk-label sequence;
[0041] a sentiment-polarity and entity-name identifying module, for performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, and identifying associated enterprise entity names in the public-opinion text data so as to construct an enterprise-association sequence; and
[0042] an early warning outputting module, for according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result.
[0043] As compared to the prior art, the disclosed public-opinion analyzing apparatus for providing early warning of enterprise risks provides beneficial effects that are similar to those provided by the method of public-opinion analysis for providing early warning of enterprise risks as enumerated above, and thus no repetitions are made herein.
[0044] The present invention in a third aspect provides a computer readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method of public-opinion analysis for providing early warning of enterprise risks as described previously.
[0045] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the method of public-opinion analysis for providing early warning of enterprise risks as enumerated above, and thus no repetitions are made herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0046] The accompanying drawings are provided herein for better understanding of the present invention and foun a part of this disclosure. The illustrative embodiments and their Date Recue/Date Received 2022-01-12 descriptions are for explaining the present invention and by no means form any improper limitation to the present invention, wherein:
[0047] FIG. 1 is a schematic flowchart of a method of public-opinion analysis for providing early warning of enterprise risks according to one embodiment of the present invention; and
[0048] FIG. 2 is another schematic flowchart of a method of public-opinion analysis for providing early warning of enterprise risks according to one embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0049] To make the foregoing objectives, features, and advantages of the present invention clearer and more understandable, the following description will be directed to some embodiments as depicted in the accompanying drawings to detail the technical schemes disclosed in these embodiments. It is, however, to be understood that the embodiments referred herein are only a part of all possible embodiments and thus not exhaustive. Based on the embodiments of the present invention, all the other embodiments can be conceived without creative labor by people of ordinary skill in the art, and all these and other embodiments shall be embraced in the scope of the present invention.
[0050] Embodiment 1
[0051] Referring to FIG. 1 and FIG. 2, the present embodiment provides a method of public-opinion analysis for providing early warning of enterprise risks, comprises:
[0052] collecting public-opinion text data from any designated website, and constructing a data-source sequence for website sources of the public-opinion text data; matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence; performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, and identifying associated enterprise entity names in the public-opinion text data so as to construct an enterprise-association sequence; according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association Date Recue/Date Received 2022-01-12 sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result.
[0053] In the method of public-opinion analysis for providing early warning of enterprise risks of the present invention, public-opinion text data are collected from any designated website, and are processed to construct website sources. The risk labels for the public-opinion text data are matched with a preset risk-label set for constructing a risk-label sequence. Sentiment polarities of the public-opinion text data are classified using a sentiment classification model so as to construct a sentiment-polarity sequence. The entity names of enterprises associated with the public-opinion text data are identified and used to construct an enterprise-association sequence. At last, a public opinion analysis result is computed according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, and then outputted.
[0054] It is thus clear that the present invention deeply digs potential risk information of enterprise through multi-dimensional data processing, so as to form a public-opinion analyzing process, thereby realizing smart early warning of potential risks for enterprises and helping risk business personnel to conduct enterprise risk control and assessment more efficiently.
[0055] In the embodiment described above, the step of constructing a data-source sequence according to website sources of the public-opinion text data comprises:
[0056] summing up a total number of the designated website and configuring a credit weight for each said designated website, so as to construct a data-source sequence set dimensionally consistent with the total number; and identifying a location of the source website in the data-source sequence set, constructing the corresponding data-source sequence, and matching a corresponding said credit weight.

Date Recue/Date Received 2022-01-12
[0057] In particular implementations, the public-opinion-collecting module serves to collect public-opinion text data of enterprises and perfonn structurized data extraction. The first thing to do is to set and configure public-opinion data source. Sources of public opinions about enterprises primarily include news websites, government websites, forums, WEIBO, and websites receiving complaints. The source sequence is S = {S1, S2, ,SO.
According to the sources of public opinions, different credit weights Wi are assigned.
The credit weights may alternatively be configured by users. The step further includes setting addresses, site sections, data-collecting frequencies, keywords of the public-opinion data sources. Then an Internet-based data collecting tool is used to acquire public-opinion text data. Afterward, a Python-based or Java-based html processing tool is used to denoise the webpages, clean data, and extract fields, so that data of public-opinion webpage data can be extracted in a structurized manner by fields like titles, sources, links, releasing date, text, summaries, and authors.
[0058] Exemplarily, collecting the public-opinion text data is realized through the following steps:
[0059] Step 1: using the Python-based or Java-based html processing tool to denoise the webpages, clean data, and extract fields, so that data of public-opinion webpage data can be extracted in a structurized manner by fields like titles, sources, links, releasing date, text, summaries, and authors. In an example, the set list of designated websites is ['website 1", "website 2", "website 3", "website 4", "website 5, "website 6", "website 7", "website 8", "website 9"], and the credit weights assigned to the designated websites are (ranging from 1 to 5): [5, 5, 3, 5, 3, 3, 5, 5, 4].
[0060] Step 2: the extracted structurized text data are stored in the form of:
"title": "Fake products doing great harm, how to rule special formula milk powder products in a targeted way";
"content": "The powdered protein beverage event in XXXX is about falsely claiming that Date Recue/Date Received 2022-01-12 powdered protein beverage is a kind of special formula milk powder, and led to severe dysplasia among infants and babies. In this event, a series of violating operations including illegal propaganda, sales malpractice, and consumer fraud caused health damage to infants and babies .... ,>.
[0061] "datetime": "2020-06-08 09:40:31";
"source": "certain social media platform";
"ur1": http://food.china.com.cn/2020-06/08/content76137776.htm;
"author": "Wang XX", "summary": ""
1.
[0062] In the embodiment described above, before the step of matching risk labels of the public-opinion text data based on a preset risk-label set, the method further comprises:
[0063] constructing the risk-label set in advance, wherein the risk-label set includes plural risk-label classes, and each said risk-label class corresponds to at least one risk keyword; and configuring a risk weight for each said risk-label class in the risk-label set.
[0064] The step of matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence comprises:
[0065] performing matching of the risk keywords to the public-opinion text data by means of text keyword matching, and searching for corresponding said risk-label class according to matching results; and based on locations of the risk-label classes in the risk-label set, constructing the risk-label sequence.
[0066] The risk label module mainly serves to extract risk labels in public opinions by means of matching risk keywords according to a risk-label set created in advance.
First, a risk-label set is constructed for classes of risk events that are commonly seen in public opinions about enterprises and classes of risk events the users care. Every risk label is assigned with a corresponding risk weight R. The risk weight may alternatively be configured by the users. A keyword set is developed for each of the risk labels, so as to form a "label -Date Recue/Date Received 2022-01-12 keyword dictionary". Then the public opinion text is matched with the risk keywords by means text keyword matching, and tagging is made according to the matching results, so as to generate a risk-label sequence L = {L1, L2, ,L) of the public opinions, where n is the total number of the risk labels, Li corresponds to the 0/1 identification corresponding to the risk label, 1 denotes that there is the ith label in the public opinions, and 0 denotes that there is not the ith label in the public opinions.
[0067] Exemplarily, the risk label matching process performed on the public-opinion text data is achieved through the following steps:
[0068] Step 1, a risk-label set is created by performing label definition wrangling on public opinion risk class while concerning business requirements from the risk management field, e.g.:
[0069] [`bankruptcy and insolvency", "mortgage and pledge", "loss", "equity change", "default and thunder", "Illegal fundraising", "infringement and plagiarism", "contract dispute", "violation of regulations or laws", "falsity and fraud", "tax evasion", "security events"];
wherein the risk weights (ranging from 1 to 10) corresponding to the risk-label classes are set as: [10, 5, 7, 10, 4, 3, 2, 2, 5, 3, 3].
[0070] Step 2, the risk-label classes corresponding to risk keyword set are wrangled to form a "label -keyword dictionary"; for example:
[0071] {
[0072] bankruptcy and insolvency: bankruptcy and insolvency, bankruptcy, frozen, business closed, business suspend, suspend business for rectification, seized, revoked, detained, non-standard opinion;
[0073] mortgage and pledge: debt collateralizing, collateralizing debt, asset value less than issued debt, asset mortgage, security for loan, pledge of equity;
[0074] loss: loss, aggravation, arrears, perfoimance increase, sales decrease;
[0075] equity change: equity change, pledge of equity, changes in equity, increase holdings, Date Recue/Date Received 2022-01-12 decrease holdings, capital reduction, split-up, merged;
[0076] default and thunder: debt default, thunder, runaway, overdue, dishonest person, uncertainty of cashing, arrears in contribution, P2P, blacklist, executed, risk;
[0077] contract dispute: contract dispute, contract cancellation, labor dispute, labor lawsuit;
[0078] falsity and fraud: financial fraud, suspected fraud, financial scandal, fraud;
[0079] Illegal fundraising: Illegal fundraising, fundraising fraud;
[0080] tax evasion: tax dodging, tax fraud, tax avoiding;
[0081] infringement and plagiarism: infringement, plagiarism;
[0082] security events: incident, information leakage, private data, data leakage, production incident;
[0083] violation of regulations or laws: violation of law, violation of regulation, complaint, right protection, MLM, economic investigation intervention, arbitration, commission, loan shark, criminal case, prosecuted, involved in gangs or vices, official investigation;
[0084] 1.
[0085] Step 3, through keyword matching, the public opinion text is matched with the risk keywords, and according to the matching results, tagging is made with the labels, so as to obtain a risk-label sequence.
[0086] Assuming that one collected entry of public-opinion text data is "A
series of incidents happened in constructions undertaken by )00CX and the company is now forbidden from managing new projects by the Housing and Construction Office due to violation ", and the word "violation" in the public-opinion text data matches a risk keyword in the risk label of "violation of regulations or laws", the risk label matching the public-opinion text data is "violation of regulations or laws". Because the other risk labels are all unmatched, "1"is only used to mark the risk-label sequence at the location of the element corresponding to "violation of regulations or laws", and the locations of the other elements in the risk-label sequence are marked with "0". As a result, the risk-label sequence corresponding to the foregoing public-opinion text data is [0, 0, 0, 0, 0, 0, 0, 0, Date Recue/Date Received 2022-01-12 0, 0, 1].
[0087] In the embodiment described above, training of the sentiment classification model comprises:
[0088] extracting public opinion corpora of various sentiment polarities from acquired public opinion corpora, so as to construct a tag-corpus set; and training the sentiment classification model based on the tag-corpus set using an LSTM or TextCNN
model structure; in which the sentiment polarities include positive sentiment, neutral sentiment, and negative sentiment, and the sentiment-polarity sequence is a sequence representation of one of the three sentiment polarities.
[0089] In particular implementations, the sentiment-polarity and entity-name identifying module extract public-opinion data sets of three polarities, including positive, neutral, and negative sentiment kinds (i.e., positive sentiment, neutral sentiment, and negative sentiment) from acquired public opinion corpora according to pre-defined positive and negative sentiment dictionary for a certain enterprise to form a tag-corpus set. For example:
[0090] [
[0091] The public opinion corpora of "negative sentiment":
[0092] A loss as high as 1.7 billion CNY, with power stations devalued; Is the case of X tech-company a common suffering of the industry;
[0093] New movies scheduled for February are halted again, 90% film and television stocks hit the limit down and cinema stocks enter the "Glacier Era";
[0094] Takkyubin accused: a network technology company presumed to increase pricing and graft price differences;
[0095] A courier company in Shanghai is so inefficient that couriers quit for other careers;
[0096] ....
[0097] The public opinion corpora of "public opinion corpora":
[0098] Challenging "vaccine leader" XXXX! First domestic vaccines launched;

Date Recue/Date Received 2022-01-12
[0099] With a burst of bullish news in the tera-scale plate blasted another harden of hundred-billion leading stocks;
[0100] Bullish news continuously come in the hydrogen energy industry and two sectors are expecting long-temi growth;
[0101] certain video platform is still "solid";
[0102] ....
[0103] The public opinion corpora of "neutral sentiment":
[0104] What exactly the "long-termism" advocated by A, B, and C is;
[0105] An image to the quotations in 2020;
[0106] Say goodbye to the getting-ready 2019 and enter the deep transformation in 2020;
[0107] Why Central Bank of certain country decided to cut the requirement reserve ratio in early January? For providing the market with liquidity;
[0108] ....
[0109]
[0110] After text pre-processing is performed on the public opinion corpora, a word embedding model that has been trained with a large quantity of public opinion text about enterprises of interest is used as a text vector representative for model training.
Afterward, the sentiment classification model was trained based on LSTM/TextCNN. As training sentiment classification models is known in the art, no detailed description is given and discussion herein is merely made to the results. As demonstrated by the statistics, the sentiment classification model according to the present embodiment when based on 100 thousand entries of data provided an accuracy rate of 87%, satisfying expectation.
[0111] In the embodiment described above, after the step of classifying sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, the method further comprises:
[0112] For every sentiment polarity, a corresponding polarity weight Qt is set, wherein Qi=
(Qi,Q2P Q3). In the sentiment-polarity sequence Ti = fT1, T2, T3), T1 denotes positive Date Recue/Date Received 2022-01-12 sentiment, T2 denotes neutral sentiment, and T3 denotes negative sentiment. Q1 denotes the polarity weight corresponding to the positive sentiment, Q2 denotes the polarity weight corresponding to the neutral sentiment, and Q3 denotes the polarity weight corresponding to the polarity weight.
[0113] In the embodiment described above, the step of identifying associated enterprise entity names in the public-opinion text data so as to construct an enterprise-association sequence comprises:
[0114] constructing a monitored-enterprise list consisting of plural enterprise entities in advance;
using a Chinese word segmentation tool and/or a NER naming entity identifying tool to identify the enterprise entity name associated with the public-opinion text data by means of keyword matching; and based on a location of the enterprise entity name in the monitored-enterprise list, constructing the enterprise-association sequence.
[0115] In particular implementations, a public-opinion processing platform is used to identify enterprise entities from the collected public-opinion text data, to extract risk labels from the data and to analyze sentiment polarities of the data. Meantime, a personalized configuring service provides standardized application configuration interface.
[0116] First, the public-opinion input module performs text pre-processing on titles, content text, and summary text of public-opinion text data collected in a real-time manner from public-opinion data sources according to subscription, so as to remove undesired stop words and conduct Chinese word segmentation. The second step is to process public-opinion labels and classify sentiment polarities. The pre-processed public-opinion text data are entered into a risk label module to generate risk-label sequences and are entered into the sentiment polarity analyzing module to generate sentiment polarity labels, such as positive sentiment, neutral sentiment or negative sentiment. At the third step, the enterprise entities associated with the public-opinion text data are identified using the combination Date Recue/Date Received 2022-01-12 of the Chinese word segmentation tool and the NER naming entity identifying tool as well as keyword matching, based on the dictionary of full names, short names, and aliases of monitored enterprises through the list of enterprises monitored. The public-opinion text data are associated with the enterprise entities to form an enterprise-association sequence E = 1,E1, E2, , Ern), where m is the number of all the monitored enterprises, Ei is the 0/1 label, in which 1 denotes the public opinion is associated with the ith enterprise, and 0 denotes not associated. The personalized configuring module of the platform supports synchronization of the monitored-enterprise list, updating of the sentiment polarity dictionary, and setting of the public opinion sources and the risk label weights.
[0117] Exemplarily, for public-opinion text data saying "A series of incidents happened in constructions undertaken by XXXX and the company is now forbidden from managing new projects by the Housing and Construction Office due to violation ..... ", this entry of data is classified by its sentiment polarity to confirm that the public opinion sentiment label is negative sentiment. Through extraction of the associated enterprise entities, the monitored enterprise list sequence corresponding to the public opinion association enterprises is generated as
[0118] [ ... , 0, 1,0, ..
[0119] In the embodiment described above, before the step of according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result, the method further comprises:
[0120] presetting plural kinds of risk-early-warning levels, and defining boundary intervals of each kind of risk-early-warning level.
[0121] In particular implementations, a risk early warning score is computed according to the early warning labels and the list of enterprises monitored I = U1i2, ...,J,i) (where Ji Date Recue/Date Received 2022-01-12 is a 0/1 label) subscribed by the user and according to the data-source sequences, credit weights, risk-label sequences, risk weights, sentiment-polarity sequences, polarity weights, and enterprise-association sequence of public-opinion text data, the early warning level is determined according to a risk threshold value. Then enterprise public opinion information that satisfies the requirements is pushed to the user as early warning.
[0122] Exemplarily, for the risk-early-warning level A= {no early warning, normal, important, serious), the boundary intervals corresponding to every risk-early-warning level is: H =
tHi, H2, H3). In other words, when the score is smaller than Hi, the corresponding risk-early-warning level is not to give early warning. When the score is greater than Hi and smaller than H2, the corresponding risk-early-warning level is normal. When the score is greater than H2 and smaller than H3, the corresponding risk-early-warning level is important. When the score is greater than H3, the corresponding risk-early-warning level is serious. The score corresponding to the sentiment polarity is Q = (Q1, Q2, Q3), and the sentiment-polarity sequence corresponding to the public-opinion text data is T
=
(T1, T2, T3), where only Ti is 1, and the other two are 0.
[0123] The step of according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result comprises:
[0124] using a public-opinion-risk-early-warning equation z = J
R1L -EV-007A +
El_oQiTi to compute a risk value of the public-opinion text data; and computing an early-warning value corresponding to the public-opinion text data in view of the enterprise-association sequence to, and outputting the risk-early-warning level based on the boundary interval to which the early-warning value belongs; the Ri denotes the risk weight of the corresponding risk-label class, Li denotes the risk-label sequence, n denotes a total number of the risk-label classes in the risk-label set, Wi denotes the credit weight of the designated website, Si denotes the data-source sequence, k denotes the total number of the designated websites, Qi denotes the polarity weight, Ti denotes Date Recue/Date Received 2022-01-12 sentiment-polarity sequence, and p denotes a total number of the sentiment polarities.
[0125] In particular implementations, for some user, the risk early warning score of some entry of public-opinion text data can be computed using the equation below:
[0126] z = RiLi +ZI[_OWiSi
[0127] With the vector inner product represented by (x, y), the equation above can be rewritten as
[0128] z = (R , L) + (W , S) + (Q ,T)
[0129] With the sequence information of the associated enterprise combined, it is obtained that
[0130] z' = z = e((E , J))
[0131] where E(x) is a unit step function,
[0132] E(x) = [ 0, 1, > 0 x < 0
[0133] It is understandable that when the enterprise entity name shown in this entry of public-opinion text data exists in the monitored-enterprise list, the value of E(x) is 1. At this time, a risk early warning score is computed. When the enterprise entity name mentioned in the entry of public-opinion text data does not exist in the list of enterprises monitored, the value of E(X) is 0. In this case, nor more computation for the risk early warning score is conducted thereto.
[0134] Further, the early warning mark is Output (z') = (Y (Z), A) , where Y(x) =
fy, (x), y2 (x),y3(x),y4(x)} , and the values of the two-value function yi(x), y2(x), y3(x), y4(x) is True or False (i.e., 1 or 0) :
[0135] y1(x) = 0 x <
[0136] y2 (x) = x <H2
[0137] y3(x) = H2 X < H3
[0138] y4(x) = x H3
[0139] Output(z') is output as the early warning mark: no early warning, nomial, important, or serious.

Date Recue/Date Received 2022-01-12
[0140] For example, risk-early-warning level A= {no early warning, normal, important, serious), corresponding threshold value: H = ( H1 = 5, H2 = 10, H3 = 30).
[0141] The score corresponding to the sentiment polarities (positive sentiment, neutral sentiment, and negative sentiment) is Q = (1,2,3) , and the sentiment-polarity sequence corresponding to this entry of the public-opinion text data is T = (0,04
[0142] Taking inputting the public-opinion text data: "A series of incidents happened in constructions undertaken by XXXX and the company is now forbidden from managing new projects by the Housing and Construction Office due to violation .....
"for example, the public-opinion text data came from NetEase, and the corresponding data-source sequence vector is [0, 0, 0, 1, 0, 0, 0, 0, 0]. The early warning label subscribed by the user is "security incident", and the list of monitored enterprises include XXXX.
[0143] According to the equation below, the risk early warning score is:
[0144] z = (R,L) + (W,S) + (Q, T) = 5 + 3 + 3 = 11
[0145] Since the public-opinion text data contains an associated enterprise (XXXX) that is one of the enterprises monitored by the user monitoring, (E ,J) > 0, so c((E ,J)) =
1, thereby obtaining that z' = z = z((E ,J)) = z = 11.
[0146] Further, because H2 <z' <H3, H3 Y(z') = (0,0,1,0), and therefore the resulting early warning mark is Output (z') = (Y (Z), A)= "important". The early warning outputting module thus outputs the public opinion "A series of incidents happened in constructions undertaken by XXXX and the company is now forbidden from managing new projects by the Housing and Construction Office due to violation .................. "
to the user as an "important" early warning.
[0147] To sum up, the schemes of the present embodiment are intended to dig potential risk Date Recue/Date Received 2022-01-12 information about enterprises of interest, and provide automated and personalized configuration, so as to form a public-opinion analyzing process and give smart early warning of potential risks to relevant enterprises, thereby helping risk business personnel to conduct enterprise risk control and assessment more efficiently.
[0148] Embodiment 2
[0149] The present embodiment provides a system of public-opinion analysis for providing early warning of enterprise risks. The system comprises:
[0150] a public-opinion-collecting module, for collecting public-opinion text data from any designated website, and constructing a data-source sequence according to website sources of the public-opinion text data;
[0151] a risk label module, for matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence;
[0152] a sentiment-polarity and entity-name identifying module, for performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model so as to construct a sentiment-polarity sequence, and identifying associated enterprise entity names in the public-opinion text data so as to construct an enterprise-association sequence; and
[0153] an early warning outputting module, for according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result.
[0154] As compared to the prior art, the system of public-opinion analysis for providing early warning of enterprise risks of the present embodiment provides beneficial effects that are similar to those provided by the method of public-opinion analysis for providing early warning of enterprise risks as enumerated in the previous embodiment, and thus no repetitions are made herein.
Date Recue/Date Received 2022-01-12
[0155] Embodiment 3
[0156] The present embodiment provides a computer-readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method of public-opinion analysis for providing early warning of enterprise risks as described previously.
[0157] As compared to the prior art, the computer-readable storage medium of the present embodiment provides beneficial effects that are similar to those provided by the method of public-opinion analysis for providing early warning of enterprise risks as enumerated in the previous embodiment, and thus no repetitions are made herein.
[0158] As will be appreciated by people of ordinary skill in the art, implementation of all or a part of the steps of the method of the present invention as described previously may be realized by having a program instruct related hardware components. The program may be stored in a computer-readable storage medium, and the program is about performing the individual steps of the methods described in the foregoing embodiments.
The storage medium may be a ROM/RAM, a hard drive, an optical disk, a memory card or the like.
[0159] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims. Hence, the scope of the present invention shall only be defined by the appended claims.

Date Recue/Date Received 2022-01-12

Claims (60)

Claims:
1. A method comprising:
collecting public-opinion text data from any designated website;
constructing a data-source sequence for website sources of the public-opinion text data, wherein a credit weight is assigned for each designated website;
matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence, wherein the preset risk-label set includes risk keywords configured in a risk-label class wherein in each risk-label class has a risk weight;
performing classification of sentiment polarities of the public-opinion text data using a sentiment classification model to construct a sentiment-polarity sequence;
identifying associated enterprise entity names in the public-opinion text data to construct an enterprise-association sequence; and computing and outputting a public opinion analysis result according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data.
2. The method of claim 1, wherein constructing the data-source sequence for the website sources of the public-opinion text data comprises:
summing up a total number of the designated websites;
configuring the credit weight for each designated website, to construct a data-source sequence set dimensionally consistent with the total number;
identifying a location of the source website in the data-source sequence set;
constructing the corresponding data-source sequence; and matching a corresponding credit weight.
3. The method of claim 1, further comprises:
constructing the risk-label set in advance, wherein the risk-label set includes plural risk-label classes, and each risk-label class corresponds to at least one risk keyword; and configuring the risk weight for each risk-label class in the risk-label set.
4. The method of claim 3, wherein matching risk labels of the public-opinion text data based on the preset risk-label set to construct the risk-label sequence comprises:
matching the risk keywords to the public-opinion text data by means of text keyword matching;
searching for corresponding risk-label class according to matching results;
and based on locations of the risk-label classes in the risk-label set, constructing the risk-label sequence.
5. The method of claim 1, wherein training of the sentiment classification model comprises:
extracting public opinion corpora of various sentiment polarities respectively from acquired public opinion corpora, to construct a tag-corpus set;
training the sentiment classification model based on the tag-corpus set using a Long short-term memory (LSTM) or convolutional neural network for text (TextCNN) model structure; and wherein classifications of the sentiment polarities include one or more of positive sentiment, neutral sentiment, and negative sentiment, and the sentiment-polarity sequence is a sequence representation of one of the three sentiment polarities.
6. The method of claim 5, further comprises configuring a corresponding polarity weight for every kind of sentiment polarity.
7. The method of claim 1, wherein identifying the associated enterprise entity names in the public-opinion text data to construct the enterprise-association sequence comprises:

constructing a monitored-enterprise list consisting of plural enterprise entities in advance;
identifying the enterprise entity name associated with the public-opinion text data by means of keyword matching with a Chinese word segmentation tool and a named-entity recognition (NER) naming entity identifying tool; and based on a location of the enterprise entity name in the monitored-enterprise list, constructing the enterprise-association sequence.
8. The method of claim 1, further comprises:
presetting plural kinds of risk-early-warning levels; and defining boundary intervals of each kind of risk-early-warning levels.
9. The method of claim 8, wherein according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result comprises:
using a public-opinion-risk-early-warning equation z = ElLo RiLi +Elc-oWiSt +
Ef_oQiTi to compute a risk value of the public-opinion text data;
computing an early-warning value corresponding to the public-opinion text data in view of the enterprise-association sequence;
outputting the risk-early-warning level based on the boundary interval to which the early-warning value belongs;
wherein Ri denotes a risk weight of a corresponding risk-label class, Li denotes the risk-label sequence, n denotes a total number of the risk-label classes in the risk-label set, Wi denotes a credit weight of the designated website, Si denotes the data-source sequence, k denotes the total number of the designated websites, Qi denotes a polarity weight, Ti denotes the sentiment-polarity sequence, andp denotes a total number of the sentiment polarities.
10. The method of any one of claims 1 to 9, wherein the public-opinion text data are collected from any designated website, and are processed to construct the website sources, wherein the sources of public opinions about enterprises include one or more of news websites, government websites, forums, micro blogging, and websites receiving complaints, the source sequence is S = {S1, S2, ..., Sk}., wherein the sources of public opinions, different credit weights Wi are assigned, wherein the credit weights are configured by users, wherein setting addresses, site sections, data-collecting frequencies, keywords of the public-opinion data sources are performed, wherein an Internet-based data collecting tool is used to acquire the public-opinion text data.
11. The method of any one of claims 1 to 10, wherein collecting the public-opinion text data comprises:
using a Python-based or Java-based html processing tool to denoise webpages, clean data, and extract fields, so that data of public-opinion webpage data is extracted in a structured manner by fields including titles, sources, links, releasing date, text, summaries, and authors;
storing the extracted structured text data.
12. The method of any one of claims 1 to 10, wherein the risk-label set is constructed for classes of risk events that are commonly seen in public opinions about enterprises and classes of risk events the users care about, wherein every risk label is assigned with a corresponding risk weight Rj, wherein the risk weight may alternatively be configured by the users, wherein a keyword set is developed for each of the risk labels, to form a "label -keyword dictionary", wherein the risk-label sequence is L = {L1, L2, ..., Ln} of the public opinions, wherein n is the total number of the risk labels, Li corresponds to the 0/1 identification corresponding to the risk label, wherein 1 denotes that there is the ith label in the public opinions, and 0 denotes that there is not the ith label in the public opinions.
13. The method of any one of claims 1 to 12, wherein the sentiment-polarity and entity-name identifying module extract public-opinion data sets of three polarities, including positive, neutral, and negative sentiment from the acquired public opinion corpora according to pre-defined positive and negative sentiment dictionary for a certain enterprise to form the tag-corpus set.
14. The method of any one of claims 1 to 13, wherein for every sentiment polarity, a corresponding polarity weight Qi is set, wherein Qi = {Q1, Q2, Q3}, wherein the sentiment-polarity sequence Ti = {T1, T2,T3}, T1 denotes positive sentiment, T2 denotes neutral sentiment, and T3 denotes negative sentiment, wherein Q1 denotes the polarity weight corresponding to the positive sentiment, Q2 denotes the polarity weight corresponding to the neutral sentiment, and Q3 denotes the polarity weight corresponding to the negative sentiment.
15. The method of any one of claims 1 to 14, wherein a public-opinion processing platform is used to identify enterprise entities from the collected public-opinion text data, to extract the risk labels from the data and to analyze the sentiment polarities of the data, wherein a personalized configuring service provides standardized application configuration interface.
16. The method of any one of claims 1 to 15, wherein the enterprise entities associated with the public-opinion text data are identified based on a dictionary of full names, short names, and aliases of monitored enterprises through a list of enterprises monitored, wherein the public-opinion text data are associated with the enterprise entities to form an enterprise-association sequence is E = E2, Em}, where m is the number of all the monitored enterprises, Ei is the 0/1 label, wherein 1 denotes the public opinion is associated with the ith enterprise, and 0 denotes not associated, wherein synchronization of the monitored-enterprise list, updating of a sentiment polarity dictionary, and setting of the public opinion sources and risk label weights is supported.
17. The method of any one of claims 1 to 16, wherein a risk early warning score is computed according to early warning labels and list of enterprises monitored is J =
U1,12, === 'La wherein Ji is a 0/1 label, subscribed by the user and according to data-source sequences, credit weights, risk-label sequences, risk weights, sentiment-polarity sequences, polarity weights, and the enterprise-association sequence of public-opinion text data, early warning level is determined according to a risk threshold value, wherein enterprise public opinion information that satisfies requirements is pushed to the user as early warning.
18. The method of any one of claims 1 to 17, wherein the risk-early-warning level A= {no early warning, normal, important, serious}, boundary intervals corresponding to every risk-early-warning level is: H = {H1, H2, H3}, wherein score is smaller than Hi, the corresponding risk-early-warning level is not to give early warning, wherein the score is greater than H1 and smaller than H2, the corresponding risk-early-warning level is normal, wherein the score is greater than H2 and smaller than H3, the corresponding risk-early-warning level is important, wherein the score is greater than H3, the corresponding risk-early-warning level is serious, wherein the score corresponding to sentiment polarity is Q = {Q1, Q2, Q3}, and the sentiment-polarity sequence corresponding to the public-opinion text data is T = {T1, T2, T3}.
19. The method of any one of claims 1 to 18, wherein risk early warning score of entry of public-opinion text data is computed by:
wherein a vector inner product represented by (x, y), the equation is:
z = (R,L) + (W,S)+ (Q,T);
wherein the sequence information of the associated enterprise combined is:
z' = z = E((E,J));
wherein E(x) is a unit step function is:
wherein the enterprise entity name shown in this entry of public-opinion text data exists in monitored-enterprise list, the value of E(x) is 1, wherein the risk early warning score is computed, wherein the enterprise entity name mentioned in the entry of public-opinion text data does not exist in the list of enterprises monitored, the value of E(x) is 0, wherein no more computation for the risk early warning score is conducted.
20. The method of any one of claims 1 to 19, wherein early warning mark is:
Output (z') =
(Y (z' ), 11), wherein Y(x) = {y1(x), y2(x), y3(x), y4 (x)}, and the values of the two-value function (x), y2 (x), y3 (x), y4(x) is True or False, 1 or 0, wherein:
y1(x) = 0 x < H1;
y2(x) = x < H2 ;
y3 (X) = H2 X < H3 ;
y4(x) = x H3 ; and wherein Output (z') is output as the early warning mark: no early warning, normal, important, or serious.
21. A system comprising:
a public-opinion-collecting module, configured to:
collect public-opinion text data from any designated website;
construct a data-source sequence for website sources of the public-opinion text data, wherein a credit weight is assigned for each designated website;
a risk label module, for matching risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence, wherein the preset risk-label set includes risk keywords configured in a risk-label class wherein in each risk-label class has a risk weight;

a sentiment-polarity and entity-name identifying module, configured to:
perform classification of sentiment polarities of the public-opinion text data using a sentiment classification model to construct a sentiment-polarity sequence;
identify associated enterprise entity names in the public-opinion text data to construct an enterprise-association sequence; and an early warning outputting module, for computing and outputting a public opinion analysis result according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data.
22. The system of claim 21, wherein constructing the data-source sequence for the website sources of the public-opinion text data comprises:
summing up a total number of the designated websites;
conftguring the credit weight for each designated website, to construct a data-source sequence set dimensionally consistent with the total number;
identifying a location of the source website in the data-source sequence set;
constructing the corresponding data-source sequence; and matching a corresponding credit weight.
23. The system of claim 21, further comprises:
constructing the risk-label set in advance, wherein the risk-label set includes plural risk-label classes, and each risk-label class corresponds to at least one risk keyword; and configuring the risk weight for each risk-label class in the risk-label set.
24. The system of claim 23, wherein matching risk labels of the public-opinion text data based on the preset risk-label set to constuct the risk-label sequence comprises:

matching the risk keywords to the public-opinion text data by means of text keyword matching;
searching for corresponding risk-label class according to matching results;
and based on locations of the risk-label classes in the risk-label set, constructing the risk-label sequence.
25. The system of claim 21, wherein training of the sentiment classification model comprises:
extracting public opinion corpora of various sentiment polarities respectively from acquired public opinion corpora, to construct a tag-corpus set;
training the sentiment classification model based on the tag-corpus set using a Long short-term memory (LSTM) or convolutional neural network for text (TextCNN) model structure; and wherein classifications of the sentiment polarities include one or more of positive sentiment, neutral sentiment, and negative sentiment, and the sentiment-polarity sequence is a sequence representation of one of the three sentiment polarities.
26. The system of claim 25, further comprises configuring a corresponding polarity weight for every kind of sentiment polarity.
27. The system of claim 21, wherein identifying the associated enterprise entity names in the public-opinion text data to construct the enterprise-association sequence comprises:
constructing a monitored-enterprise list consisting of plural enterprise entities in advance;
identifying the enterprise entity name associated with the public-opinion text data by means of keyword matching with a Chinese word segmentation tool and a named-entity recognition (NER) naming entity identifying tool; and based on a location of the enterprise entity name in the monitored-enterprise list, constructing the enterprise-association sequence.
28. The system of claim 21, further comprises:
presetting plural kinds of risk-early-warning levels; and defining boundary intervals of each kind of risk-early-warning levels.
29. The system of claim 28, wherein according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result comprises:
using a public-opinion-risk-early-warning equation to compute a risk value of the public-opinion text data;
computing an early-warning value corresponding to the public-opinion text data in view of the enterprise-association sequence;
outputting the risk-early-warning level based on the boundary interval to which the early-warning value belongs;
wherein Ri denotes a risk weight of a corresponding risk-label class, Li denotes the risk-label sequence, n denotes a total number of the risk-label classes in the risk-label set, Wi denotes a credit weight of the designated website, Si denotes the data-source sequence, k denotes the total number of the designated websites, Qi denotes a polarity weight, Ti denotes the sentiment-polarity sequence, andp denotes a total number of the sentiment polarities.
30. The system of any one of claims 21 to 29, wherein the public-opinion text data are collected from any designated website, and are processed to construct the website sources, wherein the sources of public opinions about enterprises include one or more of news websites, government websites, forums, micro blogging, and websites receiving complaints, the source sequence is S = {S1, S2, ..., Sk}., wherein the sources of public opinions, different credit weights Wi are assigned, wherein the credit weights are configured by users, wherein setting addresses, site sections, data-collecting frequencies, keywords of the public-opinion data sources are performed, wherein an Internet-based data collecting tool is used to acquire the public-opinion text data.
31. The system of any one of claims 21 to 30, wherein collecting the public-opinion text data comprises:
using a Python-based or Java-based html processing tool to denoise webpages, clean data, and extract fields, so that data of public-opinion webpage data is extracted in a structured manner by fields including titles, sources, links, releasing date, text, summaries, and authors;
storing the extracted structured text data.
32. The system of any one of claims 21 to 31, wherein the risk-label set is constructed for classes of risk events that are commonly seen in public opinions about enterprises and classes of risk events the users care about, wherein every risk label is assigned with a corresponding risk weight Rj, wherein the risk weight may alternatively be configured by the users, wherein a keyword set is developed for each of the risk labels, to form a "label -keyword dictionary", wherein the risk-label sequence is L = {L1, L2, ..., Ln} of the public opinions, wherein n is the total number of the risk labels, Li corresponds to the 0/1 identification corresponding to the risk label, wherein 1 denotes that there is the ith label in the public opinions, and 0 denotes that there is not the ith label in the public opinions.
33. The system of any one of claims 21 to 32, wherein the sentiment-polarity and entity-name identifying module extract public-opinion data sets of three polarities, including positive, neutral, and negative sentiment from the acquired public opinion corpora according to pre-defined positive and negative sentiment dictionary for a certain enterprise to form the tag-corpus set.
34. The system of any one of claims 21 to 33, wherein for every sentiment polarity, a corresponding polarity weight Qi is set, wherein Qi = {(21, Q2, Q3}, wherein the sentiment-polarity sequence Ti = {T1, T2,T3}, T1 denotes positive sentiment, T2 denotes neutral sentiment, and T3 denotes negative sentiment, wherein Q1 denotes the polarity weight corresponding to the positive sentiment, Q2 denotes the polarity weight corresponding to the neutral sentiment, and Q3 denotes the polarity weight corresponding to the negative sentiment.
35. The system of any one of claims 21 to 34, wherein a public-opinion processing platform is used to identify enterprise entities from the collected public-opinion text data, to extract the risk labels from the data and to analyze the sentiment polarities of the data, wherein a personalized configuring service provides standardized application configuration interface.
36. The system of any one of claims 21 to 35, wherein the enterprise entities associated with the public-opinion text data are identified based on a dictionary of full names, short names, and aliases of monitored enterprises through a list of enterprises monitored, wherein the public-opinion text data are associated with the enterprise entities to form an enterprise-association sequence is E = {E1,E2, where m is the number of all the monitored enterprises, Ei is the 0/1 label, wherein 1 denotes the public opinion is associated with the ith enterprise, and 0 denotes not associated, wherein synchronization of the monitored-enterprise list, updating of a sentiment polarity dictionary, and setting of the public opinion sources and risk label weights is supported.
37. The system of any one of claims 21 to 36, wherein a risk early warning score is computed according to early warning labels and list of enterprises monitored is J =1,, I
2, = = = 'La wherein Ji is a 0/1 label, subscribed by the user and according to data-source sequences, credit weights, risk-label sequences, risk weights, sentiment-polarity sequences, polarity weights, and the enterprise-association sequence of public-opinion text data, early warning level is determined according to a risk threshold value, wherein enterprise public opinion information that satisfies requirements is pushed to the user as early warning.
38. The system of any one of claims 21 to 37, wherein the risk-early-warning level A= {no early warning, normal, important, serious}, boundary intervals corresponding to every risk-early-warning level is: H = {H1, H2, H3}, wherein score is smaller than Hi, the corresponding risk-early-warning level is not to give early warning, wherein the score is greater than H1 and smaller than H2, the corresponding risk-early-warning level is normal, wherein the score is greater than H2 and smaller than H3, the corresponding risk-early-warning level is important, wherein the score is greater than H3, the corresponding risk-early-waming level is serious, wherein the score corresponding to sentiment polarity is Q = {Q1, Q2, Q3}, and the sentiment-polarity sequence corresponding to the public-opinion text data is T = {T1, T2, T3).
39. The system of any one of claims 21 to 38, wherein risk early warning score of entry of public-opinion text data is computed by:
wherein a vector inner product represented by (x, y), the equation is:
z = (R, L) + (W , S) + (Q , T);
wherein the sequence information of the associated enterprise combined is:
wherein E(x) is a unit step function is:
wherein the enterprise entity name shown in this entry of public-opinion text data exists in monitored-enterprise list, the value of E(x) is 1, wherein the risk early warning score is computed, wherein the enterprise entity name mentioned in the entry of public-opinion text data does not exist in the list of enterprises monitored, the value of E(x) is 0, wherein no more computation for the risk early warning score is conducted.
40. The system of any one of claims 21 to 39, wherein early warning mark is:
Output (z =
(Y (z' ), 11), wherein Y(x) = {y1(x), y2(x), y3(x), y4 (x)}, and the values of the two-value function (x), y2 (x), y3 (x), y4(x) is True or False, 1 or 0, wherein:
y1(x) = 0 x < H1;
y2(x) = x < H2 ;
y3 (X) = H2 X < H3 ;
y4(x) = x H3 ; and wherein Output (z') is output as the early warning mark: no early warning, nonnal, important, or serious.
41. A computer readable storage medium, storing thereon a computer program is executed by a processor configured to:
collect public-opinion text data from any designated website;
construct a data-source sequence for website sources of the public-opinion text data, wherein a credit weight is assigned for each designated website;

match risk labels of the public-opinion text data based on a preset risk-label set to construct a risk-label sequence, wherein the preset risk-label set includes risk keywords configured in a risk-label class wherein in each risk-label class has a risk weight;
perform classification of sentiment polarities of the public-opinion text data using a sentiment classification model to construct a sentiment-polarity sequence;
identify associated enterprise entity names in the public-opinion text data to construct an enterprise-association sequence; and compute and output a public opinion analysis result according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data.
42. The storage medium of claim 41, wherein constructing the data-source sequence for the website sources of the public-opinion text data comprises:
summing up a total number of the designated websites;
conftguring the credit weight for each designated website, to construct a data-source sequence set dimensionally consistent with the total number;
identifying a location of the source website in the data-source sequence set;
constructing the corresponding data-source sequence; and matching a corresponding credit weight.
43. The storage medium of claim 41, further comprises:
constructing the risk-label set in advance, wherein the risk-label set includes plural risk-label classes, and each risk-label class corresponds to at least one risk keyword; and configuring the risk weight for each risk-label class in the risk-label set.
44. The storage medium of claim 43, wherein matching risk labels of the public-opinion text data based on the preset risk-label set to construct the risk-label sequence comprises:

matching the risk keywords to the public-opinion text data by means of text keyword matching;
searching for corresponding risk-label class according to matching results;
and based on locations of the risk-label classes in the risk-label set, constructing the risk-label sequence.
45. The storage medium of claim 41, wherein training of the sentiment classification model comprises:
extracting public opinion corpora of various sentiment polarities respectively from acquired public opinion corpora, to construct a tag-corpus set;
training the sentiment classification model based on the tag-corpus set using a Long short-term memory (LSTM) or convolutional neural network for text (TextCNN) model structure; and wherein classifications of the sentiment polarities include one or more of positive sentiment, neutral sentiment, and negative sentiment, and the sentiment-polarity sequence is a sequence representation of one of the three sentiment polarities.
46. The storage medium of claim 45, further comprises configuring a corresponding polarity weight for every kind of sentiment polarity.
47. The storage medium of claim 41, wherein identifying the associated enterprise entity names in the public-opinion text data to construct the enterprise-association sequence comprises:
constructing a monitored-enterprise list consisting of plural enterprise entities in advance;
identifying the enterprise entity name associated with the public-opinion text data by means of keyword matching with a Chinese word segmentation tool and a named-entity recognition (NER) naming entity identifying tool; and based on a location of the enterprise entity name in the monitored-enterprise list, constructing the enterprise-association sequence.
48. The storage medium of claim 41, further comprises:
presetting plural kinds of risk-early-warning levels; and defining boundary intervals of each kind of risk-early-warning levels.
49. The storage medium of claim 48, wherein according to the data-source sequence, the risk-label sequence, the sentiment-polarity sequence and the enterprise-association sequence corresponding to the public-opinion text data, computing and outputting a public opinion analysis result comprises:
using a public-opinion-risk-early-warning equation tO compute a risk value of the public-opinion text data;
computing an early-warning value corresponding to the public-opinion text data in view of the enterprise-association sequence;
outputting the risk-early-warning level based on the boundary interval to which the early-warning value belongs;
wherein Ri denotes a risk weight of a corresponding risk-label class, Li denotes the risk-label sequence, n denotes a total number of the risk-label classes in the risk-label set, Wi denotes a credit weight of the designated website, Si denotes the data-source sequence, k denotes the total number of the designated websites, Qi denotes a polarity weight, Ti denotes the sentiment-polarity sequence, andp denotes a total number of the sentiment polarities.
50. The storage medium of any one of claims 41 to 49, wherein the public-opinion text data are collected from any designated website, and are processed to construct the website sources, wherein the sources of public opinions about enterprises include one or more of news websites, government websites, forums, micro blogging, and websites receiving complaints, the source sequence is S = S2, wherein the sources of public opinions, different credit weights Wi are assigned, wherein the credit weights are configured by users, wherein setting addresses, site sections, data-collecting frequencies, keywords of the public-opinion data sources are performed, wherein an Internet-based data collecting tool is used to acquire the public-opinion text data.
51. The storage medium of any one of claims 41 to 50, wherein collecting the public-opinion text data comprises:
using a Python-based or Java-based html processing tool to denoise webpages, clean data, and extract fields, so that data of public-opinion webpage data is extracted in a structured manner by fields including titles, sources, links, releasing date, text, summaries, and authors;
storing the extracted structured text data.
52. The storage medium of any one of claims 41 to 51, wherein the risk-label set is constructed for classes of risk events that are commonly seen in public opinions about enterprises and classes of risk events the users care about, wherein every risk label is assigned with a corresponding risk weight Rj, wherein the risk weight may alternatively be configured by the users, wherein a keyword set is developed for each of the risk labels, to form a "label -keyword dictionary", wherein the risk-label sequence is L = {L1, L2, ...,Ln}
of the public opinions, wherein n is the total number of the risk labels, Li corresponds to the 0/1 identification corresponding to the risk label, wherein 1 denotes that there is the ith label in the public opinions, and 0 denotes that there is not the ith label in the public opinions.
53. The storage medium of any one of claims 41 to 52, wherein the sentiment-polarity and entity-name identifying module extract public-opinion data sets of three polarities, including positive, neutral, and negative sentiment from the acquired public opinion corpora according to pre-defined positive and negative sentiment dictionary for a certain enterprise to form the tag-corpus set.
54. The storage medium of any one of claims 41 to 53, wherein for every sentiment polarity, a corresponding polarity weight Qi is set, wherein Qi = {Q1, Q2, Q3}, wherein the sentiment-polarity sequence Ti = {T1, T2, T3}, T1 denotes positive sentiment, T2 denotes neutral sentiment, and T3 denotes negative sentiment, wherein Q1 denotes the polarity weight corresponding to the positive sentiment, Q2 denotes the polarity weight corresponding to the neutral sentiment, and Q3 denotes the polarity weight corresponding to the negative sentiment.
55. The storage medium of any one of claims 41 to 54, wherein a public-opinion processing platform is used to identify enterprise entities from the collected public-opinion text data, to extract the risk labels from the data and to analyze the sentiment polarities of the data, wherein a personalized configuring service provides standardized application configuration interface.
56. The storage medium of any one of claims 41 to 55, wherein the enterprise entities associated with the public-opinion text data are identified based on a dictionary of full names, short names, and aliases of monitored enterprises through a list of enterprises monitored, wherein the public-opinion text data are associated with the enterprise entities to form an enterprise-association sequence is E = E2, Em}, where m is the number of all the monitored enterprises, Ei is the 0/1 label, wherein 1 denotes the public opinion is associated with the enterprise, and 0 denotes not associated, wherein synchronization of the monitored-enterprise list, updating of a sentiment polarity dictionary, and setting of the public opinion sources and risk label weights is supported.
57. The storage medium of any one of claims 41 to 56, wherein a risk early warning score is computed according to early warning labels and list of enterprises monitored is]. =
=== 'La wherein Ji is a 0/1 label, subscribed by the user and according to data-source sequences, credit weights, risk-label sequences, risk weights, sentiment-polarity sequences, polarity weights, and the enterprise-association sequence of public-opinion text data, early warning level is determined according to a risk threshold value, wherein enterprise public opinion information that satisfies requirements is pushed to the user as early warning.
58. The storage medium of any one of claims 41 to 57, wherein the risk-early-warning level A=
{no early warning, normal, important, serious}, boundary intervals corresponding to every risk-early-warning level is: H = {H1, H2, H3), wherein score is smaller than H1, the corresponding risk-early-warning level is not to give early warning, wherein the score is greater than H1 and smaller than H2, the corresponding risk-early-warning level is normal, wherein the score is greater than H2 and smaller than H3, the corresponding risk-early-warning level is important, wherein the score is greater than H3, the corresponding risk-early-warning level is serious, wherein the score corresponding to sentiment polarity is Q =
[Q1, Q2, Q3), and the sentiment-polarity sequence corresponding to the public-opinion text data is T = {T1, T2, T3).
59. The storage medium of any one of claims 41 to 58, wherein risk early warning score of entry of public-opinion text data is computed by:
wherein a vector inner product represented by (x, y), the equation is:
z = (R,L) + (141,S) + (Q,T);
wherein the sequence information of the associated enterprise combined is:
z' = z = E((E,J));
wherein z(x) is a unit step function is:

wherein the enterprise entity name shown in this entry of public-opinion text data exists in monitored-enterprise list, the value of E(x) is 1, wherein the risk early warning score is computed, wherein the enterprise entity name mentioned in the entry of public-opinion text data does not exist in the list of enterprises monitored, the value of E(x) is 0, wherein no more computation for the risk early warning score is conducted.
60. The storage medium of any one of claims 41 to 59, wherein early warning mark is:
Output (z ') = (Y(z'), A), wherein Y(x) = { (x), y2 (x), y3(x), y4(x)}, and the values of the two-value function yi(x), y2(x), y3 (x), y4(x) is True or False, 1 or 0, wherein:
y1(x) = 0 x < H1;
y2(x) = H1 5 x < H2 ;
y3 (X) = H2 5 x < H3 ;
y4(x) = x H3 ; and wherein Output (z') is output as the early warning mark: no early warning, normal, important, or serious.
CA3138730A 2020-11-12 2021-11-12 Public-opinion analysis method and system for providing early warning of enterprise risks Active CA3138730C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011264306.XA CN113297283A (en) 2020-11-12 2020-11-12 Public opinion analysis method and system for enterprise risk early warning
CN202011264306.X 2020-11-12

Publications (2)

Publication Number Publication Date
CA3138730A1 CA3138730A1 (en) 2022-05-12
CA3138730C true CA3138730C (en) 2023-08-01

Family

ID=77318454

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3138730A Active CA3138730C (en) 2020-11-12 2021-11-12 Public-opinion analysis method and system for providing early warning of enterprise risks

Country Status (2)

Country Link
CN (1) CN113297283A (en)
CA (1) CA3138730C (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918794B (en) * 2021-12-13 2022-03-29 宝略科技(浙江)有限公司 Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
CN115456793A (en) * 2022-09-06 2022-12-09 山东大学 Intelligent risk control system for user investment decision
CN116069832B (en) * 2023-04-07 2023-06-06 微网优联科技(成都)有限公司 Data mining method and device and electronic equipment
CN116738070A (en) * 2023-08-15 2023-09-12 浙江同信企业征信服务有限公司 Public opinion monitoring method, device, equipment and storage medium
CN116777607B (en) * 2023-08-24 2023-11-07 上海银行股份有限公司 Intelligent auditing method based on NLP technology
CN117131281B (en) * 2023-10-26 2024-02-09 中关村科学城城市大脑股份有限公司 Public opinion event processing method, apparatus, electronic device and computer readable medium
CN117291428B (en) * 2023-11-17 2024-03-08 南京雅利恒互联科技有限公司 Enterprise management APP-based data background management system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704572B (en) * 2019-09-04 2021-03-16 北京航空航天大学 Suspected illegal fundraising risk early warning method, device, equipment and storage medium
CN111695033B (en) * 2020-04-29 2023-06-27 平安科技(深圳)有限公司 Enterprise public opinion analysis method, enterprise public opinion analysis device, electronic equipment and medium

Also Published As

Publication number Publication date
CN113297283A (en) 2021-08-24
CA3138730A1 (en) 2022-05-12

Similar Documents

Publication Publication Date Title
CA3138730C (en) Public-opinion analysis method and system for providing early warning of enterprise risks
Alam et al. Processing social media images by combining human and machine computing during crises
Pacheco et al. Uncovering coordinated networks on social media: methods and case studies
CN107463605B (en) Method and device for identifying low-quality news resource, computer equipment and readable medium
Kabir et al. The Power of Social Media Analytics: Text Analytics Based on Sentiment Analysis and Word Clouds on R.
US10956522B1 (en) Regular expression generation and screening of textual items
CN103064987A (en) Bogus transaction information identification method
CN104536956A (en) A Microblog platform based event visualization method and system
Lago et al. Visual and textual analysis for image trustworthiness assessment within online news
CN107944032B (en) Method and apparatus for generating information
Chatterjee et al. Classifying facts and opinions in Twitter messages: a deep learning-based approach
CN113722433A (en) Information pushing method and device, electronic equipment and computer readable medium
CN107679977A (en) A kind of tax administration platform and implementation method based on semantic analysis
Puri et al. Survey big data analytics, applications and privacy concerns
Bani-Hani et al. A semantic model for context-based fake news detection on social media
Monterrubio et al. Coronavirus fake news detection via MedOSINT check in health care official bulletins with CBR explanation: The way to find the real information source through OSINT, the verifier tool for official journals
CN111027832A (en) Tax risk determination method, apparatus and storage medium
Owda et al. Financial discussion boards irregularities detection system (fdbs-ids) using information extraction
Zhang et al. Investigating the uses of mobile phone evidence in China criminal proceedings
CN109636627B (en) Insurance product management method, device, medium and electronic equipment based on block chain
Qi et al. Social media in state politics: Mining policy agendas topics
CN112989167B (en) Method, device and equipment for identifying transport account and computer readable storage medium
AT&T
Othman et al. Customer opinion summarization based on twitter conversations
KR101614311B1 (en) Apparatus for collecting contents using social relation character and method thereof