CN112860906B

CN112860906B - Market leader hot line and public opinion decision support method and system based on natural language processing

Info

Publication number: CN112860906B
Application number: CN202110440120.3A
Authority: CN
Inventors: 张子成; 曹伟
Original assignee: Nanjing Huiningjie Information Technology Co ltd
Current assignee: Nanjing Huiningjie Information Technology Co ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-16
Anticipated expiration: 2041-04-23
Also published as: CN112860906A

Abstract

The invention discloses a crime hot-line public opinion decision support method and system based on natural language processing, which help the crime hot-line to master the public opinion trend and carry out scientific decision, excavate daily hot-line events based on natural language processing, simplify and classify the hot-line events, then retrieve a worksheet containing hot-line keywords by adopting a Hash index method capable of matching multiple keywords, extract the abstract of the retrieved worksheet by adopting an improved TextRank algorithm, feed back the most key information in the worksheet, and enable the staff of the hot-line to timely know the daily complaints and the hot-line public opinion trend and report the complaints to related departments. By utilizing the system, the chief hot-line staff changes manual combing of public opinion hotspot information into automatic public opinion information automatic mining and displaying, thus greatly improving the working efficiency of the staff, finding out civil problems in time through the system and taking active countermeasures.

Description

Market leader hot line and public opinion decision support method and system based on natural language processing

Technical Field

The invention belongs to the technical field of artificial intelligence and machine learning, and particularly relates to a city leader hot line public opinion decision support method and system based on natural language processing.

Background

With the increasing of the consciousness of maintaining the right of the citizen, the civic hotline gradually becomes an important channel for public interest expression, emotion disclosure and thought collision. Under the action of complex social environment and benefit conflict, the network public sentiment of the emergency happens sometimes, and the negative effect of the network public sentiment after the emergency public affair happens is amplified more easily, so that more personal problems are generalized and complicated, and the social contradiction is aggravated, thereby triggering the chain reaction of the public crisis. The traditional public opinion management modes such as sealing, blocking, desert and the like not only eradicate the public opinion crisis but also possibly further damage the image. Therefore, the early prejudgment and treatment of public sentiment outbreak are very important.

With the continuous development of information technology, more and more cases are used for assisting departments in making scientific decisions by using artificial intelligence and machine learning technology. Natural language processing is an important branch of artificial intelligence that can process and analyze natural language with computer technology.

The department of industry and telecommunications of 12 months in 2017 published ' promotion of three-year action plans for the development of the new generation of artificial intelligence industry (2018 & year 2020) ], and specifically mentions ' encouragement department takes precedence in using artificial intelligence to improve business efficiency and manage service level '. In the current society, the chief staff hot line needs to respond to the 'total customer service' demanded by each party well, the convenience and satisfaction of the masses and the market main body are used as measuring standards, the chief staff hot line handling quality and efficiency and the system intelligentization level are improved, and a closed loop with quick response, efficient handling, tracking and supervision, timely feedback and analysis promotion is formed. The hotline information resource is a window for timely knowing the civil situation, and is a food for government work and decision.

At present, the degree of mining, developing and utilizing data values by the market leader hot line is not high, and the utilization and the exploration of hot line information resources are in a preliminary stage. The invention designs a public opinion decision support system for the market leader hot line by using advanced technologies such as big data mining, machine learning and the like, and provides powerful guarantee for early discovery and timely treatment of the public opinion crisis.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems, the invention provides a city leader hot-line public opinion decision support method and system based on natural language processing, wherein daily hot-line events are mined based on natural language processing, the hot-line events are simplified and classified, then a work order containing hot-line keywords is retrieved by adopting a Hash index method matched with multiple keywords, an improved TextRank algorithm is adopted to extract an abstract of the retrieved work order, the most key information in the work order is fed back, and workers of the hot-line can timely know the daily complaint hot-line and public opinion trends and report the complaints to relevant departments.

The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows:

a city leader hot line public opinion decision support method based on natural language processing comprises the following steps:

(1) mining daily hot events of the city chief hot line based on natural language processing;

(2) simplifying and classifying the hot events based on the cosine similarity;

(3) a Hash index method matched with multiple keywords is adopted to retrieve a complaint work order containing the keywords of the hot event;

(4) extracting the abstracts of the retrieved complaint worksheets by adopting an improved TextRank algorithm;

(5) and the hot line staff knows the daily public opinion hotspot events according to the summary report and reports the public opinion hotspot events to related departments.

Further, the step (1) specifically includes the steps of:

(1.1) carrying out word segmentation and keyword extraction on the complaint work order, wherein the keyword extraction method is a TF-IDF algorithm;

(1.2) constructing a keyword FpTree;

and (1.3) mining a keyword frequent item set based on the keyword FPTree.

Further, the step (1.2) of constructing the keyword FpTree includes the steps of:

1) setting minimum absolute support, scanning data records, generating a first-level frequent item set of keywords, and sequencing according to the occurrence times from more to less;

2) and scanning the data records again, and sequencing the first-level frequent item sets of the keywords generated in the step 1) appearing in each record according to the sequence of the step 1).

Further, the step (1.3) of mining the keyword frequent item set based on the keyword FPTree includes the steps of:

1) constructing a condition mode base, wherein the condition mode base is a prefix path of an item set to be mined;

2) constructing a condition FPTree;

3) and recursively mining on the condition FPTree.

Further, the step (2) specifically includes the steps of:

(2.1) setting a plurality of hot spot classification marks, wherein each hot spot classification mark comprises a plurality of keywords;

(2.2) calculating cosine similarity between all keywords of the hot spot classification marks and the frequent item set of the keywords;

and (2.3) finding the hot spot classification mark with the largest cosine similarity with the keyword frequent item set in the hot spot classification marks, and marking the hot spot classification mark on the keyword frequent item set.

Further, the step (2.2) of calculating the cosine similarity comprises the steps of:

1) processing the keyword frequent item set into One-Hot codes;

One-Hot encoding is the representation of classifying variables as binary vectors, mapping the classified values to integer values, and then representing each integer value as a binary vector;

2) and performing cosine similarity calculation on the keyword frequent item set of the One-Hot coding.

Further, in the step (3),

preprocessing a text database of the captain hot-line complaint work order to obtain a hash table with keywords corresponding to the work order number, and then searching the work order by using the only main key of the work order through multi-keyword search.

Further, the step (4) specifically includes the steps of:

(4.1) processing the work order content into a text containing a plurality of sentences, and converting the sentences into sentence vectors which can be understood by a machine;

(4.2) calculating cosine similarity between sentence vectors to obtain a similarity matrix as edge weight; adopting TF-IDF score as an initial weight value;

(4.3) carrying out TextRank iteration, and calculating the TextRank value of each sentence to obtain a sentence rank; and extracting the automatic abstracts according to the sentence ranking.

A system for supporting the hot-line public opinion decision of the market leader based on natural language processing comprises a basic layer, a data layer, a supporting layer, an application layer, a service layer and a user layer;

the basic layer is a hardware setting for project implementation, and comprises a computer room and a network environment;

the data layer comprises a basic library and an intelligent library; the basic library is primary data which is original work order text information; the intelligent database is secondary data and is a processed database;

the support layer is an algorithm and application service, and comprises FP-Growth, hot spot problem classification, information retrieval and automatic abstract extraction;

the application layer is a specific application service, and comprises intelligent public opinion supervision, intelligent civil perception and intelligent decision support;

the service layer comprises a web end and a mobile end;

the user layer comprises leaders, service personnel and operation and maintenance personnel.

Has the advantages that: according to the invention, based on natural language processing, daily hot spot data of a hotline is mined, hot spots are simplified and classified, then a Hash index method capable of matching multiple keywords is adopted to retrieve a work order containing the hot spot keywords, an improved TextRank algorithm is adopted to extract an abstract of the retrieved work order, the most key information in the work order is fed back, and workers of the hotline can timely know the daily complaint hot spot and public opinion direction and report the complaints to relevant departments.

The invention adopts natural language processing technology to develop a set of public opinion decision support system for the hot line of the chief in the city, the system can automatically dig public opinion hotspots every day through a machine learning algorithm, and set a threshold value to carry out public opinion alarm, so as to guide workers to make decision support. The automatic abstract extraction method is adopted to help workers to master the most core and important problems from a large number of related work orders, and the public opinion daily newspaper, weekly newspaper and monthly newspaper are conveniently written and pushed to related departments by self.

The decision support system can display the time of searching the group event from the massive complaint work orders by the staff from the original 2 days to the current real time, greatly improves the working efficiency of the staff and improves the processing speed of public problems. After the system is deployed, the chief hot-line staff can monitor public opinion information in real time, the working efficiency of the staff is greatly improved, a series of civil problems such as cell management confusion, WeChat platform fraud, market activity disturbance to residents and the like are found in time through the system, and active countermeasures are taken.

Drawings

Fig. 1 is a flowchart of a city leader hot line public opinion decision support method based on natural language processing according to the present invention;

FIG. 2 is a diagram of a keyword FpTree construction process;

FIG. 3 is a diagram of Hash index storing work order keyword information;

FIG. 4 is a schematic diagram of the embedding layers of the BERT model;

FIG. 5 is a modified TextRank flow diagram;

fig. 6 is a block diagram of the system for supporting a crime hot line public opinion decision based on natural language processing according to the present invention.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, the method for supporting a crime hot-line public opinion decision based on natural language processing according to the present invention includes the steps of:

(1) mining hot events of the city leader every day based on natural language processing;

the research object of the invention is a complaint work order of the civic hot line, the keywords of the complaint work order are regarded as an item set, and the hot events of the civic hot line every day, namely frequent item set mining, are mined.

The frequent item set mining comprises the following steps:

(1.1) the storage form of the complaint work order is a Chinese sentence, so that the content of the complaint work order needs to be preprocessed, word segmentation and keyword extraction are carried out on the complaint work order, and the keyword extraction method is a TF-IDF algorithm;

TF-IDF is a statistical method used to evaluate the importance of documents to the corpus.

The TF-IDF calculation formula is as follows:

wherein the content of the first and second substances,N _i,jrepresenting keywordsiOn-duty orderjOf the number of occurrences, sigma denotesSome key words in the work orderjThe number of occurrences in (c).

Wherein D represents the total number of work orders,card({j|i∈d _i})representation containing keywordsjThe number of workers in the same group.

The TF-IDF value for each keyword is:

TF - IDF=TF×IDF

taking a data set as an example, the data set is shown in table 1.

TABLE 1

OID	Keyword Set
		O ₁	{k ₁, k ₂}
O ₂	{k ₂, k ₃, k ₄, k ₅}
		O ₃	{k ₁, k ₃, k ₄, k ₆}
O ₄	{k ₂, k ₁, k ₃, k ₄}
		O ₅	{k ₂, k ₁, k ₃, k ₆}

In table 1, OID represents the complaint work order ID, and Keyword Set is a Set of keywords extracted from the work order by the TF-IDF algorithm.

(1.2) constructing a keyword FpTree;

the method applies the FPGrowth algorithm to the field of natural language processing, and excavates hot events of the day. The FPGrowth algorithm introduces data structure storage data, and mainly comprises an item head table, an FPTree and a node linked list; I/O operations can be reduced and efficiency improved.

FpTree is a tree structure defined as follows:

FpTree node data structure

FpNode {

idName// id number

List < FpNode > child node;/child node

Fpnode parent// parent node

FpNode next// the next node with the same id number

count;/number of occurrences

}

As shown in fig. 2, the keyword FpTree is constructed, including the steps of:

step 1: assuming that the minimum absolute support is 3, scanning the data records to generate a first-level frequent item set of keywords, and sorting the items according to the occurrence times from high to low, as shown in table 2:

TABLE 2

Keyword	Count
		k ₁	4
k ₂	4
		k ₃	4
k ₄	3

As can be seen,k ₅andk ₆are not shown in Table 2 becausek ₅Only 2 times of the occurrence of the disease occur,k ₆the occurrence is only 1 time and is less than the minimum support degree, so the method is not a frequent item set, and according to the Apriori theorem, a superset of the non-frequent item set is not a frequent item set, so the method does not need to be considered again.

Step 2: the data records are scanned again, with the entries in each record appearing in the table generated at Step 1 sorted in the order in the table. Initially, a root node is newly built and marked as null;

1) first recordk ₁, k ₂F, filtering and sorting according to Step 1 table to obtain said leaf opening stillk ₁, k ₂Great, newly building a node, idName being a greatk ₁Inserting it under the root node, setting count to 1, and then creating a new retaining curlk ₂A junction inserted into a retaining openingk ₁Below the junction, the insertion is as shown in (a) of fig. 2.

2) Second recordk ₂, k ₃, k ₄, k ₅Is a great face after being filtered and sequencedk ₂, k ₃, k ₄Finding that the root node does not containk ₂Child of (having a great curl)k ₂Grandson but not son), a chinese curl is createdk ₂A node is inserted below the root node, so that the root node has two children, and then a retaining pocket is newly createdk ₃Inserting the node into a retaining pocketk ₂A new retaining opening below the knot pointk ₄Inserting the node into a retaining pocketk ₃Next, the insertion is as shown in fig. 2 (b).

3) The third recordk ₁, k ₃, k ₄, k ₆Is a great face after being filtered and sequencedk ₁, k ₃, k ₄Great, at this time, a son to a root node is foundk ₁At great opening size, therefore, no new node is needed, only the original one is neededk ₁Adding 1 to the count of the node, finding a specific curl downwardsk ₁Front node has a sonk ₃} then new holderk ₃A junction point inserted into a retaining pocketk ₁Below the junction point, a new retaining opening is createdk ₄Inserting junction into a retaining openingk ₃Behind the junction, the insertion is as shown in fig. 2 (c).

4) The fourth recordk ₂, k ₁, k ₃, k ₄Is a great face after being filtered and sequencedk ₁, k ₂, k ₃, k ₄Great, at this time, a son to a root node is foundk ₁At great opening size, therefore, no new node is needed, only the original one is neededk ₁Adding 1 to the count of the node, finding a specific curl downwardsk ₁Front node has a sonk ₂At great opening, no new retaining opening is requiredk ₂The front map is a Chinese map by using a front mapk ₂Count plus 1 of a junction point, since this archk ₂The node has no son, at this moment, a new retaining opening is neededk ₃A junction inserted into a retaining openingk ₂Below the junction point, a new retaining opening is createdk ₄A junction inserted into a retaining openingk ₃Below the junction, the insertion is as shown in (d) of fig. 2.

5) The fifth recordk ₂, k ₁, k ₃, k ₆A great face after being filtered and sequencedk ₁, k ₂, k ₃Checking to find the root node hask ₁Front child, front openingk ₁Said node has a great facek ₂Front child, front openingk ₂Said node has a great facek ₃And e, inserting the son only by updating the count without newly building a node, wherein the inserting is shown as (e) in fig. 2.

6) According to the above steps, we have basically constructed an fptree (frequency Pattern tree), where each path in the tree represents an item set, because many item sets have common items, and items appearing more frequently are more likely to be common items, so that space can be saved in the order of appearance times from more to less, compressed storage is realized, and in addition, we need a table header and a clue for each node with the same idName, as shown in (f) in fig. 2.

(1.3) mining a keyword frequent item set based on the keyword FPTree;

the FPTree digging process is as follows, digging is started from a frequent pattern with the length of 1, and the digging process can be divided into 3 steps:

1) constructing a Conditional Pattern Base (CPB), wherein the CPB is a prefix path of an item set to be mined;

2) then constructing a Conditional FPTree (Conditional FP-tree) of the FPTree;

3) and recursively mining on the condition FPTree.

A keyword frequent item set mining algorithm:

procedure FP_growth(Tree, α){

if Tree contains single path P

Each combination of nodes in the for path P (denoted as beta)

Generating a pattern β &

}

else {

front ai at the head of Tree

A pattern β = ai £ u |, with a support = ai

Construction of the conditional mode base of β followed by construction of the conditional FP Tree β of β

if Treeβ ≠ ∅ then

Call FP _ growth (Tree beta, beta)

}

(2) Simplifying and classifying the hot events;

based on the mined keyword frequent item set, since some keyword frequent item sets represent a kind of hot problem, such as:

{keywordA,keywordB,keywordC}

{keywordA,keywordB,keywordC,keywordD}

all belong to a frequent keyword set, but the similarity is high, and all represent a class of hot problems, so that hot events need to be classified.

The most common method for evaluating word vector similarity is cosine similarity, and the method is suitable for similarity calculation among the keyword frequent item sets. The cosine similarity calculation formula of the word vector is as follows:

the steps of calculating the similarity between a pair of keyword frequent item sets are as follows: firstly, processing the key word frequent item set pair into One-Hot code. One-Hot encoding, also known as One-bit-efficient encoding, mainly uses an N-bit state register to encode N states, each state having an independent register bit and only One bit being active at any time. One-Hot encoding is the representation of classification variables as binary vectors. This first requires mapping the classification values to integer values. Each integer value is then represented as a binary vector, which is a zero value, except for the index of the integer, which is marked as 1. And performing cosine similarity calculation on the keyword frequent item set of the One-Hot coding.

Frequent itemsets by keywordsA={keywordA,keywordB,keywordCAnd frequent keyword itemsetB={keywordA,keywordB,keywordC,keywordDAs an example, the method can be used,Ais coded as [1,1,1,0 ]]，BIs coded as [1,1,1,1 ] 1]Therefore, it iskeywordsetAAndkeywordsetBthe cosine similarity of (a) is:

the cosine value is close to 1, the more similar the two vectors are, the cosine value is close to 0, the more dissimilar the two vectors are, the more frequent item set of the visible keywordsAAndBor more similar.

The hot event classification based on cosine similarity, because it is not clear in advance that several hot events will occur in the day when the hot event mining of complaints of each day is carried out, the invention designs a simpler and more effective processing method of a frequent keyword item set, which specifically comprises the following steps:

The pseudo-code of the algorithm is as follows:

a keyword frequent item set classification algorithm:

inputting: keyword frequent itemset

And (3) outputting: keyword frequent item set with classification label

Hotlist is a list of hot spot classification flags

hotlist.add(keywordset (1))

keywordset (keywordset (1))

Foreach item in keywordset

The cosine similarity of all keyword frequent item sets and items in If hollist is less than threshold ∂

hotlist.add(item)

else

Finding the hot spot subscript with the maximum cosine similarity to the item in the hot list, and marking a subscript label on the item.

(3) A Hash index method capable of matching with multiple keywords is adopted to retrieve a work order containing the keywords of the hot event;

the method is mainly characterized in that the public opinion hotspot event mining by texts is based on a keyword frequent item set, and specific work order content needs to be matched and searched according to multiple keywords.

For a conventional relational database, a large amount of time is consumed for searching keywords in mass data, and due to the appearance of an indexing technology, full-text indexing is added to text data, so that the searching efficiency can be greatly improved.

In the invention, multi-keyword retrieval is adopted, a hash design thought is used for reference, a hot-line complaint work order text database of the captain is preprocessed into a hash table with keywords corresponding to work order numbers, and the work order is retrieved by using the only main key of the work order, so that the retrieval efficiency can be greatly improved.

The hash table is also a hash table, and is directly improved by addressing. In a Hash mode, one elementkAt the position ofh(k)In, i.e. using a hash functionhAccording to the keywordskThe position of the slot is calculated. Function(s)hMapping key fields to hash tablesT[0...m-1]At the slot position. Hash functionhIt is possible to map two different keys to the same location, called a conflict, which is typically resolved in the database using a chaining method. In the chaining method, elements that hash to the same slot are placed in a linked list, as shown in FIG. 3.

The storage structure takes the form of triples, which are the hash function value, the set of id numbers of the work order, and the pointer used to resolve the conflict, respectively.

For example, the set of frequent key wordsk ₁, k ₂Find the hash function valueh(k ₁ )The corresponding work order id set isid ₁, id ₂}，h(k ₂ )The corresponding work order id set isid ₂, id ₃Will contain the keywords at the same timek ₁Andk ₂the work order is

{id ₁, id ₂}∩{id ₂, id ₃}={id ₂}。

(4) Extracting the abstracts of the retrieved work orders by adopting an improved TextRank algorithm, and feeding back the most key information in the work orders;

after the hot event work order is retrieved, automatic abstract extraction is realized on the content of the work order, so that the important direction of the hot event can be effectively mastered, and intelligent decision is realized.

Automatic abstract extraction is an important branch in the field of natural language processing, the current mainstream is text automatic abstract based on a graph model, and the most representative is a TextRank algorithm.

The idea of the TextRank algorithm is that each sentence is given a positive real number to represent the importance degree of the sentence, and the higher the TextRank value is, the more important the sentence is represented, and the more likely the sentence is ranked in the automatic abstract extraction ordering.

Assuming that a text containing several sentences is a directed graph, nodes are sentences, each edge is a transition probability, the transition probability is the similarity between 2 sentences, the sentence jumps to the next sentence with the transition probability, and such random jumps are continuously performed between sentences, and the process forms a first-order Markov chain. After continuous jumping, the Markov chain forms a smooth distribution, the TextRank is the smooth distribution, and the TextRank value of each sentence is the smooth probability.

The formula for TextRank is as follows:

wherein the content of the first and second substances,TR(V _i )indicating knotDotV _iThe rank value of (a) is determined,In(v _j )representation nodev _jThe set of predecessor nodes of (a),Out(v _j )representation nodev _jThe set of successor nodes of (1),nthe number of the sentences is expressed,dis the damping coefficient.

The most core part in the TextRank algorithm is the similarity calculation of edge weight in a graph, and the method for processing sentence similarity comprises the steps of converting sentences into sentence vectors which can be understood by a machine and then calculating the similarity of the sentences. Considering that the original TextRank algorithm is not ideal enough for the edge weight similarity calculation method and the node initial weight assignment processing in graph model construction, a TF-IDF score is used as an initial weight value of a node, an embedding layer of a BERT model is used, as shown in FIG. 4, a sentence is processed into a numeric 768-dimensional sentence vector, cosine similarity is used for calculating similarity between sentences as edge weights, and finally, TextRank iteration is performed, wherein the algorithm steps are shown in FIG. 5. The method specifically comprises the following steps:

(4.1) processing the work order content into a text containing a plurality of sentences; converting the sentence into a sentence vector which can be understood by a machine;

(4.3) carrying out iteration of the TextRank, and calculating the TextRank value of each sentence to obtain a sentence rank; and extracting the automatic abstracts according to the sentence ranking.

(5) The hot-line staff can timely know the public opinion hot events every day and report the public opinion hot events to related departments;

the work order data are stored by adopting MySql, the decision support system mainly utilizes the title and the complaint content of the work order, the data are input into a model center, a hot spot event is obtained through analysis, the hot spot event is stored in a database, the hot spot can be dragged in a self-service mode from a report center to form a daily report, a weekly report and a monthly report, a worker can generate a report template in a self-defined mode according to summary information searched by hot spot keywords and pushes the report template to relevant functional departments, and the functional departments perform rectification or tracking investigation aiming at the problem of public opinion reaction after obtaining the report.

As shown in fig. 6, the system for supporting a crime hot line public opinion decision based on natural language processing according to the present invention includes 6 layers: the system comprises a base layer, a data layer, a support layer, an application layer, a service layer and a user layer.

The base layer provides the hardware settings for the project implementation such as the computer room and the network environment.

The data layer is divided into a basic library and an intelligent library, wherein the basic library is a database in which primary data comprises original work order text information, geographical position information and the like; the intelligent library is secondary data and is a processed database such as a hot-line dictionary library, a statistical word library and the like.

The support layer serves an algorithm and an application service, and the algorithm used by the invention comprises FP-Growth, hot spot problem classification, information retrieval and automatic abstract extraction.

The application layer specifically provides application services including intelligent public opinion supervision, intelligent consumer perception, and intelligent decision support.

The service layer is divided into 2 terminals: a web side and a mobile side.

Claims

1. A city leader hot line public opinion decision support method based on natural language processing is characterized by comprising the following steps:

performing word segmentation and keyword extraction on the complaint work order to construct a keyword FpTree; constructing a condition mode base, wherein the condition mode base is a prefix path of an item set to be mined; constructing a condition FPTree, and recursively mining a frequent keyword item set, namely a hotspot event, on the condition FPTree;

(2) simplifying and classifying the hot events based on the cosine similarity;

processing the work order content into a text containing a plurality of sentences, and converting the sentences into sentence vectors which can be understood by a machine; calculating cosine similarity between sentence vectors to obtain a similarity matrix as edge weight; adopting TF-IDF score as an initial weight value; carrying out TextRank iteration, and calculating the TextRank value of each sentence to obtain a sentence rank; automatically extracting the abstract according to the sentence ranking;

(5) and the staff knows the daily hot events according to the summary report and reports the hot events to relevant departments.

2. The method for supporting a critique hot line public opinion decision based on natural language processing as claimed in claim 1, wherein in the step (1),

the extraction method of the key words is TF-IDF algorithm.

3. The method for supporting a civil hot line public opinion decision based on natural language processing as claimed in claim 1, wherein in the step (1), the constructing keyword FpTree comprises the steps of:

(1.1) setting minimum absolute support, scanning data records, generating a first-level frequent item set of the keywords, and sequencing the first-level frequent item set according to the occurrence times from high to low;

(1.2) scanning the data records again, and sorting the keyword primary frequent item sets generated in the step (1.1) in each record according to the sequence of the step (1.1).

4. The method for supporting a critique hot-line public opinion decision based on natural language processing as claimed in claim 1, wherein the step (2) specifically comprises the steps of:

(2.2) calculating cosine similarity between all keywords of the hot spot classification mark and the hot spot event;

and (2.3) finding the hot spot classification mark with the maximum cosine similarity to the hot spot event in the hot spot classification marks, and marking the hot spot classification mark on the hot spot event.

5. The method for supporting a crime hot-line public opinion decision based on natural language processing according to claim 4, wherein the cosine similarity calculation comprises the steps of:

1) processing the keyword frequent item set into One-Hot codes;

6. The method for supporting a critique hot line public opinion decision based on natural language processing as claimed in claim 1, wherein in the step (3),

7. A city leader hot line public opinion decision support system based on natural language processing, which adopts the city leader hot line public opinion decision support method based on natural language processing according to any one of claims 1 to 6, characterized by comprising a base layer, a data layer, a support layer, an application layer, a service layer and a user layer;

the base layer comprises a machine room and a network environment;

the application layer comprises intelligent public opinion supervision, intelligent folk perception and intelligent decision support;

the service layer comprises a web end and a mobile end;

the user layer comprises government leaders, service personnel and operation and maintenance personnel.