CN109933656B - Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium - Google Patents

Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium Download PDF

Info

Publication number
CN109933656B
CN109933656B CN201910199451.5A CN201910199451A CN109933656B CN 109933656 B CN109933656 B CN 109933656B CN 201910199451 A CN201910199451 A CN 201910199451A CN 109933656 B CN109933656 B CN 109933656B
Authority
CN
China
Prior art keywords
emotion
public opinion
data
dictionary
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910199451.5A
Other languages
Chinese (zh)
Other versions
CN109933656A (en
Inventor
耿伟
谷国栋
周起如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sunwin Intelligent Co Ltd
Original Assignee
Shenzhen Sunwin Intelligent Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sunwin Intelligent Co Ltd filed Critical Shenzhen Sunwin Intelligent Co Ltd
Priority to CN201910199451.5A priority Critical patent/CN109933656B/en
Priority to PCT/CN2019/089224 priority patent/WO2020186627A1/en
Publication of CN109933656A publication Critical patent/CN109933656A/en
Application granted granted Critical
Publication of CN109933656B publication Critical patent/CN109933656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a public opinion polarity prediction method, a public opinion polarity prediction device, computer equipment and a storage medium, wherein the method comprises the steps of obtaining public opinion data; the AC automaton based on the double-array dictionary tree performs emotion feature information extraction on the data to be analyzed to obtain feature data; performing polarity prediction on the characteristic data through a public opinion polarity prediction model to obtain a prediction result; and outputting the prediction result. According to the invention, the emotion dictionary is constructed through the storage structure of the double-array dictionary tree, so that the number of times of disk IO reading and writing and the occupied physical storage space are reduced, the AC automaton based on the double-array dictionary tree is utilized to extract emotion characteristic information from public opinion data in the emotion dictionary, characters are compared and converted into state transition, backtracking is not needed when the data to be analyzed is scanned, the problem of multiple rollback scanning is avoided, the characteristic data is subjected to polarity prediction through the public opinion polarity prediction model, and the efficiency and the accuracy of public opinion polarity prediction analysis are effectively improved.

Description

Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium
Technical Field
The present invention relates to an information processing method, and more particularly, to a public opinion polarity prediction method, apparatus, computer device, and storage medium.
Background
With the rapid development of applications such as WeChat and microblog, more and more netizens express views through the Internet. The integration of network information and social information has an increasing impact on society, even with regard to national information security and long-lasting security. Because the information quantity on the Internet is very huge, a large amount of public opinion data cannot be processed by means of a manual method, and the public opinion information needs to be automatically monitored and analyzed by means of emotion polarity analysis technology in order to comprehensively and completely acquire the overall situation of the public opinion.
The existing public opinion analysis application system generally adopts a keyword analysis method, which has low efficiency and low accuracy. Based on traditional Chinese word segmentation, the text is required to be scanned back for multiple times for pattern matching, and the performance efficiency is low; the existing public opinion analysis application system adopts a coarser statistical method to calculate emotion polarity, and the accuracy is not high due to the limitation of characteristic information and the influence of context; the public opinion emotion dictionary occupies a relatively large storage space, and brings about performance loss.
Therefore, a new method is needed to solve the problems of low speed, low accuracy of polarity prediction and high performance loss of Chinese word segmentation.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a public opinion polarity prediction method, a public opinion polarity prediction device, computer equipment and a storage medium.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the public opinion polarity prediction method comprises the following steps:
obtaining public opinion data;
the AC automaton based on the double-array dictionary tree performs emotion feature information extraction on the data to be analyzed to obtain feature data;
performing polarity prediction on the characteristic data through a public opinion polarity prediction model to obtain a prediction result;
and outputting the prediction result.
The further technical scheme is as follows: the AC automaton based on the double-array dictionary tree is a multimode matching algorithm for extracting emotion characteristic information based on emotion dictionary to-be-analyzed data, and the emotion dictionary is constructed based on the double-array dictionary tree.
The further technical scheme is as follows: the AC automaton based on the double-array dictionary tree performs emotion characteristic information extraction on data to be analyzed to obtain characteristic data, and the method comprises the following steps:
performing pattern matching on the data to be analyzed by using an AC automaton based on the double-array dictionary tree to obtain an output result;
and extracting emotion characteristic information of the output result to obtain characteristic data.
The further technical scheme is as follows: the performing pattern matching on the AC automaton based on the double-array dictionary tree to obtain an output result comprises the following steps:
splitting the data to be analyzed into a plurality of characters;
searching an emotion dictionary according to the characters;
judging whether the characters are matched;
if so, outputting the matched characters to a set to form an output result;
judging whether the current character is the last character or not;
if yes, carrying out emotion feature information extraction on the output result to obtain feature data;
if not, acquiring the next character;
returning to the emotion dictionary searching according to the characters;
if the characters are not matched, turning to the character pointed by the failure function;
judging whether the character pointed by the failure function is empty or not;
if not, outputting the character pointed by the failure function to a set to form an output result;
returning to the judgment of whether the current character is the last character;
if yes, go to the end step.
The further technical scheme is as follows: the extracting the emotion feature information of the output result to obtain feature data includes:
dividing an output result into a plurality of atomic words;
Establishing an adjacency list for storing the array graph;
determining the position of the atomic word by using the offset of the atomic word;
adding the atomic words to the corresponding positions of the arrays in the adjacency list;
calculating the distance between the atomic words of two nodes in the array based on a Viterbi algorithm;
scoring the whole array diagram stored in the adjacency list;
and adding the atomic words, the positions and the attribute information with the shortest distance into a set emotion characteristic data set to form characteristic data.
The further technical scheme is as follows: and performing polarity prediction on the feature data through a public opinion polarity prediction model to obtain a prediction result, wherein the public opinion polarity prediction model is a model obtained by inputting an emotion feature data set extracted through an emotion dictionary into an XGBoost model to obtain classification features and inputting the classification features into a logistic regression model to train.
The further technical scheme is as follows: the public opinion polarity prediction model is obtained by inputting an emotion feature data set extracted by an emotion dictionary into an XGBoost model to obtain classification features, inputting the classification features into a logistic regression model to train, and comprises the following steps:
constructing a decision tree according to the emotion feature data set extracted by the emotion dictionary;
Inputting the decision tree into the XGBoost model to obtain a residual error actually output by the XGBoost model and the emotion feature data set extracted by the emotion dictionary;
constructing a new decision tree according to the residual error;
iterating the decision tree by using a new decision tree to obtain emotion characteristic information combinations;
inputting the emotion characteristic information combination into a logistic regression model, and training the logistic regression model;
and performing model persistence processing on the trained logistic regression model to obtain a public opinion polarity prediction model.
The invention also provides a public opinion polarity prediction device, which comprises:
the public opinion data acquisition unit is used for acquiring public opinion data;
the extraction unit is used for extracting emotion characteristic information of the data to be analyzed based on the AC automaton of the double-array dictionary tree so as to obtain characteristic data;
the prediction unit is used for carrying out polarity prediction on the characteristic data through a public opinion polarity prediction model so as to obtain a prediction result;
and the output unit is used for outputting the prediction result.
The invention also provides a computer device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the method when executing the computer program.
The present invention also provides a storage medium storing a computer program which, when executed by a processor, performs the above-described method.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the emotion dictionary is constructed through the storage structure of the double-array dictionary tree, so that the number of times of disk IO reading and writing and the occupied physical storage space are reduced, the AC automaton based on the double-array dictionary tree is utilized to extract emotion characteristic information from public opinion data in the emotion dictionary, characters are compared and converted into state transition, backtracking is not needed when the data to be analyzed is scanned, the problem of multiple rollback scanning is avoided, the characteristic data is subjected to polarity prediction through the public opinion polarity prediction model, and the efficiency and the accuracy of public opinion polarity prediction analysis are effectively improved.
The invention is further described below with reference to the drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of a public opinion polarity prediction method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a public opinion polarity prediction method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a public opinion polarity prediction method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a public opinion polarity prediction method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of a public opinion polarity prediction method according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart of a public opinion polarity prediction method according to an embodiment of the present invention;
FIG. 7 is a state transition diagram provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of a failure function provided by an embodiment of the present invention;
FIG. 9 is a schematic diagram of public opinion polarity prediction according to an embodiment of the present invention;
FIG. 10 is a schematic block diagram of a public opinion polarity prediction apparatus according to an embodiment of the present invention;
fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of an application scenario of a public opinion polarity prediction method according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of a public opinion polarity prediction method according to an embodiment of the present invention. The public opinion polarity prediction method is applied to a server. And the server adopts preprocessing operation, AC automaton analysis based on double-group dictionary trees and prediction of a public opinion polarity prediction model according to the crawled target public opinion website content to obtain a public opinion polarity result, and outputs the public opinion polarity result to a terminal for display.
Fig. 2 is a flowchart of a public opinion polarity prediction method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S130.
S110, public opinion data is obtained.
In this embodiment, the public opinion data refers to data representing the emotion of the reviewer.
In an embodiment, the step S110 may include the following steps:
crawling target public opinion website content;
in this embodiment, the target public opinion website content refers to content derived from a web site. And crawling the target public opinion website content by adopting a crawler technology.
And preprocessing, webpage analysis and denoising are carried out on the target public opinion website content so as to obtain public opinion data.
In this embodiment, the target public opinion website content needs to be initially processed to obtain public opinion data, and unnecessary data is removed.
S120, extracting emotion characteristic information of the data to be analyzed based on the AC automaton of the double-array dictionary tree to obtain characteristic data.
In this embodiment, the AC automaton based on the double-array dictionary tree is a multimode matching algorithm for extracting emotion feature information based on an emotion dictionary for data to be analyzed.
The emotion dictionary is built based on a double-array dictionary tree.
In this embodiment, the emotion dictionary refers to a set of all emotional-colored word components.
Based on the dictionary storage structure of the double-array dictionary tree, firstly, determining the state of words and a steering function, calculating a failure function, and finishing the calculation of an output function in a penetrating way, wherein the double-array dictionary tree is a compressed dictionary tree, and the whole tree is represented by using two one-dimensional arrays BASE and CHECK.
For example, to construct an emotion dictionary consisting of { national chinese national team, national team }, a state transition diagram needs to be constructed in order to construct a steering function. Firstly, the state transition diagram only comprises a starting state 0, each keyword p is sequentially input into the diagram by adding a path from the starting state, new vertexes and edges are added into the diagram, a path capable of spelling the keyword p is finally generated, and in order to complete the construction of the steering function, a loop from the state 0 to the state 0 is added to each character except the starting character, so that the state transition diagram shown in the following fig. 7 is obtained, and the diagram represents the steering function.
The failure function is established according to the steering function, firstly, the failure function values of the states with the depth of 1 are calculated, the states with the depth of 2 are calculated, and the like until the failure function values of all the states except the state 0 are calculated, and the depth of the state 0 is not defined, so that the corresponding state values when i=1, 2,3,4,5,6,7,8 and 9 are obtained as 0,0,0,1,2,0,3,0,3; the resulting failure function is as described in fig. 8.
In addition, when the AC automaton is operated for the first time, the emotion dictionary is required to be loaded into the memory, a single design mode is used for designing a model object of the emotion dictionary of the AC automaton, a model after persistence is loaded into the memory when the AC automaton is operated for the first time, and then operations such as compiling and loading are not required to be executed after each call, so that once compiling and loading are realized, multiple times of operation are realized, the high-efficiency characteristic of memory access is fully utilized, and the efficiency of emotion characteristic information extraction is improved. And the double-array dictionary tree is used for compressing the storage space, and the storage compression is used for reducing the IO read-write times of the disk and the occupied storage space so as to improve the efficiency of memory access.
Feature data refers to data with affective feature information, i.e., words representing the reviewer's emotions.
In one embodiment, referring to fig. 3, the step S120 may include steps S121 to S122.
S121, performing pattern matching on data to be analyzed by using an AC automaton based on a double-array dictionary tree to obtain an output result;
the output result refers to a word set matched with the emotion words.
In one embodiment, referring to fig. 4, the step S121 may include steps S121a to S121i.
S121a, splitting the data to be analyzed into a plurality of characters;
s121b, searching an emotion dictionary according to the characters.
The character is searched in the emotion dictionary, and the emotion dictionary is constructed by a steering function and a failure function, so that when the AC automaton extracts emotion characteristic information, the character is skillfully compared and converted into state transition so as to carry out matching processing of the character and the emotion dictionary, backtracking is not needed at all when data to be analyzed are scanned, and the problem of multiple rollback scanning is avoided.
S121c, judging whether the characters are matched;
and S121d, if the characters are matched, outputting the matched characters to the set to form an output result.
When the characters are matched, the output function of the emotion dictionary is not empty, the AC automaton outputs a matching mode, and the matched characters are output to a set to form an output result.
S121e, judging whether the current character is the last character or not;
if yes, go to step S122;
s121f, if not, acquiring the next character;
returning to the step S121b;
s121g, if the characters are not matched, turning to the character pointed by the failure function.
When the current character is not matched, the current character is indicated to be invalid, and the AC automaton turns to the character pointed by the invalid function.
S121h, judging whether the character pointed by the failure function is empty or not;
and S121i, if not, outputting the character pointed by the failure function to a set to form an output result.
And outputting the character pointed by the failure function to the set to form an output result when the character pointed by the failure function is not null.
Returning to the step S121e;
if yes, go to the end step.
And circulating the steps, wherein all characters in the data to be analyzed are matched to obtain a complete output result.
S122, extracting emotion characteristic information of the output result to obtain characteristic data.
The emotion dictionary provides priori knowledge of emotion of a word, and represents emotion polarity, strength and other information of the word in most contexts. And extracting emotion characteristic information based on the emotion dictionary, extracting valuable emotion information from the public opinion text, and converting unstructured text without a rule into structured characteristic information which can be understood and identified by a computer. Finally, the obtained emotion characteristic information, namely characteristic data representation format: { emotion words, part of speech, position in sentence, emotion tendency, emotion intensity }.
In one embodiment, referring to fig. 5, the step S122 may include steps S1221 to S1227.
S1221, dividing the output result into a plurality of atomic words.
Atomic words refer to words of minimum unit. The AC automaton-based implementation breaks down a sentence into all possible atomic words.
S1222, establishing an adjacency list for storing the array chart.
An adjacency list is used to store the graph.
S1223, determining the position of the atomic word by utilizing the offset of the atomic word;
s1224, adding the atomic words into the corresponding positions of the arrays in the adjacency list;
the offset of each atomic term is used to determine where it is, and the atomic term is added to the adjacency list array term [ offset ].
S1225, calculating the distance between the atomic word frequencies of two nodes in the array based on a Viterbi algorithm;
s1226, scoring the whole array diagram stored in the adjacent table;
the distance between the atomic words term of two nodes is calculated based on the Viterbi algorithm, a distance is allocated for each node, the distance represents the length of the cumulative shortest path from the root node to the current node, then the whole graph is traversed by depth priority to make scoring, and the distance from the root node to the current node is added for each scoring.
S1227, adding the atomic words, the positions and the attribute information with the shortest distance into the set emotion feature data set to form feature data.
And adding the emotion words, the position, the attribute and other information on the shortest path into the emotion feature data set. In this embodiment, attribute information refers to information such as part of speech, position in sentence, emotion tendency, emotion intensity, and the like.
S130, performing polarity prediction on the feature data through a public opinion polarity prediction model to obtain a prediction result;
in this embodiment, the prediction result refers to the polarity value of the public opinion data. The public opinion polarity prediction model is obtained by inputting an emotion feature data set extracted by an emotion dictionary into an XGBoost model to obtain classification features, and inputting the classification features into a logistic regression model to train.
The input feature data utilizes the XGBoost model to construct new features, the constructed new feature vector is valued 0/1, and each element of the vector corresponds to a leaf node of a tree in the XGBoost model. When a sample point finally falls on a leaf node of the tree through a certain tree, the element value corresponding to the leaf node in the new feature vector is 1, the element values corresponding to other leaf nodes of the tree are 0, the length of the new feature vector is equal to the sum of the leaf node numbers contained in all the trees in the XGBoost model, and finally the new features are added into the original feature training model to obtain the public opinion polarity prediction model. The output of each individual tree is considered as the classification input feature of the sparse linear classifier, as shown in fig. 9, the input splits into two trees, the upper tree has two leaf nodes, the lower tree has three leaf nodes, and the final feature is a five-dimensional vector. For input x, the second node of the upper tree encodes [0,1], assuming that he falls on the first node of the lower tree, encodes [1, 0], falls on the final code, therefore [0,1,1,0,0], and inputs the code as an input feature of the prediction model into the logistic regression model for prediction.
In an embodiment, referring to fig. 6, the above-mentioned public opinion polarity prediction model is a model obtained by inputting emotion feature data set extracted by emotion dictionary into XGBoost model to obtain classification features, and inputting the classification features into logistic regression model for training, and includes steps S131-S136.
S131, constructing a decision tree according to the emotion feature data set extracted by the emotion dictionary;
s132, inputting the decision tree into the XGBoost model to obtain a residual error actually output by the XGBoost model and the emotion feature data set extracted by the emotion dictionary.
S133, constructing a new decision tree according to the residual error;
s134, iterating the decision tree by using the new decision tree to obtain the emotion characteristic information combination.
The XGBoost (extreme gradient lifting, eXtreme Gradient Boosting) model is a tool of a massive parallel boost tree, is the fastest and best open source boost tree tool kit at present, and is a plurality of CART regression tree integration.
And constructing a decision tree on the residual errors output by the existing model and the actual sample, and continuously iterating. Each iteration generates a classification characteristic with larger gain, and a plurality of distinguished emotion characteristic information combinations are obtained through a plurality of trees.
S135, inputting the emotion characteristic information combination into a logistic regression model, and training the logistic regression model;
s136, performing model persistence processing on the trained logistic regression model to obtain a public opinion polarity prediction model.
Combining the emotion feature information as input of a logistic regression model; a logistic regression model is trained and the model is persisted.
XGBoost is an efficient implementation of the GBDT algorithm, supports parallel processing, and uses CART regression trees by the base learner, and regularization terms relate to the number of leaf nodes of the tree and the values of the leaf nodes; XGBoost approximates the objective function according to the Taylor expansion, calculates the pseudo residual learning function FM (x), not only uses the first derivative but also uses the second derivative, and meanwhile, a regularization term is added in the model cost function for controlling the complexity of the model, so that the learned model is simpler.
Predicting the online public opinion text content by using a public opinion polarity prediction model to obtain a polarity result, and evaluating a final classification result by using an F-Score, wherein the method is defined as follows:
f-score= (2×precision×recall)/(precision+recall), where Precision represents accuracy and Recall represents Recall.
Precision = number of instances of a class correctly classified/public opinion polarity prediction model predicts total number of instances of a class
Recall = number of instances of a class correctly classified/total number of instances of a class in the test data.
S140, outputting the prediction result.
The predicted result output adopts json formatted character strings, and the output format is as follows: { "sendtrend": "front", "sendneg": 0.278 "," sendpos ":0.722}.
The comparison of the accuracy of different public opinion polarity prediction algorithms is shown in tables 1 and 2 by adopting 20w pieces of test of microblog data captured by a crawler.
TABLE 1 comparison of the extraction speed of characteristic data
Algorithm Dictionary size Extraction speed
IK word segmentation 35w 80w/s
Ansj word segmentation 35w 210w/s
Fnlp word segmentation 35w 120w/s
Double-array AC automaton 35w 1600w/s
TABLE 2 comparison of accuracy
Prediction algorithm Accuracy rate of F1
Keyword statistics method 0.703 0.633
Logistics algorithm 0.718 0.646
GBDT+lr algorithm 0.803 0.725
XGBoost+lr algorithm 0.812 0.736
According to the public opinion polarity prediction method, the emotion dictionary is constructed through the storage structure of the double-array dictionary tree, the number of times of disk IO reading and writing and the occupied physical storage space are reduced, the AC automaton based on the double-array dictionary tree is used for extracting emotion characteristic information from the public opinion data in the emotion dictionary, character comparison is converted into state transition, backtracking is not needed when the data to be analyzed are scanned, the problem of repeated rollback scanning is avoided, the characteristic data are subjected to polarity prediction through the public opinion polarity prediction model, and the efficiency and the accuracy of public opinion polarity prediction analysis are effectively improved.
Fig. 10 is a schematic block diagram of a public opinion polarity prediction apparatus according to an embodiment of the present invention. As shown in fig. 10, the present invention further provides a public opinion polarity prediction device corresponding to the above public opinion polarity prediction method. The public opinion polarity prediction apparatus includes a unit for performing the above public opinion polarity prediction method, and the apparatus may be configured in a server.
Specifically, referring to fig. 10, the public opinion polarity prediction apparatus includes:
a public opinion data acquisition unit 301, configured to acquire public opinion data;
the extracting unit 302 is configured to extract emotion feature information of data to be analyzed based on an AC automaton of the double-array dictionary tree, so as to obtain feature data;
a prediction unit 303, configured to perform polarity prediction on the feature data through a public opinion polarity prediction model, so as to obtain a prediction result;
and an output unit 304, configured to output the prediction result.
In one embodiment, the extracting unit 302 includes:
the matching subunit is used for carrying out pattern matching on the data to be analyzed by using the AC automaton based on the double-array dictionary tree so as to obtain an output result;
and the characteristic data forming subunit is used for extracting emotion characteristic information of the output result so as to obtain characteristic data.
In an embodiment, the matching subunit includes:
the splitting module is used for splitting the data to be analyzed into a plurality of characters;
the searching module is used for searching the emotion dictionary according to the characters;
the character judging module is used for judging whether the characters are matched or not;
the first output module is used for outputting the matched characters to the set if the characters are matched, so as to form an output result;
the last character judging module is used for judging whether the current character is the last character or not; if yes, carrying out emotion feature information extraction on the output result to obtain feature data;
the character acquisition module is used for acquiring the next character if not; returning to the emotion dictionary searching according to the characters;
the steering module is used for steering the character pointed by the failure function if the characters are not matched;
the pointing judgment module is used for judging whether the character pointed by the failure function is empty or not; if yes, entering an ending step;
the second output module is used for outputting the character pointed by the failure function to a set if not so as to form an output result; and returning to the judgment of whether the current character is the last character.
In an embodiment, the feature data forming subunit includes:
The dividing module is used for dividing the output result into a plurality of atomic words;
an adjacency list establishing module for establishing an adjacency list for storing the array graph;
the position determining module is used for determining the position of the atomic word by utilizing the offset of the atomic word;
the adding module is used for adding the atomic words to the corresponding positions of the arrays in the adjacency list;
the distance calculation module is used for calculating the distance between the atomic words of the two nodes in the array based on the Viterbi algorithm;
the scoring module is used for scoring the whole array diagram stored in the adjacency list;
and the integration module is used for adding the atomic words, the positions and the attribute information with the shortest distance into a set emotion characteristic data set to form characteristic data.
In an embodiment, the apparatus further includes:
the model training unit is used for inputting the emotion feature data set extracted by the emotion dictionary into the XGBoost model to obtain classification features, and inputting the classification features into the logistic regression model for training to obtain the public opinion polarity prediction model.
In an embodiment, the model training unit includes:
a first construction subunit, configured to construct a decision tree according to the emotion feature dataset extracted by the emotion dictionary;
The first input subunit is used for inputting the decision tree into the XGBoost model to obtain a residual error actually output by the XGBoost model and the emotion feature data set extracted by the emotion dictionary;
a second construction subunit, configured to construct a new decision tree according to the residual error;
the iteration subunit is used for iterating the decision tree by utilizing the new decision tree so as to obtain emotion characteristic information combination;
the combined input subunit is used for inputting the emotion characteristic information combination into a logistic regression model and training the logistic regression model;
and the processing subunit is used for performing model persistence processing on the trained logistic regression model so as to obtain a public opinion polarity prediction model.
It should be noted that, as those skilled in the art can clearly understand, the specific implementation process of the above public opinion polarity prediction device and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, the description is omitted herein.
The above-described public opinion polarity prediction apparatus may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 11.
Referring to fig. 11, fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a server.
With reference to FIG. 11, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a public opinion polarity prediction method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a public opinion polarity prediction method.
The network interface 505 is used for network communication with other devices. It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and does not constitute a limitation of the computer device 500 to which the present inventive arrangements may be applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to implement the steps of:
obtaining public opinion data;
the AC automaton based on the double-array dictionary tree performs emotion feature information extraction on the data to be analyzed to obtain feature data;
performing polarity prediction on the characteristic data through a public opinion polarity prediction model to obtain a prediction result;
and outputting the prediction result.
The AC automaton based on the double-array dictionary tree is a multimode matching algorithm for extracting emotion characteristic information based on emotion dictionary to-be-analyzed data, and the emotion dictionary is constructed based on the double-array dictionary tree.
In an embodiment, when implementing the step of extracting emotion feature information of the data to be analyzed by the AC automaton based on the dual-array dictionary tree to obtain feature data, the processor 502 specifically implements the following steps:
performing pattern matching on the data to be analyzed by using an AC automaton based on the double-array dictionary tree to obtain an output result;
and extracting emotion characteristic information of the output result to obtain characteristic data.
In one embodiment, when the processor 502 performs the step of performing pattern matching on the AC automaton based on the dual-array dictionary tree to obtain the output result, the following steps are specifically implemented:
Splitting the data to be analyzed into a plurality of characters;
searching an emotion dictionary according to the characters;
judging whether the characters are matched;
if so, outputting the matched characters to a set to form an output result;
judging whether the current character is the last character or not;
if yes, carrying out emotion feature information extraction on the output result to obtain feature data;
if not, acquiring the next character;
returning to the emotion dictionary searching according to the characters;
if the characters are not matched, turning to the character pointed by the failure function;
judging whether the character pointed by the failure function is empty or not;
if not, outputting the character pointed by the failure function to a set to form an output result;
returning to the judgment of whether the current character is the last character;
if yes, go to the end step.
And the public opinion polarity prediction model inputs the emotion feature data set extracted by the emotion dictionary into the XGBoost model to obtain classification features, and then inputs the classification features into a model obtained by training a logistic regression model.
In an embodiment, when the processor 502 inputs the classification feature into the logistic regression model to train after implementing the model step that the classification feature is obtained by inputting the emotion feature data set extracted by the emotion dictionary into the XGBoost model, the following steps are specifically implemented:
constructing a decision tree according to the emotion feature data set extracted by the emotion dictionary;
inputting the decision tree into the XGBoost model to obtain a residual error actually output by the XGBoost model and the emotion feature data set extracted by the emotion dictionary;
constructing a new decision tree according to the residual error;
iterating the decision tree by using a new decision tree to obtain emotion characteristic information combinations;
inputting the emotion characteristic information combination into a logistic regression model, and training the logistic regression model;
and performing model persistence processing on the trained logistic regression model to obtain a public opinion polarity prediction model.
It should be appreciated that in an embodiment of the application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of:
obtaining public opinion data;
the AC automaton based on the double-array dictionary tree performs emotion feature information extraction on the data to be analyzed to obtain feature data;
performing polarity prediction on the characteristic data through a public opinion polarity prediction model to obtain a prediction result;
and outputting the prediction result.
The AC automaton based on the double-array dictionary tree is a multimode matching algorithm for extracting emotion characteristic information based on emotion dictionary to-be-analyzed data, and the emotion dictionary is constructed based on the double-array dictionary tree.
In an embodiment, when the processor executes the computer program to implement the step of extracting emotion feature information of the data to be analyzed by the AC automaton based on the double-array dictionary tree to obtain feature data, the following steps are specifically implemented:
performing pattern matching on the data to be analyzed by using an AC automaton based on the double-array dictionary tree to obtain an output result;
and extracting emotion characteristic information of the output result to obtain characteristic data.
In one embodiment, when the processor executes the computer program to implement the step of performing pattern matching on the AC automaton based on the double-array dictionary tree to obtain an output result, the method specifically includes the following steps:
splitting the data to be analyzed into a plurality of characters;
searching an emotion dictionary according to the characters;
judging whether the characters are matched;
if so, outputting the matched characters to a set to form an output result;
judging whether the current character is the last character or not;
if yes, carrying out emotion feature information extraction on the output result to obtain feature data;
if not, acquiring the next character;
returning to the emotion dictionary searching according to the characters;
If the characters are not matched, turning to the character pointed by the failure function;
judging whether the character pointed by the failure function is empty or not;
if not, outputting the character pointed by the failure function to a set to form an output result;
returning to the judgment of whether the current character is the last character;
if yes, go to the end step.
In one embodiment, when the processor executes the computer program to implement the step of extracting emotion feature information from the output result to obtain feature data, the following steps are specifically implemented:
dividing an output result into a plurality of atomic words;
establishing an adjacency list for storing the array graph;
determining the position of the atomic word by using the offset of the atomic word;
adding the atomic words to the corresponding positions of the arrays in the adjacency list;
calculating the distance between the atomic words of two nodes in the array based on a Viterbi algorithm;
scoring the whole array diagram stored in the adjacency list;
and adding the atomic words, the positions and the attribute information with the shortest distance into a set emotion characteristic data set to form characteristic data.
And the public opinion polarity prediction model inputs the emotion feature data set extracted by the emotion dictionary into the XGBoost model to obtain classification features, and then inputs the classification features into a model obtained by training a logistic regression model.
In one embodiment, when the processor executes the computer program to implement the public opinion polarity prediction model to obtain classification features by inputting emotion feature data sets extracted by an emotion dictionary into an XGBoost model, and then inputting the classification features into a logistic regression model to train the obtained model, the processor specifically implements the following steps:
constructing a decision tree according to the emotion feature data set extracted by the emotion dictionary;
inputting the decision tree into the XGBoost model to obtain a residual error actually output by the XGBoost model and the emotion feature data set extracted by the emotion dictionary;
constructing a new decision tree according to the residual error;
iterating the decision tree by using a new decision tree to obtain emotion characteristic information combinations;
inputting the emotion characteristic information combination into a logistic regression model, and training the logistic regression model;
and performing model persistence processing on the trained logistic regression model to obtain a public opinion polarity prediction model.
The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (8)

1. The public opinion polarity prediction method is characterized by comprising the following steps:
obtaining public opinion data;
the AC automaton based on the double-array dictionary tree performs emotion feature information extraction on the data to be analyzed to obtain feature data;
performing polarity prediction on the characteristic data through a public opinion polarity prediction model to obtain a prediction result;
outputting the prediction result;
the AC automaton based on the double-array dictionary tree performs emotion characteristic information extraction on data to be analyzed to obtain characteristic data, and the method comprises the following steps:
performing pattern matching on the data to be analyzed by using an AC automaton based on the double-array dictionary tree to obtain an output result;
extracting emotion characteristic information of the output result to obtain characteristic data;
the method for performing pattern matching on the data to be analyzed by using the AC automaton based on the double-array dictionary tree to obtain an output result comprises the following steps:
Splitting the data to be analyzed into a plurality of characters;
searching an emotion dictionary according to the characters;
judging whether the characters are matched;
if so, outputting the matched characters to a set to form an output result;
judging whether the current character is the last character or not;
if yes, carrying out emotion feature information extraction on the output result to obtain feature data;
if not, acquiring the next character;
returning to the emotion dictionary searching according to the characters;
if the characters are not matched, turning to the character pointed by the failure function;
judging whether the character pointed by the failure function is empty or not;
if not, outputting the character pointed by the failure function to a set to form an output result;
returning to the judgment of whether the current character is the last character;
if yes, entering an ending step;
the failure function is established according to the steering function, the failure function value of the state with the depth of 1 is calculated firstly, the state with the depth of 2 is calculated, and the like until the failure function values of all the states except the state 0 are calculated, and finally the failure function is obtained.
2. The public opinion polarity prediction method according to claim 1, wherein the AC automaton based on the double-array dictionary tree is a multi-mode matching algorithm for extracting emotion feature information based on emotion dictionary to-be-analyzed data, and the emotion dictionary is constructed based on the double-array dictionary tree.
3. The public opinion polarity prediction method according to claim 1, wherein the extracting emotion feature information from the output result to obtain feature data includes:
dividing an output result into a plurality of atomic words;
establishing an adjacency list for storing the array graph;
determining the position of the atomic word by using the offset of the atomic word;
adding the atomic words to the corresponding positions of the arrays in the adjacency list;
calculating the distance between the atomic words of two nodes in the array based on a Viterbi algorithm;
scoring the whole array diagram stored in the adjacency list;
and adding the atomic words, the positions and the attribute information with the shortest distance into a set emotion characteristic data set to form characteristic data.
4. The public opinion polarity prediction method according to claim 2, wherein the characteristic data is subjected to polarity prediction by a public opinion polarity prediction model to obtain a prediction result, and the public opinion polarity prediction model is a model obtained by inputting emotion characteristic data sets extracted by an emotion dictionary into an XGBoost model to obtain classification characteristics and inputting the classification characteristics into a logistic regression model to train.
5. The public opinion polarity prediction method according to claim 4, wherein the public opinion polarity prediction model is a model obtained by inputting a classification feature into an XGBoost model through an emotion feature data set extracted by an emotion dictionary, and then inputting the classification feature into a logistic regression model for training, and the method comprises the following steps:
Constructing a decision tree according to the emotion feature data set extracted by the emotion dictionary;
inputting the decision tree into the XGBoost model to obtain a residual error actually output by the XGBoost model and the emotion feature data set extracted by the emotion dictionary;
constructing a new decision tree according to the residual error;
iterating the decision tree by using a new decision tree to obtain emotion characteristic information combinations;
inputting the emotion characteristic information combination into a logistic regression model, and training the logistic regression model;
and performing model persistence processing on the trained logistic regression model to obtain a public opinion polarity prediction model.
6. Public opinion polarity prediction device, its characterized in that includes:
the public opinion data acquisition unit is used for acquiring public opinion data;
the extraction unit is used for extracting emotion characteristic information of the data to be analyzed based on the AC automaton of the double-array dictionary tree so as to obtain characteristic data;
the prediction unit is used for carrying out polarity prediction on the characteristic data through a public opinion polarity prediction model so as to obtain a prediction result;
the output unit is used for outputting the prediction result;
the extraction unit includes:
the matching subunit is used for carrying out pattern matching on the data to be analyzed by using the AC automaton based on the double-array dictionary tree so as to obtain an output result;
The characteristic data forming subunit is used for extracting emotion characteristic information of the output result to obtain characteristic data;
the matching subunit includes:
the splitting module is used for splitting the data to be analyzed into a plurality of characters;
the searching module is used for searching the emotion dictionary according to the characters;
the character judging module is used for judging whether the characters are matched or not;
the first output module is used for outputting the matched characters to the set if the characters are matched, so as to form an output result;
the last character judging module is used for judging whether the current character is the last character or not; if yes, carrying out emotion feature information extraction on the output result to obtain feature data;
the character acquisition module is used for acquiring the next character if not; returning to the emotion dictionary searching according to the characters;
the steering module is used for steering the character pointed by the failure function if the characters are not matched;
the pointing judgment module is used for judging whether the character pointed by the failure function is empty or not; if yes, entering an ending step;
the second output module is used for outputting the character pointed by the failure function to a set if not so as to form an output result; returning to the judgment of whether the current character is the last character;
The failure function is established according to the steering function, the failure function value of the state with the depth of 1 is calculated firstly, the state with the depth of 2 is calculated, and the like until the failure function values of all the states except the state 0 are calculated, and finally the failure function is obtained.
7. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-5.
8. A storage medium storing a computer program which, when executed by a processor, performs the method of any one of claims 1 to 5.
CN201910199451.5A 2019-03-15 2019-03-15 Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium Active CN109933656B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910199451.5A CN109933656B (en) 2019-03-15 2019-03-15 Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium
PCT/CN2019/089224 WO2020186627A1 (en) 2019-03-15 2019-05-30 Public opinion polarity prediction method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910199451.5A CN109933656B (en) 2019-03-15 2019-03-15 Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109933656A CN109933656A (en) 2019-06-25
CN109933656B true CN109933656B (en) 2023-08-15

Family

ID=66987288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910199451.5A Active CN109933656B (en) 2019-03-15 2019-03-15 Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN109933656B (en)
WO (1) WO2020186627A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362669B (en) * 2019-07-18 2022-07-01 中科信息安全共性技术国家工程研究中心有限公司 Method suitable for fast keyword retrieval
CN110674297B (en) * 2019-09-24 2022-04-29 支付宝(杭州)信息技术有限公司 Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment
CN113051925A (en) * 2019-12-26 2021-06-29 中国移动通信集团有限公司 Time identification method, device, equipment and computer storage medium
CN111831824B (en) * 2020-07-16 2024-02-09 民生科技有限责任公司 Public opinion positive and negative surface classification method
CN111859074B (en) * 2020-07-29 2023-12-29 东北大学 Network public opinion information source influence evaluation method and system based on deep learning
CN113643060A (en) * 2021-08-12 2021-11-12 工银科技有限公司 Product price prediction method and device
CN114701870B (en) * 2022-02-11 2024-03-29 国能黄骅港务有限责任公司 Feeding system of dumper and high material level detection method and device thereof
CN114861027A (en) * 2022-04-29 2022-08-05 深圳市东晟数据有限公司 Multi-dimensional public opinion recommendation method based on big data and natural language processing
CN117640259A (en) * 2024-01-25 2024-03-01 武汉思普崚技术有限公司 Script step-by-step detection method and device, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779174A (en) * 2012-06-26 2012-11-14 北京奇虎科技有限公司 Public opinion information display system and method
CN103365991A (en) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space
CN106294326A (en) * 2016-08-23 2017-01-04 成都科来软件有限公司 A kind of news report Sentiment orientation analyzes method
CN108021569A (en) * 2016-11-01 2018-05-11 中国移动通信有限公司研究院 The structure of AC automatic machines and Chinese multi-model matching method and relevant apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200969A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Text sentiment polarity classification system and method based on sentence sequence
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779174A (en) * 2012-06-26 2012-11-14 北京奇虎科技有限公司 Public opinion information display system and method
CN103365991A (en) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space
CN106294326A (en) * 2016-08-23 2017-01-04 成都科来软件有限公司 A kind of news report Sentiment orientation analyzes method
CN108021569A (en) * 2016-11-01 2018-05-11 中国移动通信有限公司研究院 The structure of AC automatic machines and Chinese multi-model matching method and relevant apparatus

Also Published As

Publication number Publication date
WO2020186627A1 (en) 2020-09-24
CN109933656A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109933656B (en) Public opinion polarity prediction method, public opinion polarity prediction device, computer equipment and storage medium
CN107085581B (en) Short text classification method and device
JP6955580B2 (en) Document summary automatic extraction method, equipment, computer equipment and storage media
CN108647205B (en) Fine-grained emotion analysis model construction method and device and readable storage medium
CN106844368B (en) Method for man-machine conversation, neural network system and user equipment
CN109670191B (en) Calibration optimization method and device for machine translation and electronic equipment
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
US20150095017A1 (en) System and method for learning word embeddings using neural language models
CN113239186B (en) Graph convolution network relation extraction method based on multi-dependency relation representation mechanism
US20180053107A1 (en) Aspect-based sentiment analysis
US20170154077A1 (en) Method for comment tag extraction and electronic device
JP2005158010A (en) Apparatus, method and program for classification evaluation
CN111191002A (en) Neural code searching method and device based on hierarchical embedding
KR20180094664A (en) Method for information extraction from text data and apparatus therefor
CN112906392A (en) Text enhancement method, text classification method and related device
CN112100374A (en) Text clustering method and device, electronic equipment and storage medium
CN110781673B (en) Document acceptance method and device, computer equipment and storage medium
CN113986950A (en) SQL statement processing method, device, equipment and storage medium
CN111241271B (en) Text emotion classification method and device and electronic equipment
CN111506726A (en) Short text clustering method and device based on part-of-speech coding and computer equipment
TW202032534A (en) Voice recognition method and device, electronic device and storage medium
CN113919424A (en) Training of text processing model, text processing method, device, equipment and medium
CN113505583A (en) Sentiment reason clause pair extraction method based on semantic decision diagram neural network
CN115035890A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN103744830A (en) Semantic analysis based identification method of identity information in EXCEL document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant