CN111984793A - Text emotion classification model training method and device, computer equipment and medium - Google Patents

Text emotion classification model training method and device, computer equipment and medium Download PDF

Info

Publication number
CN111984793A
CN111984793A CN202010917934.7A CN202010917934A CN111984793A CN 111984793 A CN111984793 A CN 111984793A CN 202010917934 A CN202010917934 A CN 202010917934A CN 111984793 A CN111984793 A CN 111984793A
Authority
CN
China
Prior art keywords
text
emotion
training
speech
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010917934.7A
Other languages
Chinese (zh)
Inventor
宋威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010917934.7A priority Critical patent/CN111984793A/en
Publication of CN111984793A publication Critical patent/CN111984793A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention relates to the technical field of artificial intelligence, and provides a text emotion classification model training method, a device, computer equipment and a medium, wherein the method comprises the following steps: acquiring a plurality of long texts, and segmenting each long text to obtain a plurality of text sentences; calculating the TextRank value of each text statement in each long text, and generating a text abstract for each long text according to the TextRank value; calculating the emotion score of each text statement in each text abstract; sequencing the plurality of text sentences in each text abstract according to the emotion scores, and generating a text data set according to the sequenced plurality of text sentences; and training a plurality of text data sets based on the pre-training model to obtain a text emotion classification model. The invention can realize accurate classification of the emotion of the long text without losing the position information, the time sequence information and the semantic information of the original long text.

Description

Text emotion classification model training method and device, computer equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text emotion classification model training method and device, computer equipment and a medium.
Background
The emotion analysis of the medical long text is an important component of network public opinion monitoring, and can effectively distinguish negative information, so that a manager can check and explain the negative information timely and effectively, and can monitor the outbreak of the network public opinion constantly.
At present, most of texts are classified by using a word2vec method to code the texts, or by using a pre-training model BERT. However, in the process of implementing the present invention, the inventor finds that word2vec cannot solve the problems of word ambiguity and grammar, and the pre-training model BERT can only classify the text with the text length less than 512 words, although it can solve the problems of word ambiguity and grammar. It can be seen that the current text classification method has a good classification effect on short texts, but is not suitable for the classification of medical long texts.
Disclosure of Invention
In view of the above, there is a need to provide a text emotion classification model training method, apparatus, computer device, and medium, which can accurately classify the emotion of a long text without losing position information, timing information, and semantic information of the original long text.
The invention provides a text emotion classification model training method in a first aspect, which comprises the following steps:
acquiring a plurality of long texts, and segmenting each long text to obtain a plurality of text sentences;
calculating the TextRank value of each text statement in each long text, and generating a text abstract for each long text according to the TextRank value;
calculating the emotion score of each text statement in each text abstract;
sequencing the plurality of text sentences in each text abstract according to the emotion scores, and generating a text data set according to the sequenced plurality of text sentences;
and training a plurality of text data sets based on the pre-training model to obtain a text emotion classification model.
According to an alternative embodiment of the present invention, the calculating the TextRank value of each text sentence in each long text, and generating the text summary for each long text according to the TextRank value comprises:
sentence embedding is carried out on each text sentence based on a preset language model to obtain a sentence vector;
calculating the similarity between the statement vectors, and generating a similarity matrix according to the similarity;
generating a text graph structure according to the similarity matrix;
calculating the text graph structure by adopting a text ranking TextRank algorithm to obtain a TextRank value of each text statement;
sequencing the TextRank values and acquiring a plurality of target text sentences corresponding to a plurality of TextRank values sequenced at the front;
generating a text summary based on the plurality of target text sentences.
According to an alternative embodiment of the present invention, the calculating the emotion score of each text sentence in each text abstract comprises:
performing word segmentation on each text sentence to obtain a plurality of word segments;
identifying a first emotion part of speech of each participle, wherein the first emotion part of speech comprises a positive part of speech, a negative part of speech and a negative part of speech;
when the first emotion part of speech of the word segmentation is identified to be the positive part of speech, identifying second emotion part of speech of the word segmentation before and after the word segmentation, and generating a first emotion weight according to the second emotion part of speech of the word segmentation before and after the word segmentation;
when the first emotion part of speech of the word segmentation is identified to be a negative part of speech, identifying a third emotion part of speech of a word segmentation before the word segmentation, and generating a second emotion weight according to the third emotion part of speech of the word segmentation before the word segmentation;
when the first emotion part of speech of the recognized word segmentation is a negative part of speech, determining the preset emotion weight as a third emotion weight;
and calculating the emotion score of the text sentence according to the first emotion weight, the second emotion weight and the third emotion weight corresponding to all the participles in the text sentence.
According to an alternative embodiment of the present invention, the training of the plurality of text data sets based on the pre-training model to obtain the text emotion classification model includes:
calculating the character length of each text statement in each text data set;
starting from a first text statement, performing character accumulation on the text statements after the first text statement, stopping the character accumulation when the character length obtained by accumulation exceeds a preset character length, and splicing the accumulated text statements to obtain text data;
generating emotion category labels for the text data according to the emotion scores corresponding to the text data;
taking the positive emotion label and the corresponding text data as positive samples, and taking the negative emotion label and the corresponding text data as negative samples;
and training a pre-training model based on the positive sample and the negative sample to obtain a text emotion classification model.
According to an alternative embodiment of the present invention, the obtaining the plurality of long texts comprises:
setting a plurality of search keywords;
and crawling a plurality of texts from a plurality of search engines according to the plurality of search keywords.
According to an optional embodiment of the present invention, after training a plurality of text data sets based on a pre-training model to obtain a text emotion classification model, the method further comprises:
acquiring a link address of a hospital to be monitored;
crawling out a plurality of long texts according to the link addresses;
inputting the long texts into the text emotion classification model for emotion classification to obtain a plurality of emotion category labels, wherein the emotion category labels are positive emotion labels or negative emotion labels;
calculating the occupation ratio of the negative emotion label in the plurality of emotion category labels;
comparing the ratio with a preset ratio threshold;
and generating an early warning instruction in response to the ratio being greater than the preset ratio threshold.
According to an optional embodiment of the present invention, after training a plurality of text data sets based on a pre-training model to obtain a text emotion classification model, the method further comprises:
acquiring a long text to be classified;
inputting the long text to be classified into the text emotion classification model for emotion classification to obtain an emotion classification label;
and marking the emotion category label on the long text to be classified.
The second aspect of the invention provides a text emotion classification model training device, which comprises:
the segmentation module is used for acquiring a plurality of long texts and segmenting each long text to obtain a plurality of text sentences;
the generating module is used for calculating the TextRank value of each text statement in each long text and generating a text abstract for each long text according to the TextRank value;
the calculation module is used for calculating the emotion score of each text statement in each text abstract;
the sequencing module is used for sequencing the text sentences in each text abstract according to the emotion scores and generating a text data set according to the sequenced text sentences;
and the training module is used for training a plurality of text data sets based on the pre-training model to obtain a text emotion classification model.
A third aspect of the present invention provides a computer apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the text emotion classification model training method when the computer program is executed.
A fourth aspect of the present invention provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the text emotion classification model training method.
In summary, the text sentiment classification model training method, the text sentiment classification model training device, the computer equipment and the media adopt a text summarization method to extract the summaries of the articles, extract important information in the text and remove a large amount of interference information in the long text of medical news; compared with the whole text, the text abstract is shorter in length, so that the efficiency of model training is improved; calculating the emotion score of each text statement in each text abstract by using an emotion dictionary method, providing a basis for secondary sequencing and connection of sentences, and arranging the sentences subjected to secondary sequencing according to the appearance sequence of the original text to ensure the time sequence of the original text information; finally, the method can well solve the problems of word ambiguity, grammar and context dependence in Chinese by utilizing the excellent semantic feature extraction advantages of the Bert network, and overcomes the word number limitation of the Bert model to the medical news long text. Provides a feasible scheme for the classification of medical long text algorithms. The public opinion monitoring module has a high practical application value.
Drawings
FIG. 1 is a flowchart of a text emotion classification model training method according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a text emotion classification model training apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
FIG. 1 is a flowchart of a text emotion classification model training method according to an embodiment of the present invention. The text emotion classification model training method specifically comprises the following steps, and according to different requirements, the sequence of the steps in the flowchart can be changed, and some steps can be omitted.
The text emotion classification model training method is used for training a text emotion classification model for accurately classifying the emotion of a long text, so that the emotion of the long text is accurately classified without losing position information, time sequence information and semantic information of the original long text.
And S11, acquiring a plurality of long texts, and segmenting each long text to obtain a plurality of text sentences.
The computer device crawls a plurality of long texts from each search engine before the emotion classification model of the text data set, and divides each long text into a plurality of text sentences according to punctuation marks (such as commas, periods, exclamation marks and question marks).
Wherein the long text may be articles, news, etc. in the medical field.
In an alternative embodiment, the obtaining the plurality of long texts comprises:
setting a plurality of search keywords;
and crawling a plurality of texts from a plurality of search engines according to the plurality of search keywords.
Since there is no medical long text database with emotion classification tags, if a plurality of medical long texts are to be crawled out, a web crawler needs to be customized for each search engine. In order to quickly and accurately crawl out a plurality of medical long texts, the computer device defines search keywords related to medical treatment in advance, and then crawls out the medical long texts from a corresponding search engine by utilizing a customized web crawler.
Illustratively, the search keyword may be a doctor, a medicine, or the like.
And S12, calculating the TextRank value of each text statement in each long text, and generating a text abstract for each long text according to the TextRank value.
Wherein the text abstract is used for summarizing and summarizing the long text as concisely as possible.
Because a long text may contain a lot of contents irrelevant to emotion classification, and the number of text sentences in the long text is large, which is not beneficial to training of a text emotion classification model, the contents which can most represent the emotion classification of the long text are placed in the text abstract by generating the text abstract for each long text, so that the contents irrelevant to emotion classification can be effectively removed, and the number of text sentences is reduced. The reduction of the number of the text sentences can improve the efficiency of the emotion classification model of the text data set, thereby improving the classification efficiency of the text emotion.
In an optional embodiment, the calculating the TextRank value of each text sentence in each long text, and generating the text summary for each long text according to the TextRank value includes:
sentence embedding is carried out on each text sentence based on a preset language model to obtain a sentence vector;
calculating the similarity between the statement vectors, and generating a similarity matrix according to the similarity;
generating a text graph structure according to the similarity matrix;
calculating the text graph structure by adopting a text ranking TextRank algorithm to obtain a TextRank value of each text statement;
sequencing the TextRank values and acquiring a plurality of target text sentences corresponding to a plurality of TextRank values sequenced at the front;
generating a text summary based on the plurality of target text sentences.
Wherein the text summary comprises: text sentences, weight ordering of the text sentences and position information of the text sentences in the long text. The location information may include: a position index.
The language model may be an Embedded Language Model (ELMO) for generating a statement vector of a preset dimension. The preset dimension may be defined as 512.
The computer equipment obtains statement vectors for each text statement, calculates the similarity between the statement vectors through matrix operation after obtaining the statement vectors, and finally obtains an N-N matrix, wherein N represents the number of the text statements. And converting the similar matrix into a text graph structure which takes the text sentence as a node and the similarity as an edge and is used for calculating the TextRank value of the text sentence. The TextRank values are sorted from large to small, and a plurality of text sentences corresponding to a plurality of top-sorted (for example, top 20) TextRank values are used as target text sentences, so that text summaries are generated based on the plurality of target text sentences.
The similarity of the text sentences selects different calculation modes according to different conditions, such as cosine included angles, Euclidean distances and the like. The similarity between the text sentences can also be calculated by calculating the ratio of the number of common words in the two text sentences to the sum of the log values of the lengths of the two text sentences. The text ranking TextRank algorithm is a graph-based sorting algorithm for texts, and ranking results of text sentences in a text graph structure can be obtained through the TextRank algorithm.
In the optional embodiment, sentence embedding is performed on the text sentences through the preset language model, sentence vectors with preset dimensions can be obtained, the operation dimensions are controlled based on the sentence vectors with the preset dimensions, and the extraction efficiency of the text abstract is improved.
And S13, calculating the emotion score of each text sentence in each text abstract.
Since the text abstract does not take the subsequent task of text emotion classification into consideration, the emotion score of the text sentences in each text abstract needs to be calculated by using an emotion dictionary, and the text sentences are subjected to secondary sorting according to the emotion scores.
In an alternative embodiment, the calculating the emotion score of each text sentence in each text abstract comprises:
performing word segmentation on each text sentence to obtain a plurality of word segments;
identifying a first emotion part of speech of each participle, wherein the first emotion part of speech comprises a positive part of speech, a negative part of speech and a negative part of speech;
when the first emotion part of speech of the word segmentation is identified to be the positive part of speech, identifying second emotion part of speech of the word segmentation before and after the word segmentation, and generating a first emotion weight according to the second emotion part of speech of the word segmentation before and after the word segmentation;
when the first emotion part of speech of the word segmentation is identified to be a negative part of speech, identifying a third emotion part of speech of a word segmentation before the word segmentation, and generating a second emotion weight according to the third emotion part of speech of the word segmentation before the word segmentation;
when the first emotion part of speech of the recognized word segmentation is a negative part of speech, determining the preset emotion weight as a third emotion weight;
and calculating the emotion score of the text sentence according to the first emotion weight, the second emotion weight and the third emotion weight corresponding to all the participles in the text sentence.
Wherein the second emotion part of speech may include: adverbs, negative words, and others. The degree adverb is an adverb that defines or modifies the degree of an adjective or adverb, preceding the adjective or adverb that is modified. Exemplary, adverbs include: quite, a bit, obviously, etc.
In this optional embodiment, the computer device may perform word segmentation on each text sentence by using a word segmentation tool to obtain a plurality of segmented words, generate a vector phrase for each text sentence according to the plurality of segmented words, and then perform word-by-word part-of-speech recognition on each vector phrase.
The computer equipment can be pre-stored with a positive emotion word list, a negative emotion word list and a negative emotion word list, wherein a plurality of phrases with positive emotion part of speech are stored in the positive emotion word list, a plurality of phrases with negative emotion part of speech are stored in the negative emotion word list, and a plurality of phrases with negative emotion part of speech are stored in the negative emotion word list. The phrases in the positive emotion word list, the negative emotion word list and the negative emotion word list are different, namely, a certain phrase is only stored in one of the positive emotion word list, the negative emotion word list and the negative emotion word list. The computer equipment determines the emotional part-of-speech of each participle of each text sentence in a word-by-word matching mode. When a certain word segmentation is successfully matched with any word group in the active emotion word list, recognizing the emotion part of speech of the word segmentation as an active part of speech; when a certain word segmentation is successfully matched with any word group in the passive emotion word list, recognizing the emotion part of speech of the word segmentation as a passive part of speech; and when any word group in a word segmentation or negative emotion word list is successfully matched, identifying the emotion part of speech of the word segmentation as a negative part of speech.
The computer equipment stores an emotion weight dictionary which is used for recording the corresponding relation between emotion part of speech and weight. Specifically, the emotion weight of a participle corresponding to each positive emotion is set to be 1, the emotion weight of a participle corresponding to each negative emotion is set to be-1, and the emotion value is assumed to meet the linear superposition principle; if the word vector after word segmentation of the text sentence contains corresponding word segmentation, a forward weight is added, wherein negative words and degree adverbs have special judgment rules, the negative words can cause opposite sign of emotion weight, and the degree adverbs can double the emotion weight. And finally, judging the emotion of the text sentence according to the positive and negative of the total emotion weight.
For example, assuming that a computer device performs word segmentation on a certain text sentence to obtain a segmented word a, a segmented word B, a segmented word C and a segmented word D, if the first emotional part-of-speech of the segmented word B is identified as a positive part-of-speech, the second emotional part-of-speech of the preceding segmented word a and the second emotional part-of-speech of the following segmented word C of the segmented word B are further identified. If the second emotion part of speech of the previous participle A is a degree adverb, generating a first emotion weight + 2; if the second emotion part of speech of the previous word segmentation A is a negative word or a negative word, generating a first emotion weight-1; and if the second emotion part of speech of the later participle C is a negative word, generating a first emotion weight-1.
For another example, assuming that the first emotional part-of-speech of the segment B is identified as a negative part-of-speech, the second emotional part-of-speech of the previous segment a of the segment B is further identified. If the second emotion part of speech of the previous participle A is a degree adverb, generating a second emotion weight-2; if the second emotion part of speech of the previous participle A is a negative word, generating a second emotion weight + 1; and if the second emotion part of speech of the previous participle A is other, generating a second emotion weight-1.
For another example, assuming that the first emotion part of speech of the recognition participle B is a negative part of speech, the preset emotion weight (e.g., -0.5) is determined as the third emotion weight.
And finally, the computer equipment adds the emotion weights corresponding to each participle in the text sentence to obtain the emotion score of the text sentence.
In the optional embodiment, degree adverbs are introduced into the identification of the second emotion part of speech, so that emotion words such as good words and very good words are specially considered, and different emotion weights are given according to the degree adverbs, so that the computation of the emotion score of the text sentence is more accurate and more reasonable.
And S14, sequencing the text sentences in each text abstract according to the emotion scores, and generating a text data set according to the sequenced text sentences.
And for each text abstract, the computer equipment sorts the emotion scores of the text abstract from large to small, so that the corresponding text sentences are sorted, and the sorted text sentences are used as text data sets.
In the embodiment, the sequence information of the text abstract can be effectively ensured by sequencing the text sentences in the text abstract, and a data set is provided for the subsequent text emotion classification model training based on the pre-training model.
And S15, training a plurality of text data sets based on the pre-training model to obtain a text emotion classification model.
And the computer equipment acquires a pre-training model BERT and carries out parameter fine adjustment on the pre-training model BERT based on the plurality of text data sets to obtain a text emotion classification model.
BERT (Bidirectional Encoder representation from Transformers) is a pre-trained model that takes into account information of words preceding and following a word when processing the word, thereby obtaining context semantics. BERT (base) may be selected from a variety of pre-trained BERT models provided by Google as a pre-trained model for text matching. Since the BERT model expects the input data to have a particular format, the beginning of a sentence ([ CLS ]) and the separation/end of a sentence ([ SEP ]), tokens conforming to the fixed vocabulary used in BERT, token IDs provided by the BERT's token generator, mask IDs to indicate which elements in the sequence are tokens and which are filler elements, sentence fragment IDs to distinguish different sentences, position embedding to show the position of the marker in the sequence, can be marked with special markers.
In an optional embodiment, the training a plurality of text data sets based on a pre-training model to obtain a text emotion classification model includes:
calculating the character length of each text statement in each text data set;
starting from a first text statement, performing character accumulation on the text statements after the first text statement, stopping the character accumulation when the character length obtained by accumulation exceeds a preset character length, and splicing the accumulated text statements to obtain text data;
generating emotion category labels for the text data according to the emotion scores corresponding to the text data;
taking the positive emotion label and the corresponding text data as positive samples, and taking the negative emotion label and the corresponding text data as negative samples;
and training a pre-training model based on the positive sample and the negative sample to obtain a text emotion classification model.
The predetermined character length may be 510.
The emotion classification labels are divided into positive emotion labels and negative emotion labels, which are equivalent to a two-classification problem, and a logistic regression layer is added behind the Bert model for classification, so that a classification result is obtained.
In an optional embodiment, after training the plurality of text data sets based on the pre-training model to obtain the text emotion classification model, the method further includes:
acquiring a link address of a hospital to be monitored;
crawling out a plurality of long texts according to the link addresses;
inputting the long texts into the text emotion classification model for emotion classification to obtain a plurality of emotion category labels, wherein the emotion category labels are positive emotion labels or negative emotion labels;
calculating the occupation ratio of the negative emotion label in the plurality of emotion category labels;
comparing the ratio with a preset ratio threshold;
and generating an early warning instruction in response to the ratio being greater than the preset ratio threshold.
In the optional embodiment, if public opinion monitoring needs to be performed on a certain hospital, a connection address URL of the hospital can be acquired, and a plurality of long texts are crawled according to the URL; or acquiring information (such as doctor name, occupation number) and patient information (patient name, medical insurance number) of doctors in the hospital, and crawling out a plurality of long texts according to the information of doctors and the information of patients. And calling the text emotion classification model to perform emotion classification on each crawled long text to obtain an emotion classification label.
When the negative emotion label accounts for a large proportion (is larger than a preset ratio threshold), it is indicated that a problem occurs in the monitored hospital, for example, a doctor-patient problem, and an early warning instruction needs to be produced to realize the examination of the hospital.
In the optional embodiment, the text emotion classification model is called to carry out emotion classification on a plurality of long texts related to the hospital, so that the directional monitoring of the public sentiment of the hospital can be realized, the good development of the hospital is promoted, and the text emotion classification model has high practical value and economic value.
For example, after a patient submits a text evaluation to a certain hospital, the emotion classification label (positive emotion label and negative emotion label) of the text evaluation can be determined, so that the emotional tendency of the user to the hospital is determined, the service quality of the hospital is determined, and the service level of the hospital can be improved in an auxiliary manner.
In an optional embodiment, after training the plurality of text data sets based on the pre-training model to obtain the text emotion classification model, the method further includes:
acquiring a long text to be classified;
inputting the long text to be classified into the text emotion classification model for emotion classification to obtain an emotion classification label;
and marking the emotion category label on the long text to be classified.
Illustratively, the text emotion classification model is used for carrying out emotion classification on the long text to be classified, and emotion category labels output by the text emotion classification model are marked on the long text to be classified, so that the query and retrieval efficiency of medical texts can be improved.
In conclusion, because the long text contains rich semantic information, the method of the invention extracts the abstract of the article by adopting a text abstract method, extracts important information in the text and removes a large amount of interference information in the long text of medical news; compared with the whole text, the text abstract is shorter in length, so that the efficiency of model training is improved; calculating the emotion score of each text statement in each text abstract by using an emotion dictionary method, providing a basis for secondary sequencing and connection of sentences, and arranging the sentences subjected to secondary sequencing according to the appearance sequence of the original text to ensure the time sequence of the original text information; finally, the method can well solve the problems of word ambiguity, grammar and context dependence in Chinese by utilizing the excellent semantic feature extraction advantages of the Bert network, and overcomes the word number limitation of the Bert model to the medical news long text. Provides a feasible scheme for the classification of medical long text algorithms. The public opinion monitoring module has a high practical application value.
It is emphasized that, to further ensure the privacy and security of the text emotion classification model, the text emotion classification model can be stored in the nodes of the blockchain.
Fig. 2 is a structural diagram of a text emotion classification model training apparatus according to a second embodiment of the present invention.
In some embodiments, the text emotion classification model training apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the text emotion classification model training apparatus 20 can be stored in the memory of the computer device and executed by the at least one processor to perform the functions of text emotion classification model training (described in detail in fig. 1). The text emotion classification model training device 20 is used for training a text emotion classification model for accurately classifying emotions of a long text, so that the emotion of the long text is accurately classified without losing position information, time sequence information and semantic information of the original long text.
In this embodiment, the text emotion classification model training apparatus 20 may be divided into a plurality of functional modules according to the functions executed by the apparatus. The functional module may include: a segmentation module 201, a generation module 202, a calculation module 203, a ranking module 204, a training module 205, and a classification module 206. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The segmentation module 201 is configured to obtain a plurality of long texts, and segment each long text to obtain a plurality of text sentences.
The computer device crawls a plurality of long texts from each search engine before the emotion classification model of the text data set, and divides each long text into a plurality of text sentences according to punctuation marks (such as commas, periods, exclamation marks and question marks).
Wherein the long text may be articles, news, etc. in the medical field.
In an alternative embodiment, the segmentation module 201 obtaining a plurality of long texts comprises:
setting a plurality of search keywords;
and crawling a plurality of texts from a plurality of search engines according to the plurality of search keywords.
Since there is no medical long text database with emotion classification tags, if a plurality of medical long texts are to be crawled out, a web crawler needs to be customized for each search engine. In order to quickly and accurately crawl out a plurality of medical long texts, the computer device defines search keywords related to medical treatment in advance, and then crawls out the medical long texts from a corresponding search engine by utilizing a customized web crawler.
Illustratively, the search keyword may be a doctor, a medicine, or the like.
The generating module 202 is configured to calculate a TextRank value of each text statement in each long text, and generate a text summary for each long text according to the TextRank value.
Wherein the text abstract is used for summarizing and summarizing the long text as concisely as possible.
Because a long text may contain a lot of contents irrelevant to emotion classification, and the number of text sentences in the long text is large, which is not beneficial to training of a text emotion classification model, the contents which can most represent the emotion classification of the long text are placed in the text abstract by generating the text abstract for each long text, so that the contents irrelevant to emotion classification can be effectively removed, and the number of text sentences is reduced. The reduction of the number of the text sentences can improve the efficiency of the emotion classification model of the text data set, thereby improving the classification efficiency of the text emotion.
In an alternative embodiment, the generating module 202 calculates a TextRank value of each text sentence in each long text, and generating the text summary for each long text according to the TextRank value includes:
sentence embedding is carried out on each text sentence based on a preset language model to obtain a sentence vector;
calculating the similarity between the statement vectors, and generating a similarity matrix according to the similarity;
generating a text graph structure according to the similarity matrix;
calculating the text graph structure by adopting a text ranking TextRank algorithm to obtain a TextRank value of each text statement;
sequencing the TextRank values and acquiring a plurality of target text sentences corresponding to a plurality of TextRank values sequenced at the front;
generating a text summary based on the plurality of target text sentences.
Wherein the text summary comprises: text sentences, weight ordering of the text sentences and position information of the text sentences in the long text. The location information may include: a position index.
The language model may be an Embedded Language Model (ELMO) for generating a statement vector of a preset dimension. The preset dimension may be defined as 512.
The computer equipment obtains statement vectors for each text statement, calculates the similarity between the statement vectors through matrix operation after obtaining the statement vectors, and finally obtains an N-N matrix, wherein N represents the number of the text statements. And converting the similar matrix into a text graph structure which takes the text sentence as a node and the similarity as an edge and is used for calculating the TextRank value of the text sentence. The TextRank values are sorted from large to small, and a plurality of text sentences corresponding to a plurality of top-sorted (for example, top 20) TextRank values are used as target text sentences, so that text summaries are generated based on the plurality of target text sentences.
The similarity of the text sentences selects different calculation modes according to different conditions, such as cosine included angles, Euclidean distances and the like. The similarity between the text sentences can also be calculated by calculating the ratio of the number of common words in the two text sentences to the sum of the log values of the lengths of the two text sentences. The text ranking TextRank algorithm is a graph-based sorting algorithm for texts, and ranking results of text sentences in a text graph structure can be obtained through the TextRank algorithm.
In the optional embodiment, sentence embedding is performed on the text sentences through the preset language model, sentence vectors with preset dimensions can be obtained, the operation dimensions are controlled based on the sentence vectors with the preset dimensions, and the extraction efficiency of the text abstract is improved.
The calculating module 203 is configured to calculate an emotion score of each text sentence in each text abstract.
Since the text abstract does not take the subsequent task of text emotion classification into consideration, the emotion score of the text sentences in each text abstract needs to be calculated by using an emotion dictionary, and the text sentences are subjected to secondary sorting according to the emotion scores.
In an alternative embodiment, the calculating module 203 calculates the emotion score of each text sentence in each text abstract comprises:
performing word segmentation on each text sentence to obtain a plurality of word segments;
identifying a first emotion part of speech of each participle, wherein the first emotion part of speech comprises a positive part of speech, a negative part of speech and a negative part of speech;
when the first emotion part of speech of the word segmentation is identified to be the positive part of speech, identifying second emotion part of speech of the word segmentation before and after the word segmentation, and generating a first emotion weight according to the second emotion part of speech of the word segmentation before and after the word segmentation;
when the first emotion part of speech of the word segmentation is identified to be a negative part of speech, identifying a third emotion part of speech of a word segmentation before the word segmentation, and generating a second emotion weight according to the third emotion part of speech of the word segmentation before the word segmentation;
when the first emotion part of speech of the recognized word segmentation is a negative part of speech, determining the preset emotion weight as a third emotion weight;
and calculating the emotion score of the text sentence according to the first emotion weight, the second emotion weight and the third emotion weight corresponding to all the participles in the text sentence.
Wherein the second emotion part of speech may include: adverbs, negative words, and others. The degree adverb is an adverb that defines or modifies the degree of an adjective or adverb, preceding the adjective or adverb that is modified. Exemplary, adverbs include: quite, a bit, obviously, etc.
In this optional embodiment, the computer device may perform word segmentation on each text sentence by using a word segmentation tool to obtain a plurality of segmented words, generate a vector phrase for each text sentence according to the plurality of segmented words, and then perform word-by-word part-of-speech recognition on each vector phrase.
The computer equipment can be pre-stored with a positive emotion word list, a negative emotion word list and a negative emotion word list, wherein a plurality of phrases with positive emotion part of speech are stored in the positive emotion word list, a plurality of phrases with negative emotion part of speech are stored in the negative emotion word list, and a plurality of phrases with negative emotion part of speech are stored in the negative emotion word list. The phrases in the positive emotion word list, the negative emotion word list and the negative emotion word list are different, namely, a certain phrase is only stored in one of the positive emotion word list, the negative emotion word list and the negative emotion word list. The computer equipment determines the emotional part-of-speech of each participle of each text sentence in a word-by-word matching mode. When a certain word segmentation is successfully matched with any word group in the active emotion word list, recognizing the emotion part of speech of the word segmentation as an active part of speech; when a certain word segmentation is successfully matched with any word group in the passive emotion word list, recognizing the emotion part of speech of the word segmentation as a passive part of speech; and when any word group in a word segmentation or negative emotion word list is successfully matched, identifying the emotion part of speech of the word segmentation as a negative part of speech.
The computer equipment stores an emotion weight dictionary which is used for recording the corresponding relation between emotion part of speech and weight. Specifically, the emotion weight of a participle corresponding to each positive emotion is set to be 1, the emotion weight of a participle corresponding to each negative emotion is set to be-1, and the emotion value is assumed to meet the linear superposition principle; if the word vector after word segmentation of the text sentence contains corresponding word segmentation, a forward weight is added, wherein negative words and degree adverbs have special judgment rules, the negative words can cause opposite sign of emotion weight, and the degree adverbs can double the emotion weight. And finally, judging the emotion of the text sentence according to the positive and negative of the total emotion weight.
For example, assuming that a computer device performs word segmentation on a certain text sentence to obtain a segmented word a, a segmented word B, a segmented word C and a segmented word D, if the first emotional part-of-speech of the segmented word B is identified as a positive part-of-speech, the second emotional part-of-speech of the preceding segmented word a and the second emotional part-of-speech of the following segmented word C of the segmented word B are further identified. If the second emotion part of speech of the previous participle A is a degree adverb, generating a first emotion weight + 2; if the second emotion part of speech of the previous word segmentation A is a negative word or a negative word, generating a first emotion weight-1; and if the second emotion part of speech of the later participle C is a negative word, generating a first emotion weight-1.
For another example, assuming that the first emotional part-of-speech of the segment B is identified as a negative part-of-speech, the second emotional part-of-speech of the previous segment a of the segment B is further identified. If the second emotion part of speech of the previous participle A is a degree adverb, generating a second emotion weight-2; if the second emotion part of speech of the previous participle A is a negative word, generating a second emotion weight + 1; and if the second emotion part of speech of the previous participle A is other, generating a second emotion weight-1.
For another example, assuming that the first emotion part of speech of the recognition participle B is a negative part of speech, the preset emotion weight (e.g., -0.5) is determined as the third emotion weight.
And finally, the computer equipment adds the emotion weights corresponding to each participle in the text sentence to obtain the emotion score of the text sentence.
In the optional embodiment, degree adverbs are introduced into the identification of the second emotion part of speech, so that emotion words such as good words and very good words are specially considered, and different emotion weights are given according to the degree adverbs, so that the computation of the emotion score of the text sentence is more accurate and more reasonable.
The sorting module 204 is configured to sort the plurality of text sentences in each text abstract according to the emotion scores, and generate a text data set according to the plurality of text sentences after sorting.
And for each text abstract, the computer equipment sorts the emotion scores of the text abstract from large to small, so that the corresponding text sentences are sorted, and the sorted text sentences are used as text data sets.
In the embodiment, the sequence information of the text abstract can be effectively ensured by sequencing the text sentences in the text abstract, and a data set is provided for the subsequent text emotion classification model training based on the pre-training model.
The training module 205 is configured to train a plurality of text data sets based on a pre-training model to obtain a text emotion classification model.
And the computer equipment acquires a pre-training model BERT and carries out parameter fine adjustment on the pre-training model BERT based on the plurality of text data sets to obtain a text emotion classification model.
BERT (Bidirectional Encoder representation from Transformers) is a pre-trained model that takes into account information of words preceding and following a word when processing the word, thereby obtaining context semantics. BERT (base) may be selected from a variety of pre-trained BERT models provided by Google as a pre-trained model for text matching. Since the BERT model expects the input data to have a particular format, the beginning of a sentence ([ CLS ]) and the separation/end of a sentence ([ SEP ]), tokens conforming to the fixed vocabulary used in BERT, token IDs provided by the BERT's token generator, mask IDs to indicate which elements in the sequence are tokens and which are filler elements, sentence fragment IDs to distinguish different sentences, position embedding to show the position of the marker in the sequence, can be marked with special markers.
In an alternative embodiment, the training module 205 training a plurality of text data sets based on a pre-training model to obtain a text emotion classification model includes:
calculating the character length of each text statement in each text data set;
starting from a first text statement, performing character accumulation on the text statements after the first text statement, stopping the character accumulation when the character length obtained by accumulation exceeds a preset character length, and splicing the accumulated text statements to obtain text data;
generating emotion category labels for the text data according to the emotion scores corresponding to the text data;
taking the positive emotion label and the corresponding text data as positive samples, and taking the negative emotion label and the corresponding text data as negative samples;
and training a pre-training model based on the positive sample and the negative sample to obtain a text emotion classification model.
The predetermined character length may be 510.
The emotion classification labels are divided into positive emotion labels and negative emotion labels, which are equivalent to a two-classification problem, and a logistic regression layer is added behind the Bert model for classification, so that a classification result is obtained.
The classification module 206 is configured to classify the long text.
In an alternative embodiment, the classification module 206 classifies the long text including:
acquiring a link address of a hospital to be monitored;
crawling out a plurality of long texts according to the link addresses; inputting the long texts into the text emotion classification model for emotion classification to obtain a plurality of emotion category labels, wherein the emotion category labels are positive emotion labels or negative emotion labels;
calculating the occupation ratio of the negative emotion label in the plurality of emotion category labels;
comparing the ratio with a preset ratio threshold;
and generating an early warning instruction in response to the ratio being greater than the preset ratio threshold.
In the optional embodiment, if public opinion monitoring needs to be performed on a certain hospital, a connection address URL of the hospital can be acquired, and a plurality of long texts are crawled according to the URL; or acquiring information (such as doctor name, occupation number) and patient information (patient name, medical insurance number) of doctors in the hospital, and crawling out a plurality of long texts according to the information of doctors and the information of patients. And calling the text emotion classification model to perform emotion classification on each crawled long text to obtain an emotion classification label.
When the negative emotion label accounts for a large proportion (is larger than a preset ratio threshold), it is indicated that a problem occurs in the monitored hospital, for example, a doctor-patient problem, and an early warning instruction needs to be produced to realize the examination of the hospital.
In the optional embodiment, the text emotion classification model is called to carry out emotion classification on a plurality of long texts related to the hospital, so that the directional monitoring of the public sentiment of the hospital can be realized, the good development of the hospital is promoted, and the text emotion classification model has high practical value and economic value.
For example, after a patient submits a text evaluation to a certain hospital, the emotion classification label (positive emotion label and negative emotion label) of the text evaluation can be determined, so that the emotional tendency of the user to the hospital is determined, the service quality of the hospital is determined, and the service level of the hospital can be improved in an auxiliary manner.
In an optional embodiment, the classifying module 206 further classifies the long text by:
acquiring a long text to be classified;
inputting the long text to be classified into the text emotion classification model for emotion classification to obtain an emotion classification label;
and marking the emotion category label on the long text to be classified.
Illustratively, the text emotion classification model is used for carrying out emotion classification on the long text to be classified, and emotion category labels output by the text emotion classification model are marked on the long text to be classified, so that the query and retrieval efficiency of medical texts can be improved.
In conclusion, because the long text contains rich semantic information, the method of the invention extracts the abstract of the article by adopting a text abstract method, extracts important information in the text and removes a large amount of interference information in the long text of medical news; compared with the whole text, the text abstract is shorter in length, so that the efficiency of model training is improved; calculating the emotion score of each text statement in each text abstract by using an emotion dictionary method, providing a basis for secondary sequencing and connection of sentences, and arranging the sentences subjected to secondary sequencing according to the appearance sequence of the original text to ensure the time sequence of the original text information; finally, the method can well solve the problems of word ambiguity, grammar and context dependence in Chinese by utilizing the excellent semantic feature extraction advantages of the Bert network, and overcomes the word number limitation of the Bert model to the medical news long text. Provides a feasible scheme for the classification of medical long text algorithms. The public opinion monitoring module has a high practical application value.
It is emphasized that, to further ensure the privacy and security of the text emotion classification model, the text emotion classification model can be stored in the nodes of the blockchain.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a computer device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.
It should be noted that the computer device 3 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 stores a computer program, which when executed by the at least one processor 32 implements all or part of the steps of the text emotion classification model training method. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the text emotion classification model training method described in the embodiments of the present invention; or all or part of functions of the text emotion classification model training device are realized. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A text emotion classification model training method is characterized by comprising the following steps:
acquiring a plurality of long texts, and segmenting each long text to obtain a plurality of text sentences;
calculating the TextRank value of each text statement in each long text, and generating a text abstract for each long text according to the TextRank value;
calculating the emotion score of each text statement in each text abstract;
sequencing the plurality of text sentences in each text abstract according to the emotion scores, and generating a text data set according to the sequenced plurality of text sentences;
and training a plurality of text data sets based on the pre-training model to obtain a text emotion classification model.
2. The method for training the text emotion classification model as claimed in claim 1, wherein the calculating the TextRank value of each text sentence in each long text, and generating the text abstract for each long text according to the TextRank value comprises:
sentence embedding is carried out on each text sentence based on a preset language model to obtain a sentence vector;
calculating the similarity between the statement vectors, and generating a similarity matrix according to the similarity;
generating a text graph structure according to the similarity matrix;
calculating the text graph structure by adopting a text ranking TextRank algorithm to obtain a TextRank value of each text statement;
sequencing the TextRank values and acquiring a plurality of target text sentences corresponding to a plurality of TextRank values sequenced at the front;
generating a text summary based on the plurality of target text sentences.
3. The method for training the text emotion classification model as claimed in claim 1, wherein said calculating the emotion score of each text sentence in each text abstract comprises:
performing word segmentation on each text sentence to obtain a plurality of word segments;
identifying a first emotion part of speech of each participle, wherein the first emotion part of speech comprises a positive part of speech, a negative part of speech and a negative part of speech;
when the first emotion part of speech of the word segmentation is identified to be the positive part of speech, identifying second emotion part of speech of the word segmentation before and after the word segmentation, and generating a first emotion weight according to the second emotion part of speech of the word segmentation before and after the word segmentation;
when the first emotion part of speech of the word segmentation is identified to be a negative part of speech, identifying a third emotion part of speech of a word segmentation before the word segmentation, and generating a second emotion weight according to the third emotion part of speech of the word segmentation before the word segmentation;
when the first emotion part of speech of the recognized word segmentation is a negative part of speech, determining the preset emotion weight as a third emotion weight;
and calculating the emotion score of the text sentence according to the first emotion weight, the second emotion weight and the third emotion weight corresponding to all the participles in the text sentence.
4. The method for training the text emotion classification model according to claim 1, wherein the training of the plurality of text data sets based on the pre-trained model to obtain the text emotion classification model comprises:
calculating the character length of each text statement in each text data set;
starting from a first text statement, performing character accumulation on the text statements after the first text statement, stopping the character accumulation when the character length obtained by accumulation exceeds a preset character length, and splicing the accumulated text statements to obtain text data;
generating emotion category labels for the text data according to the emotion scores corresponding to the text data;
taking the positive emotion label and the corresponding text data as positive samples, and taking the negative emotion label and the corresponding text data as negative samples;
and training a pre-training model based on the positive sample and the negative sample to obtain a text emotion classification model.
5. The method for training the text emotion classification model according to any one of claims 1 to 4, wherein the obtaining of the plurality of long texts comprises:
setting a plurality of search keywords;
and crawling a plurality of texts from a plurality of search engines according to the plurality of search keywords.
6. The method for training the text emotion classification model as recited in claim 5, wherein after training the plurality of text data sets based on the pre-trained model to obtain the text emotion classification model, the method further comprises:
acquiring a link address of a hospital to be monitored;
crawling out a plurality of long texts according to the link addresses;
inputting the long texts into the text emotion classification model for emotion classification to obtain a plurality of emotion category labels, wherein the emotion category labels are positive emotion labels or negative emotion labels;
calculating the occupation ratio of the negative emotion label in the plurality of emotion category labels;
comparing the ratio with a preset ratio threshold;
and generating an early warning instruction in response to the ratio being greater than the preset ratio threshold.
7. The method for training the text emotion classification model as recited in claim 5, wherein after training the plurality of text data sets based on the pre-trained model to obtain the text emotion classification model, the method further comprises:
acquiring a long text to be classified;
inputting the long text to be classified into the text emotion classification model for emotion classification to obtain an emotion classification label;
and marking the emotion category label on the long text to be classified.
8. An apparatus for training a text emotion classification model, the apparatus comprising:
the segmentation module is used for acquiring a plurality of long texts and segmenting each long text to obtain a plurality of text sentences;
the generating module is used for calculating the TextRank value of each text statement in each long text and generating a text abstract for each long text according to the TextRank value;
the calculation module is used for calculating the emotion score of each text statement in each text abstract;
the sequencing module is used for sequencing the text sentences in each text abstract according to the emotion scores and generating a text data set according to the sequenced text sentences;
and the training module is used for training a plurality of text data sets based on the pre-training model to obtain a text emotion classification model.
9. A computer device, characterized in that the computer device comprises:
a memory for storing a computer program;
a processor for implementing the method of training the text emotion classification model according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the method for training a text emotion classification model according to any one of claims 1 to 7.
CN202010917934.7A 2020-09-03 2020-09-03 Text emotion classification model training method and device, computer equipment and medium Pending CN111984793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010917934.7A CN111984793A (en) 2020-09-03 2020-09-03 Text emotion classification model training method and device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010917934.7A CN111984793A (en) 2020-09-03 2020-09-03 Text emotion classification model training method and device, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN111984793A true CN111984793A (en) 2020-11-24

Family

ID=73447442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010917934.7A Pending CN111984793A (en) 2020-09-03 2020-09-03 Text emotion classification model training method and device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN111984793A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463963A (en) * 2020-11-30 2021-03-09 深圳前海微众银行股份有限公司 Method for identifying target public sentiment, model training method and device
CN112836049A (en) * 2021-01-28 2021-05-25 网易(杭州)网络有限公司 Text classification method, device, medium and computing equipment
CN112906384A (en) * 2021-03-10 2021-06-04 平安科技(深圳)有限公司 Data processing method, device and equipment based on BERT model and readable storage medium
CN113326374A (en) * 2021-05-25 2021-08-31 成都信息工程大学 Short text emotion classification method and system based on feature enhancement
CN113407722A (en) * 2021-07-09 2021-09-17 平安国际智慧城市科技股份有限公司 Text classification method and device based on text abstract, electronic equipment and medium
CN113420138A (en) * 2021-07-15 2021-09-21 上海明略人工智能(集团)有限公司 Method and device for text classification, electronic equipment and storage medium
CN113971407A (en) * 2021-12-23 2022-01-25 深圳佑驾创新科技有限公司 Semantic feature extraction method and computer-readable storage medium
CN114065742A (en) * 2021-11-19 2022-02-18 马上消费金融股份有限公司 Text detection method and device
WO2023284327A1 (en) * 2021-07-12 2023-01-19 北京百度网讯科技有限公司 Method for training text quality assessment model and method for determining text quality
WO2023173537A1 (en) * 2022-03-17 2023-09-21 平安科技(深圳)有限公司 Text sentiment analysis method and apparatus, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977399A (en) * 2019-03-05 2019-07-05 国网青海省电力公司 A kind of data analysing method and device based on NLP technology
CN110399484A (en) * 2019-06-25 2019-11-01 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of long text
CN111428024A (en) * 2020-03-18 2020-07-17 北京明略软件系统有限公司 Method and device for extracting text abstract, computer storage medium and terminal
CN111475640A (en) * 2020-04-03 2020-07-31 支付宝(杭州)信息技术有限公司 Text emotion recognition method and device based on emotion abstract

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977399A (en) * 2019-03-05 2019-07-05 国网青海省电力公司 A kind of data analysing method and device based on NLP technology
CN110399484A (en) * 2019-06-25 2019-11-01 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of long text
CN111428024A (en) * 2020-03-18 2020-07-17 北京明略软件系统有限公司 Method and device for extracting text abstract, computer storage medium and terminal
CN111475640A (en) * 2020-04-03 2020-07-31 支付宝(杭州)信息技术有限公司 Text emotion recognition method and device based on emotion abstract

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463963A (en) * 2020-11-30 2021-03-09 深圳前海微众银行股份有限公司 Method for identifying target public sentiment, model training method and device
CN112836049A (en) * 2021-01-28 2021-05-25 网易(杭州)网络有限公司 Text classification method, device, medium and computing equipment
CN112906384A (en) * 2021-03-10 2021-06-04 平安科技(深圳)有限公司 Data processing method, device and equipment based on BERT model and readable storage medium
CN112906384B (en) * 2021-03-10 2024-02-02 平安科技(深圳)有限公司 BERT model-based data processing method, BERT model-based data processing device, BERT model-based data processing equipment and readable storage medium
CN113326374B (en) * 2021-05-25 2022-12-20 成都信息工程大学 Short text emotion classification method and system based on feature enhancement
CN113326374A (en) * 2021-05-25 2021-08-31 成都信息工程大学 Short text emotion classification method and system based on feature enhancement
CN113407722A (en) * 2021-07-09 2021-09-17 平安国际智慧城市科技股份有限公司 Text classification method and device based on text abstract, electronic equipment and medium
WO2023284327A1 (en) * 2021-07-12 2023-01-19 北京百度网讯科技有限公司 Method for training text quality assessment model and method for determining text quality
CN113420138A (en) * 2021-07-15 2021-09-21 上海明略人工智能(集团)有限公司 Method and device for text classification, electronic equipment and storage medium
CN113420138B (en) * 2021-07-15 2024-02-13 上海明略人工智能(集团)有限公司 Method and device for text classification, electronic equipment and storage medium
CN114065742A (en) * 2021-11-19 2022-02-18 马上消费金融股份有限公司 Text detection method and device
CN114065742B (en) * 2021-11-19 2023-08-25 马上消费金融股份有限公司 Text detection method and device
CN113971407A (en) * 2021-12-23 2022-01-25 深圳佑驾创新科技有限公司 Semantic feature extraction method and computer-readable storage medium
WO2023173537A1 (en) * 2022-03-17 2023-09-21 平安科技(深圳)有限公司 Text sentiment analysis method and apparatus, device and storage medium

Similar Documents

Publication Publication Date Title
CN111984793A (en) Text emotion classification model training method and device, computer equipment and medium
CN104050256B (en) Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
Cao et al. Web-based traffic sentiment analysis: Methods and applications
CN108959566B (en) A kind of medical text based on Stacking integrated study goes privacy methods and system
CN111950273A (en) Network public opinion emergency automatic identification method based on emotion information extraction analysis
Nejadgholi et al. A Semi-Supervised Training Method for Semantic Search of Legal Facts in Canadian Immigration Cases.
CN112149409B (en) Medical word cloud generation method and device, computer equipment and storage medium
CN111639486A (en) Paragraph searching method and device, electronic equipment and storage medium
CN110909122A (en) Information processing method and related equipment
CN112860848B (en) Information retrieval method, device, equipment and medium
Altheneyan et al. Big data ML-based fake news detection using distributed learning
CN113919336A (en) Article generation method and device based on deep learning and related equipment
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN113870974A (en) Risk prediction method and device based on artificial intelligence, electronic equipment and medium
CN116956896A (en) Text analysis method, system, electronic equipment and medium based on artificial intelligence
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
CN114020892A (en) Answer selection method and device based on artificial intelligence, electronic equipment and medium
CN113111159A (en) Question and answer record generation method and device, electronic equipment and storage medium
Kužina et al. Methods for Automatic Sensitive Data Detection in Large Datasets: a Review
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN114492446B (en) Legal document processing method and device, electronic equipment and storage medium
CN113065355B (en) Professional encyclopedia named entity identification method, system and electronic equipment
CN115221323A (en) Cold start processing method, device, equipment and medium based on intention recognition model
CN114742061A (en) Text processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210202

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen saiante Technology Service Co.,Ltd.

Address before: 1-34 / F, Qianhai free trade building, 3048 Xinghai Avenue, Mawan, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong 518000

Applicant before: Ping An International Smart City Technology Co.,Ltd.

TA01 Transfer of patent application right