WO2022267460A1 - Event-based sentiment analysis method and apparatus, and computer device and storage medium - Google Patents

Event-based sentiment analysis method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2022267460A1
WO2022267460A1 PCT/CN2022/072045 CN2022072045W WO2022267460A1 WO 2022267460 A1 WO2022267460 A1 WO 2022267460A1 CN 2022072045 W CN2022072045 W CN 2022072045W WO 2022267460 A1 WO2022267460 A1 WO 2022267460A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
trained
preset
short
original
Prior art date
Application number
PCT/CN2022/072045
Other languages
French (fr)
Chinese (zh)
Inventor
周骏红
彭琛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022267460A1 publication Critical patent/WO2022267460A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of data processing, and in particular to an event-based sentiment analysis method, device, computer equipment and storage medium.
  • the original unsupervised learning method extracts the emotional trigger words in the sentence and combines the syntax and grammar to judge the emotional score.
  • the accuracy and generalization ability of this method are limited.
  • Word Embedding the information contained in the text can be represented by a word vector matrix, which makes end-to-end supervised learning possible, such as learning with the text word vector matrix as the input of the neural network and the emotional score as the output.
  • Neural network model the effect of this method was initially limited by the ability of word vector matrix to extract text semantics, and with the emergence of models with strong ability to extract semantic information such as ELMo, GPT and Bert in recent years, supervised learning method has become The mainstream of sentiment analysis.
  • the initial sentiment analysis task refers to judging the sentiment score of a piece of text, but for texts involving multiple subjects and multiple events, the emotions of different events of different subjects may be different, so the details of different subjects and different events may be different.
  • the Granular Sentiment Analysis Task (ABSA) was proposed, and one way to realize this task is to input a piece of text and output the subject, event and corresponding sentiment score at the same time. Analysis, the accuracy rate is not high; another way of thinking is to give a good event system in advance, input text, and output the emotional score of the text subject on different events in the event system.
  • ABSA Granular Sentiment Analysis Task
  • Embodiments of the present application provide an event-based sentiment analysis method, device, computer equipment, and storage medium, which can effectively extract arguments in text to accurately determine sentiment tags and improve user experience.
  • an event-based sentiment analysis method which includes:
  • the determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
  • an event-based sentiment analysis device which includes:
  • a request parsing unit configured to parse the sentiment analysis request to obtain the initial text if the sentiment analysis request is received
  • a preprocessing unit configured to preprocess the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
  • the argument extraction unit is used to input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotions trigger word;
  • the emotion determination unit is configured to input the determined emotional trigger words, subjects and events of each short text to be analyzed into a preset emotion determination model, so as to obtain an emotion label corresponding to each subject of the short text to be analyzed.
  • the embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • the determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
  • the embodiment of the present application also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can implement the following steps when executed by a processor:
  • the determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
  • the embodiment of the present application provides an event-based sentiment analysis method, device, computer equipment, and storage medium.
  • the embodiment of the application determines the emotional trigger word, subject and event through the argument extraction model, which can improve the determination of different subjects. Under the corresponding emotional trigger words, the accuracy of the emotional label of the event is improved, and the user experience is improved.
  • the method of this embodiment can also be applied to scenarios such as smart government affairs, thereby promoting the effect of smart city construction.
  • FIG. 1 is a schematic flow diagram of an event-based sentiment analysis method provided in an embodiment of the present application
  • Fig. 1a is a schematic diagram of an application scenario of an event-based sentiment analysis method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of an event-based sentiment analysis method provided by another embodiment of the present application.
  • FIG. 3 is a schematic sub-flow diagram of an event-based sentiment analysis method provided by another embodiment of the present application.
  • FIG. 4 is a schematic sub-flow diagram of an event-based sentiment analysis method provided by another embodiment of the present application.
  • FIG. 5 is a schematic sub-flow diagram of an event-based sentiment analysis method provided by another embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an event-based sentiment analysis method provided by another embodiment of the present application.
  • FIG. 7 is a schematic block diagram of an event-based sentiment analysis device provided in an embodiment of the present application.
  • Fig. 8 is a schematic block diagram of an event-based sentiment analysis device provided by another embodiment of the present application.
  • Fig. 9 is a schematic block diagram of a data acquisition unit of an event-based sentiment analysis device provided by another embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a text segmentation unit of an event-based sentiment analysis device provided by another embodiment of the present application.
  • Fig. 11 is a schematic block diagram of a first training unit of an event-based sentiment analysis device provided by another embodiment of the present application.
  • Fig. 12 is a schematic block diagram of an event-based sentiment analysis device provided by another embodiment of the present application.
  • FIG. 13 is a schematic diagram of the structure and composition of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of an event-based sentiment analysis method provided by an embodiment of the present application
  • FIG. 1a is a schematic diagram of a scenario of an event-based sentiment analysis method in an embodiment of the present application.
  • the event-based sentiment analysis method is applied in the management server 20 .
  • the management server 20 preprocesses the initial text according to an event-based sentiment analysis method, thereby obtaining a plurality of short texts to be analyzed including event keywords, and inputting the short texts to be analyzed into the argument extraction model 10 to determine each to-be-analyzed text Analyze the emotional trigger words, subject and event of the short text, then determine the argument extraction model of the emotional trigger word, subject and event input emotion determination model 30, thereby obtain the corresponding emotional label with the text to be analyzed, the method of the present embodiment is extremely Greatly improved the accuracy of determining sentiment labels for text. The following will introduce in detail the steps of the event-based sentiment analysis method from the perspective of the management server 20.
  • the steps of the event-based sentiment analysis method may specifically include steps S101-S104.
  • Step S101 if a sentiment analysis request is received, parse the sentiment analysis request to obtain initial text.
  • the management server may parse the sentiment analysis request to obtain the initial text.
  • the initial text may be pre-stored in the database, or crawled from an external server by a web crawler.
  • the initial text can be the public opinion information of the enterprise, and the public opinion information can include related subjects, events, and emotional trigger words. Through the analysis of different subjects, events, and emotional keywords, it is possible to accurately distinguish the information corresponding to the same emotional trigger words. different emotions.
  • the management server may define the events.
  • the public opinion events of an enterprise are generally closely related to the business operation logic, thus forming a complete event system. Aiming at the similarity, correlation, and similarity of events in corporate public opinion, it is possible to design events related to public opinion and emotion.
  • a three-level event system can include finance, personnel, operation, capital events, personnel, compliance, credit, and other eight aspects. Each aspect contains a series of specific events to form a second-level label. The second-level label is further detailed. After the division, a total of 110 specific events can be formed.
  • sentiment labels such as major negative, general negative, neutral, general positive, and major positive can be included.
  • the relevant emotional trigger words and design the mapping relationship between emotional trigger words in different directions and the emotional label system. For example, for the event of "individual stock market performance”, we can summarize the emotional trigger word set “, “skyrocketing”, “falling”, “slumping”... ⁇ , and at the same time, the related emotional label system can be mapped as ⁇ "rising”: generally positive, “slumping”: major negative... ⁇ .
  • Step S102 preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events.
  • the management server can also perform preprocessing on the acquired initial text. Since the initial text obtained by the management server can be a news release related to corporate public opinion including multiple events, the content of the news release can be very long, and often involves a large number of sentences that are not related to the target event, so it is necessary to The initial text is preprocessed accordingly, such as dividing the initial text into several short texts, and through screening, the short texts that have no redundant information related to the event can be excluded, and only the short texts that include event keywords are determined to be Analyzing short texts, that is, all determined short texts to be analyzed may include event keywords and the like.
  • events can be associated with different event keywords.
  • event keywords contained in each event can be summarized according to the text information in the current network news.
  • An event can be associated with an event keyword, or it can be associated with There are multiple event keywords.
  • Step S103 input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words.
  • the management server can input all the short texts to be analyzed into the preset argument extraction model, and the preset argument extraction model can combine the emotional trigger words, subjects and events in the text to be analyzed Accurate identification, that is, through the analysis of the argument extraction model, can make the relevant key information in the short text to be analyzed be determined.
  • Accurate identification that is, through the analysis of the argument extraction model, can make the relevant key information in the short text to be analyzed be determined.
  • through the argument extraction model it is also possible to distinguish and determine the time, place and other elements included in the short text to be analyzed, so as to more accurately determine the emotion of the short text to be analyzed Label.
  • the preset argument extraction model can be a model obtained by training the neural network through training data, which can improve the efficiency and accuracy of determining the emotional trigger words, subjects and events of the text to be analyzed, and can more effectively analyze The text to be analyzed for a subject and multiple events.
  • Step S104 input the determined emotional trigger words, subjects and events of each short text to be analyzed into the preset emotion determination model, so as to obtain the corresponding emotion tags of each subject of the short text to be analyzed.
  • the management server can input the short text to be analyzed after determining the emotion trigger word, the subject and the event into the emotion determination model, so as to obtain the emotion tag corresponding to the subject of each short text to be analyzed.
  • the current stock price of company A is ** yuan, and the stock price has risen sharply.
  • the main body of the text is company A, the stock price is an event, and the rise is an emotional trigger word. It can be seen that the emotional label corresponding to the text can be generally positive.
  • the current annual loss of company B is ** yuan, and the annual loss decreases.
  • the main body of the text is company B, and the annual loss is an event, and the decrease is generally positive.
  • the preset emotion determination model may be a model obtained by training a neural network with training data, which can more accurately determine the emotion label of the text to be analyzed.
  • the text to be analyzed can be accurately and efficiently marked with emotional tags, which improves the user experience.
  • the method further includes steps S201 to S204 before step S101 .
  • Step S201 crawling the original text through a web crawler.
  • the management server can obtain and process relevant training data to train related neural networks to obtain related models for tagging emotional tags.
  • the management server can extract a large amount of news related to corporate public opinion from relevant news websites through web crawlers, that is, the obtained news can be the original text to be processed.
  • step S202 the acquired original text is preprocessed to obtain a plurality of short texts to be trained including event keywords, and the obtained short texts to be trained are stored in a preset database as a training set.
  • the management server can also preprocess the obtained original text, that is, divide the original text to obtain multiple short texts, and determine whether the short texts contain event keywords by confirming whether the short texts contain event keywords.
  • This is the short text to be trained.
  • the short text to be trained can be stored in a preset database as a training set for easy recall.
  • the step S202 may include: steps S301-S302.
  • Step S301 Segment the original text according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords.
  • the preset text segmentation function can be cut_text.
  • the original text can be segmented so that relevant information can be divided into the same short text as much as possible.
  • Short texts including preset event keywords are used as short texts to be trained.
  • the step S301 may include steps S401-S403.
  • Step S401 Segment the original text according to a preset text segmentation function to obtain multiple original subtexts.
  • the management server can segment the original text according to a preset text segmentation function, so as to obtain multiple shorter original subtexts. Since the original text may be longer and contain more useless information, it is necessary to distinguish and identify the original sub-text.
  • Step S402 judging whether the original subtext includes preset event keywords.
  • the management server can judge whether the original subtext includes preset event keywords, that is, to realize screening and determination of the original subtext.
  • Step S403 if the original subtext includes preset event keywords, determine the original subtext as the short text to be trained.
  • the management server can determine the original subtext as the short text to be trained when the original subtext includes preset event keywords, so as to realize the classification and segmentation of the original text.
  • management server can also determine the event described in the short text to be trained according to the event keyword, so as to facilitate subsequent processing.
  • step S301 also includes the following steps:
  • Step S404 if the original subtext does not include preset event keywords, delete the original subtext.
  • the original subtext can be deleted, so as to determine more accurate training data, that is, determine a more reasonable short text to be trained.
  • Step S302 storing the obtained short text to be trained as a training set in a preset database.
  • the management server may store the obtained short text to be trained as a training set in a preset database.
  • Step S203 if a model training instruction is received, call the short text to be trained from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained.
  • the management server can call the short text to be trained from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained.
  • the above labeling can be performed manually, or automatically according to relevant labeling instructions, which is not specifically limited in this embodiment.
  • Step S204 train the preset first neural network through the marked text to be trained to obtain an argument extraction model.
  • the management server can obtain the marked text to be trained, and use the marked text to be trained to realize the training of the first neural network, thereby obtaining a trained argument extraction model.
  • the step S204 may include steps S501 - S502 .
  • Step S501 using Bert coding to obtain the vector of the marked text to be trained.
  • the management server can also use Bert encoding to obtain the vector of the marked text to be trained, so as to perform subsequent training steps.
  • Step S502 input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network for training to obtain an argument extraction model.
  • the management server can input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network to train the first neural network,
  • An argument extraction model can be obtained by training the first neural network.
  • the method further includes steps S601 - S603 before step S101 .
  • Step S601 obtaining the emotional trigger words, subjects and events included in the tagged text to be trained.
  • the management server can obtain the emotional trigger words, subjects and events included in the tagged text to be trained.
  • Step S602 determine the emotion label of the text to be trained according to the acquired emotion trigger words and events.
  • each event is pre-associated with emotional trigger words, and for different events, different emotional tags mapped to different emotional trigger words are different. Therefore, the management server can determine the emotion tags of the text to be trained according to the acquired emotion trigger words and events.
  • Step S603 train the second neural network by using the determined emotional labels and the labeled text to be trained to obtain an emotion determination model.
  • the management server can realize the training of the second neural network through the determined emotion label and the labeled text to be trained, so as to obtain the emotion determination model.
  • the embodiment of the present application can effectively extract the arguments in the text to accurately determine the emotional tags, improve the user experience, and can also be applied to scenarios such as smart government affairs, thereby promoting the construction of smart cities.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), etc.
  • the embodiment of the present application also proposes an event-based sentiment analysis device, the device 100 includes: a request parsing unit 101, a preprocessing unit 102, and an argument extraction unit 103 and an emotion determining unit 104.
  • the request parsing unit 101 is configured to, if a sentiment analysis request is received, parse the sentiment analysis request to obtain an initial text.
  • the management server may parse the sentiment analysis request to obtain the initial text.
  • the initial text may be pre-stored in the database, or crawled from an external server by a web crawler.
  • the management server may define the events.
  • the public opinion events of an enterprise are generally closely related to the business operation logic, thus forming a complete event system. Aiming at the similarity, correlation, and similarity of events in corporate public opinion, it is possible to design events related to public opinion and emotion.
  • a three-level event system can include finance, personnel, operation, capital events, personnel, compliance, credit, and other eight aspects. Each aspect contains a series of specific events to form a second-level label. The second-level label is further detailed. After the division, a total of 110 specific events can be formed.
  • sentiment labels such as major negative, general negative, neutral, general positive, and major positive can be included.
  • the relevant emotional trigger words and design the mapping relationship between emotional trigger words in different directions and the emotional label system. For example, for the event of "individual stock market performance”, we can summarize the emotional trigger word set “, “skyrocketing”, “falling”, “slumping”... ⁇ , and at the same time, the related emotional label system can be mapped as ⁇ "rising”: generally positive, “slumping”: major negative... ⁇ .
  • the preprocessing unit 102 is configured to preprocess the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events.
  • the management server can also perform preprocessing on the acquired initial text. Since the initial text obtained by the management server can be a news release related to corporate public opinion including multiple events, the content of the news release can be very long, and often involves a large number of sentences that are not related to the target event, so it is necessary to The initial text is preprocessed accordingly, such as dividing the initial text into several short texts, and through screening, the short texts that have no redundant information related to the event can be excluded, and only the short texts that include event keywords are determined to be Analyzing short texts, that is, all determined short texts to be analyzed may include event keywords and the like.
  • events can be associated with different event keywords.
  • event keywords contained in each event can be summarized according to the text information in the current network news.
  • An event can be associated with an event keyword, or it can be associated with There are multiple event keywords.
  • the argument extraction unit 103 is used to input all short texts to be analyzed into a preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events have different Emotional triggers.
  • the management server can input all the short texts to be analyzed into the preset argument extraction model, and the preset argument extraction model can combine the emotional trigger words, subjects and events in the text to be analyzed Accurate identification, that is, through the analysis of the argument extraction model, can make the relevant key information in the short text to be analyzed be determined.
  • Accurate identification that is, through the analysis of the argument extraction model, can make the relevant key information in the short text to be analyzed be determined.
  • through the argument extraction model it is also possible to distinguish and determine the time, place and other elements included in the short text to be analyzed, so as to more accurately determine the emotion of the short text to be analyzed Label.
  • the preset argument extraction model can be a model obtained by training the neural network through training data, which can improve the efficiency and accuracy of determining the emotional trigger words, subjects and events of the text to be analyzed, and can more effectively analyze The text to be analyzed for a subject and multiple events.
  • the emotion determination unit 104 is configured to input the determined emotion trigger words, subjects and events of each short text to be analyzed into a preset emotion determination model, so as to obtain an emotion tag corresponding to the subject of each short text to be analyzed.
  • the management server can input the short text to be analyzed after determining the emotion trigger word, the subject and the event into the emotion determination model, so as to obtain the emotion tag corresponding to the subject of each short text to be analyzed.
  • the current stock price of company A is ** yuan, and the stock price has risen sharply.
  • the main body of the text is company A, the stock price is an event, and the rise is an emotional trigger word. It can be seen that the emotional label corresponding to the text can be generally positive.
  • the current annual loss of company B is ** yuan, and the annual loss decreases.
  • the main body of the text is company B, and the annual loss is an event, and the decrease is generally positive.
  • the preset emotion determination model may be a model obtained by training a neural network with training data, which can more accurately determine the emotion label of the text to be analyzed.
  • the text to be analyzed can be accurately and efficiently marked with emotional tags, which improves the user experience.
  • the device 100 before the request parsing unit 101 , the device 100 further includes a text crawling unit 201 , a data acquisition unit 202 , a data labeling unit 203 and a first training unit 204 .
  • the text crawling unit 201 is configured to crawl the original text through a web crawler.
  • the management server can obtain and process relevant training data to train related neural networks to obtain related models for tagging emotional tags.
  • the management server can extract a large amount of news related to corporate public opinion from relevant news websites through web crawlers, that is, the obtained news can be the original text to be processed.
  • the data acquisition unit 202 is configured to preprocess the acquired original text to obtain a plurality of short texts to be trained including event keywords, and store the obtained short texts to be trained as a training set in a preset database.
  • the management server can also preprocess the obtained original text, that is, divide the original text to obtain multiple short texts, and determine whether the short texts contain event keywords by confirming whether the short texts contain event keywords.
  • This is the short text to be trained.
  • the short text to be trained can be stored in a preset database as a training set for easy recall.
  • the data acquisition unit 202 may include: a text segmentation unit 301 and a text storage unit 302 .
  • the text segmentation unit 301 is configured to segment the original text according to a preset text segmentation function and preset event keywords, so as to obtain a plurality of short texts to be trained including event keywords.
  • the preset text segmentation function can be cut_text.
  • the original text can be segmented so that relevant information can be divided into the same short text as much as possible.
  • Short texts including preset event keywords are used as short texts to be trained.
  • the text segmentation unit 301 may include a text processing unit 401 , a text judgment unit 402 and a text determination unit 403 .
  • the text processing unit 401 is configured to segment the original text according to a preset text segmentation function to obtain multiple original subtexts.
  • the management server can segment the original text according to a preset text segmentation function, so as to obtain multiple shorter original subtexts. Since the original text may be longer and contain more useless information, it is necessary to distinguish and identify the original sub-text.
  • a text judging unit 402 configured to judge whether the original subtext includes preset event keywords.
  • the management server can judge whether the original subtext includes preset event keywords, that is, to realize screening and determination of the original subtext.
  • a text determining unit 403 configured to determine the original subtext as the short text to be trained if the original subtext includes preset event keywords.
  • the management server can determine the original subtext as the short text to be trained when the original subtext includes preset event keywords, so as to realize the classification and segmentation of the original text.
  • management server can also determine the event described in the short text to be trained according to the event keyword, so as to facilitate subsequent processing.
  • the text segmentation unit 301 also includes the following units:
  • a text deletion unit 404 configured to delete the original subtext if the original subtext does not include preset event keywords.
  • the original subtext can be deleted to determine more accurate training data, that is, to determine a more reasonable short text to be trained.
  • the text storage unit 302 is configured to store the obtained short text to be trained as a training set in a preset database.
  • the management server may store the obtained short text to be trained as a training set in a preset database.
  • the data labeling unit 203 is configured to call the short text to be trained from the preset database for labeling if a model training instruction is received, so as to determine the emotional trigger words, subjects and events included in each text to be trained.
  • the management server can call the short text to be trained from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained.
  • the above labeling can be performed manually, or automatically according to relevant labeling instructions, which is not specifically limited in this embodiment.
  • the first training unit 204 is configured to train a preset first neural network through the marked text to be trained to obtain an argument extraction model.
  • the management server can obtain the marked text to be trained, and use the marked text to be trained to realize the training of the first neural network, thereby obtaining a trained argument extraction model.
  • the first training unit 204 may include a vector determination unit 501 and a first model training unit 502 .
  • the vector determining unit 501 is used to obtain the vector of the marked text to be trained by using Bert coding.
  • the management server can also use Bert encoding to obtain the vector of the marked text to be trained, so as to perform subsequent training steps.
  • the first model training unit 502 is used to input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network for training to obtain argument extraction Model.
  • the management server can input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network to train the first neural network,
  • An argument extraction model can be obtained by training the first neural network.
  • each event is mapped with a corresponding emotion tag through the associated emotional trigger word in advance, before the request parsing unit 101 of the device 100, it also includes a feature acquisition unit 601, a tag determination unit 602, a second Two training units 603.
  • the feature acquisition unit 601 is configured to acquire the emotional trigger words, subjects and events included in the tagged text to be trained.
  • the management server can obtain the emotional trigger words, subjects and events included in the tagged text to be trained.
  • a tag determining unit 602 configured to determine an emotional tag related to the text to be trained according to the acquired emotional trigger words and events.
  • each event is pre-associated with emotional trigger words, and for different events, different emotional tags mapped to different emotional trigger words are different. Therefore, the management server can determine the emotional label of the text to be trained according to the acquired emotional trigger words and events.
  • the second training unit 603 is configured to train the second neural network by using the determined emotional label and the labeled text to be trained to obtain an emotion determination model.
  • the management server can realize the training of the second neural network through the determined emotion label and the labeled text to be trained, so as to obtain the emotion determination model.
  • the above request parsing unit 101, preprocessing unit 102, argument extraction unit 103, and sentiment determination unit 104 can be embedded in or independent of the event-based sentiment analysis device in the form of hardware. It can be stored in the memory of the event-based sentiment analysis device in the form of software, so that the processor can invoke and execute the operations corresponding to the above units.
  • the processor may be a central processing unit (CPU), a microprocessor, a single-chip microcomputer, and the like.
  • the above event-based sentiment analysis apparatus can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 13 .
  • FIG. 13 is a schematic diagram of the structural composition of a computer device of the present application.
  • the device may be a server, where the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 700 includes a processor 702 connected through a system bus 701 , a memory, an internal memory 704 and a network interface 705 , wherein the memory may include a non-volatile storage medium 703 and an internal memory 704 .
  • the non-volatile storage medium 703 can store an operating system 7031 and a computer program 7032.
  • the processor 702 can execute an event-based sentiment analysis method.
  • the processor 702 is used to provide calculation and control capabilities and support the operation of the entire computer device 700 .
  • the internal memory 704 provides an environment for running the computer program 7032 in the non-volatile storage medium 703.
  • the processor 702 can execute an event-based sentiment analysis method.
  • the network interface 705 is used for network communication with other devices.
  • the structure shown in FIG. 13 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer device 700 on which the solution of this application is applied.
  • the specific computer device 700 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
  • the processor 702 is configured to execute the computer program 7032 stored in the memory, so as to realize the steps in the event-based sentiment analysis method described above.
  • the processor 602 may be a central processing unit (Central Processing Unit, CPU), and the processor 602 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the computer program can be stored in a storage medium, which is a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the process steps of the above method embodiments.
  • the present application also provides a storage medium.
  • the storage medium may be a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the storage medium stores a computer program, and when the computer program is executed by the processor, the processor executes the steps in the event-based sentiment analysis method described above.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other physical storages that can store program codes. medium,.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other physical storages that can store program codes. medium,.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are illustrative only.
  • the division of each unit is only a logical function division, and there may be another division method in actual implementation.
  • several units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of software products, and the computer software products are stored in a storage medium.
  • several instructions are included to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the embodiments of the present application are an event-based sentiment analysis method and apparatus, and a computer device and a storage medium. The method comprises: if a sentiment analysis request is received, parsing the sentiment analysis request to acquire initial text; pre-processing the acquired initial text to obtain a plurality of short pieces of text to be analyzed, which comprise event keywords; inputting all of said short pieces of text into a preset argument extraction model, so as to determine a sentiment trigger word, a main body and an event of each said short piece of text; and inputting the determined sentiment trigger word, the main body and the event of each said short piece of text into a preset sentiment determination model, so as to obtain a sentiment label corresponding to the main body of each said short piece of text. By means of the present application, an argument in text can be effectively extracted, so as to accurately determine a sentiment label, thereby improving the usage experience of a user; and the present application can also be applied to scenarios such as smart government affairs, thereby promoting the construction of a smart city.

Description

基于事件的情感分析方法、装置、计算机设备及存储介质Event-based sentiment analysis method, device, computer equipment and storage medium
本申请要求于2021年6月25日提交中国专利局、申请号为202110712428.9,发明名称为“基于事件的情感分析方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on June 25, 2021 with the application number 202110712428.9 and the title of the invention is "Event-Based Sentiment Analysis Method, Device, Computer Equipment and Storage Medium", the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及数据处理领域,尤其涉及一种基于事件的情感分析方法、装置、计算机设备及存储介质。The present application relates to the field of data processing, and in particular to an event-based sentiment analysis method, device, computer equipment and storage medium.
背景技术Background technique
目前已经发展出很多技术来判断一段文本的情感倾向。如最初的无监督学习法,其通过抽取句子中的情感触发词,并结合句法语法判断情感得分,但是由于语言的复杂性,这种方法的准确率和泛化能力都有限。随着Word Embedding的提出,对文本包含的信息可以用词向量矩阵表示,这使得端到端的监督学习法成为了可能,如以文本的词向量矩阵为神经网络输入,以情感得分为输出的学习神经网络模型,该方法的效果一开始受限于词向量矩阵对文本语义的提取能力,而随着近年来ELMo、GPT以及Bert等提取语义信息能力很强的模型的出现,监督学习法成为了情感分析的主流。At present, many techniques have been developed to judge the emotional tendency of a piece of text. For example, the original unsupervised learning method extracts the emotional trigger words in the sentence and combines the syntax and grammar to judge the emotional score. However, due to the complexity of the language, the accuracy and generalization ability of this method are limited. With the introduction of Word Embedding, the information contained in the text can be represented by a word vector matrix, which makes end-to-end supervised learning possible, such as learning with the text word vector matrix as the input of the neural network and the emotional score as the output. Neural network model, the effect of this method was initially limited by the ability of word vector matrix to extract text semantics, and with the emergence of models with strong ability to extract semantic information such as ELMo, GPT and Bert in recent years, supervised learning method has become The mainstream of sentiment analysis.
又最初的情感分析任务是指对一段文本判断其情感得分,但是对于涉及多个主体、多个事件的文本,不同主体的不同事件的情感可能都是不同的,故针对不同主体不同事件的细粒度情感分析任务(ABSA)被提出来,而实现该任务的一种思路是输入一段文本,同时输出主体、事件和对应的情感得分,这种方法由于既要识别主体、事件,又要进行情感分析,准确率不高;另一种思路是事先给定好事件体系,输入文本,输出文本主体在事件体系中的不同事件上的情感得分。但是发明人意识到,上述两种思路存在两方面的问题,第一,没有体现出不同事件情感的表述方式的差异,如“利润上升/负债上升”情感触发词都是上升,情感倾向却截然不同;第二,在ABSA任务中,找出主体、事件、情感触发词的对应关系不够明确,故上述两种情感分析的思路所输出对应关系比较草率。总体来说,现有的情感分析的方法在实践中的效果还不够理想,有待优化改进。The initial sentiment analysis task refers to judging the sentiment score of a piece of text, but for texts involving multiple subjects and multiple events, the emotions of different events of different subjects may be different, so the details of different subjects and different events may be different. The Granular Sentiment Analysis Task (ABSA) was proposed, and one way to realize this task is to input a piece of text and output the subject, event and corresponding sentiment score at the same time. Analysis, the accuracy rate is not high; another way of thinking is to give a good event system in advance, input text, and output the emotional score of the text subject on different events in the event system. However, the inventor realized that there are two problems in the above two ways of thinking. First, it does not reflect the difference in the way of expressing the emotions of different events. For example, the emotional trigger words of "increasing profits/increasing liabilities" are all rising, but the emotional tendency is completely different. Different; second, in the ABSA task, it is not clear enough to find out the corresponding relationship between subject, event, and emotional trigger words, so the corresponding relationship output by the above two sentiment analysis ideas is relatively sloppy. Generally speaking, the existing sentiment analysis methods are not ideal in practice and need to be optimized and improved.
发明内容Contents of the invention
本申请实施例提供一种基于事件的情感分析方法、装置、计算机设备及存储介质,能够有效提取文本中的论元从而准确确定情感标签,提高了用户的使用体验度。Embodiments of the present application provide an event-based sentiment analysis method, device, computer equipment, and storage medium, which can effectively extract arguments in text to accurately determine sentiment tags and improve user experience.
第一方面,本申请实施例提供了一种基于事件的情感分析方法,该方法包括:In the first aspect, the embodiment of the present application provides an event-based sentiment analysis method, which includes:
若接收到情感分析请求,解析所述情感分析请求以获取初始文本;If a sentiment analysis request is received, parse the sentiment analysis request to obtain the initial text;
对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;Preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;Input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words;
将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
第二方面,本申请实施例还提供了一种基于事件的情感分析装置,该装置包括:In the second aspect, the embodiment of the present application also provides an event-based sentiment analysis device, which includes:
请求解析单元,用于若接收到情感分析请求,解析所述情感分析请求以获取初始文本;A request parsing unit, configured to parse the sentiment analysis request to obtain the initial text if the sentiment analysis request is received;
预处理单元,用于对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;A preprocessing unit, configured to preprocess the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
论元提取单元,用于将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;The argument extraction unit is used to input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotions trigger word;
情感确定单元,用于将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The emotion determination unit is configured to input the determined emotional trigger words, subjects and events of each short text to be analyzed into a preset emotion determination model, so as to obtain an emotion label corresponding to each subject of the short text to be analyzed.
第三方面,本申请实施例还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现如下步骤:In the third aspect, the embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
若接收到情感分析请求,解析所述情感分析请求以获取初始文本;If a sentiment analysis request is received, parse the sentiment analysis request to obtain the initial text;
对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;Preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;Input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words;
将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序当被处理器执行时可实现如下步骤:In the fourth aspect, the embodiment of the present application also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can implement the following steps when executed by a processor:
若接收到情感分析请求,解析所述情感分析请求以获取初始文本;If a sentiment analysis request is received, parse the sentiment analysis request to obtain the initial text;
对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;Preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;Input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words;
将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
本申请实施例提供了一种基于事件的情感分析方法、装置、计算机设备及存储介质,该申请实施例由于通过论元提取模型来确定情感触发词、主体以及事件,可实现提高确定对不同主体在相应的情感触发词下,所属事件的情感标签的准确性,以及提高用户使用体验度,同时,本实施例的方法还能应用于智慧政务等场景中,从而推动智慧城市的建设的效果。The embodiment of the present application provides an event-based sentiment analysis method, device, computer equipment, and storage medium. The embodiment of the application determines the emotional trigger word, subject and event through the argument extraction model, which can improve the determination of different subjects. Under the corresponding emotional trigger words, the accuracy of the emotional label of the event is improved, and the user experience is improved. At the same time, the method of this embodiment can also be applied to scenarios such as smart government affairs, thereby promoting the effect of smart city construction.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.
图1是本申请实施例提供的一种基于事件的情感分析方法的流程示意图;FIG. 1 is a schematic flow diagram of an event-based sentiment analysis method provided in an embodiment of the present application;
图1a是本申请实施例提供的一种基于事件的情感分析方法的应用场景示意图;Fig. 1a is a schematic diagram of an application scenario of an event-based sentiment analysis method provided by an embodiment of the present application;
图2是本申请另一实施例提供的一种基于事件的情感分析方法的流程示意图;FIG. 2 is a schematic flowchart of an event-based sentiment analysis method provided by another embodiment of the present application;
图3是本申请另一实施例提供的一种基于事件的情感分析方法的子流程示意图;FIG. 3 is a schematic sub-flow diagram of an event-based sentiment analysis method provided by another embodiment of the present application;
图4是本申请另一实施例提供的一种基于事件的情感分析方法的子流程示意图;FIG. 4 is a schematic sub-flow diagram of an event-based sentiment analysis method provided by another embodiment of the present application;
图5是本申请另一实施例提供的一种基于事件的情感分析方法的子流程示意图;FIG. 5 is a schematic sub-flow diagram of an event-based sentiment analysis method provided by another embodiment of the present application;
图6是本申请另一实施例提供的一种基于事件的情感分析方法的流程示意图;FIG. 6 is a schematic flowchart of an event-based sentiment analysis method provided by another embodiment of the present application;
图7是本申请实施例提供的一种基于事件的情感分析装置的示意性框图;FIG. 7 is a schematic block diagram of an event-based sentiment analysis device provided in an embodiment of the present application;
图8是本申请另一实施例提供的一种基于事件的情感分析装置的示意性框图;Fig. 8 is a schematic block diagram of an event-based sentiment analysis device provided by another embodiment of the present application;
图9是本申请另一实施例提供的一种基于事件的情感分析装置的数据获取单元的示意性框图;Fig. 9 is a schematic block diagram of a data acquisition unit of an event-based sentiment analysis device provided by another embodiment of the present application;
图10是本申请另一实施例提供的一种基于事件的情感分析装置的文本分割单元的示意性框图;FIG. 10 is a schematic block diagram of a text segmentation unit of an event-based sentiment analysis device provided by another embodiment of the present application;
图11是本申请另一实施例提供的一种基于事件的情感分析装置的第一训练单元的示意性框图;Fig. 11 is a schematic block diagram of a first training unit of an event-based sentiment analysis device provided by another embodiment of the present application;
图12是本申请另一实施例提供的一种基于事件的情感分析装置的示意性框图;Fig. 12 is a schematic block diagram of an event-based sentiment analysis device provided by another embodiment of the present application;
图13是本申请实施例提供的一种计算机设备结构组成示意图。FIG. 13 is a schematic diagram of the structure and composition of a computer device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "comprising" and "comprises" indicate the presence of described features, integers, steps, operations, elements and/or components, but do not exclude one or Presence or addition of multiple other features, integers, steps, operations, elements, components and/or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the specification of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include plural referents unless the context clearly dictates otherwise.
请参阅图1和图1a,图1是本申请实施例提供的一种基于事件的情感分析方法的示意流程图,图1a是本申请实施例中基于事件的情感分析方法的场景示意图。该基于事件的情感分析方法应用于管理服务器20中。该管理服务器20根据基于事件的情感分析方法对初始文本进行预处理,从而得到多个包括事件关键词的待分析短文本,并将该待分析短文本输入论元提取模型10得到确定每个待分析短文本的情感触发词、主体和事件,随后再确定情感触发词、主体和事件的论元提取模型输入情感确定模型30,从而得到与待分析文本相应的情感标签,本实施例的方法极大地提高了确定文本的情感标签的准确性。以下将以管理服务器20的角度 详细地介绍该基于事件的情感分析方法的各个步骤。Please refer to FIG. 1 and FIG. 1a. FIG. 1 is a schematic flowchart of an event-based sentiment analysis method provided by an embodiment of the present application, and FIG. 1a is a schematic diagram of a scenario of an event-based sentiment analysis method in an embodiment of the present application. The event-based sentiment analysis method is applied in the management server 20 . The management server 20 preprocesses the initial text according to an event-based sentiment analysis method, thereby obtaining a plurality of short texts to be analyzed including event keywords, and inputting the short texts to be analyzed into the argument extraction model 10 to determine each to-be-analyzed text Analyze the emotional trigger words, subject and event of the short text, then determine the argument extraction model of the emotional trigger word, subject and event input emotion determination model 30, thereby obtain the corresponding emotional label with the text to be analyzed, the method of the present embodiment is extremely Greatly improved the accuracy of determining sentiment labels for text. The following will introduce in detail the steps of the event-based sentiment analysis method from the perspective of the management server 20.
如图1所示,该基于事件的情感分析方法的步骤具体可以包括步骤S101~S104。As shown in FIG. 1 , the steps of the event-based sentiment analysis method may specifically include steps S101-S104.
步骤S101,若接收到情感分析请求,解析所述情感分析请求以获取初始文本。Step S101, if a sentiment analysis request is received, parse the sentiment analysis request to obtain initial text.
在本实施例中,管理服务器若接收到情感分析请求,可以对该情感分析请求进行解析,从而得到初始文本。该初始文本可以是预先存储在数据库中的文本,也可以是通过网络爬虫从外部服务器爬取的文本。例如,初始文本可以是企业的舆情信息,该舆情信息可以包括相关的主体、事件以及情感触发词,通过对不同主体和事件、情感关键词等的分析,能够准确区分相同情感触发词所对应得不同情感。In this embodiment, if the management server receives the sentiment analysis request, it may parse the sentiment analysis request to obtain the initial text. The initial text may be pre-stored in the database, or crawled from an external server by a web crawler. For example, the initial text can be the public opinion information of the enterprise, and the public opinion information can include related subjects, events, and emotional trigger words. Through the analysis of different subjects, events, and emotional keywords, it is possible to accurately distinguish the information corresponding to the same emotional trigger words. different emotions.
为了进行情感分析,通常需要确定情感关键词以及事件,管理服务器可以对事件进行定义。如,对于事件体系而言,企业的舆情事件一般都与企业运营逻辑紧密相关,从而形成一个完整的事件体系,针对企业舆情中事件的相似性、关联性、相似性,可以设计跟舆情情感相关的三级事件体系。其中,第一级可以包括财务、人员、经营、资本事件、人员、合规、信用、其他共八个方面,每个方面包含一系列具体的事件从而形成二级标签,对二级标签进一步细分后可形成包含共计110个的具体的事件。In order to perform sentiment analysis, it is usually necessary to determine sentiment keywords and events, and the management server may define the events. For example, for the event system, the public opinion events of an enterprise are generally closely related to the business operation logic, thus forming a complete event system. Aiming at the similarity, correlation, and similarity of events in corporate public opinion, it is possible to design events related to public opinion and emotion. A three-level event system. Among them, the first level can include finance, personnel, operation, capital events, personnel, compliance, credit, and other eight aspects. Each aspect contains a series of specific events to form a second-level label. The second-level label is further detailed. After the division, a total of 110 specific events can be formed.
对于情感体系而言,可以包括重大负面、一般负面、中性、一般正面、重大正面等的情感标签。进一步针对具体的事件,可以总结相关的情感触发词,并设计不同方向的情感触发词到情感标签体系的映射关系,例如“个股市场表现”这一事件,可以总结出情感触发词集合{“上涨”、“暴涨”、“下跌”、“暴跌”…},并同时相关的情感标签体系可以映射为{“上涨”:一般正面、“暴跌”:重大负面…}。For the sentiment system, sentiment labels such as major negative, general negative, neutral, general positive, and major positive can be included. Further targeting specific events, we can summarize the relevant emotional trigger words, and design the mapping relationship between emotional trigger words in different directions and the emotional label system. For example, for the event of "individual stock market performance", we can summarize the emotional trigger word set ", "skyrocketing", "falling", "slumping"...}, and at the same time, the related emotional label system can be mapped as {"rising": generally positive, "slumping": major negative...}.
步骤S102,对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件。Step S102, preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events.
在本实施例中,管理服务器还能够对所获取得初始文本进行预处理。由于管理服务器所获取的初始文本可以是包括多个事件的跟企业舆情相关的新闻稿件,该新闻稿件的内容可以是很长的,而且往往会涉及大量与目标事件无关的句子,故需要对该初始文本进行相应的预处理,如将初始文本分割成若干个短文本,通过筛选,能够将无多余跟事件相关的信息的短文本可以被排除,只有包括事件关键词的短文本被确定为待分析短文本,即所有的确定的待分析短文本是可以包括事件关键词等。In this embodiment, the management server can also perform preprocessing on the acquired initial text. Since the initial text obtained by the management server can be a news release related to corporate public opinion including multiple events, the content of the news release can be very long, and often involves a large number of sentences that are not related to the target event, so it is necessary to The initial text is preprocessed accordingly, such as dividing the initial text into several short texts, and through screening, the short texts that have no redundant information related to the event can be excluded, and only the short texts that include event keywords are determined to be Analyzing short texts, that is, all determined short texts to be analyzed may include event keywords and the like.
通常,不同的事件可以关联有不同的事件关键词,具体的可以根据当前的网络新闻中的文本信息总结每个事件所包含的事件关键词,一个事件可以关联有一个事件关键词,也可以关联有多个事件关键词。Usually, different events can be associated with different event keywords. Specifically, the event keywords contained in each event can be summarized according to the text information in the current network news. An event can be associated with an event keyword, or it can be associated with There are multiple event keywords.
步骤S103,将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词。Step S103, input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words.
在本实施例中,管理服务器能够将所有的待分析短文本输入到预先设置的论元提取模型中,该预设的论元提取模型能够将待分析文本中的情感触发词、主体以及事件均进行精准的识别,即通过该论元提取模型的分析,能够使得待分析短文中的相关关键信息被确定。当然,作为可选的实施例,通过该论元提取模型,还能够将待分析短文本中所包括的时间、地点等 要素点区分确定出来,以便于更为精确地确定待分析短文本的情感标签。In this embodiment, the management server can input all the short texts to be analyzed into the preset argument extraction model, and the preset argument extraction model can combine the emotional trigger words, subjects and events in the text to be analyzed Accurate identification, that is, through the analysis of the argument extraction model, can make the relevant key information in the short text to be analyzed be determined. Of course, as an optional embodiment, through the argument extraction model, it is also possible to distinguish and determine the time, place and other elements included in the short text to be analyzed, so as to more accurately determine the emotion of the short text to be analyzed Label.
其中,预设的论元提取模型可以是神经网络通过训练数据进行训练后得到的模型,能够提高待分析文本的情感触发词、主体和事件的确定的效率和准确度,更能有效分析具有多个主体以及多个事件的待分析文本。Among them, the preset argument extraction model can be a model obtained by training the neural network through training data, which can improve the efficiency and accuracy of determining the emotional trigger words, subjects and events of the text to be analyzed, and can more effectively analyze The text to be analyzed for a subject and multiple events.
步骤S104,将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。Step S104, input the determined emotional trigger words, subjects and events of each short text to be analyzed into the preset emotion determination model, so as to obtain the corresponding emotion tags of each subject of the short text to be analyzed.
在本实施例中,管理服务器能够将确定了情感触发词、主体和事件后的待分析短文本输入到情感确定模型中,从而得到与每个待分析短文本的主体相应的情感标签。例如,A公司现在的股价是**元,股价大涨,该文本的主体是A公司,股价是事件,大涨是情感触发词,可知该文本所对应的情感标签对应的可以为一般正面。又例如,B公司现在的年度亏损是**元,年度亏损下降,该文本的主体是B公司,年度亏损是事件,下降是一般正面。其中,预设的情感确定模型可以是通过训练数据训练神经网络而得到的模型,能够更为精准的确定待分析文本的情感标签。In this embodiment, the management server can input the short text to be analyzed after determining the emotion trigger word, the subject and the event into the emotion determination model, so as to obtain the emotion tag corresponding to the subject of each short text to be analyzed. For example, the current stock price of company A is ** yuan, and the stock price has risen sharply. The main body of the text is company A, the stock price is an event, and the rise is an emotional trigger word. It can be seen that the emotional label corresponding to the text can be generally positive. For another example, the current annual loss of company B is ** yuan, and the annual loss decreases. The main body of the text is company B, and the annual loss is an event, and the decrease is generally positive. Wherein, the preset emotion determination model may be a model obtained by training a neural network with training data, which can more accurately determine the emotion label of the text to be analyzed.
故可知,通过预设的论元提取模型以及预设的情感确定模型能够对待分析文本进行准确高效的情感标签的标志,提高了用户的使用体验度。Therefore, it can be seen that through the preset argument extraction model and the preset emotion determination model, the text to be analyzed can be accurately and efficiently marked with emotional tags, which improves the user experience.
如图2所示,在一实施例中,所述方法的步骤S101之前,还包括步骤S201~S204。As shown in FIG. 2 , in an embodiment, the method further includes steps S201 to S204 before step S101 .
步骤S201,通过网络爬虫爬取原始文本。Step S201, crawling the original text through a web crawler.
其中,为了能够实现对文本的情感标签的快速标准,管理服务器能够获取并处理相关训练数据来训练相关的神经网络从而得到用于进行情感标签标注的相关模型。同行,管理服务器可以通过网络爬虫从相关的新闻网站提取大量的跟企业舆情相关的新闻,即所得到的新闻即可以是待处理的原始文本。Among them, in order to realize the rapid standardization of emotional tags for texts, the management server can obtain and process relevant training data to train related neural networks to obtain related models for tagging emotional tags. At the same time, the management server can extract a large amount of news related to corporate public opinion from relevant news websites through web crawlers, that is, the obtained news can be the original text to be processed.
步骤S202,对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中。In step S202, the acquired original text is preprocessed to obtain a plurality of short texts to be trained including event keywords, and the obtained short texts to be trained are stored in a preset database as a training set.
其中,管理服务器还能够对所获取的原始文本进行预处理,即对原始文本进行分割,从而得到多个短文本,通过确认短文本是否包括有事件关键词,从而可以确定有事件关键词的短文本即为待训练短文本。为了能够对待训练短文本进行管理和利用,可以将待训练短文本作为一个训练集存储到预设数据库中,以便调用。Among them, the management server can also preprocess the obtained original text, that is, divide the original text to obtain multiple short texts, and determine whether the short texts contain event keywords by confirming whether the short texts contain event keywords. This is the short text to be trained. In order to be able to manage and utilize the short text to be trained, the short text to be trained can be stored in a preset database as a training set for easy recall.
如图3所示,在一实施例中,所述步骤S202可以包括:步骤S301~S302。As shown in FIG. 3 , in an embodiment, the step S202 may include: steps S301-S302.
步骤S301,根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本。Step S301: Segment the original text according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords.
其中,预设的文本分割函数可以是cut_text,通过该文本分割函数以及预设的事件关键词,能够将原始文本进行分割,使得尽可能的降相关信息划分到同一短文本中,同时还需要将包括有预设的事件关键词的短文本作为待训练短文本。Among them, the preset text segmentation function can be cut_text. Through the text segmentation function and the preset event keywords, the original text can be segmented so that relevant information can be divided into the same short text as much as possible. Short texts including preset event keywords are used as short texts to be trained.
如图4所示,在一实施例中,所述步骤S301可以包括步骤S401~S403。As shown in FIG. 4, in an embodiment, the step S301 may include steps S401-S403.
步骤S401,根据预设的文本分割函数对所述原始文本进行分割,以得到多个原始子文本。Step S401: Segment the original text according to a preset text segmentation function to obtain multiple original subtexts.
其中,具体的,管理服务器能够根据预设的文本分割函数来对该原始文本进行分割,从 而得到多个较短的原始子文本。由于原始文本可能较长,包括较多无用信息,故需要对原始子文本进行区分和识别。Specifically, the management server can segment the original text according to a preset text segmentation function, so as to obtain multiple shorter original subtexts. Since the original text may be longer and contain more useless information, it is necessary to distinguish and identify the original sub-text.
步骤S402,判断所述原始子文本是否包括预设事件关键词。Step S402, judging whether the original subtext includes preset event keywords.
其中,管理服务器可以判断所述原始子文本是否包括预设事件关键词,即来实现对原始子文本的筛选和确定。Wherein, the management server can judge whether the original subtext includes preset event keywords, that is, to realize screening and determination of the original subtext.
步骤S403,若所述原始子文本包括预设事件关键词,将该原始子文本确定为待训练短文本。Step S403, if the original subtext includes preset event keywords, determine the original subtext as the short text to be trained.
其中,管理服务器能够在原始子文本包括预设事件关键词的情况下,将该原始子文本确定为待训练短文本,从而实现对原始文本的分类分割。Wherein, the management server can determine the original subtext as the short text to be trained when the original subtext includes preset event keywords, so as to realize the classification and segmentation of the original text.
再者,管理服务器还能够根据事件关键词来确定该待训练短文本所述的事件,以便于后续的处理。Furthermore, the management server can also determine the event described in the short text to be trained according to the event keyword, so as to facilitate subsequent processing.
在进一步的实施例中,所述步骤S301还包括以下步骤:In a further embodiment, the step S301 also includes the following steps:
步骤S404,若所述原始子文本不包括预设事件关键词,删除该原始子文本。Step S404, if the original subtext does not include preset event keywords, delete the original subtext.
其中,若原始子文本不包括预设事件关键词,可以将该原始子文本进行删除,从而来确定更为准确的训练数据,即确定更为合理的待训练短文本。Wherein, if the original subtext does not include preset event keywords, the original subtext can be deleted, so as to determine more accurate training data, that is, determine a more reasonable short text to be trained.
步骤S302,将所得到的待训练短文本作为训练集存储至预设数据库中。Step S302, storing the obtained short text to be trained as a training set in a preset database.
其中,为了便于查找利用,管理服务器可以将所得到的待训练短文本作为训练集存储至预设数据库中。Wherein, in order to facilitate searching and utilization, the management server may store the obtained short text to be trained as a training set in a preset database.
步骤S203,若接收到模型训练指令,从所述预设数据库中调取待训练短文本进行标注,以确定每个待训练文本所包括的情感触发词、主体和事件。Step S203, if a model training instruction is received, call the short text to be trained from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained.
其中,管理服务器若接收到用户发送的模型训练指令,可以从预设数据库中调取待训练短文本来进行标注,从而确定每个待训练文本所包括的情感触发词、主体和事件。上述标注可以通过人工标注进行,也可以根据相关的标注指令来进行自动标注,具体的在本实施例中并不作限定。Among them, if the management server receives the model training instruction sent by the user, it can call the short text to be trained from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained. The above labeling can be performed manually, or automatically according to relevant labeling instructions, which is not specifically limited in this embodiment.
步骤S204,通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型。Step S204, train the preset first neural network through the marked text to be trained to obtain an argument extraction model.
其中,管理服务器能够获取已标注的待训练文本,并利用已标注的待训练文本来实现对第一神经网络的训练,从而得到一个训练完成的论元提取模型。Wherein, the management server can obtain the marked text to be trained, and use the marked text to be trained to realize the training of the first neural network, thereby obtaining a trained argument extraction model.
如图5所示,在一实施例中,所述步骤S204可以包括步骤S501~S502。As shown in FIG. 5 , in an embodiment, the step S204 may include steps S501 - S502 .
步骤S501,利用Bert编码得到已标注的待训练文本的向量。Step S501, using Bert coding to obtain the vector of the marked text to be trained.
其中,管理服务器还能够利用Bert编码,从而得到已标注的待训练文本的向量,以便于进行后续的训练步骤。Wherein, the management server can also use Bert encoding to obtain the vector of the marked text to be trained, so as to perform subsequent training steps.
步骤S502,将已标注的待训练文本的向量以及已标注的待训练文本所包括的情感触发词、主体和事件输入预设的第一神经网络进行训练以得到论元提取模型。Step S502 , input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network for training to obtain an argument extraction model.
其中,管理服务器可以将已标注的待训练文本的向量以及已标注的待训练文本所包括的情感触发词、主体和事件输入预设的第一神经网络中,以对第一神经网络进行训练,通过训练第一神经网络就能够得到一个论元提取模型。Wherein, the management server can input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network to train the first neural network, An argument extraction model can be obtained by training the first neural network.
如图6所示,若每个事件均预先通过所关联的情感触发词映射有相应的情感标签,所述方法的步骤S101之前还包括步骤S601~S603。As shown in FIG. 6 , if each event is mapped with a corresponding emotion tag through an associated emotion trigger word in advance, the method further includes steps S601 - S603 before step S101 .
步骤S601,获取已标注的待训练文本所包括的情感触发词、主体和事件。Step S601, obtaining the emotional trigger words, subjects and events included in the tagged text to be trained.
其中,管理服务器能够获取已标注好的待训练文本所包括的情感触发词、主体和事件。Wherein, the management server can obtain the emotional trigger words, subjects and events included in the tagged text to be trained.
步骤S602,根据所获取的情感触发词和事件确定所述待训练文本的情感标签。Step S602, determine the emotion label of the text to be trained according to the acquired emotion trigger words and events.
其中,每个事件均预先有关联情感触发词,同时对于不同的事件,不同的情感触发词所映射的情感标签是不相同的。故管理服务器可以根据所获取的情感触发词和事件确定所述待训练文本的情感标签。Wherein, each event is pre-associated with emotional trigger words, and for different events, different emotional tags mapped to different emotional trigger words are different. Therefore, the management server can determine the emotion tags of the text to be trained according to the acquired emotion trigger words and events.
步骤S603,通过已确定情感标签以及已标注的待训练文本训练第二神经网络以得到情感确定模型。Step S603, train the second neural network by using the determined emotional labels and the labeled text to be trained to obtain an emotion determination model.
其中,管理服务器能够通过已确定的情感标签以及已标注的待训练文本来实现对第二神经网络的训练,从而得到情感确定模型。Wherein, the management server can realize the training of the second neural network through the determined emotion label and the labeled text to be trained, so as to obtain the emotion determination model.
综上,本申请实施例能有效提取文本中的论元从而准确确定情感标签,提高了用户的使用体验度,还能应用于智慧政务等场景中,从而推动智慧城市的建设。To sum up, the embodiment of the present application can effectively extract the arguments in the text to accurately determine the emotional tags, improve the user experience, and can also be applied to scenarios such as smart government affairs, thereby promoting the construction of smart cities.
本领域技术员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。Those skilled in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in a computer-readable storage medium. , may include the flow of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), etc.
请参阅图7,对应上述一种基于事件的情感分析方法,本申请实施例还提出一种基于事件的情感分析装置,该装置100包括:请求解析单元101、预处理单元102、论元提取单元103以及情感确定单元104。Please refer to Fig. 7, corresponding to the above-mentioned event-based sentiment analysis method, the embodiment of the present application also proposes an event-based sentiment analysis device, the device 100 includes: a request parsing unit 101, a preprocessing unit 102, and an argument extraction unit 103 and an emotion determining unit 104.
请求解析单元101,用于若接收到情感分析请求,解析所述情感分析请求以获取初始文本。The request parsing unit 101 is configured to, if a sentiment analysis request is received, parse the sentiment analysis request to obtain an initial text.
在本实施例中,管理服务器若接收到情感分析请求,可以对该情感分析请求进行解析,从而得到初始文本。该初始文本可以是预先存储在数据库中的文本,也可以是通过网络爬虫从外部服务器爬取的文本。In this embodiment, if the management server receives the sentiment analysis request, it may parse the sentiment analysis request to obtain the initial text. The initial text may be pre-stored in the database, or crawled from an external server by a web crawler.
为了进行情感分析,通常需要确定情感关键词以及事件,管理服务器可以对事件进行定义。如,对于事件体系而言,企业的舆情事件一般都与企业运营逻辑紧密相关,从而形成一个完整的事件体系,针对企业舆情中事件的相似性、关联性、相似性,可以设计跟舆情情感相关的三级事件体系。其中,第一级可以包括财务、人员、经营、资本事件、人员、合规、信用、其他共八个方面,每个方面包含一系列具体的事件从而形成二级标签,对二级标签进一步细分后可形成包含共计110个的具体的事件。In order to perform sentiment analysis, it is usually necessary to determine sentiment keywords and events, and the management server may define the events. For example, for the event system, the public opinion events of an enterprise are generally closely related to the business operation logic, thus forming a complete event system. Aiming at the similarity, correlation, and similarity of events in corporate public opinion, it is possible to design events related to public opinion and emotion. A three-level event system. Among them, the first level can include finance, personnel, operation, capital events, personnel, compliance, credit, and other eight aspects. Each aspect contains a series of specific events to form a second-level label. The second-level label is further detailed. After the division, a total of 110 specific events can be formed.
对于情感体系而言,可以包括重大负面、一般负面、中性、一般正面、重大正面等的情感标签。进一步针对具体的事件,可以总结相关的情感触发词,并设计不同方向的情感触发词到情感标签体系的映射关系,例如“个股市场表现”这一事件,可以总结出情感触发词集合{“上涨”、“暴涨”、“下跌”、“暴跌”…},并同时相关的情感标签体系可以映射为{“上 涨”:一般正面、“暴跌”:重大负面…}。For the sentiment system, sentiment labels such as major negative, general negative, neutral, general positive, and major positive can be included. Further targeting specific events, we can summarize the relevant emotional trigger words, and design the mapping relationship between emotional trigger words in different directions and the emotional label system. For example, for the event of "individual stock market performance", we can summarize the emotional trigger word set ", "skyrocketing", "falling", "slumping"...}, and at the same time, the related emotional label system can be mapped as {"rising": generally positive, "slumping": major negative...}.
预处理单元102,用于对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件。The preprocessing unit 102 is configured to preprocess the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events.
在本实施例中,管理服务器还能够对所获取得初始文本进行预处理。由于管理服务器所获取的初始文本可以是包括多个事件的跟企业舆情相关的新闻稿件,该新闻稿件的内容可以是很长的,而且往往会涉及大量与目标事件无关的句子,故需要对该初始文本进行相应的预处理,如将初始文本分割成若干个短文本,通过筛选,能够将无多余跟事件相关的信息的短文本可以被排除,只有包括事件关键词的短文本被确定为待分析短文本,即所有的确定的待分析短文本是可以包括事件关键词等。In this embodiment, the management server can also perform preprocessing on the acquired initial text. Since the initial text obtained by the management server can be a news release related to corporate public opinion including multiple events, the content of the news release can be very long, and often involves a large number of sentences that are not related to the target event, so it is necessary to The initial text is preprocessed accordingly, such as dividing the initial text into several short texts, and through screening, the short texts that have no redundant information related to the event can be excluded, and only the short texts that include event keywords are determined to be Analyzing short texts, that is, all determined short texts to be analyzed may include event keywords and the like.
通常,不同的事件可以关联有不同的事件关键词,具体的可以根据当前的网络新闻中的文本信息总结每个事件所包含的事件关键词,一个事件可以关联有一个事件关键词,也可以关联有多个事件关键词。Usually, different events can be associated with different event keywords. Specifically, the event keywords contained in each event can be summarized according to the text information in the current network news. An event can be associated with an event keyword, or it can be associated with There are multiple event keywords.
论元提取单元103,用于将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词。The argument extraction unit 103 is used to input all short texts to be analyzed into a preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events have different Emotional triggers.
在本实施例中,管理服务器能够将所有的待分析短文本输入到预先设置的论元提取模型中,该预设的论元提取模型能够将待分析文本中的情感触发词、主体以及事件均进行精准的识别,即通过该论元提取模型的分析,能够使得待分析短文中的相关关键信息被确定。当然,作为可选的实施例,通过该论元提取模型,还能够将待分析短文本中所包括的时间、地点等要素点区分确定出来,以便于更为精确地确定待分析短文本的情感标签。In this embodiment, the management server can input all the short texts to be analyzed into the preset argument extraction model, and the preset argument extraction model can combine the emotional trigger words, subjects and events in the text to be analyzed Accurate identification, that is, through the analysis of the argument extraction model, can make the relevant key information in the short text to be analyzed be determined. Of course, as an optional embodiment, through the argument extraction model, it is also possible to distinguish and determine the time, place and other elements included in the short text to be analyzed, so as to more accurately determine the emotion of the short text to be analyzed Label.
其中,预设的论元提取模型可以是神经网络通过训练数据进行训练后得到的模型,能够提高待分析文本的情感触发词、主体和事件的确定的效率和准确度,更能有效分析具有多个主体以及多个事件的待分析文本。Among them, the preset argument extraction model can be a model obtained by training the neural network through training data, which can improve the efficiency and accuracy of determining the emotional trigger words, subjects and events of the text to be analyzed, and can more effectively analyze The text to be analyzed for a subject and multiple events.
情感确定单元104,用于将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The emotion determination unit 104 is configured to input the determined emotion trigger words, subjects and events of each short text to be analyzed into a preset emotion determination model, so as to obtain an emotion tag corresponding to the subject of each short text to be analyzed.
在本实施例中,管理服务器能够将确定了情感触发词、主体和事件后的待分析短文本输入到情感确定模型中,从而得到与每个待分析短文本的主体相应的情感标签。例如,A公司现在的股价是**元,股价大涨,该文本的主体是A公司,股价是事件,大涨是情感触发词,可知该文本所对应的情感标签对应的可以为一般正面。又例如,B公司现在的年度亏损是**元,年度亏损下降,该文本的主体是B公司,年度亏损是事件,下降是一般正面。其中,预设的情感确定模型可以是通过训练数据训练神经网络而得到的模型,能够更为精准的确定待分析文本的情感标签。In this embodiment, the management server can input the short text to be analyzed after determining the emotion trigger word, the subject and the event into the emotion determination model, so as to obtain the emotion tag corresponding to the subject of each short text to be analyzed. For example, the current stock price of company A is ** yuan, and the stock price has risen sharply. The main body of the text is company A, the stock price is an event, and the rise is an emotional trigger word. It can be seen that the emotional label corresponding to the text can be generally positive. For another example, the current annual loss of company B is ** yuan, and the annual loss decreases. The main body of the text is company B, and the annual loss is an event, and the decrease is generally positive. Wherein, the preset emotion determination model may be a model obtained by training a neural network with training data, which can more accurately determine the emotion label of the text to be analyzed.
故可知,通过预设的论元提取模型以及预设的情感确定模型能够对待分析文本进行准确高效的情感标签的标志,提高了用户的使用体验度。Therefore, it can be seen that through the preset argument extraction model and the preset emotion determination model, the text to be analyzed can be accurately and efficiently marked with emotional tags, which improves the user experience.
如图8所示,在一实施例中,所述装置100的请求解析单元101之前,还包括文本爬取单元201、数据获取单元202、数据标注单元203以及第一训练单元204。As shown in FIG. 8 , in an embodiment, before the request parsing unit 101 , the device 100 further includes a text crawling unit 201 , a data acquisition unit 202 , a data labeling unit 203 and a first training unit 204 .
其中,文本爬取单元201,用于通过网络爬虫爬取原始文本。Wherein, the text crawling unit 201 is configured to crawl the original text through a web crawler.
其中,为了能够实现对文本的情感标签的快速标准,管理服务器能够获取并处理相关训练数据来训练相关的神经网络从而得到用于进行情感标签标注的相关模型。同行,管理服务器可以通过网络爬虫从相关的新闻网站提取大量的跟企业舆情相关的新闻,即所得到的新闻即可以是待处理的原始文本。Among them, in order to realize the rapid standardization of emotional tags for texts, the management server can obtain and process relevant training data to train related neural networks to obtain related models for tagging emotional tags. At the same time, the management server can extract a large amount of news related to corporate public opinion from relevant news websites through web crawlers, that is, the obtained news can be the original text to be processed.
数据获取单元202,用于对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中。The data acquisition unit 202 is configured to preprocess the acquired original text to obtain a plurality of short texts to be trained including event keywords, and store the obtained short texts to be trained as a training set in a preset database.
其中,管理服务器还能够对所获取的原始文本进行预处理,即对原始文本进行分割,从而得到多个短文本,通过确认短文本是否包括有事件关键词,从而可以确定有事件关键词的短文本即为待训练短文本。为了能够对待训练短文本进行管理和利用,可以将待训练短文本作为一个训练集存储到预设数据库中,以便调用。Among them, the management server can also preprocess the obtained original text, that is, divide the original text to obtain multiple short texts, and determine whether the short texts contain event keywords by confirming whether the short texts contain event keywords. This is the short text to be trained. In order to be able to manage and utilize the short text to be trained, the short text to be trained can be stored in a preset database as a training set for easy recall.
如图9所示,在一实施例中,所述数据获取单元202可以包括:文本分割单元301以及文本存储单元302。As shown in FIG. 9 , in an embodiment, the data acquisition unit 202 may include: a text segmentation unit 301 and a text storage unit 302 .
文本分割单元301,用于根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本。The text segmentation unit 301 is configured to segment the original text according to a preset text segmentation function and preset event keywords, so as to obtain a plurality of short texts to be trained including event keywords.
其中,预设的文本分割函数可以是cut_text,通过该文本分割函数以及预设的事件关键词,能够将原始文本进行分割,使得尽可能的降相关信息划分到同一短文本中,同时还需要将包括有预设的事件关键词的短文本作为待训练短文本。Among them, the preset text segmentation function can be cut_text. Through the text segmentation function and the preset event keywords, the original text can be segmented so that relevant information can be divided into the same short text as much as possible. Short texts including preset event keywords are used as short texts to be trained.
如图10所示,在一实施例中,所述文本分割单元301可以包括文本处理单元401、文本判断单元402以及文本确定单元403。As shown in FIG. 10 , in an embodiment, the text segmentation unit 301 may include a text processing unit 401 , a text judgment unit 402 and a text determination unit 403 .
文本处理单元401,用于根据预设的文本分割函数对所述原始文本进行分割,以得到多个原始子文本。The text processing unit 401 is configured to segment the original text according to a preset text segmentation function to obtain multiple original subtexts.
其中,具体的,管理服务器能够根据预设的文本分割函数来对该原始文本进行分割,从而得到多个较短的原始子文本。由于原始文本可能较长,包括较多无用信息,故需要对原始子文本进行区分和识别。Specifically, the management server can segment the original text according to a preset text segmentation function, so as to obtain multiple shorter original subtexts. Since the original text may be longer and contain more useless information, it is necessary to distinguish and identify the original sub-text.
文本判断单元402,用于判断所述原始子文本是否包括预设事件关键词。A text judging unit 402, configured to judge whether the original subtext includes preset event keywords.
其中,管理服务器可以判断所述原始子文本是否包括预设事件关键词,即来实现对原始子文本的筛选和确定。Wherein, the management server can judge whether the original subtext includes preset event keywords, that is, to realize screening and determination of the original subtext.
文本确定单元403,用于若所述原始子文本包括预设事件关键词,将该原始子文本确定为待训练短文本。A text determining unit 403, configured to determine the original subtext as the short text to be trained if the original subtext includes preset event keywords.
其中,管理服务器能够在原始子文本包括预设事件关键词的情况下,将该原始子文本确定为待训练短文本,从而实现对原始文本的分类分割。Wherein, the management server can determine the original subtext as the short text to be trained when the original subtext includes preset event keywords, so as to realize the classification and segmentation of the original text.
再者,管理服务器还能够根据事件关键词来确定该待训练短文本所述的事件,以便于后续的处理。Furthermore, the management server can also determine the event described in the short text to be trained according to the event keyword, so as to facilitate subsequent processing.
在进一步的实施例中,所述文本分割单元301还包括以下单元:In a further embodiment, the text segmentation unit 301 also includes the following units:
文本删除单元404,用于若所述原始子文本不包括预设事件关键词,删除该原始子文本。A text deletion unit 404, configured to delete the original subtext if the original subtext does not include preset event keywords.
其中,若原始子文本不包括预设事件关键词,可以将该原始子文本进行删除,从而来确 定更为准确的训练数据,即确定更为合理的待训练短文本。Wherein, if the original subtext does not include preset event keywords, the original subtext can be deleted to determine more accurate training data, that is, to determine a more reasonable short text to be trained.
文本存储单元302,用于将所得到的待训练短文本作为训练集存储至预设数据库中。The text storage unit 302 is configured to store the obtained short text to be trained as a training set in a preset database.
其中,为了便于查找利用,管理服务器可以将所得到的待训练短文本作为训练集存储至预设数据库中。Wherein, in order to facilitate searching and utilization, the management server may store the obtained short text to be trained as a training set in a preset database.
数据标注单元203,用于若接收到模型训练指令,从所述预设数据库中调取待训练短文本进行标注,以确定每个待训练文本所包括的情感触发词、主体和事件。The data labeling unit 203 is configured to call the short text to be trained from the preset database for labeling if a model training instruction is received, so as to determine the emotional trigger words, subjects and events included in each text to be trained.
其中,管理服务器若接收到用户发送的模型训练指令,可以从预设数据库中调取待训练短文本来进行标注,从而确定每个待训练文本所包括的情感触发词、主体和事件。上述标注可以通过人工标注进行,也可以根据相关的标注指令来进行自动标注,具体的在本实施例中并不作限定。Among them, if the management server receives the model training instruction sent by the user, it can call the short text to be trained from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained. The above labeling can be performed manually, or automatically according to relevant labeling instructions, which is not specifically limited in this embodiment.
第一训练单元204,用于通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型。The first training unit 204 is configured to train a preset first neural network through the marked text to be trained to obtain an argument extraction model.
其中,管理服务器能够获取已标注的待训练文本,并利用已标注的待训练文本来实现对第一神经网络的训练,从而得到一个训练完成的论元提取模型。Wherein, the management server can obtain the marked text to be trained, and use the marked text to be trained to realize the training of the first neural network, thereby obtaining a trained argument extraction model.
如图11所示,在一实施例中,所述第一训练单元204可以包括向量确定单元501、第一模型训练单元502。As shown in FIG. 11 , in an embodiment, the first training unit 204 may include a vector determination unit 501 and a first model training unit 502 .
向量确定单元501,用于利用Bert编码得到已标注的待训练文本的向量。The vector determining unit 501 is used to obtain the vector of the marked text to be trained by using Bert coding.
其中,管理服务器还能够利用Bert编码,从而得到已标注的待训练文本的向量,以便于进行后续的训练步骤。Wherein, the management server can also use Bert encoding to obtain the vector of the marked text to be trained, so as to perform subsequent training steps.
第一模型训练单元502,用于将已标注的待训练文本的向量以及已标注的待训练文本所包括的情感触发词、主体和事件输入预设的第一神经网络进行训练以得到论元提取模型。The first model training unit 502 is used to input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network for training to obtain argument extraction Model.
其中,管理服务器可以将已标注的待训练文本的向量以及已标注的待训练文本所包括的情感触发词、主体和事件输入预设的第一神经网络中,以对第一神经网络进行训练,通过训练第一神经网络就能够得到一个论元提取模型。Wherein, the management server can input the vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained into the preset first neural network to train the first neural network, An argument extraction model can be obtained by training the first neural network.
如图12所示,若每个事件均预先通过所关联的情感触发词映射有相应的情感标签,所述装置100的请求解析单元101之前,还包括特征获取单元601、标签确定单元602、第二训练单元603。As shown in Figure 12, if each event is mapped with a corresponding emotion tag through the associated emotional trigger word in advance, before the request parsing unit 101 of the device 100, it also includes a feature acquisition unit 601, a tag determination unit 602, a second Two training units 603.
特征获取单元601,用于获取已标注的待训练文本所包括的情感触发词、主体和事件。The feature acquisition unit 601 is configured to acquire the emotional trigger words, subjects and events included in the tagged text to be trained.
其中,管理服务器能够获取已标注好的待训练文本所包括的情感触发词、主体和事件。Wherein, the management server can obtain the emotional trigger words, subjects and events included in the tagged text to be trained.
标签确定单元602,用于根据所获取的情感触发词和事件确定与所述待训练文本的情感标签。A tag determining unit 602, configured to determine an emotional tag related to the text to be trained according to the acquired emotional trigger words and events.
其中,每个事件均预先有关联情感触发词,同时对于不同的事件,不同的情感触发词所映射的情感标签是不相同的。故管理服务器可以根据所获取的情感触发词和事件确定与所述待训练文本的情感标签。Wherein, each event is pre-associated with emotional trigger words, and for different events, different emotional tags mapped to different emotional trigger words are different. Therefore, the management server can determine the emotional label of the text to be trained according to the acquired emotional trigger words and events.
第二训练单元603,用于通过已确定情感标签以及已标注的待训练文本训练第二神经网络以得到情感确定模型。The second training unit 603 is configured to train the second neural network by using the determined emotional label and the labeled text to be trained to obtain an emotion determination model.
其中,管理服务器能够通过已确定的情感标签以及已标注的待训练文本来实现对第二神经网络的训练,从而得到情感确定模型。Wherein, the management server can realize the training of the second neural network through the determined emotion label and the labeled text to be trained, so as to obtain the emotion determination model.
需要说明的是,所属领域的技术人员可以清楚地了解到,上述基于事件的情感分析装置100和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above-mentioned event-based sentiment analysis device 100 and each unit can refer to the corresponding descriptions in the foregoing method embodiments. For the convenience and brevity of description, I won't repeat them here.
由以上可见,在硬件实现上,以上请求解析单元101、预处理单元102、论元提取单元103以及情感确定单元104等可以以硬件形式内嵌于或独立于基于事件的情感分析装置中,也可以以软件形式存储于基于事件的情感分析装置的存储器中,以便处理器调用执行以上各个单元对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。It can be seen from the above that in terms of hardware implementation, the above request parsing unit 101, preprocessing unit 102, argument extraction unit 103, and sentiment determination unit 104 can be embedded in or independent of the event-based sentiment analysis device in the form of hardware. It can be stored in the memory of the event-based sentiment analysis device in the form of software, so that the processor can invoke and execute the operations corresponding to the above units. The processor may be a central processing unit (CPU), a microprocessor, a single-chip microcomputer, and the like.
上述基于事件的情感分析装置可以实现为一种计算机程序的形式,计算机程序可以在如图13所示的计算机设备上运行。The above event-based sentiment analysis apparatus can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 13 .
图13为本申请一种计算机设备的结构组成示意图。该设备可以是服务器,其中,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。FIG. 13 is a schematic diagram of the structural composition of a computer device of the present application. The device may be a server, where the server may be an independent server or a server cluster composed of multiple servers.
参照图13,该计算机设备700包括通过系统总线701连接的处理器702、存储器、内存储器704和网络接口705,其中,存储器可以包括非易失性存储介质703和内存储器704。Referring to FIG. 13 , the computer device 700 includes a processor 702 connected through a system bus 701 , a memory, an internal memory 704 and a network interface 705 , wherein the memory may include a non-volatile storage medium 703 and an internal memory 704 .
该非易失性存储介质703可存储操作系统7031和计算机程序7032,该计算机程序7032被执行时,可使得处理器702执行一种基于事件的情感分析方法。The non-volatile storage medium 703 can store an operating system 7031 and a computer program 7032. When the computer program 7032 is executed, the processor 702 can execute an event-based sentiment analysis method.
该处理器702用于提供计算和控制能力,支撑整个计算机设备700的运行。The processor 702 is used to provide calculation and control capabilities and support the operation of the entire computer device 700 .
该内存储器704为非易失性存储介质703中的计算机程序7032的运行提供环境,该计算机程序7032被处理器702执行时,可使得处理器702执行一种基于事件的情感分析方法。The internal memory 704 provides an environment for running the computer program 7032 in the non-volatile storage medium 703. When the computer program 7032 is executed by the processor 702, the processor 702 can execute an event-based sentiment analysis method.
该网络接口705用于与其它设备进行网络通信。本领域技术人员可以理解,图13中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备700的限定,具体的计算机设备700可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 705 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 13 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer device 700 on which the solution of this application is applied. The specific computer device 700 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
其中,所述处理器702用于运行存储在存储器中的计算机程序7032,以实现如上所述的基于事件的情感分析方法中的步骤。Wherein, the processor 702 is configured to execute the computer program 7032 stored in the memory, so as to realize the steps in the event-based sentiment analysis method described above.
应当理解,在本申请实施例中,处理器602可以是中央处理单元(Central Processing Unit,CPU),该处理器602还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor 602 may be a central processing unit (Central Processing Unit, CPU), and the processor 602 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer programs. The computer program can be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the above method embodiments.
因此,本申请还提供一种存储介质。该存储介质可以为计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。该存储介质存储有计算机程序,该计算机程序被处理器执行时使处理器执行如上所述的基于事件的情感分析方法中的步骤。Therefore, the present application also provides a storage medium. The storage medium may be a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile. The storage medium stores a computer program, and when the computer program is executed by the processor, the processor executes the steps in the event-based sentiment analysis method described above.
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质,。The storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other physical storages that can store program codes. medium,.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the relationship between hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的。例如,各个单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are illustrative only. For example, the division of each unit is only a logical function division, and there may be another division method in actual implementation. For example, several units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs. Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs. In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of software products, and the computer software products are stored in a storage medium In the above, several instructions are included to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the application, but the scope of protection of the application is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the scope of the technology disclosed in the application. Modifications or replacements, these modifications or replacements shall be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (20)

  1. 一种基于事件的情感分析方法,包括:An event-based sentiment analysis method comprising:
    若接收到情感分析请求,解析所述情感分析请求以获取初始文本;If a sentiment analysis request is received, parse the sentiment analysis request to obtain the initial text;
    对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;Preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
    将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;Input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words;
    将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
  2. 如权利要求1所述的方法,其中,所述若接收到情感分析请求,解析所述情感分析请求以获取初始文本的步骤之前,还包括:The method according to claim 1, wherein, if the sentiment analysis request is received, before the step of parsing the sentiment analysis request to obtain the initial text, further comprising:
    通过网络爬虫爬取原始文本;Crawl the original text through a web crawler;
    对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中;Preprocessing the acquired original text to obtain a plurality of short texts to be trained including event keywords, and storing the obtained short texts to be trained as a training set in a preset database;
    若接收到模型训练指令,从所述预设数据库中调取待训练短文本进行标注,以确定每个待训练文本所包括的情感触发词、主体和事件;If the model training instruction is received, the short text to be trained is called from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained;
    通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型。The preset first neural network is trained through the labeled text to be trained to obtain an argument extraction model.
  3. 如权利要求2所述的方法,其中,所述对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中的步骤,包括:The method according to claim 2, wherein the obtained original text is preprocessed to obtain a plurality of short texts to be trained including event keywords, and the obtained short texts to be trained are stored as a training set Steps to the preset database, including:
    根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本;Segment the original text according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords;
    将所得到的待训练短文本作为训练集存储至预设数据库中。Store the obtained short text to be trained as a training set in a preset database.
  4. 如权利要求3所述的方法,其中,所述根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本的步骤,包括:The method according to claim 3, wherein said original text is segmented according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords ,include:
    根据预设的文本分割函数对所述原始文本进行分割,以得到多个原始子文本;Segmenting the original text according to a preset text segmentation function to obtain multiple original subtexts;
    判断所述原始子文本是否包括预设事件关键词;judging whether the original subtext includes preset event keywords;
    若所述原始子文本包括预设事件关键词,将该原始子文本确定为待训练短文本。If the original subtext includes preset event keywords, the original subtext is determined as the short text to be trained.
  5. 如权利要求4所述的方法,其中,所述方法还包括:The method of claim 4, wherein the method further comprises:
    若所述原始子文本不包括预设事件关键词,删除该原始子文本。If the original subtext does not include preset event keywords, the original subtext is deleted.
  6. 如权利要求2所述的方法,其中,所述通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型的步骤,包括:The method according to claim 2, wherein the step of training the preset first neural network through the labeled text to be trained to obtain the argument extraction model includes:
    利用Bert编码得到已标注的待训练文本的向量;Use Bert encoding to obtain the vector of the labeled text to be trained;
    将已标注的待训练文本的向量以及已标注的待训练文本所包括的情感触发词、主体和事件输入预设的第一神经网络进行训练以得到论元提取模型。The vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained are input into the preset first neural network for training to obtain an argument extraction model.
  7. 如权利要求2所述的方法,其中,若每个事件均预先通过所关联的情感触发词映射有 相应的情感标签,所述方法还包括:The method according to claim 2, wherein, if each event is mapped with a corresponding emotional label by the associated emotional trigger word in advance, the method also includes:
    获取已标注的待训练文本所包括的情感触发词、主体和事件;Obtain the emotional trigger words, subjects and events included in the tagged text to be trained;
    根据所获取的情感触发词和事件确定所述待训练文本的情感标签;Determine the emotional label of the text to be trained according to the acquired emotional trigger words and events;
    通过已确定情感标签以及已标注的待训练文本训练第二神经网络以得到情感确定模型。The second neural network is trained by using the determined emotional labels and the labeled text to be trained to obtain an emotional determination model.
  8. 一种基于事件的情感分析装置,包括:An event-based sentiment analysis device, comprising:
    请求解析单元,用于若接收到情感分析请求,解析所述情感分析请求以获取初始文本;A request parsing unit, configured to parse the sentiment analysis request to obtain the initial text if the sentiment analysis request is received;
    预处理单元,用于对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;A preprocessing unit, configured to preprocess the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
    论元提取单元,用于将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;The argument extraction unit is used to input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotions trigger word;
    情感确定单元,用于将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The emotion determination unit is configured to input the determined emotional trigger words, subjects and events of each short text to be analyzed into a preset emotion determination model, so as to obtain an emotion label corresponding to each subject of the short text to be analyzed.
  9. 一种计算机设备,其中,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A computer device, wherein the computer device includes a memory and a processor, and a computer program is stored on the memory, and the processor implements the following steps when executing the computer program:
    若接收到情感分析请求,解析所述情感分析请求以获取初始文本;If a sentiment analysis request is received, parse the sentiment analysis request to obtain the initial text;
    对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;Preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
    将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;Input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words;
    将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
  10. 如权利要求9所述的计算机设备,其中,所述若接收到情感分析请求,解析所述情感分析请求以获取初始文本的步骤之前,还包括:The computer device according to claim 9, wherein, before the step of parsing the sentiment analysis request to obtain the initial text if the sentiment analysis request is received, further comprising:
    通过网络爬虫爬取原始文本;Crawl the original text through a web crawler;
    对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中;Preprocessing the acquired original text to obtain a plurality of short texts to be trained including event keywords, and storing the obtained short texts to be trained as a training set in a preset database;
    若接收到模型训练指令,从所述预设数据库中调取待训练短文本进行标注,以确定每个待训练文本所包括的情感触发词、主体和事件;If the model training instruction is received, the short text to be trained is called from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained;
    通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型。The preset first neural network is trained through the labeled text to be trained to obtain an argument extraction model.
  11. 如权利要求10所述的计算机设备,其中,所述对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中的步骤,包括:The computer device according to claim 10, wherein said preprocessing the acquired original text to obtain a plurality of short texts to be trained including event keywords, and using the obtained short texts to be trained as a training set The steps to save to the preset database include:
    根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本;Segment the original text according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords;
    将所得到的待训练短文本作为训练集存储至预设数据库中。Store the obtained short text to be trained as a training set in a preset database.
  12. 如权利要求11所述的计算机设备,其中,所述根据预设的文本分割函数以及预设的 事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本的步骤,包括:The computer device according to claim 11, wherein said original text is segmented according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords steps, including:
    根据预设的文本分割函数对所述原始文本进行分割,以得到多个原始子文本;Segmenting the original text according to a preset text segmentation function to obtain multiple original subtexts;
    判断所述原始子文本是否包括预设事件关键词;judging whether the original subtext includes preset event keywords;
    若所述原始子文本包括预设事件关键词,将该原始子文本确定为待训练短文本。If the original subtext includes preset event keywords, the original subtext is determined as the short text to be trained.
  13. 如权利要求12所述的计算机设备,其中,所述方法还包括:The computer device of claim 12, wherein the method further comprises:
    若所述原始子文本不包括预设事件关键词,删除该原始子文本。If the original subtext does not include preset event keywords, the original subtext is deleted.
  14. 如权利要求10所述的计算机设备,其中,所述通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型的步骤,包括:The computer device according to claim 10, wherein the step of training the preset first neural network through the labeled text to be trained to obtain the argument extraction model includes:
    利用Bert编码得到已标注的待训练文本的向量;Use Bert encoding to obtain the vector of the labeled text to be trained;
    将已标注的待训练文本的向量以及已标注的待训练文本所包括的情感触发词、主体和事件输入预设的第一神经网络进行训练以得到论元提取模型。The vector of the marked text to be trained and the emotional trigger words, subjects and events included in the marked text to be trained are input into the preset first neural network for training to obtain an argument extraction model.
  15. 如权利要求10所述的计算机设备,其中,若每个事件均预先通过所关联的情感触发词映射有相应的情感标签,所述处理器还实现:The computer device according to claim 10, wherein, if each event is mapped with a corresponding emotional tag through the associated emotional trigger word in advance, the processor further implements:
    获取已标注的待训练文本所包括的情感触发词、主体和事件;Obtain the emotional trigger words, subjects and events included in the tagged text to be trained;
    根据所获取的情感触发词和事件确定所述待训练文本的情感标签;Determine the emotional label of the text to be trained according to the acquired emotional trigger words and events;
    通过已确定情感标签以及已标注的待训练文本训练第二神经网络以得到情感确定模型。The second neural network is trained by using the determined emotional labels and the labeled text to be trained to obtain an emotional determination model.
  16. 一种计算机可读存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行如下步骤:A computer-readable storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the processor performs the following steps:
    若接收到情感分析请求,解析所述情感分析请求以获取初始文本;If a sentiment analysis request is received, parse the sentiment analysis request to obtain the initial text;
    对所获取的初始文本进行预处理,以得到多个包括事件关键词的待分析短文本,其中,不同的事件关键词关联相应的事件;Preprocessing the acquired initial text to obtain a plurality of short texts to be analyzed including event keywords, wherein different event keywords are associated with corresponding events;
    将所有的待分析短文本输入预设的论元提取模型,以确定每个待分析短文本的情感触发词、主体和事件,其中,不同的事件关联有不同的情感触发词;Input all the short texts to be analyzed into the preset argument extraction model to determine the emotional trigger words, subjects and events of each short text to be analyzed, wherein different events are associated with different emotional trigger words;
    将所确定的每个待分析短文本的情感触发词、主体和事件输入预设的情感确定模型,以得到与每个待分析短文本的主体相应的情感标签。The determined emotional trigger words, subjects and events of each short text to be analyzed are input into a preset emotion determination model to obtain an emotional label corresponding to the subject of each short text to be analyzed.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述若接收到情感分析请求,解析所述情感分析请求以获取初始文本的步骤之前,还包括:The computer-readable storage medium according to claim 16, wherein, before the step of parsing the sentiment analysis request to obtain the initial text if the sentiment analysis request is received, further comprising:
    通过网络爬虫爬取原始文本;Crawl the original text through a web crawler;
    对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集存储至预设数据库中;Preprocessing the acquired original text to obtain a plurality of short texts to be trained including event keywords, and storing the obtained short texts to be trained as a training set in a preset database;
    若接收到模型训练指令,从所述预设数据库中调取待训练短文本进行标注,以确定每个待训练文本所包括的情感触发词、主体和事件;If the model training instruction is received, the short text to be trained is called from the preset database for labeling, so as to determine the emotional trigger words, subjects and events included in each text to be trained;
    通过已标注的待训练文本训练预设的第一神经网络以得到论元提取模型。The preset first neural network is trained through the labeled text to be trained to obtain an argument extraction model.
  18. 如权利要求17所述的计算机可读存储介质,其中,所述对所获取的原始文本进行预处理,以得到多个包括事件关键词的待训练短文本,并将所得到的待训练短文本作为训练集 存储至预设数据库中的步骤,包括:The computer-readable storage medium according to claim 17, wherein the preprocessing is performed on the acquired original text to obtain a plurality of short texts to be trained including event keywords, and the obtained short texts to be trained are The steps to store as a training set into a preset database include:
    根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本;Segment the original text according to a preset text segmentation function and preset event keywords to obtain a plurality of short texts to be trained including event keywords;
    将所得到的待训练短文本作为训练集存储至预设数据库中。Store the obtained short text to be trained as a training set in a preset database.
  19. 如权利要求18所述的计算机可读存储介质,其中,所述根据预设的文本分割函数以及预设的事件关键词对所述原始文本进行分割,以得到多个包括事件关键词的待训练短文本的步骤,包括:The computer-readable storage medium according to claim 18, wherein the original text is segmented according to a preset text segmentation function and preset event keywords to obtain a plurality of texts to be trained including event keywords Steps for a short text, including:
    根据预设的文本分割函数对所述原始文本进行分割,以得到多个原始子文本;Segmenting the original text according to a preset text segmentation function to obtain multiple original subtexts;
    判断所述原始子文本是否包括预设事件关键词;judging whether the original subtext includes preset event keywords;
    若所述原始子文本包括预设事件关键词,将该原始子文本确定为待训练短文本。If the original subtext includes preset event keywords, the original subtext is determined as the short text to be trained.
  20. 如权利要求19所述的计算机可读存储介质,其中,所述处理器还执行如下步骤:The computer-readable storage medium of claim 19, wherein the processor further performs the steps of:
    若所述原始子文本不包括预设事件关键词,删除该原始子文本。If the original subtext does not include preset event keywords, the original subtext is deleted.
PCT/CN2022/072045 2021-06-25 2022-01-14 Event-based sentiment analysis method and apparatus, and computer device and storage medium WO2022267460A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110712428.9 2021-06-25
CN202110712428.9A CN113434631B (en) 2021-06-25 2021-06-25 Emotion analysis method and device based on event, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022267460A1 true WO2022267460A1 (en) 2022-12-29

Family

ID=77754534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072045 WO2022267460A1 (en) 2021-06-25 2022-01-14 Event-based sentiment analysis method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN113434631B (en)
WO (1) WO2022267460A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434631B (en) * 2021-06-25 2023-10-13 平安科技(深圳)有限公司 Emotion analysis method and device based on event, computer equipment and storage medium
CN114065763A (en) * 2021-11-24 2022-02-18 深圳前海环融联易信息科技服务有限公司 Event extraction-based public opinion analysis method and device and related components

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200065383A1 (en) * 2018-08-24 2020-02-27 S&P Global Inc. Sentiment Analysis
CN112784580A (en) * 2021-01-25 2021-05-11 中国工商银行股份有限公司 Financial data analysis method and device based on event extraction
CN112860852A (en) * 2021-01-26 2021-05-28 北京金堤科技有限公司 Information analysis method and device, electronic equipment and computer readable storage medium
CN113434631A (en) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 Emotion analysis method and device based on event, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063427A (en) * 2014-06-06 2014-09-24 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
CN110442857B (en) * 2019-06-18 2024-05-10 平安科技(深圳)有限公司 Emotion intelligent judging method and device and computer readable storage medium
CN110705300A (en) * 2019-09-27 2020-01-17 上海烨睿信息科技有限公司 Emotion analysis method, emotion analysis system, computer terminal and storage medium
CN112632225B (en) * 2020-12-29 2022-08-30 天津汇智星源信息技术有限公司 Semantic searching method and device based on case and event knowledge graph and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200065383A1 (en) * 2018-08-24 2020-02-27 S&P Global Inc. Sentiment Analysis
CN112784580A (en) * 2021-01-25 2021-05-11 中国工商银行股份有限公司 Financial data analysis method and device based on event extraction
CN112860852A (en) * 2021-01-26 2021-05-28 北京金堤科技有限公司 Information analysis method and device, electronic equipment and computer readable storage medium
CN113434631A (en) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 Emotion analysis method and device based on event, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113434631A (en) 2021-09-24
CN113434631B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
US20200184307A1 (en) Utilizing recurrent neural networks to recognize and extract open intent from text inputs
WO2022267460A1 (en) Event-based sentiment analysis method and apparatus, and computer device and storage medium
WO2021051560A1 (en) Text classification method and apparatus, electronic device, and computer non-volatile readable storage medium
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
WO2021003810A1 (en) Service system update method, electronic device and readable storage medium
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN111460787A (en) Topic extraction method and device, terminal device and storage medium
CN108664595B (en) Domain knowledge base construction method and device, computer equipment and storage medium
WO2018201600A1 (en) Information mining method and system, electronic device and readable storage medium
TW202020691A (en) Feature word determination method and device and server
CN111444723A (en) Information extraction model training method and device, computer equipment and storage medium
WO2023040493A1 (en) Event detection
WO2021143206A1 (en) Single-statement natural language processing method and apparatus, computer device, and readable storage medium
WO2021159656A1 (en) Method, device, and equipment for semantic completion in a multi-round dialogue, and storage medium
JPWO2012147428A1 (en) Text clustering apparatus, text clustering method, and program
US20210073257A1 (en) Logical document structure identification
CN110991163A (en) Document comparison analysis method and device, electronic equipment and storage medium
WO2021063089A1 (en) Rule matching method, rule matching apparatus, storage medium and electronic device
CN110738055A (en) Text entity identification method, text entity identification equipment and storage medium
WO2023124647A1 (en) Summary determination method and related device thereof
US11397756B2 (en) Data archiving method and computing device implementing same
CN113704420A (en) Method and device for identifying role in text, electronic equipment and storage medium
CN110263345B (en) Keyword extraction method, keyword extraction device and storage medium
CN114329112A (en) Content auditing method and device, electronic equipment and storage medium
WO2022073341A1 (en) Disease entity matching method and apparatus based on voice semantics, and computer device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22826971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22826971

Country of ref document: EP

Kind code of ref document: A1