WO2022141861A1

WO2022141861A1 - Emotion classification method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022141861A1
Application number: PCT/CN2021/083713
Authority: WO
Inventors: 何友鑫; 彭琛; 汪伟
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-31
Filing date: 2021-03-30
Publication date: 2022-07-07
Also published as: CN112732915A

Abstract

An emotion classification method, comprising: acquiring original text data, and performing text preprocessing on the original text data to obtain an initial word set (S1); performing encoding processing on the initial word set to obtain an integer code, and performing vectorization processing on the initial word set according to the integer code to obtain a standard word vector set (S2); performing bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set (S3); performing screening processing on the semantic word vector set using a preset long short-term memory network to obtain a target text sequence, performing probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value, and analyzing the probability value to obtain an emotion classification result (S4). In addition, the present invention further relates to blockchain technology, and the initial word set can be stored in a node of a blockchain. Further provided are an emotion classification apparatus, an electronic device, and a computer-readable storage medium, capable of solving the problem of low accuracy in emotion classification.

Description

Emotion classification method, device, electronic device and storage medium

This application claims the priority of the Chinese patent application with the application number CN202011640369.0 and the invention titled "Emotion Classification Method, Device, Electronic Device and Storage Medium", which was submitted to the China Patent Office on December 31, 2020, the entire content of which is approved by Reference is incorporated in this application.

technical field

The present application relates to the technical field of intelligent decision-making, and in particular, to an emotion classification method, apparatus, electronic device, and computer-readable storage medium.

Background technique

With the continuous rise of social networks, the Internet has become not only a source for people to obtain daily information, but also an indispensable platform for people to express their opinions. People commenting on hot events, expressing movie reviews, and describing product experience in online communities will generate a large amount of textual information with emotional color. level of attention.

The inventor realizes that the existing sentiment classification methods are based on traditional machine learning methods, which cannot extract deeper contextual semantics and structural features, resulting in incomplete or incomplete keyword extraction, thereby reducing the accuracy of sentiment classification.

SUMMARY OF THE INVENTION

A sentiment classification method including:

Obtain original text data, and perform text preprocessing on the original text data to obtain an initial word set;

Encoding the initial word set to obtain an integer code, and performing vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

Use a preset text training model to perform bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set;

Use a preset long-term and short-term memory network to screen the semantic word vector set to obtain a target text sequence, and perform probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value. Perform analysis to obtain sentiment classification results.

An emotion classification device, the device includes:

a text preprocessing module, used to obtain original text data, perform text preprocessing on the original text data, and obtain an initial word set;

A vectorization module, configured to perform encoding processing on the initial word set to obtain an integer code, and perform vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

a bidirectional semantic module, used for performing bidirectional semantic processing on the standard word vector set by using a preset text training model to obtain a semantic word vector set;

a classification module, used for screening the semantic word vector set by using a preset long-term and short-term memory network to obtain a target text sequence, and performing probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value, A sentiment classification result is obtained by analyzing the probability value.

An electronic device comprising:

a memory that stores at least one instruction; and

A processor that executes the instructions stored in the memory to achieve the following steps:

A computer-readable storage medium having at least one instruction stored in the computer-readable storage medium, the at least one instruction being executed by a processor in an electronic device to implement the following steps:

This application can solve the problem of low accuracy of sentiment classification.

Description of drawings

1 is a schematic flowchart of an emotion classification method provided by an embodiment of the present application;

FIG. 2 is a functional block diagram of an emotion classification device provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an electronic device implementing the emotion classification method according to an embodiment of the present application.

The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

Detailed ways

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

The embodiment of the present application provides an emotion classification method. The execution subject of the emotion classification method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the emotion classification method can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to FIG. 1 , a schematic flowchart of an emotion classification method provided by an embodiment of the present application is shown. In this embodiment, the emotion classification method includes:

S1. Obtain original text data, and perform text preprocessing on the original text data to obtain an initial word set.

In this embodiment of the present application, the original text data may be chapter-level text.

For example, the original text data is real estate-related news articles. Specifically, the news articles can be crawled from real estate-related news sites by using python technology.

In other embodiments of the present application, the original text data may also be news articles in other fields, for example, news articles related to e-commerce, news articles related to the medical field.

Specifically, performing text preprocessing on the original text data to obtain an initial set of words, including:

Extracting key sentences in the original text data to obtain a key sentence set;

Performing stopword removal processing on the key sentence set to obtain a stoppage-removing sentence set;

A word segmentation process is performed on the stop-removing sentence set to obtain an initial word set.

Preferably, the key sentences in the original text data include at least two of the title, the first sentence, the last sentence, and the middle key sentence in the original text data.

Wherein, the intermediate key sentence may be the sentence after the conjunction, for example, if the conjunction "then" is detected, the sentence after the conjunction is used as the key sentence. For example, the key sentences in the original text data are the first and last sentences and the middle key sentences of real estate-related news articles.

Specifically, the process of removing stop words is to use a preset stop word table to remove words that have no actual meaning in the key sentences in the key sentence set. For example, words such as "ah" and "de" in each key sentence in the key sentence set are deleted.

Wherein, the stop word table may be the obtained "HIT stop word database" and "Sichuan University machine learning intelligent laboratory stop word database", or the stop word table may also be preset.

Further, one of the embodiments of the present application may use the Jieba tool to perform word segmentation on each sentence in the stop-stop sentence set, and split each sentence into multiple words to obtain an initial word set.

S2. Perform encoding processing on the initial word set to obtain an integer code, and perform vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set.

In the embodiment of the present application, the encoding process performed on the initial word set to obtain an integer code includes:

determining a categorical variable for each initial word in the initial word set;

Encoding and naming processing is performed on the categorical variables to obtain integer codes.

Wherein, the classification variable of the initial word refers to the category to which the initial word belongs, and determining the classification variable of the initial word set is to analyze the category to which the initial word in the initial word set belongs.

Specifically, in the embodiment of the present application, the coding process for the categorical variable is to perform coding and identification according to different categories of the categorical variable. For example, the first categorical variable is identified as 0, and the second categorical variable is identified as 1, to identify the third categorical variable as 2.

Wherein, for example, the initial word set includes ["house price", "rising", "falling", "rising"], and it is determined that the classification variables in the initial word set are house price, rising and falling, a total of three categories, Integer encoding is performed on the classification vector, so that the house price is 0, the increase is 1, and the decrease is 2, and the standard word vector set can be obtained as [1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0].

Further, the initial word set is vectorized according to the integer code to obtain a standard word vector set, including:

Select any target point in the two-dimensional Cartesian coordinate system;

The initial words in the initial word set are arranged vertically with the target point as the benchmark, and the categorical variables are arranged horizontally in the order of the integer coding based on the target point;

If the intersection points between the words corresponding to the horizontally arranged categorical variables and the vertically arranged initial words are the same, let the intersection be the first value. The intersections between words are not the same, let the intersection be the second value to obtain the result matrix;

A vector formed by the first numerical value and the second numerical value in the result matrix is extracted to obtain a standard word vector set.

For example, the resulting matrix is

Then extract values from the result matrix in row or column order to obtain multiple vectors [1, 0, 0], [0, 1, 0], [0, 0, 1], then the standard word vector set consists of the multiple vectors. composed of vectors.

S3. Use a preset text training model to perform bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set.

In the embodiment of the present application, a preset text training model is used to perform bidirectional semantic processing on the standard word vector set, wherein the structure of the text training model consists of a three-layer Bi-LSTM (Bi-directional Long Short-Term Memory, bidirectional long short-term memory) network.

Specifically, performing bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set, including:

obtaining multiple target vectors in the standard word vector set;

calculating a plurality of forward vectors and a plurality of backward vectors of the plurality of target vectors;

Calculate by using a preset bidirectional semantic calculation formula, the plurality of forward vectors and the plurality of backward vectors, to obtain a plurality of semantic word vectors of the plurality of target vectors;

Summarize the plurality of semantic word vectors to obtain the semantic word vector set.

In detail,

The forward vector formula and the backward vector formula include:

in,

represents the forward vector,

represents the backward vector,

and

is the first variable in the Bi-LSTM network,

is the previous word of the forward vector,

is the previous word of the backward vector.

Specifically, the bidirectional semantic calculation formula includes:

Among them, h _t is the semantic word vector,

represents the forward vector,

Represents the backward vector, U is the second variable in the Bi-LSTM network, and c is the preset parameter.

In detail, since the text training model contains three layers of Bi-LSTM networks, the same standard word vector can be extracted layer by layer in the text training model to three word vectors and added as new features to subsequent tasks to participate in training. , so as to realize the dynamic update of the word vector. The input value of the first layer of Bi-LSTM network is a standard word vector, and the input of the second and third layers of Bi-LSTM network corresponds to the word vector output by the corresponding position of the previous layer. With the increase of network depth, the syntactic and semantic information contained in the word vector will be more abundant.

S4. Use a preset long-term and short-term memory network to screen the semantic word vector set to obtain a target text sequence, and perform probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value. The probability value is analyzed to obtain the sentiment classification result.

In the embodiments of the present application, the LSTM network (Long Short-Term Memory, long short-term memory network) is a time recurrent neural network, including: an input gate, a forget gate, and an output gate.

Specifically, the use of the preset long-term and short-term memory network to screen the semantic word vector set to obtain the target text sequence, including:

Step A: Calculate the state value of the semantic word vector in the semantic word vector set through the input gate;

Step B: calculating the activation value of the semantic word vector in the semantic word vector set through the forgetting gate;

Step C: Calculate the state update value of the semantic word vector according to the state value and the activation value;

Step D: using the output gate to calculate the initial text sequence corresponding to the state update value;

Step E: Calculate the loss value of the initial text sequence and the preset real label according to a preset loss function, and when the loss value is less than a preset threshold, determine that the initial text sequence is the target text of the semantic word vector sequence.

In an optional embodiment, the calculation method of the state value includes:

Among them, i _t represents the state value,

represents the bias of the cell unit in the input gate, w _i represents the activation factor of the input gate, h _t-1 represents the peak value of the semantic word vector at time t-1 of the input gate, x _t represents the semantic word vector at time t, b _i Represents the weights of the cell units in the input gate.

In an optional embodiment, the calculation method of the activation value includes:

where f _t represents the activation value,

represents the bias of the cell unit in the forget gate, w _f represents the activation factor of the forget gate,

represents the peak value of the semantic word vector at time t-1 of the forgetting gate, x _t represents the semantic word vector input at time t, and b _f represents the weight of the cell unit in the forgetting gate.

In an optional embodiment, the method for calculating the state update value includes:

Among them, c _t represents the state update value, h _t-1 represents the peak value of the semantic word vector at the time of input gate t-1,

Represents the peak value of the semantic word vector at the forget gate time t-1.

In an optional embodiment, the calculating the initial text sequence corresponding to the state update value by using the output gate includes: calculating the initial text sequence by using the following formula:

o _t =tan h(c _t )

where o _t represents the initial text sequence, tan h represents the activation function of the output gate, and c _t represents the state update value.

Further, performing probability calculation on the target text sequence according to the preset attention mechanism and obtaining a probability value, and analyzing the probability value to obtain a sentiment classification result, including:

Calculate the weight coefficient of the target text sequence according to a preset weight coefficient formula;

Calculate the context sequence of the target text sequence by using the weight coefficient;

Calculate the probability value corresponding to the target text sequence according to the context sequence and a preset probability calculation formula;

If the probability value is greater than a preset first probability value, determine that the emotion classification result is a positive emotion;

If the probability value is less than a preset first probability value and greater than a preset second probability value, determine that the emotion classification result is a negative emotion;

If the probability value is less than a preset second probability value, it is determined that the emotion classification result is a neutral emotion.

Specifically, the preset weight coefficient formula includes:

where at is the weight coefficient, h _t is the hidden unit in the _LSTM network, W, V and U are the variables in the LSTM, tanh is the activation function, exp is the exponential function, and t is the number of texts in the target sequence , s is the preset parameter in the LSTM network, and o _t is the target text sequence.

Further, using the weight coefficient to calculate the context sequence of the target text sequence, including:

Among them, ct is the context sequence, _ats is the weight coefficient, o _t _is the target text sequence, and s is the preset parameter in the LSTM network.

Specifically, calculating the probability value corresponding to the target text sequence according to the context sequence and a preset probability calculation formula includes:

The probability calculation formula is:

y _t =f(c _t ,o _t )=σ(W _c [c _t ;o _t ])

Among them, y _t is the probability value, and c _t is the state update value.

The present application obtains a standard word vector set by preprocessing and vectorizing the original text data, and then performs bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set. The bidirectional semantic processing can capture standard words The forward information and backward information of the vector make the obtained semantic word vector contain the semantic information of the context, which enhances the comprehensiveness and richness of the extracted semantic information, which is beneficial to improve the accuracy of text sentiment classification. Therefore, the emotion classification method proposed in this application can solve the problem of low accuracy of emotion classification.

As shown in FIG. 2 , it is a functional block diagram of an emotion classification apparatus provided by an embodiment of the present application.

The emotion classification apparatus 100 described in this application may be installed in an electronic device. According to the implemented functions, the emotion classification apparatus 100 may include a text preprocessing module 101 , a vectorization module 102 , a bidirectional semantic module 103 and a classification module 104 . The modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of the electronic device and can perform fixed functions, and are stored in the memory of the electronic device.

In this embodiment, the functions of each module/unit are as follows:

The text preprocessing module 101 is used for acquiring original text data, and performing text preprocessing on the original text data to obtain an initial word set;

The vectorization module 102 is configured to perform encoding processing on the initial word set to obtain an integer code, and perform vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

The bidirectional semantic module 103 is configured to perform bidirectional semantic processing on the standard word vector set by using a preset text training model to obtain a semantic word vector set;

The classification module 104 is used for screening the semantic word vector set by using a preset long-term and short-term memory network to obtain a target text sequence, and performing probability calculation on the target text sequence according to a preset attention mechanism and obtaining The probability value is analyzed to obtain a sentiment classification result. .

The text preprocessing module 101 is used for acquiring original text data, and performing text preprocessing on the original text data to obtain an initial word set.

Specifically, the text preprocessing module 101 is specifically used for:

Get raw text data;

The vectorization module 102 is configured to perform encoding processing on the initial word set to obtain an integer code, and perform vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set.

In this embodiment of the present application, the vectorization module 102 is specifically configured to:

Specifically, in the embodiment of the present application, the coding process for the categorical variable is to perform coding and identification according to different categories of the categorical variable, for example, the first categorical variable is identified as 0, and the second categorical variable is identified as 1, to identify the third categorical variable as 2.

Further, performing vectorization on the initial word set according to the integer code to obtain a standard word vector set, including:

Select any target point in the two-dimensional Cartesian coordinate system;

The initial words in the initial word set are arranged vertically with the target point as a benchmark, and the categorical variables are arranged horizontally according to the order of the integer coding on the basis of the target point;

For example, the resulting matrix is

The bidirectional semantic module 103 is configured to perform bidirectional semantic processing on the standard word vector set by using a preset text training model to obtain a semantic word vector set.

Specifically, the bidirectional semantic module 103 is specifically used for:

obtaining multiple target vectors in the standard word vector set;

In detail, the forward vector formula and the backward vector formula include:

in,

represents the forward vector,

represents the backward vector,

and

is the first variable in the Bi-LSTM network,

is the previous word of the forward vector,

is the previous word of the backward vector.

Specifically, the bidirectional semantic calculation formula includes:

Among them, h _t is the semantic word vector,

represents the forward vector,

The classification module 104 is configured to perform screening processing on the semantic word vector set by using a preset long and short-term memory network to obtain a target text sequence, and perform probability calculation on the target text sequence according to a preset attention mechanism and obtain the target text sequence. The probability value is analyzed to obtain a sentiment classification result.

In the embodiment of the present application, the LSTM (Long Short-Term Memory, long short-term memory network) network is a time recurrent neural network, including: an input gate, a forget gate, and an output gate.

Specifically, the classification module 104 is specifically used for:

Calculate the state value of the semantic word vector in the semantic word vector set through the input gate;

Calculate the activation value of the semantic word vector in the semantic word vector set through the forgetting gate;

Calculate the state update value of the semantic word vector according to the state value and the activation value;

Calculate the initial text sequence corresponding to the state update value by using the output gate;

According to the preset loss function, the loss value of the initial text sequence and the preset real label is calculated, and when the loss value is less than the preset threshold, it is determined that the initial text sequence is the target text sequence of the semantic word vector.

In an optional embodiment, the calculation method of the state value includes:

Among them, i _t represents the state value,

where f _t represents the activation value,

o _t =tan h(c _t )

Specifically, the preset weight coefficient formula includes:

The probability calculation formula is:

y _t =f(c _t ,o _t )=σ(W _c [c _t ;o _t ])

Among them, y _t is the probability value, and c _t is the state update value.

The present application obtains a standard word vector set by preprocessing and vectorizing the original text data, and then performs bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set. The bidirectional semantic processing can capture standard words The forward information and backward information of the vector make the obtained semantic word vector contain the semantic information of the context, which enhances the comprehensiveness and richness of the extracted semantic information, which is beneficial to improve the accuracy of text sentiment classification. Therefore, the emotion classification device proposed in this application can solve the problem of low accuracy of emotion classification.

As shown in FIG. 3 , it is a schematic structural diagram of an electronic device for implementing an emotion classification method provided by an embodiment of the present application.

The electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as an emotion classification program 12.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the code of the emotion classification program 12, etc., but also can be used to temporarily store data that has been output or will be output.

In some embodiments, the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits. Central Processing Unit (CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the programs or modules (such as emotion) stored in the memory 11. classification programs, etc.), and call data stored in the memory 11 to perform various functions of the electronic device 1 and process data.

The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.

FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the figure. components, or a combination of certain components, or a different arrangement of components.

For example, although not shown, the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management The device implements functions such as charge management, discharge management, and power consumption management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.

Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. Among them, the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.

The emotion classification program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, it can realize:

Specifically, for the specific implementation method of the above-mentioned instruction by the processor 10, reference may be made to the description of the relevant steps in the corresponding embodiments of FIG. 1 to FIG. 3 , which will not be repeated here.

Further, if the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, a read-only memory (ROM, Read-Only). Memory).

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. The readable storage medium stores a computer program, and the computer program is stored in the When executed by the processor of the electronic device, it can achieve:

In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.

Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application and not to limit them. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims

A sentiment classification method, wherein the method comprises:

Obtain original text data, and perform text preprocessing on the original text data to obtain an initial word set;

Encoding the initial word set to obtain an integer code, and performing vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

Use a preset text training model to perform bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set;

Use a preset long-term and short-term memory network to screen the semantic word vector set to obtain a target text sequence, and perform probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value. Perform analysis to obtain sentiment classification results.
The emotion classification method according to claim 1, wherein the encoding process on the initial word set to obtain an integer code, comprising:

determining a categorical variable for each initial word in the initial word set;

Encoding and naming processing is performed on the categorical variables to obtain integer codes.
The sentiment classification method according to claim 1, wherein the vectorizing the initial word set according to the integer code to obtain a standard word vector set, comprising:

Select any target point in the two-dimensional Cartesian coordinate system;

The initial words in the initial word set are arranged vertically with the target point as the benchmark, and the categorical variables are arranged horizontally in the order of the integer coding based on the target point;

If the intersection points between the words corresponding to the horizontally arranged categorical variables and the vertically arranged initial words are the same, let the intersection be the first value. The intersections between words are different, let the intersection be a second value, and obtain a result matrix composed of the first value and the second value;

The first numerical value or the second numerical value is extracted from the result matrix to form a plurality of vectors to obtain a standard word vector set.
The sentiment classification method according to claim 1, wherein the bidirectional semantic processing is performed on the standard word vector set by using a preset text training model to obtain a semantic word vector set, comprising:

obtaining multiple target vectors in the standard word vector set;

calculating a plurality of forward vectors and a plurality of backward vectors of the plurality of target vectors;

Calculate by using a preset bidirectional semantic calculation formula, the plurality of forward vectors and the plurality of backward vectors, to obtain a plurality of semantic word vectors of the plurality of target vectors;

Summarize the plurality of semantic word vectors to obtain the semantic word vector set.
The emotion classification method according to claim 1, wherein the selection of the semantic word vector set by using a preset long-term and short-term memory network to obtain a target text sequence, comprising:

Calculate the state value of the semantic word vector in the semantic word vector set through the input gate;

Calculate the activation value of the semantic word vector in the semantic word vector set through the forgetting gate;

Calculate the state update value of the semantic word vector according to the state value and the activation value;

Calculate the initial text sequence corresponding to the state update value by using the output gate;

The loss value between the initial text sequence and the preset real label is calculated according to a preset loss function, and when the loss value is less than a preset threshold, it is determined that the initial text sequence is the target text sequence of the semantic word vector.
The emotion classification method according to claim 5, wherein the probability calculation is performed on the target text sequence according to a preset attention mechanism to obtain a probability value, and the emotion classification result is obtained by analyzing the probability value, comprising:

Calculate the weight coefficient of the target text sequence according to a preset weight coefficient formula;

Calculate the context sequence of the target text sequence by using the weight coefficient;

Calculate the probability value corresponding to the target text sequence according to the context sequence and a preset probability calculation formula;

If the probability value is greater than a preset first probability value, determine that the emotion classification result is a positive emotion;

If the probability value is less than a preset first probability value and greater than a preset second probability value, determine that the emotion classification result is a negative emotion;

If the probability value is less than a preset second probability value, it is determined that the emotion classification result is a neutral emotion.
The sentiment classification method according to any one of claims 1 to 6, wherein the text preprocessing is performed on the original text data to obtain an initial word set, comprising:

Extracting key sentences in the original text data to obtain a key sentence set;

Performing stopword removal processing on the key sentence set to obtain a stoppage-removing sentence set;

A word segmentation process is performed on the stop-removing sentence set to obtain an initial word set.
An emotion classification device, wherein the device comprises:

a text preprocessing module, used to obtain original text data, perform text preprocessing on the original text data, and obtain an initial word set;

a vectorization module, configured to perform encoding processing on the initial word set to obtain an integer code, and perform vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

a bidirectional semantic module, used for performing bidirectional semantic processing on the standard word vector set by using a preset text training model to obtain a semantic word vector set;

a classification module, used for screening the semantic word vector set by using a preset long-term and short-term memory network to obtain a target text sequence, and performing probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value, A sentiment classification result is obtained by analyzing the probability value.
An electronic device, wherein the electronic device comprises:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the steps of:

Obtain original text data, and perform text preprocessing on the original text data to obtain an initial word set;

Encoding the initial word set to obtain an integer code, and performing vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

Use a preset text training model to perform bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set;

Use a preset long-term and short-term memory network to screen the semantic word vector set to obtain a target text sequence, and perform probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value. Perform analysis to obtain sentiment classification results.
The electronic device according to claim 9, wherein the encoding process on the initial word set to obtain an integer code comprises:

determining a categorical variable for each initial word in the initial word set;

Encoding and naming processing is performed on the categorical variables to obtain integer codes.
The electronic device according to claim 9, wherein the vectorizing the initial word set according to the integer code to obtain a standard word vector set, comprising:

Select any target point in the two-dimensional Cartesian coordinate system;

The initial words in the initial word set are arranged vertically with the target point as the benchmark, and the categorical variables are arranged horizontally in the order of the integer coding based on the target point;

If the intersection points between the words corresponding to the horizontally arranged categorical variables and the vertically arranged initial words are the same, let the intersection be the first value. The intersections between words are different, let the intersection be a second value, and obtain a result matrix composed of the first value and the second value;

The first numerical value or the second numerical value is extracted from the result matrix to form a plurality of vectors to obtain a standard word vector set.
The electronic device according to claim 9, wherein the bidirectional semantic processing is performed on the standard word vector set by using a preset text training model to obtain a semantic word vector set, comprising:

obtaining multiple target vectors in the standard word vector set;

calculating a plurality of forward vectors and a plurality of backward vectors of the plurality of target vectors;

Calculate by using a preset bidirectional semantic calculation formula, the plurality of forward vectors and the plurality of backward vectors, to obtain a plurality of semantic word vectors of the plurality of target vectors;

Summarize the plurality of semantic word vectors to obtain the semantic word vector set.
The electronic device according to claim 9, wherein the selection of the semantic word vector set by using a preset long-term and short-term memory network to obtain a target text sequence, comprising:

Calculate the state value of the semantic word vector in the semantic word vector set through the input gate;

Calculate the activation value of the semantic word vector in the semantic word vector set through the forgetting gate;

Calculate the state update value of the semantic word vector according to the state value and the activation value;

Calculate the initial text sequence corresponding to the state update value by using the output gate;

The loss value between the initial text sequence and the preset real label is calculated according to a preset loss function, and when the loss value is less than a preset threshold, it is determined that the initial text sequence is the target text sequence of the semantic word vector.
The electronic device according to claim 13, wherein the performing probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value, and analyzing the probability value to obtain a sentiment classification result, comprising:

Calculate the weight coefficient of the target text sequence according to a preset weight coefficient formula;

Calculate the context sequence of the target text sequence by using the weight coefficient;

Calculate the probability value corresponding to the target text sequence according to the context sequence and a preset probability calculation formula;

If the probability value is greater than a preset first probability value, determine that the emotion classification result is a positive emotion;

If the probability value is less than a preset first probability value and greater than a preset second probability value, determine that the emotion classification result is a negative emotion;

If the probability value is less than a preset second probability value, it is determined that the emotion classification result is a neutral emotion.
The electronic device according to any one of claims 9 to 14, wherein, performing text preprocessing on the original text data to obtain an initial word set, comprising:

Extracting key sentences in the original text data to obtain a key sentence set;

Performing stopword removal processing on the key sentence set to obtain a stoppage-removing sentence set;

A word segmentation process is performed on the stop-removing sentence set to obtain an initial word set.
A computer-readable storage medium storing a computer program, wherein the computer program implements the following steps when executed by a processor:

Obtain original text data, and perform text preprocessing on the original text data to obtain an initial word set;

Encoding the initial word set to obtain an integer code, and performing vectorization processing on the initial word set according to the integer encoding to obtain a standard word vector set;

Use a preset text training model to perform bidirectional semantic processing on the standard word vector set to obtain a semantic word vector set;

Use a preset long-term and short-term memory network to screen the semantic word vector set to obtain a target text sequence, and perform probability calculation on the target text sequence according to a preset attention mechanism to obtain a probability value. Perform analysis to obtain sentiment classification results.
The computer-readable storage medium according to claim 16, wherein the encoding process on the initial word set to obtain an integer code comprises:

determining a categorical variable for each initial word in the initial word set;

Encoding and naming processing is performed on the categorical variables to obtain integer codes.
The computer-readable storage medium according to claim 16, wherein the vectorizing the initial word set according to the integer code to obtain a standard word vector set, comprising:

Select any target point in the two-dimensional Cartesian coordinate system;

The initial words in the initial word set are arranged vertically with the target point as the benchmark, and the categorical variables are arranged horizontally in the order of the integer coding based on the target point;

If the intersection points between the words corresponding to the horizontally arranged categorical variables and the vertically arranged initial words are the same, let the intersection be the first value. The intersections between words are different, let the intersection be a second value, and obtain a result matrix composed of the first value and the second value;

The first numerical value or the second numerical value is extracted from the result matrix to form a plurality of vectors to obtain a standard word vector set.
The computer-readable storage medium according to claim 16, wherein the bidirectional semantic processing is performed on the standard word vector set by using a preset text training model to obtain a semantic word vector set, comprising:

obtaining multiple target vectors in the standard word vector set;

calculating a plurality of forward vectors and a plurality of backward vectors of the plurality of target vectors;

Calculate by using a preset bidirectional semantic calculation formula, the plurality of forward vectors and the plurality of backward vectors, to obtain a plurality of semantic word vectors of the plurality of target vectors;

Summarize the plurality of semantic word vectors to obtain the semantic word vector set.
The computer-readable storage medium according to claim 16 , wherein the filtering of the semantic word vector set by using a preset long-term and short-term memory network to obtain a target text sequence, comprising:

Calculate the state value of the semantic word vector in the semantic word vector set through the input gate;

Calculate the activation value of the semantic word vector in the semantic word vector set through the forgetting gate;

Calculate the state update value of the semantic word vector according to the state value and the activation value;

Calculate the initial text sequence corresponding to the state update value by using the output gate;

The loss value between the initial text sequence and the preset real label is calculated according to a preset loss function, and when the loss value is less than a preset threshold, it is determined that the initial text sequence is the target text sequence of the semantic word vector.