CN113590767A

CN113590767A - Multilingual alarm information category judgment method, system, equipment and storage medium

Info

Publication number: CN113590767A
Application number: CN202111145028.0A
Authority: CN
Inventors: 曾卫东; 王鑫; 陈翔; 梁法光; 王宾; 管磊; 文继锋; 程国栋; 陈修迪
Original assignee: NR Electric Co Ltd; Xian Thermal Power Research Institute Co Ltd
Current assignee: NR Electric Co Ltd; Xian Thermal Power Research Institute Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2021-11-02
Anticipated expiration: 2041-09-28
Also published as: CN113590767B

Abstract

The invention belongs to the field of natural language processing, and discloses a multilingual alarm information category judgment method, a system, equipment and a storage medium, which comprise the following steps: acquiring multi-language alarm information to be classified; coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified. The method eliminates the difference of Chinese, English and number in the form of the multi-language alarm information, only keeps the semantic relevance of the Chinese, English and number, can extract semantic features of the multi-language alarm information through a single high-dimensional semantic feature extraction model, can realize the classification of the multi-language alarm information by means of an alarm information classification model, and effectively improves the accuracy of judging the category of the multi-language alarm information.

Description

Multilingual alarm information category judgment method, system, equipment and storage medium

Technical Field

The invention belongs to the field of natural language processing, and relates to a multilingual alarm information category judgment method, system, equipment and storage medium.

Background

The distributed control system is used as a brain of the thermal power plant, the running state of power generation equipment of the thermal power plant and the running states of lower computer cards and controllers of the control system need to be monitored in real time, and if the equipment, the cards and the controllers are abnormal, an upper computer of the control system needs to record alarm information. In practice, because the number of devices capable of generating alarms is large, a large amount of alarm information will be recorded in use, and in order to reasonably apply the alarm information, the alarm information needs to be divided according to the actual alarm type.

Information classification is a problem in the field of natural language processing, and is generally performed on english sentences in units of words. However, when the alarm information is classified, most of the alarm information contains English, Chinese and numbers at the same time, which relates to a multi-language problem, and results in poor classification accuracy achieved by the existing classification method.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned disadvantage of the prior art that the classification accuracy of the multilingual alarm information classification is poor, and provides a method, a system, a device and a storage medium for determining the multilingual alarm information classification.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

in a first aspect of the present invention, a method for determining a category of a multilingual warning message includes the steps of: acquiring multi-language alarm information to be classified; coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.

The multilingual alarm information category judgment method of the invention is further improved in that:

the word stock is constructed in the following way: acquiring historical multilingual alarm information; counting Chinese characters appearing in the historical multilingual alarm information, the occurrence frequency of each Chinese character, English words appearing and the occurrence frequency of each English word; taking Chinese characters with the occurrence frequency larger than the preset frequency, English words with the occurrence frequency larger than the preset frequency, 10 Arabic numerals, UNK, BOS and EOS as word library elements; carrying out unified coding on each word stock element from 1 to obtain the codes of each word stock element; and combining the word stock elements and the codes of the word stock elements to obtain a word stock.

The specific method for coding the multilingual alarm information to be classified into the coding sequence through the preset word stock comprises the following steps: replacing the Chinese characters, English words and Arabic numerals contained in the multi-language alarm information to be classified in the word stock by the codes of corresponding word stock elements in the word stock; replacing Chinese characters or English words which are in the multi-language alarm information to be classified and do not contain the word stock by UNK codes in the word stock; and (3) taking the BOS code in the word stock as the beginning of the code sequence, and cutting or supplementing the code sequence into a fixed length by adopting the EOS code in the word stock to obtain the code sequence.

The specific method for inputting the coding sequence into the preset high-dimensional semantic feature extraction model to obtain the high-dimensional semantic feature vector comprises the following steps: acquiring one-hot vectors of each code in a coding sequence, multiplying the one-hot vectors of each code with a preset word embedding matrix in sequence, and inputting the multiplied one-hot vectors into a preset high-dimensional semantic feature extraction model; and (5) taking the cell state of the last moment of the high-dimensional semantic feature extraction model as a high-dimensional semantic feature vector.

The high-dimensional semantic feature extraction model is constructed in the following way: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; and training a preset long-time memory network model through a historical coding sequence to obtain a high-dimensional semantic feature extraction model.

The alarm type classification model is constructed in the following way: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; inputting the historical coding sequence into a preset high-dimensional semantic feature extraction model to obtain historical high-dimensional semantic feature vectors, and determining the alarm types of historical multi-language alarm information corresponding to the historical high-dimensional semantic feature vectors as labels of the historical high-dimensional semantic feature vectors; and training a preset three-layer feedforward neural network model through each historical high-dimensional semantic feature vector and the label of each historical high-dimensional semantic feature vector to obtain an alarm type classification model.

When the preset long-time memory network model is trained through the historical coding sequence: the loss function of the network model is memorized in a long-term mode according to the following formula:

wherein L: (S) Coding the sequence for historySThe loss function value, P, of the long and short term memory network model_t(S_t) Probability vector P output by long-time and short-time memory network model_tMiddle history code sequenceSTo (1) atA code s_tCorresponding one-hot vector S_tThe probability of (a) of (b) being,Ncoding the sequence for historySThe number of codes of (a); optimizing the long-time and short-time memory network model by a batch gradient descent method by taking the loss function value of the minimized long-time and short-time memory network model as an optimization target; when the preset three-layer feedforward neural network model is trained through the historical high-dimensional semantic feature vectors and the labels of the historical high-dimensional semantic feature vectors: the loss function of the three-layer feedforward neural network model is given by:

L(c)=-logP(c)

wherein L: (c) A loss function of a three-layer feedforward neural network model, Pc) Is the probability of the true class to which the current input belongs in the output probability vector P,cinputting a real category to which the high-dimensional semantic feature vector belongs; and optimizing the three-layer feedforward neural network model by a batch gradient descent method by taking the loss function of the minimized three-layer feedforward neural network model as an optimization target.

In a second aspect of the present invention, a multilingual warning-information-category determination system includes: the acquisition module is used for acquiring multi-language alarm information to be classified; the coding module is used for coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; the characteristic extraction module is used for inputting the coding sequence into a preset high-dimensional semantic characteristic extraction model to obtain a high-dimensional semantic characteristic vector; and the category judgment module is used for inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.

In a third aspect of the present invention, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multilingual alert information category determination method when executing the computer program.

In a fourth aspect of the present invention, a computer-readable storage medium stores a computer program, which when executed by a processor implements the steps of the above-mentioned multilingual alert-information-category determination method.

Compared with the prior art, the invention has the following beneficial effects:

the method for judging the category of the multi-language alarm information comprises the steps of coding the multi-language alarm information to be classified into a coding sequence through a preset word stock, and extracting a feature vector of the coding sequence by utilizing a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; the differences of Chinese, English and numbers in the multi-language alarm information in form are eliminated, only the relevance of the Chinese, English and numbers in the semantic is reserved, and the respective semantic information is stored by high-dimensional semantic feature vectors. And finally, classifying the high-dimensional semantic feature vectors by means of an alarm information classification model, further determining the types of the multi-language alarm information to be classified corresponding to the high-dimensional semantic feature vectors, and effectively improving the accuracy of judging the categories of the multi-language alarm information.

Drawings

FIG. 1 is a flow chart of a multilingual warning message type determination method according to the present invention;

FIG. 2 is a block diagram of a high-dimensional semantic feature extraction model training process according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention is described in further detail below with reference to the accompanying drawings:

referring to fig. 1, in an embodiment of the present invention, a method for determining a category of a multi-language alarm information is provided, which can extract semantic features from the multi-language alarm information through a single high-dimensional semantic feature extraction model, and can classify the multi-language alarm information by using an alarm information classification model. Specifically, the multilingual alarm information category determination method includes the following steps.

S1: and acquiring multi-language alarm information to be classified.

Preferably, the word stock is constructed in the following manner: acquiring historical multilingual alarm information; counting Chinese characters appearing in the historical multilingual alarm information, the occurrence frequency of each Chinese character, English words appearing and the occurrence frequency of each English word; taking Chinese characters with the occurrence frequency larger than the preset frequency, English words with the occurrence frequency larger than the preset frequency, 10 Arabic numerals, UNK, BOS and EOS as word library elements; carrying out unified coding on each word stock element from 1 to obtain the codes of each word stock element; and combining the word stock elements and the codes of the word stock elements to obtain a word stock.

Specifically, in this embodiment, the historical multilingual alarm information generated by the distributed control systems of different manufacturers is collected, including various device fault alarms, device action alarms, event sequence alarms, various system process alarms, and the like, and the collected multilingual alarm information is classified by the professional technician according to the alarm types and is used as a training data set of the high-dimensional semantic feature extraction model and the alarm type classification model.

And counting words appearing in the training data set, and adopting different counting modes for different languages. The Chinese character statistics method comprises the steps that Chinese characters are counted by taking the Chinese characters as units, and Chinese characters appearing in training data sets and the number of times of each Chinese character appear are counted; english is counted by taking English words as units, and English words appearing in the training data set and the number of times of each English word appear are counted; the Chinese characters and English words with the occurrence frequency more than 10 times are taken as word library elements and are brought into a word library, 10 Arabic numerals are all brought into the word library, special characters UNK, BOS and EOS are brought into the word library, and all the elements in the word library are uniformly coded from 1.

S2: and coding the multilingual alarm information to be classified into a coding sequence through a preset word stock.

Preferably, the step S2: the specific method for coding the multilingual alarm information to be classified into the coding sequence through the preset word stock comprises the following steps: replacing the Chinese characters, English words and Arabic numerals contained in the multi-language alarm information to be classified in the word stock by the codes of corresponding word stock elements in the word stock; replacing Chinese characters or English words which are in the multi-language alarm information to be classified and do not contain the word stock by UNK codes in the word stock; and (3) taking the BOS code in the word stock as the beginning of the code sequence, and cutting or supplementing the code sequence into a fixed length by adopting the EOS code in the word stock to obtain the code sequence.

S3: and inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector.

Preferably, the step S3: the specific method for inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain the high-dimensional semantic feature vector comprises the following steps: acquiring one-hot vectors of each code in a coding sequence, multiplying the one-hot vectors of each code with a preset word embedding matrix in sequence, and inputting the multiplied one-hot vectors into a preset high-dimensional semantic feature extraction model; and (5) taking the cell state of the last moment of the high-dimensional semantic feature extraction model as a high-dimensional semantic feature vector.

Wherein, the dimension of the one-hot vector of each code is consistent with the size of the multi-language word stock, and only s_t-1The values at the positions take 1 and the values at the other positions take 0. Wherein s is_t-1When representing the trainingtThe first of the time-of-day input code sequencest-1 code.

The preset high-dimensional semantic feature extraction model is constructed in the following mode: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; and training a preset long-time memory network model through a historical coding sequence to obtain a high-dimensional semantic feature extraction model.

Combining the word stock construction process, a training data set consisting of historical multilingual alarm information can be obtained, training is performed in a self-coding mode, the historical multilingual alarm information is coded into a historical coding sequence through a preset word stock, and then one-hot vectors of all codes in the historical coding sequence are obtained.

Specifically, referring to fig. 2, the initial high-dimensional semantic feature extraction model may adopt a single-layer Long Short Term memory network (LSTM) model, which is trained during trainingtTime of day input History code sequencet-1 codes s_t-1Corresponding one-hot vector S_t-1The vector dimension is consistent with the size of the multi-lingual word stock, with only s_t-1The values at the positions take 1 and the values at the other positions take 0. Embedding preset words into a matrix W_eAnd S_t-1Multiplication as input to the long-and-short-term memory network model, expecting the probability vector P at the output_tIn the history coding sequencetProbability of individual character P_t(S_t) And max.

Wherein the preset words are embedded into the matrix W_eThe word embedding matrix adopts random initialization, and parameters in the matrix are trained along with a long-time memory network. The one-hot vector can be converted into a word embedding vector by a word embedding matrix:

E _t=W_eS_t

when training a preset long-time memory network model through a historical coding sequence: the loss function of the network model is memorized in a long-term mode according to the following formula:

wherein L: (S) Coding the sequence for historySThe loss function value, P, of the long and short term memory network model_t(S_t) Probability vector P output by long-time and short-time memory network model_tMiddle history code sequenceSTo (1) atA code s_tCorresponding one-hot vector S_tThe probability of (a) of (b) being,Ncoding the sequence for historySThe number of codes of (2).

And (3) with the loss function value of the minimum long-short time memory network model as an optimization target, optimizing the long-short time memory network model by a batch gradient descent method, adjusting parameters of the long-short time memory network model, and optimizing until the reduction of the value of the loss function is smaller than a preset value. And finally, obtaining a preset high-dimensional semantic feature extraction model.

When the characteristics of the coding sequence are extracted through the high-dimensional semantic characteristic extraction model, the high-dimensional semantic characteristic extraction model is used for extracting the characteristicsNAnd the cell state of the moment is taken as the extracted high-dimensional semantic feature vector.

S4: and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.

Preferably, the alarm type classification model in step S4 is constructed as follows: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; inputting the historical coding sequence into a preset high-dimensional semantic feature extraction model to obtain historical high-dimensional semantic feature vectors, and determining the alarm types of historical multi-language alarm information corresponding to the historical high-dimensional semantic feature vectors as labels of the historical high-dimensional semantic feature vectors; and training a preset three-layer feedforward neural network model through each historical high-dimensional semantic feature vector and the label of each historical high-dimensional semantic feature vector to obtain an alarm type classification model.

Combining the word stock construction process and the high-dimensional semantic feature extraction model, historical high-dimensional semantic feature vectors of historical multi-language alarm information obtained through the high-dimensional semantic feature extraction model can be obtained, the historical high-dimensional semantic feature vectors are matched with alarm abnormal types one by one and serve as labels of the historical high-dimensional semantic feature vectors to participate in training of the alarm type classification model.

The initial alarm type classification model can select a three-layer feedforward neural network model, and the output of the three-layer feedforward neural network model is used as the alarm type corresponding to the input high-dimensional semantic feature vector.

The loss function of the three-layer feedforward neural network model is given by:

L(c)=-logP(c)

wherein L: (c) A loss function of a three-layer feedforward neural network model, Pc) Is the probability of the true class to which the current input belongs in the output probability vector P,cinputting a real category to which the high-dimensional semantic feature vector belongs; and (3) optimizing the three-layer feedforward neural network model by a batch gradient descent method by taking the loss function of the minimized three-layer feedforward neural network model as an optimization target, namely maximizing the probability of the alarm type to which the input high-dimensional semantic feature vector belongs until the reduction of the value of the loss function is smaller than a preset value. And finally, obtaining a preset alarm type classification model.

Then, on the basis of obtaining the alarm type classification model, the high-dimensional semantic feature vector of the multi-language alarm information to be classified obtained in step S3 is input into the alarm type classification model, and the alarm type of the multi-language alarm information to be classified is obtained according to the output of the alarm type classification model.

When the method for judging the category of the multi-language alarm information is used specifically, firstly, the multi-language alarm information to be classified is converted into the multi-language alarm information to be classifiedThe method comprises the steps that a coding sequence with a fixed length is used for replacing Chinese characters, English words and numbers in multi-language alarm information to be classified by using codes in a word stock constructed in a training stage, the Chinese characters or the English words which do not exist in the word stock are replaced by using codes corresponding to special characters UNK, the codes corresponding to special characters BOS are used as the beginning of the coding sequence, and the codes of the special characters EOS are used for cutting or supplementing the coding sequence into the fixed length. Then, extracting high-dimensional semantic feature vectors of the coding sequence, and when a trained high-dimensional semantic feature extraction model is used, inputting each code of the coding sequence with the semantic features to be extracted into the high-dimensional semantic feature extraction model in sequence, wherein the length of the coding sequence is assumed to beNExtracting the followingNAnd (3) taking the cell state of the high-dimensional semantic feature extraction model obtained at the moment, namely the last moment, as a high-dimensional semantic feature vector of the whole coding sequence. And finally, classifying the high-dimensional semantic feature vector, and after the high-dimensional semantic feature vector is obtained, classifying the high-dimensional semantic feature vector by using a trained alarm type classification model to obtain the category of the multi-language alarm information corresponding to the high-dimensional semantic feature vector.

In summary, the method for determining the category of the multi-language alarm information of the present invention uniformly encodes the Chinese characters in units of Chinese characters and the English words in units of English words and the numbers in units of characters in the multi-language alarm information to be classified, and then projects the codes into the same high-dimensional semantic space by the word embedding matrix according to the semantic information of the codes in training of the word embedding matrix and the long-short time memory network, thereby eliminating the differences in the forms of the Chinese characters, the English words and the numbers, only retaining the semantic relevance of the Chinese characters, the English words and the numbers, and storing the respective semantic information by the high-dimensional semantic feature vectors. And, because a multilingual warning message is composed of one or more of Chinese, English and number in sequence, a high-dimensional semantic feature vector sequence can be obtained after the coding sequence corresponding to the multilingual warning message is input into the word embedding matrix.

The high-dimensional semantic feature extraction model is trained based on a self-coding mode, so that the trained high-dimensional semantic feature extraction model can well model an input high-dimensional semantic feature vector sequence, important semantic features in the input high-dimensional semantic feature vector sequence are coded into cell states in the high-dimensional semantic feature extraction model, after the high-dimensional semantic feature vector sequences are input into the high-dimensional semantic feature extraction model according to the sequence, semantic information of sentences formed by the high-dimensional semantic feature vector sequences is coded into the cell states of the high-dimensional semantic feature extraction model, the cell states are also high-dimensional semantic feature vectors in nature, and the high-dimensional semantic feature vectors are sent into an alarm type classification model, so that the classes of multi-language alarm information to be classified corresponding to the high-dimensional semantic feature vectors can be obtained.

Generally speaking, the multilingual alarm information category judgment method extracts semantic features from multilingual alarm information to be classified through a single high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector, and classifies the high-dimensional semantic feature vector by means of the alarm information classification model to further determine the type of the multilingual alarm information to be classified corresponding to the high-dimensional semantic feature vector.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details of non-careless mistakes in the embodiment of the apparatus, please refer to the embodiment of the method of the present invention.

In another embodiment of the present invention, a multilingual warning information category determination system is provided, which can be used to implement the above-mentioned multilingual warning information category determination method.

The acquisition module is used for acquiring multi-language alarm information to be classified; the coding module is used for coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; the feature extraction module is used for inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; the category judgment module is used for inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.

In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the multilingual alarm information category judgment method.

In yet another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a computer device and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by the processor to implement the corresponding steps of the method for determining the category of the multi-language alarm information in the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A multilingual alarm information category judgment method is characterized by comprising the following steps:

acquiring multi-language alarm information to be classified;

coding the multilingual alarm information to be classified into a coding sequence through a preset word stock;

inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector;

and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.

2. The multilingual warning message category determination method of claim 1, wherein the word stock is constructed by:

acquiring historical multilingual alarm information;

counting Chinese characters appearing in the historical multilingual alarm information, the occurrence frequency of each Chinese character, English words appearing and the occurrence frequency of each English word;

taking Chinese characters with the occurrence frequency larger than the preset frequency, English words with the occurrence frequency larger than the preset frequency, 10 Arabic numerals, UNK, BOS and EOS as word library elements;

carrying out unified coding on each word stock element from 1 to obtain the codes of each word stock element;

and combining the word stock elements and the codes of the word stock elements to obtain a word stock.

3. The multilingual alert information category determination method according to claim 2, wherein the specific method for encoding the multilingual alert information to be classified into the code sequence using a predetermined word library is:

replacing the Chinese characters, English words and Arabic numerals contained in the multi-language alarm information to be classified in the word stock by the codes of corresponding word stock elements in the word stock;

replacing Chinese characters or English words which are in the multi-language alarm information to be classified and do not contain the word stock by UNK codes in the word stock;

and (3) taking the BOS code in the word stock as the beginning of the code sequence, and cutting or supplementing the code sequence into a fixed length by adopting the EOS code in the word stock to obtain the code sequence.

4. The multilingual alert-information-category decision method according to claim 1, wherein the specific method of inputting the code sequence into a preset high-dimensional semantic-feature extraction model to obtain the high-dimensional semantic feature vector is:

acquiring one-hot vectors of each code in a coding sequence, multiplying the one-hot vectors of each code with a preset word embedding matrix in sequence, and inputting the multiplied one-hot vectors into a preset high-dimensional semantic feature extraction model; and (5) taking the cell state of the last moment of the high-dimensional semantic feature extraction model as a high-dimensional semantic feature vector.

5. The multilingual alert-information-category-decision method according to claim 1, wherein the high-dimensional semantic-feature-extraction model is constructed by:

acquiring historical multilingual alarm information;

coding the historical multilingual alarm information into a historical coding sequence through a preset word bank;

and training a preset long-time memory network model through a historical coding sequence to obtain a high-dimensional semantic feature extraction model.

6. The multilingual alert information category determination method of claim 5, wherein the alert type classification model is constructed by:

acquiring historical multilingual alarm information;

inputting the historical coding sequence into a preset high-dimensional semantic feature extraction model to obtain historical high-dimensional semantic feature vectors, and determining the alarm types of historical multi-language alarm information corresponding to the historical high-dimensional semantic feature vectors as labels of the historical high-dimensional semantic feature vectors;

and training a preset three-layer feedforward neural network model through each historical high-dimensional semantic feature vector and the label of each historical high-dimensional semantic feature vector to obtain an alarm type classification model.

7. The multilingual alert-message-type decision method of claim 6, wherein, when training the predetermined long-short duration memory network model through the history code sequence: the loss function of the network model is memorized in a long-term mode according to the following formula:

wherein L: (S) Coding the sequence for historySThe loss function value, P, of the long and short term memory network model_t(S_t) Probability vector P output by long-time and short-time memory network model_tMiddle history code sequenceSTo (1) atA code s_tCorresponding one-hot vector S_tThe probability of (a) of (b) being,Ncoding the sequence for historySThe number of codes of (a); optimizing the long-time and short-time memory network model by a batch gradient descent method by taking the loss function value of the minimized long-time and short-time memory network model as an optimization target；

When the preset three-layer feedforward neural network model is trained through the historical high-dimensional semantic feature vectors and the labels of the historical high-dimensional semantic feature vectors: the loss function of the three-layer feedforward neural network model is given by:

L(c)=-logP(c)

8. A multilingual warning information category determination system, comprising:

the acquisition module is used for acquiring multi-language alarm information to be classified;

the coding module is used for coding the multilingual alarm information to be classified into a coding sequence through a preset word stock;

the characteristic extraction module is used for inputting the coding sequence into a preset high-dimensional semantic characteristic extraction model to obtain a high-dimensional semantic characteristic vector;

and the category judgment module is used for inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.

9. A computer arrangement comprising a memory, a processor and a computer program stored in said memory and being executable on said processor, characterized in that said processor, when executing said computer program, carries out the steps of the multilingual alert-information-category determination method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the multilingual alert-information-category determination method according to any one of claims 1 to 7.