CN113590767A - Multilingual alarm information category judgment method, system, equipment and storage medium - Google Patents

Multilingual alarm information category judgment method, system, equipment and storage medium Download PDF

Info

Publication number
CN113590767A
CN113590767A CN202111145028.0A CN202111145028A CN113590767A CN 113590767 A CN113590767 A CN 113590767A CN 202111145028 A CN202111145028 A CN 202111145028A CN 113590767 A CN113590767 A CN 113590767A
Authority
CN
China
Prior art keywords
alarm information
multilingual
dimensional semantic
semantic feature
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111145028.0A
Other languages
Chinese (zh)
Other versions
CN113590767B (en
Inventor
曾卫东
王鑫
陈翔
梁法光
王宾
管磊
文继锋
程国栋
陈修迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NR Electric Co Ltd
Xian Thermal Power Research Institute Co Ltd
Original Assignee
NR Electric Co Ltd
Xian Thermal Power Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NR Electric Co Ltd, Xian Thermal Power Research Institute Co Ltd filed Critical NR Electric Co Ltd
Priority to CN202111145028.0A priority Critical patent/CN113590767B/en
Publication of CN113590767A publication Critical patent/CN113590767A/en
Application granted granted Critical
Publication of CN113590767B publication Critical patent/CN113590767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the field of natural language processing, and discloses a multilingual alarm information category judgment method, a system, equipment and a storage medium, which comprise the following steps: acquiring multi-language alarm information to be classified; coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified. The method eliminates the difference of Chinese, English and number in the form of the multi-language alarm information, only keeps the semantic relevance of the Chinese, English and number, can extract semantic features of the multi-language alarm information through a single high-dimensional semantic feature extraction model, can realize the classification of the multi-language alarm information by means of an alarm information classification model, and effectively improves the accuracy of judging the category of the multi-language alarm information.

Description

Multilingual alarm information category judgment method, system, equipment and storage medium
Technical Field
The invention belongs to the field of natural language processing, and relates to a multilingual alarm information category judgment method, system, equipment and storage medium.
Background
The distributed control system is used as a brain of the thermal power plant, the running state of power generation equipment of the thermal power plant and the running states of lower computer cards and controllers of the control system need to be monitored in real time, and if the equipment, the cards and the controllers are abnormal, an upper computer of the control system needs to record alarm information. In practice, because the number of devices capable of generating alarms is large, a large amount of alarm information will be recorded in use, and in order to reasonably apply the alarm information, the alarm information needs to be divided according to the actual alarm type.
Information classification is a problem in the field of natural language processing, and is generally performed on english sentences in units of words. However, when the alarm information is classified, most of the alarm information contains English, Chinese and numbers at the same time, which relates to a multi-language problem, and results in poor classification accuracy achieved by the existing classification method.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned disadvantage of the prior art that the classification accuracy of the multilingual alarm information classification is poor, and provides a method, a system, a device and a storage medium for determining the multilingual alarm information classification.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
in a first aspect of the present invention, a method for determining a category of a multilingual warning message includes the steps of: acquiring multi-language alarm information to be classified; coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.
The multilingual alarm information category judgment method of the invention is further improved in that:
the word stock is constructed in the following way: acquiring historical multilingual alarm information; counting Chinese characters appearing in the historical multilingual alarm information, the occurrence frequency of each Chinese character, English words appearing and the occurrence frequency of each English word; taking Chinese characters with the occurrence frequency larger than the preset frequency, English words with the occurrence frequency larger than the preset frequency, 10 Arabic numerals, UNK, BOS and EOS as word library elements; carrying out unified coding on each word stock element from 1 to obtain the codes of each word stock element; and combining the word stock elements and the codes of the word stock elements to obtain a word stock.
The specific method for coding the multilingual alarm information to be classified into the coding sequence through the preset word stock comprises the following steps: replacing the Chinese characters, English words and Arabic numerals contained in the multi-language alarm information to be classified in the word stock by the codes of corresponding word stock elements in the word stock; replacing Chinese characters or English words which are in the multi-language alarm information to be classified and do not contain the word stock by UNK codes in the word stock; and (3) taking the BOS code in the word stock as the beginning of the code sequence, and cutting or supplementing the code sequence into a fixed length by adopting the EOS code in the word stock to obtain the code sequence.
The specific method for inputting the coding sequence into the preset high-dimensional semantic feature extraction model to obtain the high-dimensional semantic feature vector comprises the following steps: acquiring one-hot vectors of each code in a coding sequence, multiplying the one-hot vectors of each code with a preset word embedding matrix in sequence, and inputting the multiplied one-hot vectors into a preset high-dimensional semantic feature extraction model; and (5) taking the cell state of the last moment of the high-dimensional semantic feature extraction model as a high-dimensional semantic feature vector.
The high-dimensional semantic feature extraction model is constructed in the following way: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; and training a preset long-time memory network model through a historical coding sequence to obtain a high-dimensional semantic feature extraction model.
The alarm type classification model is constructed in the following way: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; inputting the historical coding sequence into a preset high-dimensional semantic feature extraction model to obtain historical high-dimensional semantic feature vectors, and determining the alarm types of historical multi-language alarm information corresponding to the historical high-dimensional semantic feature vectors as labels of the historical high-dimensional semantic feature vectors; and training a preset three-layer feedforward neural network model through each historical high-dimensional semantic feature vector and the label of each historical high-dimensional semantic feature vector to obtain an alarm type classification model.
When the preset long-time memory network model is trained through the historical coding sequence: the loss function of the network model is memorized in a long-term mode according to the following formula:
Figure 772198DEST_PATH_IMAGE001
wherein L: (S) Coding the sequence for historySThe loss function value, P, of the long and short term memory network model t (S t ) Probability vector P output by long-time and short-time memory network model t Middle history code sequenceSTo (1) atA code s t Corresponding one-hot vector S t The probability of (a) of (b) being,Ncoding the sequence for historySThe number of codes of (a); optimizing the long-time and short-time memory network model by a batch gradient descent method by taking the loss function value of the minimized long-time and short-time memory network model as an optimization target; when the preset three-layer feedforward neural network model is trained through the historical high-dimensional semantic feature vectors and the labels of the historical high-dimensional semantic feature vectors: the loss function of the three-layer feedforward neural network model is given by:
L(c)=-logP(c)
wherein L: (c) A loss function of a three-layer feedforward neural network model, Pc) Is the probability of the true class to which the current input belongs in the output probability vector P,cinputting a real category to which the high-dimensional semantic feature vector belongs; and optimizing the three-layer feedforward neural network model by a batch gradient descent method by taking the loss function of the minimized three-layer feedforward neural network model as an optimization target.
In a second aspect of the present invention, a multilingual warning-information-category determination system includes: the acquisition module is used for acquiring multi-language alarm information to be classified; the coding module is used for coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; the characteristic extraction module is used for inputting the coding sequence into a preset high-dimensional semantic characteristic extraction model to obtain a high-dimensional semantic characteristic vector; and the category judgment module is used for inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.
In a third aspect of the present invention, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multilingual alert information category determination method when executing the computer program.
In a fourth aspect of the present invention, a computer-readable storage medium stores a computer program, which when executed by a processor implements the steps of the above-mentioned multilingual alert-information-category determination method.
Compared with the prior art, the invention has the following beneficial effects:
the method for judging the category of the multi-language alarm information comprises the steps of coding the multi-language alarm information to be classified into a coding sequence through a preset word stock, and extracting a feature vector of the coding sequence by utilizing a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; the differences of Chinese, English and numbers in the multi-language alarm information in form are eliminated, only the relevance of the Chinese, English and numbers in the semantic is reserved, and the respective semantic information is stored by high-dimensional semantic feature vectors. And finally, classifying the high-dimensional semantic feature vectors by means of an alarm information classification model, further determining the types of the multi-language alarm information to be classified corresponding to the high-dimensional semantic feature vectors, and effectively improving the accuracy of judging the categories of the multi-language alarm information.
Drawings
FIG. 1 is a flow chart of a multilingual warning message type determination method according to the present invention;
FIG. 2 is a block diagram of a high-dimensional semantic feature extraction model training process according to the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, in an embodiment of the present invention, a method for determining a category of a multi-language alarm information is provided, which can extract semantic features from the multi-language alarm information through a single high-dimensional semantic feature extraction model, and can classify the multi-language alarm information by using an alarm information classification model. Specifically, the multilingual alarm information category determination method includes the following steps.
S1: and acquiring multi-language alarm information to be classified.
Preferably, the word stock is constructed in the following manner: acquiring historical multilingual alarm information; counting Chinese characters appearing in the historical multilingual alarm information, the occurrence frequency of each Chinese character, English words appearing and the occurrence frequency of each English word; taking Chinese characters with the occurrence frequency larger than the preset frequency, English words with the occurrence frequency larger than the preset frequency, 10 Arabic numerals, UNK, BOS and EOS as word library elements; carrying out unified coding on each word stock element from 1 to obtain the codes of each word stock element; and combining the word stock elements and the codes of the word stock elements to obtain a word stock.
Specifically, in this embodiment, the historical multilingual alarm information generated by the distributed control systems of different manufacturers is collected, including various device fault alarms, device action alarms, event sequence alarms, various system process alarms, and the like, and the collected multilingual alarm information is classified by the professional technician according to the alarm types and is used as a training data set of the high-dimensional semantic feature extraction model and the alarm type classification model.
And counting words appearing in the training data set, and adopting different counting modes for different languages. The Chinese character statistics method comprises the steps that Chinese characters are counted by taking the Chinese characters as units, and Chinese characters appearing in training data sets and the number of times of each Chinese character appear are counted; english is counted by taking English words as units, and English words appearing in the training data set and the number of times of each English word appear are counted; the Chinese characters and English words with the occurrence frequency more than 10 times are taken as word library elements and are brought into a word library, 10 Arabic numerals are all brought into the word library, special characters UNK, BOS and EOS are brought into the word library, and all the elements in the word library are uniformly coded from 1.
S2: and coding the multilingual alarm information to be classified into a coding sequence through a preset word stock.
Preferably, the step S2: the specific method for coding the multilingual alarm information to be classified into the coding sequence through the preset word stock comprises the following steps: replacing the Chinese characters, English words and Arabic numerals contained in the multi-language alarm information to be classified in the word stock by the codes of corresponding word stock elements in the word stock; replacing Chinese characters or English words which are in the multi-language alarm information to be classified and do not contain the word stock by UNK codes in the word stock; and (3) taking the BOS code in the word stock as the beginning of the code sequence, and cutting or supplementing the code sequence into a fixed length by adopting the EOS code in the word stock to obtain the code sequence.
S3: and inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector.
Preferably, the step S3: the specific method for inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain the high-dimensional semantic feature vector comprises the following steps: acquiring one-hot vectors of each code in a coding sequence, multiplying the one-hot vectors of each code with a preset word embedding matrix in sequence, and inputting the multiplied one-hot vectors into a preset high-dimensional semantic feature extraction model; and (5) taking the cell state of the last moment of the high-dimensional semantic feature extraction model as a high-dimensional semantic feature vector.
Wherein, the dimension of the one-hot vector of each code is consistent with the size of the multi-language word stock, and only s t-1The values at the positions take 1 and the values at the other positions take 0. Wherein s is t-1When representing the trainingtThe first of the time-of-day input code sequencest-1 code.
The preset high-dimensional semantic feature extraction model is constructed in the following mode: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; and training a preset long-time memory network model through a historical coding sequence to obtain a high-dimensional semantic feature extraction model.
Combining the word stock construction process, a training data set consisting of historical multilingual alarm information can be obtained, training is performed in a self-coding mode, the historical multilingual alarm information is coded into a historical coding sequence through a preset word stock, and then one-hot vectors of all codes in the historical coding sequence are obtained.
Specifically, referring to fig. 2, the initial high-dimensional semantic feature extraction model may adopt a single-layer Long Short Term memory network (LSTM) model, which is trained during trainingtTime of day input History code sequencet-1 codes s t-1Corresponding one-hot vector S t-1The vector dimension is consistent with the size of the multi-lingual word stock, with only s t-1The values at the positions take 1 and the values at the other positions take 0. Embedding preset words into a matrix WeAnd S t-1Multiplication as input to the long-and-short-term memory network model, expecting the probability vector P at the output t In the history coding sequencetProbability of individual character P t (S t ) And max.
Wherein the preset words are embedded into the matrix WeThe word embedding matrix adopts random initialization, and parameters in the matrix are trained along with a long-time memory network. The one-hot vector can be converted into a word embedding vector by a word embedding matrix:
E t =WeS t
when training a preset long-time memory network model through a historical coding sequence: the loss function of the network model is memorized in a long-term mode according to the following formula:
Figure 553947DEST_PATH_IMAGE001
wherein L: (S) Coding the sequence for historySThe loss function value, P, of the long and short term memory network model t (S t ) Probability vector P output by long-time and short-time memory network model t Middle history code sequenceSTo (1) atA code s t Corresponding one-hot vector S t The probability of (a) of (b) being,Ncoding the sequence for historySThe number of codes of (2).
And (3) with the loss function value of the minimum long-short time memory network model as an optimization target, optimizing the long-short time memory network model by a batch gradient descent method, adjusting parameters of the long-short time memory network model, and optimizing until the reduction of the value of the loss function is smaller than a preset value. And finally, obtaining a preset high-dimensional semantic feature extraction model.
When the characteristics of the coding sequence are extracted through the high-dimensional semantic characteristic extraction model, the high-dimensional semantic characteristic extraction model is used for extracting the characteristicsNAnd the cell state of the moment is taken as the extracted high-dimensional semantic feature vector.
S4: and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.
Preferably, the alarm type classification model in step S4 is constructed as follows: acquiring historical multilingual alarm information; coding the historical multilingual alarm information into a historical coding sequence through a preset word bank; inputting the historical coding sequence into a preset high-dimensional semantic feature extraction model to obtain historical high-dimensional semantic feature vectors, and determining the alarm types of historical multi-language alarm information corresponding to the historical high-dimensional semantic feature vectors as labels of the historical high-dimensional semantic feature vectors; and training a preset three-layer feedforward neural network model through each historical high-dimensional semantic feature vector and the label of each historical high-dimensional semantic feature vector to obtain an alarm type classification model.
Combining the word stock construction process and the high-dimensional semantic feature extraction model, historical high-dimensional semantic feature vectors of historical multi-language alarm information obtained through the high-dimensional semantic feature extraction model can be obtained, the historical high-dimensional semantic feature vectors are matched with alarm abnormal types one by one and serve as labels of the historical high-dimensional semantic feature vectors to participate in training of the alarm type classification model.
The initial alarm type classification model can select a three-layer feedforward neural network model, and the output of the three-layer feedforward neural network model is used as the alarm type corresponding to the input high-dimensional semantic feature vector.
The loss function of the three-layer feedforward neural network model is given by:
L(c)=-logP(c)
wherein L: (c) A loss function of a three-layer feedforward neural network model, Pc) Is the probability of the true class to which the current input belongs in the output probability vector P,cinputting a real category to which the high-dimensional semantic feature vector belongs; and (3) optimizing the three-layer feedforward neural network model by a batch gradient descent method by taking the loss function of the minimized three-layer feedforward neural network model as an optimization target, namely maximizing the probability of the alarm type to which the input high-dimensional semantic feature vector belongs until the reduction of the value of the loss function is smaller than a preset value. And finally, obtaining a preset alarm type classification model.
Then, on the basis of obtaining the alarm type classification model, the high-dimensional semantic feature vector of the multi-language alarm information to be classified obtained in step S3 is input into the alarm type classification model, and the alarm type of the multi-language alarm information to be classified is obtained according to the output of the alarm type classification model.
When the method for judging the category of the multi-language alarm information is used specifically, firstly, the multi-language alarm information to be classified is converted into the multi-language alarm information to be classifiedThe method comprises the steps that a coding sequence with a fixed length is used for replacing Chinese characters, English words and numbers in multi-language alarm information to be classified by using codes in a word stock constructed in a training stage, the Chinese characters or the English words which do not exist in the word stock are replaced by using codes corresponding to special characters UNK, the codes corresponding to special characters BOS are used as the beginning of the coding sequence, and the codes of the special characters EOS are used for cutting or supplementing the coding sequence into the fixed length. Then, extracting high-dimensional semantic feature vectors of the coding sequence, and when a trained high-dimensional semantic feature extraction model is used, inputting each code of the coding sequence with the semantic features to be extracted into the high-dimensional semantic feature extraction model in sequence, wherein the length of the coding sequence is assumed to beNExtracting the followingNAnd (3) taking the cell state of the high-dimensional semantic feature extraction model obtained at the moment, namely the last moment, as a high-dimensional semantic feature vector of the whole coding sequence. And finally, classifying the high-dimensional semantic feature vector, and after the high-dimensional semantic feature vector is obtained, classifying the high-dimensional semantic feature vector by using a trained alarm type classification model to obtain the category of the multi-language alarm information corresponding to the high-dimensional semantic feature vector.
In summary, the method for determining the category of the multi-language alarm information of the present invention uniformly encodes the Chinese characters in units of Chinese characters and the English words in units of English words and the numbers in units of characters in the multi-language alarm information to be classified, and then projects the codes into the same high-dimensional semantic space by the word embedding matrix according to the semantic information of the codes in training of the word embedding matrix and the long-short time memory network, thereby eliminating the differences in the forms of the Chinese characters, the English words and the numbers, only retaining the semantic relevance of the Chinese characters, the English words and the numbers, and storing the respective semantic information by the high-dimensional semantic feature vectors. And, because a multilingual warning message is composed of one or more of Chinese, English and number in sequence, a high-dimensional semantic feature vector sequence can be obtained after the coding sequence corresponding to the multilingual warning message is input into the word embedding matrix.
The high-dimensional semantic feature extraction model is trained based on a self-coding mode, so that the trained high-dimensional semantic feature extraction model can well model an input high-dimensional semantic feature vector sequence, important semantic features in the input high-dimensional semantic feature vector sequence are coded into cell states in the high-dimensional semantic feature extraction model, after the high-dimensional semantic feature vector sequences are input into the high-dimensional semantic feature extraction model according to the sequence, semantic information of sentences formed by the high-dimensional semantic feature vector sequences is coded into the cell states of the high-dimensional semantic feature extraction model, the cell states are also high-dimensional semantic feature vectors in nature, and the high-dimensional semantic feature vectors are sent into an alarm type classification model, so that the classes of multi-language alarm information to be classified corresponding to the high-dimensional semantic feature vectors can be obtained.
Generally speaking, the multilingual alarm information category judgment method extracts semantic features from multilingual alarm information to be classified through a single high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector, and classifies the high-dimensional semantic feature vector by means of the alarm information classification model to further determine the type of the multilingual alarm information to be classified corresponding to the high-dimensional semantic feature vector.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details of non-careless mistakes in the embodiment of the apparatus, please refer to the embodiment of the method of the present invention.
In another embodiment of the present invention, a multilingual warning information category determination system is provided, which can be used to implement the above-mentioned multilingual warning information category determination method.
The acquisition module is used for acquiring multi-language alarm information to be classified; the coding module is used for coding the multilingual alarm information to be classified into a coding sequence through a preset word stock; the feature extraction module is used for inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector; the category judgment module is used for inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the multilingual alarm information category judgment method.
In yet another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a computer device and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by the processor to implement the corresponding steps of the method for determining the category of the multi-language alarm information in the above embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A multilingual alarm information category judgment method is characterized by comprising the following steps:
acquiring multi-language alarm information to be classified;
coding the multilingual alarm information to be classified into a coding sequence through a preset word stock;
inputting the coding sequence into a preset high-dimensional semantic feature extraction model to obtain a high-dimensional semantic feature vector;
and inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.
2. The multilingual warning message category determination method of claim 1, wherein the word stock is constructed by:
acquiring historical multilingual alarm information;
counting Chinese characters appearing in the historical multilingual alarm information, the occurrence frequency of each Chinese character, English words appearing and the occurrence frequency of each English word;
taking Chinese characters with the occurrence frequency larger than the preset frequency, English words with the occurrence frequency larger than the preset frequency, 10 Arabic numerals, UNK, BOS and EOS as word library elements;
carrying out unified coding on each word stock element from 1 to obtain the codes of each word stock element;
and combining the word stock elements and the codes of the word stock elements to obtain a word stock.
3. The multilingual alert information category determination method according to claim 2, wherein the specific method for encoding the multilingual alert information to be classified into the code sequence using a predetermined word library is:
replacing the Chinese characters, English words and Arabic numerals contained in the multi-language alarm information to be classified in the word stock by the codes of corresponding word stock elements in the word stock;
replacing Chinese characters or English words which are in the multi-language alarm information to be classified and do not contain the word stock by UNK codes in the word stock;
and (3) taking the BOS code in the word stock as the beginning of the code sequence, and cutting or supplementing the code sequence into a fixed length by adopting the EOS code in the word stock to obtain the code sequence.
4. The multilingual alert-information-category decision method according to claim 1, wherein the specific method of inputting the code sequence into a preset high-dimensional semantic-feature extraction model to obtain the high-dimensional semantic feature vector is:
acquiring one-hot vectors of each code in a coding sequence, multiplying the one-hot vectors of each code with a preset word embedding matrix in sequence, and inputting the multiplied one-hot vectors into a preset high-dimensional semantic feature extraction model; and (5) taking the cell state of the last moment of the high-dimensional semantic feature extraction model as a high-dimensional semantic feature vector.
5. The multilingual alert-information-category-decision method according to claim 1, wherein the high-dimensional semantic-feature-extraction model is constructed by:
acquiring historical multilingual alarm information;
coding the historical multilingual alarm information into a historical coding sequence through a preset word bank;
and training a preset long-time memory network model through a historical coding sequence to obtain a high-dimensional semantic feature extraction model.
6. The multilingual alert information category determination method of claim 5, wherein the alert type classification model is constructed by:
acquiring historical multilingual alarm information;
coding the historical multilingual alarm information into a historical coding sequence through a preset word bank;
inputting the historical coding sequence into a preset high-dimensional semantic feature extraction model to obtain historical high-dimensional semantic feature vectors, and determining the alarm types of historical multi-language alarm information corresponding to the historical high-dimensional semantic feature vectors as labels of the historical high-dimensional semantic feature vectors;
and training a preset three-layer feedforward neural network model through each historical high-dimensional semantic feature vector and the label of each historical high-dimensional semantic feature vector to obtain an alarm type classification model.
7. The multilingual alert-message-type decision method of claim 6, wherein, when training the predetermined long-short duration memory network model through the history code sequence: the loss function of the network model is memorized in a long-term mode according to the following formula:
Figure 213476DEST_PATH_IMAGE001
wherein L: (S) Coding the sequence for historySThe loss function value, P, of the long and short term memory network model t (S t ) Probability vector P output by long-time and short-time memory network model t Middle history code sequenceSTo (1) atA code s t Corresponding one-hot vector S t The probability of (a) of (b) being,Ncoding the sequence for historySThe number of codes of (a); optimizing the long-time and short-time memory network model by a batch gradient descent method by taking the loss function value of the minimized long-time and short-time memory network model as an optimization target;
When the preset three-layer feedforward neural network model is trained through the historical high-dimensional semantic feature vectors and the labels of the historical high-dimensional semantic feature vectors: the loss function of the three-layer feedforward neural network model is given by:
L(c)=-logP(c)
wherein L: (c) A loss function of a three-layer feedforward neural network model, Pc) Is the probability of the true class to which the current input belongs in the output probability vector P,cinputting a real category to which the high-dimensional semantic feature vector belongs; and optimizing the three-layer feedforward neural network model by a batch gradient descent method by taking the loss function of the minimized three-layer feedforward neural network model as an optimization target.
8. A multilingual warning information category determination system, comprising:
the acquisition module is used for acquiring multi-language alarm information to be classified;
the coding module is used for coding the multilingual alarm information to be classified into a coding sequence through a preset word stock;
the characteristic extraction module is used for inputting the coding sequence into a preset high-dimensional semantic characteristic extraction model to obtain a high-dimensional semantic characteristic vector;
and the category judgment module is used for inputting the high-dimensional semantic feature vector into a preset alarm type classification model to obtain the alarm type of the multi-language alarm information to be classified.
9. A computer arrangement comprising a memory, a processor and a computer program stored in said memory and being executable on said processor, characterized in that said processor, when executing said computer program, carries out the steps of the multilingual alert-information-category determination method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the multilingual alert-information-category determination method according to any one of claims 1 to 7.
CN202111145028.0A 2021-09-28 2021-09-28 Multilingual alarm information category judgment method, system, equipment and storage medium Active CN113590767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111145028.0A CN113590767B (en) 2021-09-28 2021-09-28 Multilingual alarm information category judgment method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111145028.0A CN113590767B (en) 2021-09-28 2021-09-28 Multilingual alarm information category judgment method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113590767A true CN113590767A (en) 2021-11-02
CN113590767B CN113590767B (en) 2022-01-07

Family

ID=78242453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111145028.0A Active CN113590767B (en) 2021-09-28 2021-09-28 Multilingual alarm information category judgment method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113590767B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338346A (en) * 2021-12-29 2022-04-12 中国工商银行股份有限公司 Alarm message processing method and device and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273426A (en) * 2017-05-18 2017-10-20 四川新网银行股份有限公司 A kind of short text clustering method based on deep semantic route searching
US20180089152A1 (en) * 2016-09-02 2018-03-29 Digital Genius Limited Message text labelling
CN109101010A (en) * 2018-09-30 2018-12-28 深圳市元征科技股份有限公司 A kind of Diagnosis method of automobile faults and relevant device
CN109543764A (en) * 2018-11-28 2019-03-29 安徽省公共气象服务中心 A kind of warning information legitimacy detection method and detection system based on intelligent semantic perception
US20190340242A1 (en) * 2018-05-04 2019-11-07 Dell Products L.P. Linguistic semantic analysis monitoring/alert integration system
CN111859948A (en) * 2019-04-28 2020-10-30 北京嘀嘀无限科技发展有限公司 Language identification, language model training and character prediction method and device
CN112052824A (en) * 2020-09-18 2020-12-08 广州瀚信通信科技股份有限公司 Gas pipeline specific object target detection alarm method, device and system based on YOLOv3 algorithm and storage medium
CN112131390A (en) * 2020-11-24 2020-12-25 江苏电力信息技术有限公司 Electric power early warning information automatic classification method based on deep learning
WO2021042843A1 (en) * 2019-09-06 2021-03-11 平安科技(深圳)有限公司 Alert information decision method and apparatus, computer device and storage medium
CN112612898A (en) * 2021-03-05 2021-04-06 蚂蚁智信(杭州)信息技术有限公司 Text classification method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089152A1 (en) * 2016-09-02 2018-03-29 Digital Genius Limited Message text labelling
CN107273426A (en) * 2017-05-18 2017-10-20 四川新网银行股份有限公司 A kind of short text clustering method based on deep semantic route searching
US20190340242A1 (en) * 2018-05-04 2019-11-07 Dell Products L.P. Linguistic semantic analysis monitoring/alert integration system
CN109101010A (en) * 2018-09-30 2018-12-28 深圳市元征科技股份有限公司 A kind of Diagnosis method of automobile faults and relevant device
CN109543764A (en) * 2018-11-28 2019-03-29 安徽省公共气象服务中心 A kind of warning information legitimacy detection method and detection system based on intelligent semantic perception
CN111859948A (en) * 2019-04-28 2020-10-30 北京嘀嘀无限科技发展有限公司 Language identification, language model training and character prediction method and device
WO2021042843A1 (en) * 2019-09-06 2021-03-11 平安科技(深圳)有限公司 Alert information decision method and apparatus, computer device and storage medium
CN112052824A (en) * 2020-09-18 2020-12-08 广州瀚信通信科技股份有限公司 Gas pipeline specific object target detection alarm method, device and system based on YOLOv3 algorithm and storage medium
CN112131390A (en) * 2020-11-24 2020-12-25 江苏电力信息技术有限公司 Electric power early warning information automatic classification method based on deep learning
CN112612898A (en) * 2021-03-05 2021-04-06 蚂蚁智信(杭州)信息技术有限公司 Text classification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI J: "Biderectional LSTM with Hierarchical Attention for Text Classification", 《2019 IEEE 4TH ADVANCED INFORMATION TECHNOLOGY,ELECTRONIC AND AUTOMATION CONTROL CONFERENCE》 *
孙丽娜: "基于深度神经网络的中文评论情感分析研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338346A (en) * 2021-12-29 2022-04-12 中国工商银行股份有限公司 Alarm message processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN113590767B (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN108491817B (en) Event detection model training method and device and event detection method
CN110580292B (en) Text label generation method, device and computer readable storage medium
US20210056266A1 (en) Sentence generation method, sentence generation apparatus, and smart device
CN108664538B (en) Automatic identification method and system for suspected familial defects of power transmission and transformation equipment
CN111738011A (en) Illegal text recognition method and device, storage medium and electronic device
CN114430363A (en) Fault reason positioning method, device, equipment and storage medium
CN109993216B (en) Text classification method and device based on K nearest neighbor KNN
CN113590767B (en) Multilingual alarm information category judgment method, system, equipment and storage medium
CN115758255B (en) Power consumption abnormal behavior analysis method and device under fusion model
CN114528845A (en) Abnormal log analysis method and device and electronic equipment
CN111524503B (en) Audio data processing method and device, audio recognition equipment and storage medium
CN110968689A (en) Training method of criminal name and law bar prediction model and criminal name and law bar prediction method
CN116956896A (en) Text analysis method, system, electronic equipment and medium based on artificial intelligence
CN113535906A (en) Text classification method and related device for hidden danger events in electric power field
CN112989058A (en) Information classification method, test question classification method, device, server and storage medium
CN111639494A (en) Case affair relation determining method and system
CN114997750B (en) Risk information pushing method, system, equipment and medium
CN115774784A (en) Text object identification method and device
CN112685548B (en) Question answering method, electronic device and storage device
CN114357996A (en) Time sequence text feature extraction method and device, electronic equipment and storage medium
CN114328927A (en) Gate control cyclic acquisition method based on label perception
CN115526176A (en) Text recognition method and device, electronic equipment and storage medium
CN117172248B (en) Text data labeling method, system and medium
CN108241749B (en) Method and apparatus for generating information from sensor data
CN111475644A (en) Intelligent operation and maintenance method and device for eliminating mass alarms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant