CN116541202B

CN116541202B - Scientific and technological risk management system and risk early warning device

Info

Publication number: CN116541202B
Application number: CN202310703440.2A
Authority: CN
Inventors: 王奕; 邱雪雄; 赵崇昌
Original assignee: Shenzhen Yishicheng Technology Co ltd
Current assignee: Shenzhen Yishicheng Technology Co ltd
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2023-10-03
Anticipated expiration: 2043-06-14
Also published as: CN116541202A

Abstract

The application provides a technological risk management system and a risk early warning device, wherein the technological risk management system executes a risk early warning method, which comprises the following steps: collecting target log data of a preset process of a target big data assembly in a preset time window in a scientific operation activity, carrying out data preprocessing on the target log data to obtain log processing unit data, carrying out feature extraction on the log processing unit data to obtain a corresponding log feature matrix, respectively inputting the log feature matrix into a log state discrimination model to obtain a state corresponding to the log processing unit data, and determining a risk level corresponding to the preset process of the target big data assembly according to the state corresponding to the log processing unit data. The technical risk management system realizes dynamic monitoring and early warning for organizing technical risk metering, grading and risk control by constructing a whole process and active technical risk monitoring management system and managing by means of technical risk big data and configuration and association relation.

Description

Scientific and technological risk management system and risk early warning device

Technical Field

The application relates to the technical field of scientific and technological risk management, in particular to a scientific and technological risk management system, a risk early warning device, a risk early warning method and a storage medium.

Background

The development of information technology is gradually changed, the organization business operation relies on the information technology and system and relies on the business continuity of the information system, and the information systems of the data center are numerous and have complex association relations, so that various information technology faults and risks are continuously emerging and are not easy to perceive. When an enterprise or an organization carries out a scientific and technological operation activity through a scientific and technological management system, related business of the scientific and technological operation can be processed through components such as HDFS, YARN, DRUID, KAFKA, and in order to ensure normal processing of the business, the operation state of the big data components used by the scientific and technological management system is required to be monitored, so that the scientific and technological operation risk condition faced by the enterprise or the organization is acquired.

The big data components have a plurality of processes, and each process has own log printing, so that the running state of the big data components can be monitored in a risk manner through log analysis. However, in the related art, only when an operation abnormality or a fault occurs in a technology operation activity in the technology management system, an operation and maintenance person can check a log to locate a reason, and it is impossible to estimate and prevent an impending risk in advance.

Disclosure of Invention

The embodiment of the application provides a scientific and technological risk management system and a risk early warning device, which can monitor the running state of a big data component used in scientific and technological operation activities.

In a first aspect, an embodiment of the present application provides a risk early warning method, including:

collecting target log data of a preset process of a target big data assembly in a preset time window, wherein the target log data comprises a plurality of logs;

dividing a plurality of logs in the target log data into a plurality of log processing unit data;

respectively extracting features of the plurality of log processing unit data to obtain a plurality of log feature matrixes corresponding to the plurality of log processing unit data;

respectively inputting the plurality of log feature matrixes into a log state discrimination model to obtain states respectively corresponding to the plurality of log processing unit data, wherein the states comprise an abnormal state and a stable state;

and determining the risk level corresponding to the preset process of the target big data component according to the states respectively corresponding to the plurality of log processing unit data.

In a second aspect, a technical risk management system includes a technical management server, a big data component, and a technical risk management server communicatively coupled to each other.

In a third aspect, an embodiment of the present application further provides a risk early warning device, including:

the system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is used for acquiring target log data of a preset process of a target big data assembly in a preset time window, and the target log data comprise a plurality of logs;

a dividing unit configured to divide a plurality of logs in the target log data into a plurality of log processing unit data;

the feature extraction unit is used for respectively extracting features of the plurality of log processing unit data to obtain a plurality of log feature matrixes corresponding to the plurality of log processing unit data;

the log state judging unit is used for respectively inputting the plurality of log feature matrixes into a log state judging model to obtain states respectively corresponding to the plurality of log processing unit data, wherein the states comprise an abnormal state and a stable state;

and the determining unit is used for determining the risk level corresponding to the preset process of the target big data component according to the states respectively corresponding to the plurality of log processing unit data.

In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, which when run on a computer causes the computer to perform a risk early warning method as provided in any of the embodiments of the present application.

According to the technical scheme provided by the embodiment of the application, the target log data of the target big data component in the preset time window are acquired, the target log data comprise a plurality of logs, the plurality of logs in the target log data are divided into a plurality of log processing unit data, the plurality of log processing unit data are respectively subjected to feature extraction to obtain a plurality of log feature matrixes corresponding to the plurality of log processing unit data, the plurality of log feature matrixes are respectively input into a log state discrimination model to obtain states respectively corresponding to the plurality of log processing unit data, the states comprise an abnormal state and a stable state, and the risk level corresponding to the preset process of the target big data component is determined according to the states respectively corresponding to the plurality of log processing unit data, so that the risk can be monitored on the running state of the big data component used in the scientific and technological management system.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application scenario of a scientific and technological risk management system according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of a risk early warning method according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of a log state discrimination model according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a log coding module according to an embodiment of the present application.

Fig. 5 is a schematic view of a scenario of collecting a training sample set of log data according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a risk early warning device according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present application based on the embodiments of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The technical risk management system provided by the application comprises a technical risk information base, a technical risk assessment, a technical risk inspection, a technical risk monitoring and early warning and a technical risk cockpit; and the dynamic monitoring of the technological operation is implemented, a three-in-one collaborative operation system of operation and maintenance, safety and risk is constructed, and the effectiveness and the refinement degree of the technological risk management are improved. The main management means of the scientific and technological risk management system comprises: centralized management, which is to realize comprehensive and centralized online management of technological risks by collecting technological operation data; actively managing, namely collecting risk and alarm information of monitoring and early warning in real time by using a risk early warning method, and actively finding risk hidden danger existing in an information system, so as to realize active intervention and advanced management; dynamic management, namely, dynamic monitoring, analysis and early warning of technological risks are realized by configuring key technological risk monitoring indexes, consumption configuration and association relation.

The scientific and technological risk management system provided by the application is mainly developed based on the requirement of information technology risk management, and aims to help enterprises or organizations to monitor, early warn and manage information technology in scientific and technological operation activities, so that scientific and effective risk management is carried out, and various different information technology risk challenges are effectively met. The scientific and technological risk management system also utilizes big data analysis technology to improve the calculation efficiency and the decision accuracy. In addition, to ensure information security and business continuity, the technological risk management system supports backup, restore and fault tolerance mechanisms and complies with relevant regulations and standards. The application mainly records the function of the risk management system for evaluating the risk of the big data component in the scientific and technological operation activity through log analysis, and the function can enable related management personnel to more comprehensively know and evaluate the risk situation faced by the scientific and technological management system so as to quickly make countermeasures when necessary and reduce potential damage.

the transducer model is a deep learning model that uses the attention mechanism (Attention Mechanism) to increase the model training speed. the transducer model is composed of two parts, namely an Encoder and a Decoder, wherein the Encoder comprises a plurality of encoding modules (Encoder blocks).

In order to realize risk monitoring on the running state of a big data component used in a science and technology management system, the application introduces a coding module in a transformation former model into a log state discrimination model, and correspondingly provides a science and technology risk management system, a risk early warning device, a risk early warning method and a storage medium. The risk early-warning method can be executed by the risk early-warning device or electronic equipment integrated with the risk early-warning device. The risk early warning device can be realized in a hardware or software mode. The electronic device may be any device with a processor and having a processing capability, such as a mobile electronic device with a processor, such as a smart phone, a tablet computer, a palm computer, a notebook computer, or a stationary electronic device with a processor, such as a desktop computer, a server, or the like.

For example, referring to fig. 1, the present application further provides a technology risk management system, as shown in fig. 1, where the technology risk management system includes a technology management server, a big data component, and a technology risk management server, where the technology management server processes technologies management related services through the big data component, and the technology risk management server performs risk monitoring on the big data component adopted by the technology management server. The technical risk management server can acquire target log data of a target big data component used in a technical operation activity in a preset time window through a preset process, wherein the target log data comprises a plurality of logs, the plurality of logs in the target log data are divided into a plurality of log processing unit data, the plurality of log processing unit data are respectively subjected to feature extraction to obtain a plurality of log feature matrixes corresponding to the plurality of log processing unit data, the plurality of log feature matrixes are respectively input into a log state discrimination model to obtain states respectively corresponding to the plurality of log processing unit data, the states comprise an abnormal state and a stable state, and a risk grade corresponding to the preset process of the target big data component is determined according to the states respectively corresponding to the plurality of log processing unit data.

It should be noted that, the schematic view of the scenario of the technological risk management system shown in fig. 1 is only an example, and the technological risk management system and scenario described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the technological risk management system and the appearance of a new business scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.

Referring to fig. 2, fig. 2 is a flow chart of a risk early warning method according to an embodiment of the application. The specific flow of the risk early warning method provided by the embodiment of the application can be as follows:

s110, collecting target log data of a preset process of a target big data component in a preset time window, wherein the target log data comprise a plurality of logs.

The big data component can be HDFS, YARN, DRUID, KAFKA and other components, the components have a plurality of processes, the processes have log printing of the processes, a certain precursor is generated before abnormality usually occurs in an actual environment, although the log is not obviously reported in error, a certain risk is also implied, and in order to capture and evaluate the prospective risks, the application provides a risk early warning method which solves the technical problems based on log analysis.

The operation and maintenance personnel can select the big data component as the target big data component according to the needs and select the preset process included in the target big data component according to the needs.

The preset time window refers to a certain period of time, for example, the preset time window may be a time window of 10 minutes, and log data generated by a certain process of a certain big data component within the time window of 10 minutes is collected as target log data.

In the embodiment of the application, the logs generated by each big data component on the big data service cluster can be collected through a log collection tool, for example, the log collection tool can be a flash, the flash can write the collected log into Hive and kafka respectively, write the log into Hive as offline data for training a log state discrimination model, write the log into kafka as real-time data, and perform real-time risk analysis.

Wherein jume is a distributed, reliable and highly available service for efficiently collecting, aggregating and moving large amounts of log data. Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table and provide SQL-like query functions. Kafka is a distributed, publish/subscribe based messaging system.

It can be appreciated that the present application can be used for collecting target log data of a preset process of the target big data component in a preset time window by means of a jump.

S120, dividing a plurality of logs in the target log data into a plurality of log processing unit data.

In this embodiment, after the target log data is collected, the target log data needs to be analyzed next, and a log state discrimination model is constructed based on the coding module in the transducer model and is used for analyzing the log data. In order to make the log state discrimination model available, the application provides a method for preprocessing the log data and unifies the log data input into the log state discrimination model. Specifically, the method for preprocessing data provided by the present application includes the contents of the process S120 and the process S130, please refer to the following description.

The log processing unit data refers to log data with unified attribute, and can be used as unit data of a log state discrimination model input data feature, for example, the unified attribute can be the same word number.

In this embodiment, in order to unify the data input to the log state discrimination model, a plurality of logs included in the target log data may be first divided so that each log processing unit data obtained includes a fixed number of words.

In this embodiment, the number of words in the log processing unit data may be set to 128, and each word may be set as one token. The token belongs to a special expression in the field of natural language processing (NPL, nature Language Processing), is a minimum unit of an injection algorithm, and is represented by different services, wherein the token can be a phrase, a sentence or a word.

In some embodiments, the process S120 "dividing the plurality of logs in the target log data into a plurality of log processing unit data" may include the following processes:

s1210, sorting a plurality of logs in the target log data according to time to obtain sorted target log data;

for example, in this embodiment, the number of logs in a process of collecting a certain component in a time window of 10 minutes is 10, and the 10 logs are target log data, which is an example, and the scheme provided by the application is described, for convenience in description, the 10 log data are numbered by numbers of 0-9, and then are ordered in time sequence, so that the ordered target log data are 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.

S1220, combining the ordered target log data according to each preset number of logs in continuous time sequence to obtain a plurality of log combinations;

wherein the preset number can be set by the person skilled in the art according to the need.

It should be noted that, when performing natural language processing by using a transducer, only the weight between tokens in a sentence or a whole segment needs to be considered, but the processing object in the present application is a log, and the log generally has a causal relationship, so that in order to enable the log state discrimination model to grasp such relationship information, several logs are bound together according to a continuous time sequence and analyzed, thereby preserving such relationship information between contexts.

For example, in this embodiment, the preset number of values may be 3, and the logs in each 3 continuous time sequences are combined, so that a plurality of log combinations may be obtained: 012. 123, 234, 345, 456, 567, 678, 789.

For example, 4 consecutive time-sequential logs in the target log data may be as shown in table 1 below:

TABLE 1

In log combining, the first 3 log descriptions may be combined, and then the last 3 log descriptions may be combined.

It can be understood that, in the embodiment of the present application, the log is combined by combining log descriptions corresponding to the log.

S1230, performing splicing processing on the logs in each log combination in the log combinations to obtain a plurality of log processing unit data.

In this embodiment, the log descriptions of the three logs in each of the log combinations 012, 123, 234, 345, 456, 567, 678, 789 are respectively spliced, so as to obtain log processing unit data corresponding to each log combination. The specific splicing process is described in detail below.

In some embodiments, the process S1230 "performs the splicing process on the logs in each log combination of the plurality of log combinations to obtain a plurality of log processing unit data" may include the following processes:

s12310, adding a start symbol, a separator and an end symbol into each log combination, and carrying out combination processing to obtain a plurality of spliced logs, wherein the start symbol is used for marking the start position of the logs in the spliced logs, the separator is used for separating the logs of different strips in the spliced logs, and the end symbol is used for marking the end position of the logs in the spliced logs;

S12320, acquiring the word number of each spliced log in the spliced logs;

s12330, if the number of words in the spliced log is larger than a preset threshold, performing partial deletion processing on log description of the log in the spliced log, so that the number of words in the spliced log is the preset threshold;

in this embodiment, the number of words in the log processing unit data may be set to 128, that is, the preset threshold set here is 128.

For example, if the number of words in the spliced log is greater than 128, the first half content of each log in the three logs included in the spliced log may be deleted, and the second half content of each log may be kept as much as possible, so that the number of words in the spliced log is 128.

It should be noted that, because the content of the subsequent part of the log description corresponding to the log is important, when the log description of the log in the spliced log is partially deleted, the subsequent part of the content of each log is saved as much as possible, and the previous part of the content is removed.

S12340, if the number of words in the spliced log is smaller than the preset threshold, filling a filler in the spliced log to enable the number of words in the spliced log to be the preset threshold, wherein the filler is used for serving as words to supplement the number of words in the spliced log;

For example, if the number of words in the spliced log is smaller than 128, a filler is added to the spliced log so that the number of words in the spliced log is 128.

In this embodiment, SOP may be set as a start symbol, SEP as a separator, EOP as an end symbol, and PAD as a filler. As an example, the first three logs described in the above table are subjected to splicing processing, and one obtained spliced log is as follows:

[SOP]

Starting become controller state transition kafka controller KafkaController

[SEP] Incremented epoch to 1 kafka controller KafkaController

[SEP]Registering IsrChangeNotificationListener kafka controller KafkaController

[EOP]

[PAD][PAD][PAD]

s12350, each spliced log is used as log processing unit data.

In the embodiment of the application, each spliced log is used as one log processing unit data, and a plurality of log processing unit data are obtained.

And S130, respectively extracting features of the plurality of log processing unit data to obtain a plurality of log feature matrixes corresponding to the plurality of log processing unit data.

It should be noted that, the present application constructs a log state discrimination model based on the encoding module in the transducer model, and because the input of the transducer model is a vector matrix, the log data input into the log state discrimination model is also converted into a vector matrix form in the present application, so in the embodiment of the present application, feature extraction is performed on a plurality of log processing unit data, and each log processing unit data in the plurality of log processing unit data is converted into a log feature matrix, so that the log state discrimination model is input to perform log state discrimination processing.

In some embodiments, the process S130 "respectively performs feature extraction on the plurality of log processing unit data to obtain a plurality of log feature matrices corresponding to the plurality of log processing unit data" may include the following processes:

s1310, carrying out digital processing on words in each log processing unit data in the log processing unit data according to a preset comparison dictionary to obtain word feature vectors corresponding to each log processing unit data, wherein the preset comparison dictionary is a mapping relation dictionary for converting words into numbers;

the preset comparison dictionary is a mapping relation dictionary for converting words into numbers, each word corresponds to an ID, and only the word is replaced by the corresponding ID during conversion. The preset reference dictionary may be set by one skilled in the art.

For example, the preset reference dictionary may map the mapping relationship between the partial words and numbers in the log processing unit data of the previous example as shown in the following table 2:

TABLE 2

In the embodiment of the application, the word in each log processing unit data is digitally processed according to a preset comparison dictionary, namely, the word is converted into a digital form, so as to obtain the word feature vector with the shape of 128 x 1 corresponding to each log processing unit data.

S1320, performing dimension conversion processing on word feature vectors corresponding to each log processing unit data to obtain a word feature matrix corresponding to each log processing unit data;

in the embodiment of the application, the word feature vector corresponding to each log processing unit data is subjected to dimension increasing processing to obtain a word feature matrix with the shape of 128 x 128 corresponding to each log processing unit data, and word coding information (word) is reflected through the word feature matrix.

S1330, generating a corresponding position feature matrix according to the positions of different words in the logs included in each log processing unit data;

it should be noted that, since the structure corresponding to the attention mechanism in the coding module of the transducer model does not carry position information as in the conventional recurrent neural network, it is necessary to add position coding information additionally, so the present application further adds position coding information (Position Embeddings) reflecting the positions of words in the log, specifically, a corresponding position feature matrix may be generated according to the positions of different words in the log included in each log processing unit data, and the position coding information is reflected by the position feature matrix.

Specifically, in generating the position feature matrix, the position-coding information (Position Embeddings) is represented by PE, and the dimension of PE is identical to the word-coding information (word encoding). The mode of acquiring PE in the application is the same as the calculation formula adopted by acquiring PE in the transducer, and the calculation formula is as follows:

where pos represents the position of the word in the sentence, d represents the dimension of PE (as with the word Embedding), 2i represents the even dimension, 2i+1 represents the odd dimension (i.e., 2i.ltoreq.d, 2i+1.ltoreq.d).

S1340, generating an additional feature matrix according to clause codes, log types and process types corresponding to logs included in each log processing unit data, wherein the clause codes are used for marking different log strips in the log processing unit data;

it should be noted that, because the processing object in the present application is a log, compared with the traditional processing natural language, more information is carried in the log, and in order to retain the information in the log, in the embodiment of the present application, clause coding information (Segment Embeddings), log Type coding information (LogType Embeddings) and process Type coding information (Type codes) are also added. Specifically, an additional feature matrix may be generated according to the respective data codes, log types, and process types of the logs included in each log processing unit data, by which the clause code information, the log type code information, and the process type code information are reflected.

Wherein the clause code information is used for marking different journals in the log processing unit data, for example, when the log processing unit data comprises 3 journals, the clause code information is used for distinguishing the 3 journals; the log type encoding information is used to mark the log type of the log in the log processing unit data, for example, the log type may include INFO, ERROR, WARN, DEBUG and the like; the process type encoding information is used to mark the process corresponding to the log source component in the log processing unit data, for example, the process type encoding information includes a plurality of different types of processes such as NameNode, dataNode and Secondary NameNode, for example, an HDFS.

In some embodiments, to reduce the concerns of the clause code information, the log type code information, and the process category code information affecting each other, the clause code information, the log type code information, and the process category code information may be represented using One-Hot Encoding (One Encoding), generating an additional feature matrix including the clause code information, the log type code information, and the process category code information, the additional feature matrix having a shape of 128×128. For example, the additional feature matrix is filled with 0 from 0 to 85 columns, with 86 to 88 columns representing the position-sequential one-hot encoding of three logs (i.e., three logs in the corresponding log processing unit data), with 89 to 92 columns representing the aforementioned 4 log types one-hot encoding, and with 93 to 127 columns representing the aforementioned 34 processes one-hot encoding.

In some embodiments, log analysis may be performed only on log data with a log type INFO, ERROR, WARN, DEBUG, and since the log of these types has a higher association with the occurrence of an abnormal situation of a big data component, only analyzing the log data of these types may enable the present application to reduce the data processing amount and improve the risk analysis efficiency.

S1350, obtaining the plurality of log feature matrices corresponding to the plurality of log processing unit data according to the word feature matrix, the position feature matrix and the additional feature matrix corresponding to each log processing unit data.

In the embodiment of the application, word feature matrix corresponding to each piece of log processing unit data is added with position feature matrix alignment, normalization processing is carried out after the addition, all values are mapped into the interval of [ -1,1] to obtain a first intermediate processing matrix corresponding to each piece of log processing unit data, and then the first intermediate processing matrix corresponding to each piece of log processing unit data is added with additional feature matrix alignment corresponding to each piece of log processing unit data to obtain a plurality of log feature matrices. The log feature matrix is the minimum basic unit for inputting the log state discrimination model.

S140, respectively inputting the plurality of log feature matrixes into a log state discrimination model to obtain states respectively corresponding to the plurality of log processing unit data, wherein the states comprise abnormal states and stable states.

The log state discrimination model is configured to perform state discrimination processing on the log feature matrix to obtain the state of log processing unit data corresponding to the log feature matrix.

In the embodiment of the application, each log feature matrix in a plurality of log feature matrices is input into a log state discrimination model for state discrimination processing to obtain a state corresponding to each log feature matrix, namely, a state of log processing unit data corresponding to each log feature matrix.

Wherein, the abnormal state refers to that the large data component is blocked or fails when processing the service, so that the service cannot be processed normally; the steady state refers to that the large data component processes the business without being blocked or failed, and the business is processed normally.

In some embodiments, as shown in fig. 3, the log state discrimination model may include a log coding module, a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, and a fourth fully-connected layer that are sequentially connected, where the log coding module includes six coding modules of a transducer model that are sequentially connected, and the process S140 "inputs the multiple log feature matrices into the log state discrimination model respectively to obtain states corresponding to the multiple log processing unit data respectively" may include the following processes:

S1410, respectively inputting a plurality of log feature matrixes into the log coding module to obtain a plurality of first log feature data;

s1420, respectively inputting the plurality of first log feature data into the first full connection layer to obtain a plurality of second log feature data;

s1430, respectively inputting the plurality of second log feature data into the second full connection layer to obtain a plurality of third log feature data;

s1440, respectively inputting the plurality of third log feature data into the third full connection layer to obtain a plurality of fourth log feature data;

s1450, inputting the plurality of fourth log characteristic data into the fourth full connection layer respectively to obtain states corresponding to the plurality of log processing unit data respectively.

In this embodiment, a batch size may be defined, where the batch size represents how many log feature matrices are input to the log state discrimination model at a time. It is apparent that batch is 1 in this embodiment. The input of the log coding module is [ batch,128, 128], and the output of the log coding module is [ batch,128, 128]; input of the first full-connection layer is shape [ batch,128×128], output is [ batch,128], and the first full-connection layer can adopt a Relu function as an activation function; input of the second full connection layer is [ batch,128], output is [ batch,64], and the second full connection layer can adopt a Relu function as an activation function; input of the third full connection layer is [ batch,64], output is [ batch,12], and the third full connection layer can adopt a Relu function as an activation function; the input of the fourth full-connection layer is [ batch,12], the output is [ batch,1], the fourth full-connection layer can adopt a Sigmoid function as an activation function, the fourth full-connection layer outputs a number of 0-1, the number is judged to be in an abnormal state when approaching 0, and the number is judged to be in a stable state when approaching 1. The specific abnormal reference value is set according to the actual situation, for example, the abnormal reference value is set to be 0.3, and when the number output by the model is smaller than 0.3, the abnormal state is judged.

In some embodiments, as shown in fig. 4, the log encoding module may include six encoding modules of a transducer model connected in sequence: the first coding module, the second coding module, the third coding module, the fourth coding module, the fifth coding module and the sixth coding module are identical in structure, and each coding module specifically comprises a Multi-Head Attention layer, an Add & Nor layer, a Feed Forward layer and a further Add & Nor layer.

And S150, determining the risk level corresponding to the preset process of the target big data component according to the states respectively corresponding to the plurality of log processing unit data.

In the embodiment of the application, the risk level corresponding to the preset process of the target big data component can be determined through a plurality of states obtained by inputting the log state discrimination model into a plurality of log processing unit data.

In some embodiments, the process S150 "determining the risk level corresponding to the preset process of the target big data component according to the states corresponding to the plurality of log processing unit data respectively" may include the following processes:

s1510, obtaining the occupation ratio of the abnormal state in the states corresponding to the log processing unit data respectively, and taking the occupation ratio as the log abnormal rate;

In the embodiment of the application, the occupation ratio of the abnormal state in the states corresponding to the log processing unit data can be obtained, and the occupation ratio is used as the log abnormal rate.

For example, after some target log data is subjected to data preprocessing, 10 log processing unit data are obtained, wherein the state of 6 log processing unit data is abnormal, and the ratio is 60%.

S1520, determining a risk level corresponding to a preset process of the target big data component according to the log abnormality rate.

In the embodiment of the application, after the log abnormality rate is obtained, the risk level corresponding to the preset process of the target big data component can be determined according to the log abnormality rate.

For example: the risk grades can comprise four grades of no risk, low risk, medium risk and high risk, and if the log abnormality rate is lower than 10%, the risk is determined to be no; log anomaly rate higher than 10% and lower than 33% is determined as low risk; the log abnormality rate is higher than 33% and lower than 80%, and the log abnormality rate is regarded as medium risk; log anomaly rates above 80% and below 100% are rated as high risk. The reference values of different risk levels corresponding to the log anomaly rate may be set according to the actual scene performance, which is not specifically described herein.

In particular, the application is not limited by the order of execution of the steps described, as some of the steps may be performed in other orders or concurrently without conflict.

As can be seen from the foregoing, in the risk early warning method provided by the embodiment of the present application, the collected target log data of the target big data component in the preset process within the preset time window is subjected to data preprocessing, so as to obtain a plurality of log feature matrices, and the plurality of log feature matrices are respectively input into the log state discrimination model, so as to obtain states respectively corresponding to the plurality of log processing unit data, where the states include an abnormal state and a stable state, and the risk level corresponding to the preset process of the target big data component is determined according to the states respectively corresponding to the plurality of log processing unit data, so that the risk monitoring can be performed on the running state of the big data component used in the technology management system.

In some embodiments, the risk early warning method provided by the application further includes the following steps:

collecting log data in a stable state as a positive example log data sample;

collecting log data in an abnormal state as a counterexample log data sample;

And taking the positive example log data sample and the negative example log data sample as a log data training sample set for training the log state discrimination model.

Referring to fig. 5, fig. 5 is a schematic view of a scenario of a training sample set for collecting log data provided by an embodiment of the present application, when a fault occurs, in the embodiment of the present application, the fault is a human discovery or based on an anomaly monitoring discovery, a log collection event is triggered, a window time is queried from a time point when an anomaly occurs as an anomaly state log collection window, an anomaly state log is collected, after the collection is completed, the historical data is continuously divided into a window time as a safety window time interval, a window time is queried over the safety window time interval, and the window time is used as a steady state log collection window, and the steady state log data is collected.

The safety window time interval is set for dividing the abnormal state log acquisition area and the stable state log acquisition area, so that the possible causal relationship of the logs in the two areas is avoided. The window length of the window time is manually set according to the need, for example, the window length of the window time may be 20 minutes.

The collected log data of the abnormal state and the log data of the stable state are respectively stored, the log data of the abnormal state and the log data of the stable state collected in the same batch are endowed with the same batch ID so as to be convenient for inquiring and training, the log data of the stable state is taken as a positive example log data sample, and the log data of the abnormal state is taken as a negative example log data sample.

According to the embodiment of the application, the log data acquisition mode is used for enabling the proportion of the positive and negative case data sets in the log data training sample set to be close to 1:1, so that the log state judgment model is prevented from paying more attention to one surface when the log state judgment model is trained.

When the log state discrimination model is trained through the log data training sample set, the log data samples are subjected to data preprocessing to generate the corresponding log feature matrix of the log data samples, so that the log feature matrix meets the input requirement of the log state discrimination model. The method for generating the corresponding log feature matrix according to the log data is disclosed, and please refer to the method for processing the target log data to generate the corresponding log feature matrix of the target log data.

In an embodiment, a risk early warning device is also provided. Referring to fig. 6, fig. 6 is a schematic structural diagram of a risk early warning device 200 according to an embodiment of the application. The risk early warning device 200 includes an acquisition unit 201, a dividing unit 202, a feature extraction unit 203, a log state discrimination unit 204, and a determination unit 205, as follows:

the collecting unit 201 is configured to collect target log data of a preset process of the target big data component within a preset time window, where the target log data includes a plurality of logs;

a dividing unit 202 for dividing a plurality of logs in the target log data into a plurality of log processing unit data;

a feature extraction unit 203, configured to perform feature extraction on the plurality of log processing unit data, to obtain a plurality of log feature matrices corresponding to the plurality of log processing unit data;

a log state discrimination unit 204, configured to input the plurality of log feature matrices into a log state discrimination model, respectively, to obtain states corresponding to the plurality of log processing unit data, where the states include an abnormal state and a stable state;

and the determining unit 205 is configured to determine a risk level corresponding to a preset process of the target big data component according to the states respectively corresponding to the plurality of log processing unit data.

Optionally, in some embodiments, the dividing unit 202 is configured to:

sequencing a plurality of logs in the target log data according to time to obtain sequenced target log data;

combining the ordered target log data according to each preset number of logs in continuous time sequence to obtain a plurality of log combinations;

and respectively performing splicing processing on the logs in each log combination in the log combinations to obtain a plurality of log processing unit data.

Optionally, in some embodiments, the dividing unit 202 is configured to:

adding a start symbol, a separator and an end symbol into each log combination, and carrying out merging processing to obtain a plurality of spliced logs, wherein the start symbol is used for marking the start position of the logs in the spliced logs, the separator is used for separating the logs of different strips in the spliced logs, and the end symbol is used for marking the end position of the logs in the spliced logs;

acquiring the word number of each spliced log in the plurality of spliced logs;

if the word number of the spliced log is larger than a preset threshold, performing partial deletion processing on the log description of the log in the spliced log, so that the word number of the spliced log is the preset threshold;

If the word number of the spliced log is smaller than the preset threshold value, filling a filler in the spliced log to enable the word number of the spliced log to be the preset threshold value, wherein the filler is used for serving as the word to supplement the word number of the spliced log;

each spliced log is used as log processing unit data.

Optionally, in some embodiments, the feature extraction unit 203 is configured to:

performing digital processing on words in each log processing unit data in the plurality of log processing unit data according to a preset comparison dictionary to obtain word feature vectors corresponding to each log processing unit data, wherein the preset comparison dictionary is a mapping relation dictionary for converting words into numbers;

performing dimension conversion processing on word feature vectors corresponding to each log processing unit data to obtain word feature matrixes corresponding to each log processing unit data;

generating a corresponding position feature matrix according to the positions of different words in the log included in each log processing unit data;

generating an additional feature matrix according to a clause code, a log type and a process type corresponding to the log included in each log processing unit data, wherein the clause code is used for marking different log strips in the log processing unit data;

And obtaining the plurality of log feature matrixes corresponding to the plurality of log processing unit data according to the word feature matrix, the position feature matrix and the additional feature matrix corresponding to each log processing unit data.

Optionally, in some embodiments, the log state discrimination model includes a log coding module, a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, and a fourth fully-connected layer connected in sequence, the log coding module includes six coding modules of the transformer model connected in sequence, and the log state discrimination unit 204 is configured to:

respectively inputting a plurality of log feature matrixes into the log coding module to obtain a plurality of first log feature data;

respectively inputting the plurality of first log feature data into the first full connection layer to obtain a plurality of second log feature data;

respectively inputting a plurality of second log feature data into the second full connection layer to obtain a plurality of third log feature data;

respectively inputting a plurality of third log feature data into the third full connection layer to obtain a plurality of fourth log feature data;

and respectively inputting the plurality of fourth log characteristic data into the fourth full connection layer to obtain states respectively corresponding to the plurality of log processing unit data.

Optionally, in some embodiments, the determining unit 205 is configured to:

acquiring the occupation ratio of the abnormal state in the states corresponding to the log processing unit data respectively, and taking the occupation ratio as the log abnormal rate;

and determining a risk level corresponding to a preset process of the target big data component according to the log abnormality rate.

Optionally, in some embodiments, the risk early warning device further comprises a sample collection unit for:

collecting log data in a stable state as a positive example log data sample;

collecting log data in an abnormal state as a counterexample log data sample;

It should be noted that, the risk early-warning device provided by the embodiment of the present application and the risk early-warning method in the foregoing embodiments belong to the same concept, and any method provided in the risk early-warning method embodiment may be implemented by using the risk early-warning device, and detailed implementation processes of the method are shown in the risk early-warning method embodiment and are not repeated herein.

In addition, in order to better implement the risk early warning method in the embodiment of the present application, on the basis of the risk early warning method, the present application further provides an electronic device, please refer to fig. 7, fig. 7 shows a schematic structural diagram of an electronic device 300 provided by the present application, as shown in fig. 7, the electronic device 300 provided by the present application includes a processor 301 and a memory 302, where the processor 301 is configured to implement steps of the risk early warning method in the above embodiment of the present application when executing a computer program stored in the memory 302, such as:

By way of example, a computer program may be partitioned into one or more modules/units that are stored in memory 302 and executed by processor 301 to accomplish an embodiment of the application. One or more of the modules/units may be a series of computer program instruction segments capable of performing particular functions to describe the execution of the computer program in a computer device.

Electronic device 300 may include, but is not limited to, a processor 301, a memory 302. It will be appreciated by those skilled in the art that the illustration is merely an example of the electronic device 300 and is not limiting of the electronic device 300, and may include more or fewer components than shown, or may combine some of the components, or different components, e.g., the electronic device 300 may further include an input-output device, a network access device, a bus, etc., through which the processor 301, the memory 302, the input-output device, the network access device, etc., are connected.

The processor 301 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like that is a control center of the electronic device 300 that interfaces and lines to various portions of the overall electronic device 300.

The memory 302 may be used to store computer programs and/or modules, and the processor 301 implements various functions of the computer device by running or executing the computer programs and/or modules stored in the memory 302 and invoking data stored in the memory 302. The memory 302 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the electronic device 300, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the risk early-warning device, the electronic apparatus 300 and the corresponding units thereof described above may refer to the description of the risk early-warning method in the above embodiment of the present application, and the detailed description thereof will not be repeated here.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a computer readable storage medium having stored therein a plurality of instructions that can be loaded by a processor to perform the steps in the risk early warning method of the above embodiment of the present application, for example:

The specific operation may refer to the description of the risk early warning method in the above embodiments of the present application, and will not be repeated here.

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the instructions stored in the computer readable storage medium can execute the steps in the risk early warning method in the above embodiment of the present application, the beneficial effects that can be achieved by the risk early warning method in the above embodiment of the present application can be achieved, and detailed descriptions are omitted here.

Furthermore, the terms "first," "second," and "third," and the like, herein, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the particular steps or modules listed and certain embodiments may include additional steps or modules not listed or inherent to such process, method, article, or apparatus.

The above describes in detail a scientific and technological risk management system, risk early warning device, method and storage medium provided by the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, and the above examples are only used to help understand the method and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. A risk early warning method, comprising:

dividing a plurality of logs in the target log data into a plurality of log processing unit data, including: sequencing a plurality of logs in the target log data according to time to obtain sequenced target log data; combining the ordered target log data according to each preset number of logs in continuous time sequence to obtain a plurality of log combinations; respectively performing splicing processing on the logs in each log combination in the log combinations to obtain a plurality of log processing unit data, wherein each log processing unit data comprises a fixed word number;

Extracting features of the log processing unit data respectively to obtain a plurality of log feature matrixes corresponding to the log processing unit data, wherein the method comprises the steps of digitizing words in each log processing unit data in the log processing unit data according to a preset comparison dictionary to obtain word feature vectors corresponding to each log processing unit data, and the preset comparison dictionary is a mapping relation dictionary for converting words into numbers; performing dimension conversion processing on word feature vectors corresponding to each log processing unit data to obtain word feature matrixes corresponding to each log processing unit data; generating a corresponding position feature matrix according to the positions of different words in the log included in each log processing unit data; generating an additional feature matrix according to a clause code, a log type and a process type corresponding to the log included in each log processing unit data, wherein the clause code is used for marking different log strips in the log processing unit data; obtaining a plurality of log feature matrixes corresponding to the plurality of log processing unit data according to the word feature matrixes, the position feature matrixes and the additional feature matrixes corresponding to each log processing unit data;

2. The risk early warning method according to claim 1, wherein the performing the splicing process on the logs in each of the plurality of log combinations to obtain a plurality of log processing unit data includes:

acquiring the word number of each spliced log in the plurality of spliced logs;

each spliced log is used as log processing unit data.

3. The risk early warning method according to claim 1, wherein the log state discrimination model includes a log coding module, a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, and a fourth fully-connected layer, which are sequentially connected, the log coding module includes six coding modules of a transducer model, which are sequentially connected, and the step of inputting the plurality of log feature matrices into the log state discrimination model to obtain states corresponding to the plurality of log processing unit data, respectively, includes:

4. The risk early warning method according to claim 1, wherein the determining the risk level corresponding to the preset process of the target big data assembly according to the states corresponding to the log processing unit data respectively includes:

5. The risk warning method of any one of claims 1 to 4, further comprising:

collecting log data in a stable state as a positive example log data sample;

collecting log data in an abnormal state as a counterexample log data sample;

6. A technical risk management system, characterized in that the technical risk management system comprises a technical management server, a big data component and a technical risk management server which are connected in communication, wherein the technical risk management server is used for executing the risk early warning method according to any one of claims 1 to 5.

7. A risk early warning device comprising means for performing the risk early warning method according to any one of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when run on a computer, causes the computer to perform the risk early warning method according to any one of claims 1 to 5.