CN116629992A

CN116629992A - Operation risk identification method, training method, device, equipment and medium of model

Info

Publication number: CN116629992A
Application number: CN202310521509.XA
Authority: CN
Inventors: 胡耀荣; 周琪; 乔昱嘉; 杨艳
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-08-22

Abstract

The disclosure provides an operation risk identification method, a training device, training equipment and training media of a model. Can be applied to the technical fields of computers and artificial intelligence. The operation risk identification method comprises the following steps: extracting characteristics of operation data in a target operation data set to obtain a characteristic matrix, wherein the target operation data set comprises a plurality of operation data with the same attribute, each operation data comprises a plurality of operation sub-data with different operation nodes, and the characteristic matrix comprises a plurality of characteristic vectors corresponding to the plurality of operation data one by one; determining the data quantity of a plurality of operation data and the node quantity of a plurality of operation nodes; determining a target risk recognition model from the plurality of risk recognition models based on the number of data and the number of nodes; and inputting the feature matrix into a target risk identification model, and determining a risk identification result of each operation data.

Description

Operation risk identification method, training method, device, equipment and medium of model

Technical Field

The disclosure relates to the technical field of computers and artificial intelligence, and in particular relates to an operation risk identification method, a training method of a model, a training device of the model, equipment and a medium.

Background

With the rapid development of internet finance, increasingly innovative financial products bring more risks to financial institutions, and at present, the financial institutions need to continuously strengthen methods for identifying and preventing operational risks.

In the related art, the operation risk is mainly identified and prevented by means of manual inspection and data analysis, but the main process where the operation risk possibly occurs is inspected and confirmed manually, so that the efficiency is low and the capability of manually and subjectively confirming the risk is different; the data analysis method mainly comprises the steps of establishing indexes in a main operation flow with operation risks, collecting relevant operation data and establishing a relevant management and control system, aiming at the relevant data exceeding a threshold value defined by the management and control system, recording the relevant data and performing management and control measures on the concerned authority account, wherein the data analysis mode generally cannot early warn in advance and has certain hysteresis. Therefore, it is difficult to accurately reduce the occurrence rate of operational risk of the financial institution.

Disclosure of Invention

In view of the above problems, the present disclosure provides an operation risk identification method, a training method of a model, a device, equipment and a medium, which can accurately identify risk data of an operation risk, so as to solve the above problems of using manual inspection and data analysis to identify the operation risk.

In a first aspect, an embodiment of the present disclosure provides an operation risk identification method, including:

extracting characteristics of operation data in a target operation data set to obtain a characteristic matrix, wherein the target operation data set comprises a plurality of operation data with the same attribute, each operation data comprises a plurality of operation sub-data with different operation nodes, and the characteristic matrix comprises a plurality of characteristic vectors which are in one-to-one correspondence with the plurality of operation data;

determining the data quantity of the plurality of operation data and the node quantity of the plurality of operation nodes;

determining a target risk recognition model from a plurality of risk recognition models based on the number of data and the number of nodes;

and inputting the feature matrix into the target risk identification model, and determining a risk identification result of each operation data.

With reference to the first aspect, in one possible implementation manner, before the extracting features from the operation data in the target operation data set to obtain a feature matrix, the method further includes:

determining attribute data of the operation data;

determining the target operation data set from a plurality of operation data sets based on the attribute data;

And adding the operation data into the target operation data set.

With reference to the first aspect, in a possible implementation manner, the attribute data includes a service type and a frequency-impact type;

the determining the attribute data of the operation data includes:

determining the service type of the operation data from the attribute data;

and determining the frequency-influence type of the operation data based on the service type and the object attribute data of the operation object.

With reference to the first aspect, in one possible implementation manner, determining, from a plurality of risk recognition models, the target risk recognition model based on the data number and the node number includes:

determining the data complexity of the target operation data set based on the number of nodes and the number of data;

the target risk recognition model is determined from the plurality of risk recognition models based on the data complexity.

With reference to the first aspect, in a possible implementation manner, the determining, based on the number of nodes and the number of data, the data complexity of the target operation data set includes:

determining a ratio between the number of nodes and the number of data, and taking the ratio as the data complexity of the target operation data set;

The determining the target risk recognition model from the plurality of risk recognition models based on the data complexity includes:

determining a first risk recognition model from the plurality of risk recognition models as the target risk recognition model under the condition that the ratio is determined to be larger than a preset threshold value, wherein the first risk recognition model is a support vector machine of a complex kernel function;

and determining a second risk recognition model from the plurality of risk recognition models as the target risk recognition model under the condition that the ratio is smaller than or equal to the preset threshold value, wherein the second risk recognition model is a support vector machine of a linear kernel function.

With reference to the first aspect, in a possible implementation manner, the operation risk identification method further includes:

generating alarm information under the condition that the risk identification result is determined to be used for representing the operation corresponding to the operation data as a risk operation;

and sending the alarm information to the terminal equipment.

In a second aspect, an embodiment of the present disclosure provides a training method of a risk identification model, including:

extracting characteristics of sample operation data in a sample target operation data set to obtain a sample characteristic matrix, wherein the sample target operation data set comprises a plurality of sample operation data with the same attribute, each sample operation data comprises a plurality of sample operation sub-data with different operation nodes, and the sample characteristic matrix comprises a plurality of characteristic vectors corresponding to the plurality of sample operation data one by one;

Determining the number of sample data of the plurality of sample operation data and the number of sample nodes of the plurality of operation nodes;

determining a target risk recognition model from a plurality of risk recognition models based on the number of sample data and the number of sample nodes;

training the target risk recognition model by using the feature matrix and the label corresponding to the feature matrix.

In a third aspect, an embodiment of the present disclosure provides an operation risk identification apparatus, including:

the feature extraction module is used for carrying out feature extraction on operation data in a target operation data set to obtain a feature matrix, wherein the target operation data set comprises a plurality of operation data with the same attribute, each operation data comprises a plurality of operation sub-data with different operation nodes, and the feature matrix comprises a plurality of feature vectors which are in one-to-one correspondence with the plurality of operation data;

a data determining module, configured to determine the data number of the plurality of operation data and the node number of the plurality of operation nodes;

the model determining module is used for determining a target risk recognition model from a plurality of risk recognition models based on the data quantity and the node quantity;

and the input module is used for inputting the characteristic matrix into the target risk identification model and determining the risk identification result of each operation data.

With reference to the third aspect, in one possible implementation manner, before performing feature extraction on the operation data in the target operation data set to obtain a feature matrix, the method further includes:

an attribute data determining module for determining attribute data of the operation data;

an operation data set determining module configured to determine the target operation data set from a plurality of operation data sets based on the attribute data;

and the operation data adding module is used for adding the operation data into the target operation data set.

With reference to the third aspect, in one possible implementation manner, the attribute data includes a service type and a frequency-impact type;

wherein, the attribute data determining module includes:

a service type determining unit configured to determine a service type of the operation data from the attribute data;

and a frequency-influence type determining unit configured to determine a frequency-influence type of the operation data based on the service type and object attribute data of the operation object.

With reference to the third aspect, in one possible implementation manner, the model determining module includes:

a data complexity determining unit, configured to determine a data complexity of the target operation data set based on the number of nodes and the number of data;

And a risk identification determining unit configured to determine the target risk identification model from the plurality of risk identification models based on the data complexity.

With reference to the third aspect, in one possible implementation manner, wherein,

the data complexity determining unit comprises a ratio determining subunit, configured to determine a ratio between the number of nodes and the number of data, and use the ratio as the data complexity of the target operation data set;

the risk identification determination unit includes:

a first determining subunit, configured to determine, when it is determined that the ratio is greater than a predetermined threshold, a first risk recognition model from the plurality of risk recognition models as the target risk recognition model, where the first risk recognition model is a support vector machine of a complex kernel function;

and the second determining subunit is configured to determine, when it is determined that the ratio is less than or equal to a predetermined threshold, a second risk recognition model from the plurality of risk recognition models as the target risk recognition model, where the second risk recognition model is a support vector machine of a linear kernel function.

With reference to the third aspect, in one possible implementation manner, the operation risk identification device further includes:

The generation module is used for generating alarm information under the condition that the risk identification result is determined to be used for representing the operation corresponding to the operation data as a risk operation;

and the sending module is used for sending the alarm information to the terminal equipment.

In a fourth aspect, an embodiment of the present disclosure provides a training apparatus for a risk identification model, including:

the sample feature extraction module is used for carrying out feature extraction on sample operation data in a sample target operation data set to obtain a sample feature matrix, wherein the sample target operation data set comprises a plurality of sample operation data with the same attribute, each sample operation data comprises a plurality of sample operation sub-data with different operation nodes, and the sample feature matrix comprises a plurality of feature vectors which are in one-to-one correspondence with the plurality of sample operation data;

a sample data determining module, configured to determine a sample data number of the plurality of sample operation data and a sample node number of the plurality of operation nodes;

a risk identification model determination module that determines a target risk identification model from a plurality of risk identification models based on the number of sample data and the number of sample nodes; and

and the training module is used for training the target risk identification model by utilizing the characteristic matrix and the label corresponding to the characteristic matrix.

In a fifth aspect, embodiments of the present disclosure provide an electronic device, including:

one or more processors;

and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a method for implementing the method as described above.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon executable instructions that when executed by a processor are configured to implement a method as described above.

In a seventh aspect, the disclosed embodiments provide a computer program product comprising a computer program for implementing a method as described above when the computer program is executed by a processor.

In the embodiment of the disclosure, feature matrix is obtained by extracting features of operation data of related service, namely a plurality of feature vectors corresponding to the operation data one by one are obtained, a target risk recognition model is determined from a plurality of risk recognition models by determining the data quantity and the node quantity of the operation data of the related service, then the feature matrix is input into the target risk model, and finally the risk recognition result of the operation data is determined. Because the operation data are classified and extracted to obtain a data set, and the corresponding model is determined to perform risk identification operation based on the data quantity and the node quantity in the data set, the risk identification method can effectively cope with scenes with a large quantity of features and a large quantity of samples, greatly improve the risk identification efficiency and accuracy, and simultaneously construct a model with faster operation speed and better predictability.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an exemplary system architecture of at least one of a method and a corresponding apparatus for operating a risk identification method and a training method of a model in accordance with an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of operational risk identification according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a training method of operating a risk identification model according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a system architecture diagram for operational risk identification according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an operational risk identification apparatus according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a training apparatus operating a risk identification model in accordance with an embodiment of the present disclosure;

fig. 7 schematically illustrates a schematic block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated.

In the related art, in the operation risk management flow, the management and control measures lag behind the risk operation, and the risk cannot be prevented by using the index data of the past operation risk more fully, which may result in higher operation risk occurrence rate and greater benefit loss. Based on this, the embodiment of the disclosure provides an operation risk identification method, which can identify user operation data, further make risk early warning, has high flexibility and wide application range, and can be applied to various enterprise user service systems in various fields and various industries, such as a bank staff service system, a user service operation analysis system and the like, without limitation. Embodiments of the present disclosure are described in detail below with reference to the attached drawings.

Fig. 1 schematically illustrates an exemplary system architecture of at least one of a method and a corresponding apparatus for operating a risk recognition method and a training method of a model according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various client applications, such as business operations class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, and/or social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for the user to operate the business system with the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the target operation, and feed back the processing result (e.g., a web page, information, or data obtained or generated according to the user operation) to the terminal device.

It should be noted that the training method of the operation risk identification method and model provided in the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the training apparatus for operating the risk recognition apparatus and model provided by the embodiments of the present disclosure may be generally provided in the server 105. The training method of the operational risk identification method and model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the training apparatus of the operation risk identification apparatus and model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Alternatively, the training method of the operation risk identification method and model provided by the embodiments of the present disclosure may be performed by the first terminal device 101, the second terminal device 102, and the third terminal device 103, or may be performed by other terminal devices different from the first terminal device 101, the second terminal device 102, and the third terminal device 103. Accordingly, the training apparatus for the operation risk identification apparatus and model provided in the embodiments of the present disclosure may also be provided in the first terminal device 101, the second terminal device 102, the third terminal device 103, or in other terminal devices different from the first terminal device 101, the second terminal device 102, and the third terminal device 103.

For example, risk identification may be performed in real time according to a user operation, where the operation may be performed by a user on any one of the first terminal device 101, the second terminal device 102, and the third terminal device 103 (for example, but not limited to, the first terminal device 101), and then the first terminal device 101 may locally perform the operation risk identification method provided by the embodiment of the present disclosure, or send a risk identification request to another terminal device, server, or server cluster, and perform the operation risk identification method provided by the embodiment of the present disclosure by the other terminal device, server, or server cluster that receives the risk identification request.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flow diagram of a method of operational risk identification according to an embodiment of the present disclosure. The identification method provided by the embodiment of the present disclosure may include the following operations S210 to S240.

In operation S210, feature extraction is performed on the operation data in the target operation data set, so as to obtain a feature matrix.

According to an embodiment of the present disclosure, the target operation data set may include a plurality of operation data having the same attribute, each operation data includes a plurality of operation sub-data different from the operation node, and the feature matrix includes a plurality of feature vectors corresponding to the plurality of operation data one to one.

The attribute may include a service type, an occurrence frequency, and an influence degree corresponding to the target operation data, for example, the service type may be a personal service and a public service, and the occurrence frequency and the influence degree may be multiple frequency multiple loss, multiple frequency less loss, fewer frequency multiple loss, fewer frequency less loss, and the like.

The operation data may be a business operation performed by the user in the business system, including a related business operation performed by the user when the personal business system performs a deposit business operation, such as user login, password modification, and the like.

The operation node may be a monitoring node where risk operation may occur when a user performs service operation in the service system, for example, when a user performs settlement service on a public service system, risk operation occurs in the password input link, and the password input link is the operation node. According to the embodiment of the disclosure, the monitoring points can be established in key links of the service system, wherein the key links are likely to have operation risk events, and risk operation information of each monitoring point is collected. It should be noted that, the established monitoring points all meet the regulations of the related laws and regulations, and necessary security measures are taken without violating the public order colloquial.

In operation S220, the data number of the plurality of operation data and the node number of the plurality of operation nodes are determined.

According to embodiments of the present disclosure, the data amount may be a data amount size, and may also refer to the number of pieces of operation data.

In operation S230, a target risk recognition model is determined from the plurality of risk recognition models based on the number of data and the number of nodes.

According to embodiments of the present disclosure, the risk recognition model may include a support vector machine of a complex kernel function and a support vector machine of a linear kernel function. A Support Vector Machine (SVM) is a two-class classifier, where the two-class classifier classifies the operation data by calculating the intervals between features in a feature matrix, and in the embodiment of the present disclosure, the feature matrix refers to a set formed by all feature vectors after feature extraction is performed on the operation data collected by different monitoring nodes in a certain service flow, where the feature matrix may also be referred to as a feature space.

In operation S240, the feature matrix is input into the risk recognition model, and a risk recognition result of each operation data is determined.

According to embodiments of the present disclosure, the risk identification result may include different tag values, e.g., tag values of 1 and-1, 1 representing the occurrence of an operational risk event, -1 representing the absence of an operational risk event.

In some possible embodiments, in the above-mentioned operation S210, feature extraction is performed on all service operation data of all users in the service system, for each piece of operation data, operation data for recording any user operation behavior may be obtained from an operation database in the user service system, where the user operation database may be a user operation data set defined by the user service system, or may be a storage space connected to the user service system for storing operation data generated by an enterprise user, such as cloud storage, a mobile hard disk, and the like, where the application is not limited. Meanwhile, the operation data types, data sizes, data formats, and the like of the respective users in the above-described user operation database are not limited herein. It will be appreciated that the user service system may include any of a personal service, a public service, and the like. For the characteristics of the operation data, it may be that the user may trigger a risk operation of the monitoring nodes during use of different service systems, the number of the monitoring nodes is determined based on the service types, and the number of the monitoring nodes corresponding to the different service types is different.

In some possible embodiments, in the above-mentioned operation S210, the feature matrix includes a plurality of feature vectors corresponding to the plurality of operation data One-to-One, where the corresponding feature vectors are obtained by Encoding the operation data, the Encoding method may include One-Hot Encoding (One-Hot Encoding), where N states are encoded using N-bit state registers, each state has its own register bit, and at any time, only One bit is valid, and in this disclosure, for example, for public service [ financing service, bill service ], it is converted into a numerical value [10,01], that is, for a feature having 2 values, after the One-Hot Encoding, it is converted into 2 binary features, and only One is activated at a time. Furthermore, in a specific implementation operation, the encoding method for converting the type feature into the digital feature may further include: polynomial encoders, ordinalEncoder, countEncoder, sumEncoder, etc., the specific encoding method is not limited herein.

In some possible embodiments, in operation S220, the data amount of the operation data may be the service amount corresponding to the risk operation data of the user, for example, in the last two years, the risk operation has occurred in the user operating the deposit service and the electronic banking service in the personal service system, and then the data amounts corresponding to the two service systems are 2; the number of nodes may be the number of monitoring nodes where risk operations occur to the user, for example, in the last two years when the user has performed risk operations on three nodes of account number input, password input and amount input while operating deposit business in the personal business system, the number of nodes of the user risk operation data is 3.

By the related operation of the operation risk identification method, the operation data of the user execution service are classified and extracted to obtain the characteristic data set, and the corresponding model is determined to carry out risk operation identification based on the data quantity and the node quantity in the data set, so that the risk identification precision and the processing efficiency are improved, and compared with the traditional manual method and the data analysis method, the early warning and management and control effects can be achieved.

According to the operation risk identification method, before extracting the operation data in the target operation data set to obtain the feature matrix, the method further includes: determining attribute data of the operation data; determining the target operation data set from a plurality of operation data sets based on the attribute data; and adding the operation data into the target operation data set.

In some possible embodiments, the attribute data represents a service type, an occurrence frequency and an influence degree corresponding to the operation data, wherein the service type may include a personal service, a public service, and the like. Personal services may include: savings, consumer loans, foreign exchange, intermediate business, electronic banking, etc.; the public service may include: for public deposit, financing, bill service, international service, settlement service, electronic banking of enterprises, financial service of institutions, asset hosting service, investment banking service, etc. The target operational data set may be an operational data set corresponding to a trigger monitoring node during use of the different business systems by the user.

In some possible embodiments, before obtaining the feature data, the method further includes preprocessing the operation data, where the preprocessing method may include: missing value processing, outlier processing, duplicate value deletion, date format conversion, conversion of non-numeric data into numeric data, and the like.

The operation risk identification method, wherein the attribute data comprises a service type and a frequency-influence type; wherein the determining the attribute data of the operation data includes: and determining a frequency-influence type of the operation data based on the service type and object attribute data of the operation object.

In some possible embodiments, the operation object may be a user of a service system, and the object attribute data may be information data of the user, including: basic data of the user, such as education level, affiliated departments, posts, etc.; user's work data, such as work content, work behavior, responsibility rights, and work logs; and user funds transaction data such as user deposit accounts, user investment accounts, and investment times. It should be noted that, the processes of collection, storage, use, processing, transmission, application and the like of the related data all conform to the regulations of the related laws and regulations, necessary security measures are adopted, and the public order is not violated.

And determining the frequency-influence type of the operation data based on the object attribute data of the operation object, wherein the frequency-influence type can be the occurrence frequency and influence degree of operation risks when a user operates the service system, and comprises multiple frequencies, multiple losses, fewer frequencies, fewer losses and the like. In particular, in the present disclosure, table 1 is illustrated as an example, and table 1 is a schematic diagram of frequency-impact types of determining operation data based on a service type and an object attribute according to an embodiment of the present disclosure.

TABLE 1

IB231756

By analogy, the frequency-influence type of operation data corresponding to businesses such as savings, consumption loans, foreign exchange, intermediate businesses, electronic banks and the like in the personal businesses can be determined, for example, the number of nodes corresponding to the savings businesses in the personal businesses is data sets 1-4, the number of nodes corresponding to the consumption loan businesses is data sets 5-8, and as can be seen from table 1, the data set 1 of the savings businesses and the data set 5 of the consumption loan businesses are of multi-frequency and multi-loss type; meanwhile, the frequency-influence type of operation data corresponding to public deposit, financing service, bill service, international service, settlement service, enterprise electronic bank, institution financial service, asset hosting service, investment banking service and other services in public service can be determined, for example, the number of nodes corresponding to public deposit in public service is data set n-n+3, the number of nodes corresponding to financing service is data set n+4-n+7, and as can be seen from the table, the data set n+2 of public deposit belongs to the type with few frequency and more loss, and the data set n+3 of public deposit is the type with few frequency and less loss.

According to the operation risk identification method, the determining a target risk identification model from a plurality of risk identification models based on the data quantity and the node quantity includes: determining a data complexity of the target operational dataset based on the number of nodes and the number of data, and determining the target risk identification model from the plurality of risk models based on the data complexity.

In some possible embodiments, the complexity of the data comprises a ratio between the number of nodes and the number of data, when said ratio is less than or equal to a predetermined threshold value, a linear kernel function SVM is used; and when the ratio is greater than a preset threshold value, adopting a complex kernel function SVM.

The operation risk identification method includes determining a data complexity of the target operation data set based on the number of nodes and the number of data, including determining a ratio between the number of nodes and the number of data, and taking the ratio as the data complexity of the target operation data set. Determining the target risk recognition model from the plurality of risk recognition models based on the data complexity, including determining a first risk recognition model from the plurality of risk recognition models as the target risk recognition model if the ratio is determined to be greater than a predetermined threshold; and determining a second risk recognition model from the plurality of risk recognition models as the risk recognition model in the case that the ratio is determined to be less than or equal to the predetermined threshold.

According to an embodiment of the present disclosure, the first risk identification model may be a Support Vector Machine (SVM) of a complex kernel function; the second risk identification model may be a Support Vector Machine (SVM) of a linear kernel function.

In some possible embodiments, the predetermined threshold may be 1/1000; the kernel function is a key factor for realizing mapping of a problem from an input space to a high-dimensional space in a Support Vector Machine (SVM) algorithm, different kernel functions adopt different support vector machine algorithms, the form and parameters of the kernel function determine the type and the complexity of a classifier, and in actual operation, the Support Vector Machine (SVM) without using the kernel function can only be used for solving the problem of linear classification, so that the problem of nonlinearity of operation classification aiming at a target risk is solved by adopting the linear kernel function and the complex kernel function. Under the condition that the ratio of the number of nodes to the number of data is larger than 1/1000, a support vector machine of a complex kernel function is used as a target risk identification model; and under the condition that the ratio of the number of the nodes to the number of the data is less than or equal to 1/1000, using a support vector machine of a linear kernel function as a target risk identification model.

The operation risk identification method further comprises the step of generating alarm information under the condition that the risk identification result is determined to be used for representing that an operation corresponding to the operation data is a risk operation; and sending the alarm information to the terminal equipment.

In some possible embodiments, the risk identification result is labeled 1 and indicates that a risk operation event may occur, and after the identification result is labeled 1, alarm information is generated and sent to the risk alarm information platform, the risk alarm platform may be instructed to display information such as a user type, user information, and user operation behavior in the alarm information. Optionally, corresponding measures can be taken based on the user type in the alarm information and the user information to inform the relevant alarm information receiving personnel, for example, corresponding measures are taken for the risk users, wherein the risk alarm notification modes include, but are not limited to, short messages, mails, telephone messages, mobile phone App message pushing, micro-signal public signals and the like, and the specific risk alarm mode can be determined based on the alarm content or the actual application scene and is not limited specifically herein.

Fig. 3 schematically illustrates a flow diagram of a training method for operating a risk identification model according to an embodiment of the present disclosure. The training method provided by the embodiment of the present disclosure may include the following operations S310 to S340.

In operation S310, feature extraction is performed on the sample operation data in the sample target operation data set, so as to obtain a sample feature matrix.

According to an embodiment of the present disclosure, the sample target operation data set may include a plurality of sample operation data having the same attribute, each of the sample operation data includes a plurality of sample operation sub-data having different operation nodes, and the sample feature matrix includes a plurality of feature vectors corresponding to the plurality of sample operation data one to one.

In operation S320, the number of sample data of the plurality of sample operation data and the number of sample nodes of the plurality of operation nodes are determined.

In operation S330, a target risk recognition model is determined from the plurality of risk recognition models based on the number of sample data and the number of sample nodes.

In operation S340, the target risk recognition model is trained using the feature matrix and the label corresponding to the feature matrix.

In some possible embodiments, in operation S310, the method for selecting sample operation data may include a Swift selection sample algorithm, i.e. randomly selecting N items from a group of data, for example, randomly selecting k items from a set of N items, dividing the data group into two areas, where the first area contains the selected items, the second area is all the remaining items, and assuming that an array is: [ a, b, c, d, e, f, g ], from which 3 items are randomly selected, so k=3, in the cycle i is initially 0, so it points to "a"; by calculating a random number between i and the size of the array, assuming this random number is 4, then exchanging "a" for "e" with index 4, then moving i forward; repeating the operation, calculating the random number between i and the size of the data group, wherein the random number is never smaller than 1 because i is shifted, so that 'e' is not exchanged any more; assuming a random number of 6, "b" is exchanged with "g"; also a random number, say 4, is exchanged for "c" with "a", the final selected term being [ e, g, a ], and the random selection is over. In addition, the Swift sample selection algorithm may also include methods such as "reservoir sampling" (Reservoir Sampling), and the method for selecting the sample is not specifically limited herein.

In some possible embodiments, the sample operation data may include data of risk operations of the user in the service system within the last two years or within a specific time period, where the specific time period is not limited, and the value range may be set according to the specific service system; the sample node number may be a characteristic number of user risk operation data, including risk operations of user-triggered monitoring nodes in the business system over the last two years. The sample target operation data set includes a plurality of sample operation data with the same attribute, wherein the attribute can be the same service type and the same frequency-influence, for example, the characteristic that the frequency-influence of deposit service and/or consumption loan service in personal service is the same, and the characteristic that the frequency-influence of deposit and/or financing service in public service is the same. Each sample operation data comprises a plurality of sample operation sub-data with different operation nodes, wherein the operation nodes are different monitoring nodes with risk operation data possibly occurring in the same service, for example, when a user operates an electronic banking service in a personal service, risk operation possibly occurs in both nodes of inputting a password and inputting an amount.

In some possible embodiments, if the dimensions of the sample operation data are more, multiple dimensions carry the same information, and meanwhile, the calculation duration is also increased, and the purpose of reducing the dimensions is achieved by feature extraction and merging of the same information. Thus, the present disclosure may achieve an optimal solution by feature extraction of sample operational data, i.e., selecting M of the sub-features from N features, and among the M sub-features, the criterion function. Taking deposit service and foreign exchange service in personal service as examples, in the last two years, when a user performs deposit service or foreign exchange service operation in a service system, a monitoring point is triggered in a password input link, and when the frequency-influence types are multi-frequency and multi-loss, the characteristics of the deposit service and foreign exchange service password input corresponding sample operation data in the personal service can be extracted and combined, and other service operation data can be similar to the characteristics to generate a sample characteristic matrix.

In some possible embodiments, the step S320 may specifically determine the number of data of the user risk operation data and the number of nodes of the risk operation data of each service type in a certain past time period, and the selection of the specific time period may be determined according to different service types and actual service characteristics, which is not limited herein specifically. For example, the number of samples of the user that have been at risk in consuming loan operations and the number of features of key nodes of the risk operations in the personal business system over the last two years may be determined, or the number of samples of the user that have been at risk in performing the credit business and the number of features of key nodes of the user that have been at risk in performing the credit business in the public business system over the last two years.

In some possible embodiments, the step S330 may specifically be to determine, based on the data complexity of the user risk operation data of each service type in a certain fixed period of time in the past, a target risk identification model, for example, based on the number of nodes of a key node where a risk operation occurs when a user performs an account input operation in a personal service system in the past two years, and the number of data of a service corresponding to the risk operation, determine, by using a ratio of the number of nodes to the number of data, what risk identification model is used by the risk identification model, where the risk identification model may be a support vector machine of a complex kernel function, and if the ratio is less than or equal to a predetermined threshold, the risk identification model may be a support vector machine of a linear kernel function, where the method further includes setting the predetermined threshold, and the predetermined threshold may be 1/1000.

In some possible embodiments, the operation S340 may specifically be training the linear kernel function SVM1 or the complex kernel function SVM2 based on the feature matrix and the label corresponding to the matrix, where the label corresponding to the matrix may be the data set 1, the data set 2, and the data set 3 …, for example, training the complex kernel function SVM2 when the ratio of the number of nodes corresponding to the data set 1 to the number of data is greater than 1/1000, and training the linear kernel function SVM1 when the ratio of the number of nodes corresponding to the data set 2 to the number of data is less than or equal to 1/1000.

According to an embodiment of the present disclosure, by inputting a training data set on a feature matrix: t= { (x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…，(x _N ,y _N ) X, where x _i For the ith feature vector, y _i ∈(+1,-1)，y _i For class notation, i=1, 2 … N, the specific operation is as shown in formula (1):

wherein, the liquid crystal display device comprises a liquid crystal display device,is the optimal solution for solving the hyperplane, omega ^* Calculating to obtain a displacement term b as a normal vector ^* The specific operation is shown in the formula (2):

where j=1, … n+1, then the separation hyperplane can be calculated, and the specific operation is as shown in formula (3):

ω ^* ·x+b ^* ＝0 (3)

meanwhile, a classification decision function can be obtained, and the specific operation is shown in a formula (4):

f(x)＝sign(ω ^* ·x+b ^* ) (4)

thus, the maximum division hyperplane capable of dividing the feature matrix into two parts and the two parts after division are obtained.

In the process of implementing the present disclosure, it is found that a support vector machine without using a kernel function can only be used to solve the problem of linear classification, and embodiments of the present disclosure employ a linear kernel function and a complex kernel function for classifying nonlinear problems with respect to user risk operations. Wherein the complex kernel function causes/>

Where δ > 0, δ is the bandwidth of the gaussian kernel.

A linear kernel function as shown in equation (6):

K(x _i ,x _j )＝x _i ^T x _j (6)

and selecting corresponding models for different types of data sets according to the ratio of the number of nodes to the number of data according to the calculation result.

Fig. 4 schematically illustrates a system architecture diagram for operational risk identification according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, as shown in fig. 4, a plurality of monitoring points (1, 2,3, … N) are established in a key link where risk operations may occur in the business module 410 to collect a risk operation data set of a user; classifying the collected risk operation dataset module 420 according to the type of service, frequency of occurrence, and degree of impact; according to the ratio between the number of nodes in the data set and the number of data, and comparing a preset threshold value, determining, through the model selection module 430, that the risk identification model 440 adopts a support vector machine of a complex kernel function or a support vector machine of a linear kernel function; then, a risk recognition result of the corresponding dataset is output through the risk recognition model, specifically, when the risk recognition result is 1, an alarm is performed through the alarm module 450, and alarm information is generated.

Based on the operation risk identification method, the disclosure further provides an operation risk identification device.

Fig. 5 schematically illustrates a block diagram of a construction of an operation risk recognition apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, an operation risk identification apparatus 500 provided in an embodiment of the present disclosure includes: a feature extraction module 510, a data determination module 520, a model determination module 530, an input module 540.

The feature extraction module 510 is configured to perform feature extraction on operation data in a target operation data set to obtain a feature matrix, where the target operation data set includes a plurality of operation data with the same attribute, each of the operation data includes a plurality of operation sub-data with different operation nodes, and the feature matrix includes a plurality of feature vectors corresponding to the plurality of operation data one to one.

The data determining module 520 is configured to determine the data amounts of the plurality of operation data and the node amounts of the plurality of operation nodes.

The model determining module 530 is configured to determine a target risk recognition model from the multiple risk recognition models based on the data number and the node number.

An input module 540, configured to input the feature matrix into the target risk recognition model, and determine a risk recognition result of each operation data.

Based on the operation risk identification device, before extracting the characteristics of the operation data in the target operation data set to obtain the characteristic matrix, the operation risk identification device further comprises: an attribute data determining module for determining attribute data of the operation data; an operation data set determining module configured to determine the target operation data set from a plurality of operation data sets based on the attribute data; and the operation data adding module is used for adding the operation data into the target operation data set.

Based on the operation risk identification means, the attribute data includes a traffic type and a frequency-impact type.

According to an embodiment of the present disclosure, the attribute data determining module may include: a service type determining unit configured to determine a service type of the operation data from the attribute data; and a frequency-influence type determining unit configured to determine a frequency-influence type of the operation data based on the service type and object attribute data of the operation object.

Based on the operational risk recognition apparatus described above, the model determination module 530 may include: a data complexity determining unit, configured to determine a data complexity of the target operation data set based on the number of nodes and the number of data; and a risk identification determining unit configured to determine the target risk identification model from the plurality of risk identification models based on the data complexity.

Based on the operation risk identification device, the data complexity determination unit may include a ratio determination subunit configured to determine a ratio between the number of nodes and the number of data, and use the ratio as the data complexity of the target operation data set; the risk identification determination unit may include a first determination subunit for determining a first risk identification model from the plurality of risk identification models as the target risk identification model in the case where it is determined that the ratio is greater than a predetermined threshold, a second determination subunit for determining a second risk identification model from the plurality of risk identification models as the target risk identification model in the case where it is determined that the ratio is less than or equal to the predetermined threshold,

According to an embodiment of the disclosure, the first risk identification model may be a support vector machine of a complex kernel function; the second risk identification model may be a support vector machine of a linear kernel function.

Based on the operation risk recognition device, the operation risk recognition device 500 may further include a generation module configured to generate alarm information when it is determined that the risk recognition result is used to characterize an operation corresponding to the operation data as a risk operation; and the sending module is used for sending the alarm information to the terminal equipment.

Fig. 6 schematically illustrates a block diagram of a training apparatus operating a risk identification model according to an embodiment of the present disclosure.

As shown in fig. 6, a training apparatus 600 for operating a risk identification model according to an embodiment of the present disclosure includes: sample feature extraction module 610, sample data determination module 620, risk identification model determination module 630, training module 640.

The sample feature extraction module 610 is configured to perform feature extraction on sample operation data in the sample target operation data set, so as to obtain a sample feature matrix.

The sample data determining module 620 is configured to determine the number of sample data of the plurality of sample operation data and the number of sample nodes of the plurality of operation nodes.

The risk recognition model determining module 630 is configured to determine a target risk recognition model from the plurality of risk recognition models based on the number of sample data and the number of sample nodes.

And a training module 640, configured to train the target risk recognition model using the feature matrix and the label corresponding to the feature matrix.

Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement an operational risk identification method according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed in this text.

As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the electronic device 700 are connected to an input/output (I/O) interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as the operational risk recognition method. For example, in some embodiments, the operational risk identification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM702 and/or the communication unit 709. When the computer program is loaded into the RAM703 and executed by the computing unit 701, one or more operations of the operation risk identification method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the operational risk identification method by any other suitable means (e.g. by means of firmware).

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete operations. For example, the operations described in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An operational risk identification method, comprising:

determining the data quantity of the plurality of operation data and the node quantity of a plurality of operation nodes;

determining a target risk recognition model from a plurality of risk recognition models based on the number of data and the number of nodes; and

2. The method of claim 1, further comprising, prior to said feature extracting the operational data in the target operational data set to obtain a feature matrix:

determining attribute data of the operation data;

determining the target operational data set from a plurality of operational data sets based on the attribute data; and

the operation data is added to the target operation data set.

3. The method of claim 2, wherein the attribute data includes a traffic type and a frequency-impact type;

the determining the attribute data of the operation data includes:

determining the service type of the operation data from the attribute data; and

4. A method according to any one of claims 1 to 3, wherein the determining a target risk identification model from a plurality of risk identification models based on the number of data and the number of nodes comprises:

determining a data complexity of the target operational dataset based on the number of nodes and the number of data; and

the target risk identification model is determined from the plurality of risk identification models based on the data complexity.

5. The method of claim 4, wherein,

the determining the data complexity of the target operation data set based on the number of nodes and the number of data includes:

The determining the target risk recognition model from the plurality of risk recognition models based on the data complexity comprises:

determining a first risk recognition model from the multiple risk recognition models as the target risk recognition model under the condition that the ratio is determined to be larger than a preset threshold, wherein the first risk recognition model is a support vector machine of a complex kernel function; and

and under the condition that the ratio is smaller than or equal to the preset threshold value, determining a second risk recognition model from the multiple risk recognition models as the target risk recognition model, wherein the second risk recognition model is a support vector machine of a linear kernel function.

6. A method according to any one of claims 1 to 3, further comprising:

generating alarm information under the condition that the risk identification result is determined to be used for representing the operation corresponding to the operation data as a risk operation; and

and sending the alarm information to terminal equipment.

7. A method of training a risk identification model, comprising:

performing feature extraction on sample operation data in a sample target operation data set to obtain a sample feature matrix, wherein the sample target operation data set comprises a plurality of sample operation data with the same attribute, each sample operation data comprises a plurality of sample operation sub-data with different operation nodes, and the sample feature matrix comprises a plurality of feature vectors corresponding to the plurality of sample operation data one by one;

Determining a sample data number of the plurality of sample operation data and a sample node number of the plurality of operation nodes;

determining a target risk recognition model from a plurality of risk recognition models based on the number of sample data and the number of sample nodes; and

training the target risk identification model by using the feature matrix and the label corresponding to the feature matrix.

8. A risk identification device comprising:

the feature extraction module is used for carrying out feature extraction on operation data in a target operation data set to obtain a feature matrix, wherein the target operation data set comprises a plurality of operation data with the same attribute, each operation data comprises a plurality of operation sub-data with different operation nodes, and the feature matrix comprises a plurality of feature vectors corresponding to the plurality of operation data one by one;

the data determining module is used for determining the data quantity of the plurality of operation data and the node quantity of the plurality of operation nodes;

and the input module is used for inputting the feature matrix into the target risk identification model and determining a risk identification result of each operation data.

9. A training device of a risk identification model, comprising:

a risk recognition model determination module for determining a target risk recognition model from a plurality of risk recognition models based on the number of sample data and the number of sample nodes; and

and the training module is used for training the target risk identification model by utilizing the feature matrix and the label corresponding to the feature matrix.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.