CN112053021A

CN112053021A - Feature coding method and device for enterprise operation management risk identification

Info

Publication number: CN112053021A
Application number: CN201910489215.7A
Authority: CN
Inventors: 王铖骅; 黄晶
Original assignee: State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Zhejiang Electric Power Co Ltd; 4Paradigm Beijing Technology Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Zhejiang Electric Power Co Ltd; 4Paradigm Beijing Technology Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2020-12-08

Abstract

The embodiment of the invention discloses a feature coding method and a feature coding device for enterprise operation management risk identification, relates to the technical field of computers, and can effectively improve the accuracy of model prediction. The method comprises the following steps: acquiring target data from an enterprise risk data source, wherein the target data comprises character type data and/or knowledge graph type data; coding the target data according to the adjacent relation between the target data to obtain a target feature code, wherein the target feature code is used for inputting a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value. The method is suitable for enterprise risk identification and prediction.

Description

Feature coding method and device for enterprise operation management risk identification

Technical Field

The invention relates to the technical field of computers, in particular to a feature coding method and device for enterprise operation management risk identification.

Background

The business management risk of the enterprise is contained in various data related to the enterprise, and how to identify the risk which the enterprise may face from the huge data has great significance to the enterprise itself and entities related to the enterprise.

However, because the data related to the enterprise risk has the characteristics of complex data source and diversified data format, when the enterprise risk identification is performed by using a mathematical model, the features extracted from the data source are often difficult to be directly used for model training and prediction.

Disclosure of Invention

In view of this, embodiments of the present invention provide a feature coding method and apparatus for enterprise operation management risk identification, which can greatly reduce workload and blindness of feature extraction, and effectively improve accuracy of model prediction.

In a first aspect, an embodiment of the present invention provides a feature coding method for enterprise operation management risk identification, including: acquiring target data from an enterprise risk data source, wherein the target data comprises character type data and/or knowledge graph type data; coding the target data according to the adjacent relation between the target data to obtain a target feature code, wherein the target feature code is used for inputting a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value.

Optionally, the acquiring target data from the enterprise risk data source includes: performing word segmentation and key information extraction on unstructured data in an enterprise risk data source to enable each key information to form character type data; and/or converting structured data in the enterprise risk data source into knowledge graph type data.

Optionally, the target data includes character-type data; the encoding the target data according to the adjacent relation between the target data to obtain the target feature code comprises: constructing the adjacent relation among the character type data according to a preset rule; and vectorizing the character type data according to the adjacent relation to form the target feature code.

Optionally, the constructing the adjacent relationship between the character-type data according to the preset rule includes: constructing the adjacent relation among people of each type of position in each enterprise according to a preset rule; the vectorizing the character-type data according to the adjacent relationship to form the target feature code includes: vectorizing the identity of the person in each type of position and averaging to obtain the average identity characteristic code of the position; and splicing the average identity characteristic codes of all types of positions to form the target characteristic code.

Optionally, the target data comprises knowledge graph type data; the encoding the target data according to the adjacent relation between the target data to obtain the target feature code comprises: determining the influence characteristic code of each enterprise by using a pagerank algorithm based on the relation between the enterprises in the knowledge graph type data; determining risk characteristic codes of each enterprise by using a labelpropagation algorithm based on the relation between the enterprises in the knowledge graph type data; determining characteristic codes related to people in each enterprise by using a deepwalk algorithm based on the relation between the enterprises and people in the knowledge graph type data; stitching the impact signature code, the risk signature code, and the person-related signature code to form the target signature code.

Optionally, the target data includes character-type data and knowledge graph-type data; the encoding the target data according to the adjacent relation between the target data to obtain the target feature code comprises: vectorizing the character type data and the knowledge graph type data respectively according to the adjacent relation between the target data to form character feature codes and graph feature codes; and splicing the character feature codes and the map feature codes to form the target feature codes.

Optionally, after the target data is encoded according to the adjacent relationship between the target data to obtain the target feature code, the method further includes: according to the generation time of the target data, carrying out time sequence grouping on the target feature codes, and training a time evolution model among all groups of target feature codes so as to obtain time sequence feature codes of the target data, wherein the features of the time sequence feature codes are used for inputting a second risk identification neural network model; and the coefficient matrix between the input layer node of the time evolution model and the first layer node of the hidden layer is the time sequence characteristic code.

Optionally, the performing time sequence grouping on the target feature codes according to the generation time of the target data, and training a time evolution model between each group of target feature codes includes: dividing target feature codes corresponding to the target data generated in the same preset time period into a group; and taking N-1 groups of target feature codes corresponding to the previous N-1 preset time periods as input, taking a target feature code corresponding to the Nth preset time period as output, and training the time evolution model, wherein N is an integer greater than 1.

In a second aspect, an embodiment of the present invention provides a feature encoding apparatus for enterprise operation management risk identification, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring target data from an enterprise risk data source, and the target data comprises character type data and/or knowledge graph type data; the encoding unit is used for encoding the target data according to the adjacent relation between the target data to obtain a target feature code, and the target feature code is used for inputting a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value.

Optionally, the obtaining unit includes: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for performing word segmentation and key information extraction on unstructured data in an enterprise risk data source so as to enable each piece of key information to form character type data; and/or a second acquisition module for converting structured data in the enterprise risk data source into knowledge graph type data.

Optionally, the target data includes character-type data; the encoding unit includes: the construction module is used for constructing the adjacent relation among the character type data according to a preset rule; and the vectorization module is used for vectorizing the character type data according to the adjacent relation so as to form the target feature code.

Optionally, the constructing module is specifically configured to construct, according to a preset rule, an adjacent relationship between people of each category of positions in each enterprise; the vectorization module is specifically configured to: vectorizing the identity of the person in each type of position and averaging to obtain the average identity characteristic code of the position; and splicing the average identity characteristic codes of all types of positions to form the target characteristic code.

Optionally, the target data comprises knowledge graph type data; the encoding unit includes: the first coding module is used for determining the influence characteristic code of each enterprise by using a pagerank algorithm based on the relation between the enterprises in the knowledge graph type data; the second coding module is used for determining the risk characteristic code of each enterprise by using a label propagation algorithm based on the relation between the enterprises in the knowledge graph type data; the third coding module is used for determining characteristic codes related to people in each enterprise by using a deepwalk algorithm based on the relation between the enterprises and people in the knowledge graph type data; a splicing module for splicing the influence signature code, the risk signature code and the person-related signature code to form the target signature code.

Optionally, the target data includes character-type data and knowledge graph-type data; the encoding unit is specifically configured to: vectorizing the character type data and the knowledge graph type data respectively according to the adjacent relation between the target data to form character feature codes and graph feature codes; and splicing the character feature codes and the map feature codes to form the target feature codes.

Optionally, the apparatus further comprises: the training unit is used for coding the target data according to the adjacent relation between the target data to obtain target feature codes, then carrying out time sequence grouping on the target feature codes according to the generation time of the target data, and training a time evolution model between each group of target feature codes to obtain the time sequence feature codes of the target data, wherein the characteristics of the time sequence feature codes are used for inputting a second risk recognition neural network model; and the coefficient matrix between the input layer node of the time evolution model and the first layer node of the hidden layer is the time sequence characteristic code.

Optionally, the training unit is specifically configured to: dividing target feature codes corresponding to the target data generated in the same preset time period into a group; and taking N-1 groups of target feature codes corresponding to the previous N-1 preset time periods as input, taking a target feature code corresponding to the Nth preset time period as output, and training the time evolution model, wherein N is an integer greater than 1.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for executing any feature coding method for enterprise operation management risk identification provided by the embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement any of the signature coding methods for enterprise operation management risk identification provided by the embodiments of the present invention.

The feature coding method and device for enterprise operation management risk identification provided by the embodiment of the invention can acquire target data from an enterprise risk data source, and code the target data according to the adjacent relation between the target data to obtain the target feature code. Because the target feature coding can represent each target data by one multi-dimensional vector, and the dimension number of the multi-dimensional vector is smaller than the preset threshold value, the character type data and/or the knowledge graph type data can be uniformly represented by a group of dense multi-dimensional vectors, so that the feature extraction and the model training are facilitated, and the accuracy of model prediction is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a feature encoding method for enterprise operation management risk identification according to an embodiment of the present invention;

fig. 2 is a schematic diagram of feature splicing in a feature encoding method for identifying risk of enterprise operation management according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an encoding method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of another feature splicing in the feature coding method for enterprise operation management risk identification according to the embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a modeling apparatus for enterprise operation management risk identification according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the feature encoding method for enterprise operation management risk identification provided in the embodiment of the present invention may specifically include:

s11, acquiring target data from an enterprise risk data source, wherein the target data comprises character type data and/or knowledge graph type data;

the enterprise risk data source may include any information related to an enterprise, such as basic information of an industrial and commercial enterprise, financial information of a listed enterprise, information of operating conditions of small and medium enterprises in the country, court judgment information, tax rating information, court announcement information, court execution information, trademark data, patent information, software/work copyright information, enterprise bid and bid information, and media report information related to the enterprise. The carrier of such information may include various databases, bulletin documents, periodicals, etc., and the format of the information may vary greatly. In this step, the required information can be integrated from these enterprise risk data sources to obtain the target data.

Optionally, in an embodiment of the present invention, the data type of the target data may include character type data, or knowledge graph type data, or a combination of both. When the data in the enterprise risk data source is not character type data or knowledge graph type data, the enterprise risk source data can be converted into one of the two types of data and then further processed.

The character data may include, for example, ID (Identification) data. The ID data is one type of category data, and compared with the traditional category data, the ID features have the characteristic of huge value space, such as enterprise ID, enterprise corporate ID, word ID after word segmentation and the like. Optionally, the value space of the enterprise ID may be equal to the number of the enterprises, and the value space of the enterprise legal person ID is equal to all the people; after a segment of news is subjected to word segmentation, the space scale formed by word IDs is larger. Knowledge graph-type data may refer to data that reflects an enterprise internal structure or enterprise external relationships through a graph constructed by a triple structure of a knowledge graph. Since the character-type data and the knowledge graph-type data are generally large and complex, in order to facilitate feature extraction and model training using the extracted features, the embodiment of the present invention performs the following further processing on the two data.

S12, encoding the target data according to the adjacent relation between the target data to obtain a target feature code, wherein the target feature code is used for inputting a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value.

In this step, the target data may be encoded according to an adjacent relationship between the target data to obtain a target feature code, and the target feature code may be used to input the first risk identification neural network model. Through the coding, specific features in the target data can be changed into features expressed by vectors, so that model training and prediction are facilitated. In an embodiment of the present invention, the dimension of the vector may be smaller than a preset threshold, for example, smaller than 200 or smaller than 500, so that the features may be represented more intensively, which effectively improves the efficiency of the model. And how each target data is represented by a vector in particular can be trained according to requirements.

The feature coding method for enterprise operation management risk identification provided by the embodiment of the invention can acquire target data from an enterprise risk data source, and code the target data according to the adjacent relation between the target data to obtain the target feature code. Because the target feature coding can represent each target data by one multi-dimensional vector, and the dimension number of the multi-dimensional vector is smaller than the preset threshold value, the character type data and/or the knowledge graph type data can be uniformly represented by a group of dense multi-dimensional vectors, so that the feature extraction and the model training are facilitated, and the accuracy of model prediction is effectively improved.

Optionally, in step S11, the processing manner of obtaining the target data from the enterprise risk data source may also be different according to the data type. For example, word segmentation and key information extraction can be performed on unstructured data in an enterprise risk data source, so that each key information forms character-type data; for structured data in the enterprise risk data source, the structured data in the enterprise risk data source can be converted into knowledge graph type data.

For example, for an unstructured data of a story segment of media, the story segment may be segmented to extract key information therein, and each key information forms a character type data, such as "garden festival", "opening" and the like. For example, the knowledge pattern data may be formed for structured data such as the performance of a middle school entrance for one shift, in accordance with the school number, name, subject, score, and the like.

After the target data of the character type or the knowledge graph type is obtained, the target data may be encoded according to the adjacent relationship between the target data in step S12. The following describes encoding of character type data and encoding of knowledge graph type data.

Optionally, in one embodiment of the invention, the target data obtained from the enterprise risk data source includes character-type data; encoding the target data according to the neighboring relationship between the target data in step S12 to obtain the target feature code may include:

constructing the adjacent relation among the character type data according to a preset rule;

and vectorizing the character type data according to the adjacent relation to form the target feature code.

For example, in one embodiment of the invention, there are two IDs, A and B, and if A and B have some relationship, then A and B may be connected by a directed edge, so that A and B are adjacent. For example, assuming that A and B are both corporate for a business X, and A precedes B to be a corporate, then the relationship between A and B "A, B is both corporate for business X and A precedes B", then it can be defined by this relationship whether there can be an edge between other IDs, and if there is an edge between two IDs, there is an adjacency between the two IDs. Of course, in other embodiments of the present invention, the adjacent relationship between the character-type data may be defined by other relations, which is not limited by the embodiments of the present invention.

After the adjacent relationship between the character data is constructed, the character data can be represented by a group of vectors according to the adjacent relationship. By representing the character type data by using the vector, the data volume of the character type data can be greatly reduced, and the model efficiency is effectively improved.

For example, in one embodiment of the present invention, with each enterprise as a unit, one "paragraph" Pi ═ C may be constructed according to high-pipe relationships_i1，C_i2，C_i3......C_i _ni}。

Wherein Pi represents the "paragraph" constructed by the ith enterprise, and Cij represents the 1-hot code of the jth high pipe of the ith enterprise. And regarding the high pipe as a word, coding 1-hot of the first n high pipes into n input vectors, coding 1-hot of the next high pipe into an output vector, constructing a deep learning model, and training by using all 'paragraph' data formed by Pi. Taking a coefficient matrix between the input layer of the trained model and the first layer node of the hidden layer as an embedding parameter, and for each C_i _niThe 1-hot code of (a) is embedded in a dense vector space.

Further, in the above embodiment, the adjacent relation constructed based on Pi only represents the ID features related to people in the enterprise. Since business risk is in business units, in one embodiment of the invention, ID class features associated with people can also be spliced into business dimension features.

Specifically, in one embodiment of the present invention, constructing the adjacent relationship between the character-type data according to the preset rule may include:

constructing the adjacent relation among people of each type of position in each enterprise according to a preset rule;

the vectorizing the character-type data according to the adjacent relationship to form the target feature code includes: vectorizing the identity of the person in each type of position and averaging to obtain the average identity characteristic code of the position;

and splicing the average identity characteristic codes of all types of positions to form the target characteristic code.

For example, for any class of positions (i.e., dimension k (e.g., high manager, middle tier, or general employee, etc.)), the jth person (total J persons) that fits in that dimension is found, and the features of the person who remembers that class of position are encoded as:

for each dimension k, a fixed length feature vector is constructed, and the lengths of the feature vectors of different dimensions may be the same or different. As shown in fig. 2, by concatenating these feature vectors, the ID class feature for the enterprise can be obtained.

For knowledge graph type data, in an embodiment of the present invention, the target data obtained from the enterprise risk data source includes knowledge graph type data, and encoding the target data according to the neighboring relationship between the target data to obtain the target feature code may specifically include:

determining the influence characteristic code of each enterprise by using a pagerank algorithm based on the relation between the enterprises in the knowledge graph type data;

determining risk characteristic codes of each enterprise by using a label propagation algorithm based on the relation between the enterprises in the knowledge graph type data;

determining characteristic codes related to people in each enterprise by using a deepwalk algorithm based on the relation between the enterprises and people in the knowledge graph type data;

stitching the impact signature code, the risk signature code, and the person-related signature code to form the target signature code.

In particular, in knowledge graph type data, relationships between businesses and enterprises can be modeled as relationships between nodes in a knowledge graph. The impact size of an enterprise may be determined by the number and/or quality of nodes connected to the node representing the enterprise.

In the embodiment of the invention, the Pagerank algorithm simulates the enterprise A as a node in the knowledge graph type data, and the 'out-link' of the node is regarded as a 'directed edge' pointing to other nodes, and the 'in-link' is a directed edge pointing to the node by other nodes. The entire knowledge-graph may form a directed graph.

Alternatively, the evaluation of enterprise impact may follow the following two principles:

the quantity assumes: the greater the degree of entry (i.e., the number of linked) of a node, the higher the impact of the enterprise represented by the node; for example, if the degree of entry for node A1 is 3 and the degree of entry for node A2 is 6, then the impact of the business represented by node A2 is greater than the impact of the business represented by node A1.

The quality assumption is that: the greater the influence of the source node of the degree of entry of a node, the greater the influence of the enterprise represented by the node. For example, the source node B1 of the degree of entry for node A3 has a greater impact than the source node B2 of the degree of entry for node A4, and the impact of the business represented by node A3 is greater than the impact of the business represented by node A4. Optionally, in an embodiment of the present invention, the influence of the source node of the degree may also be represented as the degree of the source node of the degree, and based on this, the influence of the node B1 may be represented as the degree of the node B1.

The influence of each enterprise can be represented by walking in the directed graph by adopting a random walk model of a random process in mathematics. The random walk model means that the current state of an object is only related to the last state of the object, and is not related to the state of the object before the object. Assuming an initial state, the probability of the user walking all pages is equal, and after each walk, the user can walk to the page pointed by the link with equal probability according to the link given in the page at the moment.

After determining the impact of the enterprise, in one embodiment of the present invention, the risk profile code of each enterprise may be further determined using the label propagation algorithm. Thus, the greater the impact of a business, the greater the probability that the business will be at risk once the risk profile is presented, while the lesser the impact of the business, the less the probability that the business will be at risk when the risk profile is presented.

Specifically, Label propagation is a semi-supervised learning algorithm, and its core idea is that similar data should have the same Label. It can be considered that label propagation tends to take the label with the largest number in labels of neighbor nodes of a node as the label of the node itself. Alternatively, the labels may be labels that characterize different types of risks, or labels that characterize different degrees of risk. Based on this, if most of all neighboring nodes of a business are labeled with risk label R1, the business will also be labeled with risk label R1.

In the knowledge graph of enterprise relations, the equity investment relation and the guarantee relation among enterprises are easy to cause enterprise risks. In one embodiment of the invention, risk characterization coding may be performed for enterprises having both types of relationships. For example, if enterprise a is secured for enterprise B, a directed edge is connected between enterprise a and enterprise B. After the relation is determined, the risk coding value of each enterprise in the knowledge graph can be obtained by adopting pagerank and label propagation.

After the influence of the relation between the enterprises on the enterprise risk is determined through the pagerank and label propagation algorithms, in the embodiment of the invention, the characteristic codes related to people in the enterprises can be determined according to the relation between the enterprises and people in the knowledge graph.

Determining characteristic codes related to people in each enterprise by using a deepwalk algorithm based on the relation between enterprises and people in knowledge graph type data

In the relation between an enterprise and people, the enterprise risk can be described through the relation between people, and the feature coding is carried out by adopting the idea of deepwalk. The Deepwalk algorithm is also similar to a word2vec method, network nodes are uniformly selected on a graph by the Deepwalk algorithm, a random walk sequence with a fixed length is generated and is similar to a sequence in a natural language, and a characteristic coding mode is obtained by using a DNN model.

As shown in FIG. 3, in one embodiment of the invention, corporate and high-management relationships are employed to build corporate and human network relationships. Six nodes 1-6 in the left graph of fig. 3 may represent businesses or high-level governments, with the connecting lines between the nodes representing the connections between the nodes. Several sequences can be formed according to the concept of Deepwalk, randomly walking between nodes along these links (see the middle graph of fig. 3). The sequence relation is analogized to the sequence in the natural language, and a model is obtained by giving a node and predicting the nearby node (see the right graph of fig. 3). And the coefficient matrix between the first layer nodes of the input layer and the hidden layer of the model is the characteristic code of the enterprise in the relationship between the enterprise and the person.

After the feature codes are obtained, as shown in fig. 4, different feature codes obtained based on pagerank, label propagation and Deepwalk can be "spliced" together to jointly form a feature code representing an enterprise knowledge graph.

The above embodiments have described in detail the feature encoding of the character type data and the knowledge graph type data, respectively, but the present invention is not limited thereto. In an embodiment of the present invention, the target data may include character-type data and knowledge graph-type data, and the encoding the target data according to the adjacent relationship between the target data in step S12 to obtain the target feature code may specifically include:

vectorizing the character type data and the knowledge graph type data respectively according to the adjacent relation between the target data to form character feature codes and graph feature codes;

and splicing the character feature codes and the map feature codes to form the target feature codes.

That is, after the feature codes of the character type data and the feature codes of the knowledge graph type data are obtained, the two feature codes can be spliced together to form the target feature code.

Further, the target feature code is an enterprise risk feature code for the same time node, and when the influence of time factors on risk features is considered, the target feature code can be further processed. Specifically, the target feature codes may be subjected to time sequence grouping according to the generation time of the target data, and a time evolution model between each group of target feature codes is trained, so as to obtain the time sequence feature codes of the target data, where the features of the time sequence feature codes are used for inputting a second risk identification neural network model, and a coefficient matrix between an input layer node of the time evolution model and a first layer node of a hidden layer is the time sequence feature codes.

Optionally, in an embodiment of the present invention, target feature codes corresponding to the target data generated within the same preset time period may be grouped into one group; and taking N-1 groups of target feature codes corresponding to the previous N-1 preset time periods as input, taking a target feature code corresponding to the Nth preset time period as output, and training the time evolution model, wherein N is an integer greater than 1.

For example, in one embodiment of the present invention, the signature code of enterprise i at time j is x_ijTaking a time interval n, whereinn is a hyper-parameter, combines the data change characteristics of enterprise risks, and is 1 week, so that the enterprise data are ensured to have certain changes in different periods, and meanwhile, compared with one month, more training samples can be created in such a time period.

The time series of business i was cut into m-n +1 samples by n:

(x_i1，x_i2，…，x_in-1)-＞(x_in)

(x_i2，x_i3，…，x_in)-＞(x_in+1)

…

(x_im-n+1，x_i3，…，x_im-1)-＞(x_im)

and combining the data of all enterprises to form a training sample, and training to obtain a DNN model.

In a second aspect, an embodiment of the present invention further provides a feature processing device for enterprise operation management risk identification, which can perform feature coding on features related to enterprise risks and perform model training by using the coded features, so that workload and blindness of feature extraction are greatly reduced, and accuracy of model prediction is effectively improved.

As shown in fig. 5, the feature encoding apparatus for enterprise operation management risk identification provided in the embodiment of the present invention may specifically include:

an obtaining unit 31, configured to obtain target data from an enterprise risk data source, where the target data includes character-type data and/or knowledge graph-type data;

the encoding unit 32 is configured to encode the target data according to an adjacent relationship between the target data to obtain a target feature code, where the target feature code is used to input a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value.

The feature coding device for enterprise operation management risk identification provided by the embodiment of the invention can acquire target data from an enterprise risk data source, and code the target data according to the adjacent relation between the target data to obtain the target feature code. Because the target feature coding can represent each target data by one multi-dimensional vector, and the dimension number of the multi-dimensional vector is smaller than the preset threshold value, the character type data and/or the knowledge graph type data can be uniformly represented by a group of dense multi-dimensional vectors, so that the feature extraction and the model training are facilitated, and the accuracy of model prediction is effectively improved.

Optionally, the obtaining unit 31 may include: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for performing word segmentation and key information extraction on unstructured data in an enterprise risk data source so as to enable each piece of key information to form character type data; and/or a second acquisition module for converting structured data in the enterprise risk data source into knowledge graph type data.

Optionally, the target data may include character-type data; the encoding unit 32 may include: the construction module is used for constructing the adjacent relation among the character type data according to a preset rule; and the vectorization module is used for vectorizing the character type data according to the adjacent relation so as to form the target feature code.

Optionally, the constructing module is specifically configured to construct, according to a preset rule, an adjacent relationship between people of each category of positions in each enterprise; the vectorization module may be specifically configured to: vectorizing the identity of the person in each type of position and averaging to obtain the average identity characteristic code of the position; and splicing the average identity characteristic codes of all types of positions to form the target characteristic code.

Optionally, the target data comprises knowledge graph type data; the encoding unit 32 may include: the first coding module is used for determining the influence characteristic code of each enterprise by using a pagerank algorithm based on the relation between the enterprises in the knowledge graph type data; the second coding module is used for determining the risk characteristic code of each enterprise by using a label propagation algorithm based on the relation between the enterprises in the knowledge graph type data; the third coding module is used for determining characteristic codes related to people in each enterprise by using a deepwalk algorithm based on the relation between the enterprises and people in the knowledge graph type data; a splicing module for splicing the influence signature code, the risk signature code and the person-related signature code to form the target signature code.

Optionally, the target data includes character-type data and knowledge graph-type data; the encoding unit 32 may specifically be configured to: vectorizing the character type data and the knowledge graph type data respectively according to the adjacent relation between the target data to form character feature codes and graph feature codes; and splicing the character feature codes and the map feature codes to form the target feature codes.

Optionally, the feature encoding apparatus for enterprise operation management risk identification provided in the embodiment of the present invention may further include: the training unit is used for coding the target data according to the adjacent relation between the target data to obtain target feature codes, then carrying out time sequence grouping on the target feature codes according to the generation time of the target data, and training a time evolution model between each group of target feature codes to obtain the time sequence feature codes of the target data, wherein the characteristics of the time sequence feature codes are used for inputting a second risk recognition neural network model; and the coefficient matrix between the input layer node of the time evolution model and the first layer node of the hidden layer is the time sequence characteristic code.

Optionally, the training unit may be specifically configured to: dividing target feature codes corresponding to the target data generated in the same preset time period into a group; and taking N-1 groups of target feature codes corresponding to the previous N-1 preset time periods as input, taking a target feature code corresponding to the Nth preset time period as output, and training the time evolution model, wherein N is an integer greater than 1.

As shown in fig. 6, an electronic device provided in an embodiment of the present invention may include: the device comprises a shell 51, a processor 52, a memory 53, a circuit board 54 and a power circuit 55, wherein the circuit board 54 is arranged inside a space enclosed by the shell 51, and the processor 52 and the memory 53 are arranged on the circuit board 54; a power supply circuit 55 for supplying power to each circuit or device of the electronic apparatus; the memory 53 is used to store executable program code; the processor 52 reads the executable program code stored in the memory 53 to run a program corresponding to the executable program code, so as to execute the signature coding method for enterprise operation management risk identification provided in any of the foregoing embodiments.

For specific execution processes of the above steps by the processor 52 and further steps executed by the processor 52 by running the executable program code, reference may be made to the description of the foregoing embodiments, and details are not described herein again.

The above electronic devices exist in a variety of forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.

(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.

(5) And other electronic equipment with data interaction function.

Accordingly, an embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored in the computer-readable storage medium, and the one or more programs can be executed by one or more processors to implement any one of the feature coding methods for enterprise operation management risk identification provided in the foregoing embodiments, so that corresponding technical effects can also be achieved, and the foregoing has been described in detail, and are not described herein again.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.

In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A feature coding method for enterprise operation management risk identification is characterized by comprising the following steps:

acquiring target data from an enterprise risk data source, wherein the target data comprises character type data and/or knowledge graph type data;

coding the target data according to the adjacent relation between the target data to obtain a target feature code, wherein the target feature code is used for inputting a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value.

2. The method of claim 1, wherein obtaining target data from an enterprise risk data source comprises:

performing word segmentation and key information extraction on unstructured data in an enterprise risk data source to enable each key information to form character type data;

and/or

And converting the structured data in the enterprise risk data source into knowledge graph type data.

3. The method of claim 1, wherein the target data comprises character-type data;

the encoding the target data according to the adjacent relation between the target data to obtain the target feature code comprises:

4. The method according to claim 3, wherein the constructing the adjacent relationship between the character-type data according to the preset rule comprises:

the vectorizing the character-type data according to the adjacent relationship to form the target feature code includes:

vectorizing the identity of the person in each type of position and averaging to obtain the average identity characteristic code of the position;

5. The method of claim 1, wherein the target data comprises knowledge graph type data;

6. The method of claim 1, wherein the target data comprises character-type data and knowledge graph-type data;

7. The method according to any one of claims 1 to 6, wherein after the target data are encoded according to the adjacent relation between the target data to obtain the target feature code, the method further comprises:

according to the generation time of the target data, carrying out time sequence grouping on the target feature codes, and training a time evolution model among all groups of target feature codes so as to obtain time sequence feature codes of the target data, wherein the features of the time sequence feature codes are used for inputting a second risk identification neural network model; and the coefficient matrix between the input layer node of the time evolution model and the first layer node of the hidden layer is the time sequence characteristic code.

8. A feature coding device for enterprise operation management risk identification, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring target data from an enterprise risk data source, and the target data comprises character type data and/or knowledge graph type data;

the encoding unit is used for encoding the target data according to the adjacent relation between the target data to obtain a target feature code, and the target feature code is used for inputting a first risk identification neural network model; and the target feature code represents each target data by using a multi-dimensional vector, and the dimension number of the multi-dimensional vector is less than a preset threshold value.

9. An electronic device, characterized in that the electronic device comprises: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for executing the feature coding method for enterprise operation management risk identification as claimed in any one of the preceding claims 1-7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which are executable by one or more processors to implement the signature coding method for enterprise operations management risk identification of any one of the preceding claims 1 to 7.