CN112418652B

CN112418652B - Risk identification method and related device

Info

Publication number: CN112418652B
Application number: CN202011302019.3A
Authority: CN
Inventors: 涂昶; 张镇潮; 施建生; 钱力扬; 陈鹏飞; 祁海洋
Original assignee: Servyou Software Group Co ltd
Current assignee: Servyou Software Group Co ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2024-01-30
Anticipated expiration: 2040-11-19
Also published as: CN112418652A

Abstract

The application discloses a risk identification method, which comprises the following steps: carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations; clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result; and determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result. The corresponding purchase and sale mode is determined by clustering the counted purchase and sale commodity matrixes, so that the risk organization deviating from the purchase and sale mode is determined on the basis of the normal purchase and sale mode, and the accuracy of risk identification is improved. The application also discloses a risk identification device, a server and a computer readable storage medium, which have the beneficial effects.

Description

Risk identification method and related device

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a risk identification method, a risk identification device, a server, and a computer readable storage medium.

Background

With the continuous development of data processing technology, more and more data processing operations are currently performed on data. Data is a representation of facts, concepts, or instructions that may be processed by manual or automated means. After the data is interpreted and given a certain meaning, the data becomes information. Data processing is the collection, storage, retrieval, processing, transformation, and transmission of data. The basic purpose of data processing is to extract and derive data that is valuable and meaningful to some particular person from a large, possibly unorganized, unintelligible, data. Data processing is a fundamental link of system engineering and automatic control. The data processing is throughout various fields of social production and social life, and various aspects of the production processing are greatly improved through the data processing. For example, enterprise operation data is typically analyzed using data processing related techniques to determine the risks present in different enterprises.

In the related technology, the implementation scheme is mainly focused on the risk diagnosis of tax payers of commercial enterprises, the business enterprise has a relatively single purchase and sale mode, the main types and the amounts of the purchased and sold commodities only have the difference in gross interest rate, and the commercial enterprises can be identified as the risk enterprises if the difference is too large. However, for objects in other operation modes, for example, industrial enterprises and other non-commercial enterprises relate to complex input-output relationships, no effective risk identification algorithm based on the purchase and sale of goods by taxpayers exists in the industry, that is, effective risk identification cannot be performed on different types of enterprise data, and accuracy and precision of risk identification are reduced.

Therefore, how to improve the accuracy of identifying risks is a major concern for those skilled in the art.

Disclosure of Invention

The purpose of the present application is to provide a risk identification method, a risk identification device, a server and a computer readable storage medium, which are used for determining a corresponding purchase and sale mode by clustering a counted purchase and sale commodity matrix, and further determining a risk organization deviating from the purchase and sale mode on the basis of a normal purchase and sale mode.

In order to solve the above technical problems, the present application provides a risk identification method, including:

carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations;

clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result;

and determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result.

Optionally, performing data statistics processing on the obtained commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations, including:

classifying the acquired commodity data of the organizations according to the commodity types to obtain commodity classification data corresponding to each organization;

carrying out industry deviation rectifying treatment on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization;

and carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain the purchase and sale commodity matrix of all organizations.

Optionally, classifying the obtained commodity data of the plurality of organizations according to the commodity type to obtain commodity classification data corresponding to each organization, including:

and classifying the acquired commodity data of the organizations by adopting a natural language processing model to obtain commodity classification data corresponding to each organization.

Optionally, performing industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization, including:

and carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the XGBoost algorithm to obtain the industry deviation rectifying commodity data corresponding to each organization.

Optionally, clustering the purchase and sale commodity matrix of all organizations to obtain a purchase and sale mode clustering result, including:

and clustering the purchase and sale commodity matrixes of all organizations by adopting mean shift clustering to obtain the purchase and sale mode clustering result.

Optionally, determining the organization deviating from the preset proportion as the risk organization from the purchase and sale mode clustering result includes:

determining a normal purchase and sale mode organization from the purchase and sale mode clustering result according to the preset proportion;

and taking the organization outside the normal purchase and sale mode organization in the plurality of organizations as the risk organization.

The application also provides a risk identification device, comprising:

the commodity data statistics module is used for carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all the organizations;

the commodity matrix clustering module is used for clustering the purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;

and the risk organization determining module is used for determining the organization deviating from the preset proportion as the risk organization from the purchase and sale mode clustering result.

Optionally, the commodity data statistics module includes:

the commodity classification unit is used for classifying the acquired commodity data of the plurality of organizations according to the commodity types to obtain commodity classification data corresponding to each organization;

the industry deviation rectifying unit is used for carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization;

and the data statistics unit is used for carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain the purchase and sale commodity matrix of all organizations.

The application also provides a server comprising:

a memory for storing a computer program;

a processor for implementing the steps of the risk identification method as described above when executing the computer program.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the risk identification method as described above.

The risk identification method provided by the application comprises the following steps: carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations; clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result; and determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result.

The acquired commodity data of a plurality of organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, then the purchase and sale commodity matrixes are clustered, a purchase and sale mode clustering result is determined, and finally the organization deviating from a normal purchase and sale mode is determined from the purchase and sale mode clustering result to serve as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, and only simple similar purchase and sale can be identified, and the accuracy of identifying risks is improved.

The application further provides a risk identification device, a server and a computer readable storage medium, which have the above beneficial effects and are not described herein.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.

Fig. 1 is a flowchart of a risk identification method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a risk identification device according to an embodiment of the present application.

Detailed Description

The core of the application is to provide a risk identification method, a risk identification device, a server and a computer readable storage medium, corresponding purchase and sale modes are determined by clustering the counted purchase and sale commodity matrixes, and further risk organization deviating from the purchase and sale modes is determined on the basis of the normal purchase and sale modes, so that the accuracy of risk identification is improved.

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Therefore, the risk identification method is provided, the acquired commodity data of a plurality of organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, the purchase and sale commodity matrixes are clustered, the purchase and sale mode clustering result is determined, and the organization deviating from the normal purchase and sale mode is determined from the purchase and sale mode clustering result to serve as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, and the risk identification method can only identify simple similar purchase and sale, and improves accuracy of risk identification.

A risk identification method provided in the present application is described below by way of an embodiment.

Referring to fig. 1, fig. 1 is a flowchart of a risk identification method according to an embodiment of the present application.

In this embodiment, the method may include:

s101, carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations;

the step aims at carrying out data statistics processing on the acquired commodity data of different organizations to obtain purchase and sale commodity matrixes corresponding to each organization, namely purchase and sale commodity matrixes of all organizations. Wherein the commodity data includes purchase commodity data and sales commodity data.

The commodity data can be obtained from combined operation data, invoice data generated by organizations, and data purchased and sold by the organizations. It is to be noted that the manner of acquiring the commodity data in this step is not limited in detail. However, the commodity data may be manually entered data, regardless of the commodity data obtained therefrom. Therefore, the commodity data may have a problem such as a certain data input error. For example, the names of the commodities are not standard, and the classification of the commodities is inaccurate. Further, since the differences between the commodity categories sold by purchasing in different industries are large, organizations need to be classified according to different industries so as to analyze the conventional mode in the industry according to commodity data of the same industry, so as to determine the deviation from the conventional organization.

Therefore, in this step, the accuracy of statistics on commodity data is improved. In the step, commodity name deviation rectifying treatment can be carried out on commodity data, and deviation rectifying treatment can be carried out on industry data of organizations. So as to improve the accuracy and precision of the acquired commodity data.

Finally, in order to improve the convenience of data display, commodity data are displayed in a matrix form in the step, so that corresponding processing is conveniently carried out on the data. Wherein, a purchase and sale commodity matrix generally refers to the purchase and sale commodity matrix of all organizations in the same industry.

Further, in order to improve accuracy of data statistics, the step may include:

step 1, classifying the acquired commodity data of a plurality of organizations according to commodity types to obtain commodity classification data corresponding to each organization;

step 2, carrying out industry deviation rectifying treatment on the commodity classification data corresponding to each organization according to industry standards to obtain industry deviation rectifying commodity data corresponding to each organization;

and step 3, carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain purchase and sale commodity matrixes of all organizations.

It can be seen that the present alternative mainly describes how to obtain the purchase and sale commodity matrix. In the alternative scheme, firstly, the acquired commodity data of a plurality of organizations are classified according to commodity types to obtain commodity classification data corresponding to each organization; and then, carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to industry standards to obtain industry deviation rectifying commodity data corresponding to each organization. It can be seen that, first, both the steps are to reclassify the commodity data and perform industry deviation correction processing on the industry. In order to improve accuracy of commodity data classification, the commodity data classification process may first perform deviation correction processing on commodity names, and then classify commodity data. And finally, carrying out data statistics processing on the industry deviation correction commodity data corresponding to each organization to obtain purchase and sale commodity matrixes of all organizations. That is, on the basis of correcting the commodity name, commodity classification and industry information in commodity data, corresponding statistical processing is performed on the commodity data. Wherein the rows of the purchase and sale commodity matrix represent the organization or the organization quantity, and the columns of the purchase and sale commodity matrix represent the purchase commodity or the sale commodity.

Further, in order to improve accuracy of commodity classification, step 1 in the above alternative may include:

It can be seen that this alternative mainly describes how to sort the goods. In the alternative scheme, in order to improve accuracy of commodity classification, a natural language processing model is adopted to classify the acquired commodity data of a plurality of organizations, and commodity classification data corresponding to each organization is obtained. The accuracy of the identification of commodity names can be improved through the natural language processing model, and the deviated names can be further corrected. N-gram strong rule matching, a core word extraction algorithm, a tax field special word segmentation technology and a BERT (Bidirectional Encoder Representations from Transformers) natural language processing deep learning framework can be adopted.

Further, in order to improve accuracy of industry deviation correction, step 2 in the above alternative may include:

and carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the XGBoost algorithm to obtain industry deviation rectifying commodity data corresponding to each organization.

It can be seen that the present alternative is mainly to explain how to perform industry deviation correction. In the alternative scheme, industry deviation rectifying processing is carried out on commodity classification data corresponding to each organization according to the XGBoost algorithm, so that industry deviation rectifying commodity data corresponding to each organization is obtained. Among these, XGBoost (eXtreme Gradient Boosting, extreme gradient lifting) is derived from a gradient lifting framework, but is more efficient, with the secret being that the algorithm can compute in parallel, build trees approximately, handle sparse data efficiently, and optimize memory usage, which results in XGBoost lifting at least 10 times faster than existing gradient lifting implementations.

S102, clustering purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;

based on S101, the step aims at clustering purchase and sale commodity matrixes of all organizations and then obtaining purchase and sale mode clustering results. That is, the data in the purchase and sale commodity matrix may be clustered, that is, the number of each sales commodity under the same purchase commodity may be clustered. For example, in the case of clustering to purchase a commodity, there are 3000 organizations for selling x commodity, 1250 organizations for selling y commodity, and 20 organizations for selling z commodity.

Further, in order to improve the clustering effect in this step, this step may include:

and clustering purchase and sale commodity matrixes of all organizations by adopting mean shift clustering to obtain purchase and sale mode clustering results.

Therefore, in the alternative scheme, the mean shift clustering algorithm is mainly adopted to cluster the purchase and sale commodity matrix. Wherein the mean shift clustering algorithm is a sliding window based algorithm that attempts to find dense areas of data points. This is a centroid-based algorithm, meaning that the goal of the algorithm is to locate the center point of each group/class, by updating the candidate points for the center point to the mean of the points within the sliding window. These candidate windows are then filtered in a post-processing stage to eliminate approximate duplicates, forming the final set of centerpoints and their corresponding groups.

And S103, determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result.

On the basis of S102, this step aims at determining, as a risk organization, organization data deviating from a preset proportion from the purchase-sale pattern clustering result.

That is, on the basis of clustering results of purchase and sale patterns, normal purchase and sale patterns to most organizations can be determined therefrom according to a preset ratio. For example, 80% of organizations purchase B products and sell h products. Then it can be said that the purchase of the B commodity corresponds to the sale of the h commodity, the purchase and sale pattern may be a normal purchase and sale pattern, and an organization outside the purchase and sale pattern may mark a risk organization.

Further, to illustrate the operation of this step, this step may include:

step 1, determining a normal purchase and sale mode organization from a purchase and sale mode clustering result according to a preset proportion;

and 2, taking the organization which is outside the normal purchase and sale mode organization in the plurality of organizations as a risk organization.

In the alternative scheme, the normal purchase and sale mode organization is determined from the purchase and sale mode clustering result according to the preset proportion. That is, the normal purchase and sale pattern organization is first determined. Then, a tissue other than the normal purchase and sale mode tissue among the plurality of tissues is taken as a risk tissue. That is, an organization outside the normal purchase-and-sale mode serves as a risk organization.

In summary, in this embodiment, the acquired commodity data of multiple organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, then the purchase and sale commodity matrixes are clustered, a purchase and sale mode clustering result is determined, and finally the organization deviating from a normal purchase and sale mode is determined from the purchase and sale mode clustering result to serve as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, rather than only identifying simple similar purchase and sale operations, and accuracy of identifying risks is improved.

A risk identification method provided in the present application is described below by way of a specific embodiment.

In this embodiment, taking an example of analyzing invoice data of a manufacturing enterprise, the method may include:

and step 1, screening invoice data of an enterprise organization according to the analysis period, and selecting invoice data of the enterprise organization at the time of making an invoice t0 (starting during the analysis period) and t1 (stopping during the analysis period).

And step 2, classifying the names of the invoices. The classification standards divide the categories of the names of goods according to a goods and service tax classification coding table issued by the tax administration, and the coding table relates to 4000 remainder of the categories of goods and services together. The method comprises the step of carrying out commodity category correction on the cargo name by adopting N-gram strong rule matching, a core word extraction algorithm, a tax field special word segmentation technology and a BERT natural language processing deep learning framework.

Step 3, converting the data structure according to the result of the second step of deviation correction to construct an enterprise organization purchase and sale matrix A _mn Where m represents the number of business organizations in the sample, n represents the coding categories of purchase and sales, a _(ij) Representing the ratio of the amount of the commodity code j purchased or sold by the ith enterprise organization to the total amount of the purchase or sale. Proper manual correction is carried out on the industry registered by the enterprise organization to ensure the accuracy of the industry, and an xgboost algorithm is used for learning A _mn And (3) registering the relation of the industry (after manual correction) to the enterprise organization, obtaining an industry correction algorithm, and carrying out industry prediction on enterprise organizations with business-in and business-out invoices in the analysis period based on the industry correction algorithm.

And 4, counting main commodities purchased and sold in the industry based on the commodity codes and the industry after deviation correction. And sorting the purchased goods from large to small, calculating the purchase proportion of the goods in the industry, accumulating the sorted purchase proportion, and outputting the main purchased goods codes of the industry when the accumulated value i reaches the preset threshold value of 90%. Further, the sales commodity main code is obtained in the same manner.

And 5, screening the registered industry which is Q after the deviation correction of the enterprise organization, and obtaining the summarized amount occupation ratio data (counted according to the enterprise organization and the commodity codes) of the enterprise organization, and combining the main purchase and sale commodity codes of the Q industry obtained in the fourth step. Converting the data structure to obtain an enterprise organization purchase commodity matrix A in the Q industry _buy And sales commodity matrix A _sell Wherein A is _buy And A _sell The number of the lines is equal to the number of enterprise organizations in the industry. A is that _buy List of (A) represents purchase of main commodity _buy The elements in the matrix represent the ratio of the amount of money to be purchased for the primary commodity, A _sell And the same is true.

And 6, clustering the commercial matrix purchased by the enterprise organization under the industry Q, wherein a mean shift clustering method is adopted for determining the diversity of the purchased and sold products caused by different production processes under the same industry. For example, in the wire and cable manufacturing industry, the main purchased commodity categories are copper wires, copper wires, copper wire rods and the like, and various modes may exist in the purchased raw materials due to the fact that different enterprises make invoice categories and fill in errors or different production processes. The method of using the drift mean clustering can automatically mine different modes in different industries, and the number of clusters does not need to be determined manually. And obtaining the buying label a and selling label b of the enterprise n through a clustering algorithm.

And 7, mining the association relation between the buying label and the selling label through a statistical method. Firstly, starting from sales commodities, when the reverse-order arrangement accumulation ratio of the enterprise organization quantity under the sales category reaches 80%, determining the above categories as a main mode of enterprise organization sales, and then mining a purchase mode corresponding to a threshold value according to the established sales mode: the buy-sell mode is said to hold when the business organization under the buy-tag c and sell-tag d overlap by a certain ratio k=80%.

And 8, selecting enterprise organizations which do not accord with the buying and selling modes in the industry as risk enterprises according to the buying and selling modes constructed in the seventh step.

For example, the number of business organizations in different purchase and sales modes of the wire and cable industry may be analyzed. Further, the sample selects 5000 enterprise organizations, and the commodity codes purchased and sold are clustered, and the clustering result is as follows:

it can be seen that the commercial products purchased by the enterprise organization in the wire and cable industry are clustered and copolymerized into 5 categories by using a mean shift method, and the commercial products are clustered into 7 categories. Most of the enterprise organizations are classified in 0,1,2 from the point of view of sales, with an accumulated value of 83.9%. Thus, 0,1,2 is determined to be the main commodity mode sold by the wire and cable industry. Starting from the sales category, the corresponding purchase category is 0 (up to 80%) when the sales category is found to be 0; 1 is 0,1,2 (accumulated value reaches 80%); 2 are 0,1,2 (accumulated value reaches 80%), so that it can be determined that there is a certain risk for the enterprise organizations except for the above mode.

It is obvious that in this embodiment, the acquired commodity data of a plurality of organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, then the purchase and sale commodity matrixes are clustered, the purchase and sale mode clustering result is determined, and finally the organization deviating from the normal purchase and sale mode is determined from the purchase and sale mode clustering result to be used as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, rather than only identifying simple similar purchase and sale, and accuracy of identifying risks is improved.

The risk identification device provided in the embodiments of the present application is described below, and the risk identification device described below and the risk identification method described above may be referred to correspondingly.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a risk identification device according to an embodiment of the present application.

In this embodiment, the apparatus may include:

the commodity data statistics module 100 is configured to perform data statistics processing on the obtained commodity data of multiple organizations according to commodity types and industry standards, so as to obtain purchase and sale commodity matrices of all organizations;

the commodity matrix clustering module 200 is used for clustering purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;

the risk organization determining module 300 is configured to determine an organization deviating from a preset proportion as a risk organization from the purchase-sale pattern clustering result.

Optionally, the commodity data statistics module 100 may include:

the commodity classification unit is used for classifying the acquired commodity data of the plurality of organizations according to commodity types to obtain commodity classification data corresponding to each organization;

the industry deviation rectifying unit is used for carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to industry standards to obtain industry deviation rectifying commodity data corresponding to each organization;

and the data statistics unit is used for carrying out data statistics processing on the industry deviation correction commodity data corresponding to each organization to obtain purchase and sale commodity matrixes of all organizations.

The embodiment of the application also provides a server, which comprises:

a memory for storing a computer program;

a processor for implementing the steps of the risk identification method as described in the above embodiments when executing the computer program.

The present application also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the risk identification method as described in the above embodiments.

In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above describes in detail a risk identification method, a risk identification device, a server and a computer readable storage medium provided in the present application. Specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the description of the examples above is only intended to assist in understanding the methods of the present application and their core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

Claims

1. A risk identification method, comprising:

determining the organization deviating from a preset proportion as a risk organization from the purchase and sale mode clustering result;

the method comprises the steps of carrying out data statistics processing on acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations, wherein the method comprises the following steps:

2. The risk identification method according to claim 1, wherein classifying the acquired commodity data of the plurality of organizations according to the commodity category to obtain commodity classification data corresponding to each organization, comprises:

3. The risk identification method according to claim 1, wherein performing an industry deviation rectifying process on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization includes:

4. The risk identification method according to claim 1, wherein clustering the purchase and sale commodity matrix of all organizations to obtain purchase and sale pattern clustering results comprises:

5. The risk identification method of claim 1, wherein determining an organization deviating from a preset proportion from the purchase-sale pattern clustering result as a risk organization includes:

6. A risk identification device, comprising:

the risk organization determining module is used for determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result;

wherein, commodity data statistics module includes:

7. A server, comprising:

a memory for storing a computer program;

processor for implementing the steps of the risk identification method according to any of claims 1 to 5 when executing said computer program.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the risk identification method according to any of claims 1 to 5.