CN112418652B - Risk identification method and related device - Google Patents

Risk identification method and related device Download PDF

Info

Publication number
CN112418652B
CN112418652B CN202011302019.3A CN202011302019A CN112418652B CN 112418652 B CN112418652 B CN 112418652B CN 202011302019 A CN202011302019 A CN 202011302019A CN 112418652 B CN112418652 B CN 112418652B
Authority
CN
China
Prior art keywords
commodity
purchase
organization
sale
organizations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011302019.3A
Other languages
Chinese (zh)
Other versions
CN112418652A (en
Inventor
涂昶
张镇潮
施建生
钱力扬
陈鹏飞
祁海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Servyou Software Group Co ltd
Original Assignee
Servyou Software Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Servyou Software Group Co ltd filed Critical Servyou Software Group Co ltd
Priority to CN202011302019.3A priority Critical patent/CN112418652B/en
Publication of CN112418652A publication Critical patent/CN112418652A/en
Application granted granted Critical
Publication of CN112418652B publication Critical patent/CN112418652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a risk identification method, which comprises the following steps: carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations; clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result; and determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result. The corresponding purchase and sale mode is determined by clustering the counted purchase and sale commodity matrixes, so that the risk organization deviating from the purchase and sale mode is determined on the basis of the normal purchase and sale mode, and the accuracy of risk identification is improved. The application also discloses a risk identification device, a server and a computer readable storage medium, which have the beneficial effects.

Description

Risk identification method and related device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a risk identification method, a risk identification device, a server, and a computer readable storage medium.
Background
With the continuous development of data processing technology, more and more data processing operations are currently performed on data. Data is a representation of facts, concepts, or instructions that may be processed by manual or automated means. After the data is interpreted and given a certain meaning, the data becomes information. Data processing is the collection, storage, retrieval, processing, transformation, and transmission of data. The basic purpose of data processing is to extract and derive data that is valuable and meaningful to some particular person from a large, possibly unorganized, unintelligible, data. Data processing is a fundamental link of system engineering and automatic control. The data processing is throughout various fields of social production and social life, and various aspects of the production processing are greatly improved through the data processing. For example, enterprise operation data is typically analyzed using data processing related techniques to determine the risks present in different enterprises.
In the related technology, the implementation scheme is mainly focused on the risk diagnosis of tax payers of commercial enterprises, the business enterprise has a relatively single purchase and sale mode, the main types and the amounts of the purchased and sold commodities only have the difference in gross interest rate, and the commercial enterprises can be identified as the risk enterprises if the difference is too large. However, for objects in other operation modes, for example, industrial enterprises and other non-commercial enterprises relate to complex input-output relationships, no effective risk identification algorithm based on the purchase and sale of goods by taxpayers exists in the industry, that is, effective risk identification cannot be performed on different types of enterprise data, and accuracy and precision of risk identification are reduced.
Therefore, how to improve the accuracy of identifying risks is a major concern for those skilled in the art.
Disclosure of Invention
The purpose of the present application is to provide a risk identification method, a risk identification device, a server and a computer readable storage medium, which are used for determining a corresponding purchase and sale mode by clustering a counted purchase and sale commodity matrix, and further determining a risk organization deviating from the purchase and sale mode on the basis of a normal purchase and sale mode.
In order to solve the above technical problems, the present application provides a risk identification method, including:
carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations;
clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result;
and determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result.
Optionally, performing data statistics processing on the obtained commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations, including:
classifying the acquired commodity data of the organizations according to the commodity types to obtain commodity classification data corresponding to each organization;
carrying out industry deviation rectifying treatment on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization;
and carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain the purchase and sale commodity matrix of all organizations.
Optionally, classifying the obtained commodity data of the plurality of organizations according to the commodity type to obtain commodity classification data corresponding to each organization, including:
and classifying the acquired commodity data of the organizations by adopting a natural language processing model to obtain commodity classification data corresponding to each organization.
Optionally, performing industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization, including:
and carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the XGBoost algorithm to obtain the industry deviation rectifying commodity data corresponding to each organization.
Optionally, clustering the purchase and sale commodity matrix of all organizations to obtain a purchase and sale mode clustering result, including:
and clustering the purchase and sale commodity matrixes of all organizations by adopting mean shift clustering to obtain the purchase and sale mode clustering result.
Optionally, determining the organization deviating from the preset proportion as the risk organization from the purchase and sale mode clustering result includes:
determining a normal purchase and sale mode organization from the purchase and sale mode clustering result according to the preset proportion;
and taking the organization outside the normal purchase and sale mode organization in the plurality of organizations as the risk organization.
The application also provides a risk identification device, comprising:
the commodity data statistics module is used for carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all the organizations;
the commodity matrix clustering module is used for clustering the purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;
and the risk organization determining module is used for determining the organization deviating from the preset proportion as the risk organization from the purchase and sale mode clustering result.
Optionally, the commodity data statistics module includes:
the commodity classification unit is used for classifying the acquired commodity data of the plurality of organizations according to the commodity types to obtain commodity classification data corresponding to each organization;
the industry deviation rectifying unit is used for carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization;
and the data statistics unit is used for carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain the purchase and sale commodity matrix of all organizations.
The application also provides a server comprising:
a memory for storing a computer program;
a processor for implementing the steps of the risk identification method as described above when executing the computer program.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the risk identification method as described above.
The risk identification method provided by the application comprises the following steps: carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations; clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result; and determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result.
The acquired commodity data of a plurality of organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, then the purchase and sale commodity matrixes are clustered, a purchase and sale mode clustering result is determined, and finally the organization deviating from a normal purchase and sale mode is determined from the purchase and sale mode clustering result to serve as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, and only simple similar purchase and sale can be identified, and the accuracy of identifying risks is improved.
The application further provides a risk identification device, a server and a computer readable storage medium, which have the above beneficial effects and are not described herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flowchart of a risk identification method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a risk identification device according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a risk identification method, a risk identification device, a server and a computer readable storage medium, corresponding purchase and sale modes are determined by clustering the counted purchase and sale commodity matrixes, and further risk organization deviating from the purchase and sale modes is determined on the basis of the normal purchase and sale modes, so that the accuracy of risk identification is improved.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the related technology, the implementation scheme is mainly focused on the risk diagnosis of tax payers of commercial enterprises, the business enterprise has a relatively single purchase and sale mode, the main types and the amounts of the purchased and sold commodities only have the difference in gross interest rate, and the commercial enterprises can be identified as the risk enterprises if the difference is too large. However, for objects in other operation modes, for example, industrial enterprises and other non-commercial enterprises relate to complex input-output relationships, no effective risk identification algorithm based on the purchase and sale of goods by taxpayers exists in the industry, that is, effective risk identification cannot be performed on different types of enterprise data, and accuracy and precision of risk identification are reduced.
Therefore, the risk identification method is provided, the acquired commodity data of a plurality of organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, the purchase and sale commodity matrixes are clustered, the purchase and sale mode clustering result is determined, and the organization deviating from the normal purchase and sale mode is determined from the purchase and sale mode clustering result to serve as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, and the risk identification method can only identify simple similar purchase and sale, and improves accuracy of risk identification.
A risk identification method provided in the present application is described below by way of an embodiment.
Referring to fig. 1, fig. 1 is a flowchart of a risk identification method according to an embodiment of the present application.
In this embodiment, the method may include:
s101, carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations;
the step aims at carrying out data statistics processing on the acquired commodity data of different organizations to obtain purchase and sale commodity matrixes corresponding to each organization, namely purchase and sale commodity matrixes of all organizations. Wherein the commodity data includes purchase commodity data and sales commodity data.
The commodity data can be obtained from combined operation data, invoice data generated by organizations, and data purchased and sold by the organizations. It is to be noted that the manner of acquiring the commodity data in this step is not limited in detail. However, the commodity data may be manually entered data, regardless of the commodity data obtained therefrom. Therefore, the commodity data may have a problem such as a certain data input error. For example, the names of the commodities are not standard, and the classification of the commodities is inaccurate. Further, since the differences between the commodity categories sold by purchasing in different industries are large, organizations need to be classified according to different industries so as to analyze the conventional mode in the industry according to commodity data of the same industry, so as to determine the deviation from the conventional organization.
Therefore, in this step, the accuracy of statistics on commodity data is improved. In the step, commodity name deviation rectifying treatment can be carried out on commodity data, and deviation rectifying treatment can be carried out on industry data of organizations. So as to improve the accuracy and precision of the acquired commodity data.
Finally, in order to improve the convenience of data display, commodity data are displayed in a matrix form in the step, so that corresponding processing is conveniently carried out on the data. Wherein, a purchase and sale commodity matrix generally refers to the purchase and sale commodity matrix of all organizations in the same industry.
Further, in order to improve accuracy of data statistics, the step may include:
step 1, classifying the acquired commodity data of a plurality of organizations according to commodity types to obtain commodity classification data corresponding to each organization;
step 2, carrying out industry deviation rectifying treatment on the commodity classification data corresponding to each organization according to industry standards to obtain industry deviation rectifying commodity data corresponding to each organization;
and step 3, carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain purchase and sale commodity matrixes of all organizations.
It can be seen that the present alternative mainly describes how to obtain the purchase and sale commodity matrix. In the alternative scheme, firstly, the acquired commodity data of a plurality of organizations are classified according to commodity types to obtain commodity classification data corresponding to each organization; and then, carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to industry standards to obtain industry deviation rectifying commodity data corresponding to each organization. It can be seen that, first, both the steps are to reclassify the commodity data and perform industry deviation correction processing on the industry. In order to improve accuracy of commodity data classification, the commodity data classification process may first perform deviation correction processing on commodity names, and then classify commodity data. And finally, carrying out data statistics processing on the industry deviation correction commodity data corresponding to each organization to obtain purchase and sale commodity matrixes of all organizations. That is, on the basis of correcting the commodity name, commodity classification and industry information in commodity data, corresponding statistical processing is performed on the commodity data. Wherein the rows of the purchase and sale commodity matrix represent the organization or the organization quantity, and the columns of the purchase and sale commodity matrix represent the purchase commodity or the sale commodity.
Further, in order to improve accuracy of commodity classification, step 1 in the above alternative may include:
and classifying the acquired commodity data of the organizations by adopting a natural language processing model to obtain commodity classification data corresponding to each organization.
It can be seen that this alternative mainly describes how to sort the goods. In the alternative scheme, in order to improve accuracy of commodity classification, a natural language processing model is adopted to classify the acquired commodity data of a plurality of organizations, and commodity classification data corresponding to each organization is obtained. The accuracy of the identification of commodity names can be improved through the natural language processing model, and the deviated names can be further corrected. N-gram strong rule matching, a core word extraction algorithm, a tax field special word segmentation technology and a BERT (Bidirectional Encoder Representations from Transformers) natural language processing deep learning framework can be adopted.
Further, in order to improve accuracy of industry deviation correction, step 2 in the above alternative may include:
and carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the XGBoost algorithm to obtain industry deviation rectifying commodity data corresponding to each organization.
It can be seen that the present alternative is mainly to explain how to perform industry deviation correction. In the alternative scheme, industry deviation rectifying processing is carried out on commodity classification data corresponding to each organization according to the XGBoost algorithm, so that industry deviation rectifying commodity data corresponding to each organization is obtained. Among these, XGBoost (eXtreme Gradient Boosting, extreme gradient lifting) is derived from a gradient lifting framework, but is more efficient, with the secret being that the algorithm can compute in parallel, build trees approximately, handle sparse data efficiently, and optimize memory usage, which results in XGBoost lifting at least 10 times faster than existing gradient lifting implementations.
S102, clustering purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;
based on S101, the step aims at clustering purchase and sale commodity matrixes of all organizations and then obtaining purchase and sale mode clustering results. That is, the data in the purchase and sale commodity matrix may be clustered, that is, the number of each sales commodity under the same purchase commodity may be clustered. For example, in the case of clustering to purchase a commodity, there are 3000 organizations for selling x commodity, 1250 organizations for selling y commodity, and 20 organizations for selling z commodity.
Further, in order to improve the clustering effect in this step, this step may include:
and clustering purchase and sale commodity matrixes of all organizations by adopting mean shift clustering to obtain purchase and sale mode clustering results.
Therefore, in the alternative scheme, the mean shift clustering algorithm is mainly adopted to cluster the purchase and sale commodity matrix. Wherein the mean shift clustering algorithm is a sliding window based algorithm that attempts to find dense areas of data points. This is a centroid-based algorithm, meaning that the goal of the algorithm is to locate the center point of each group/class, by updating the candidate points for the center point to the mean of the points within the sliding window. These candidate windows are then filtered in a post-processing stage to eliminate approximate duplicates, forming the final set of centerpoints and their corresponding groups.
And S103, determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result.
On the basis of S102, this step aims at determining, as a risk organization, organization data deviating from a preset proportion from the purchase-sale pattern clustering result.
That is, on the basis of clustering results of purchase and sale patterns, normal purchase and sale patterns to most organizations can be determined therefrom according to a preset ratio. For example, 80% of organizations purchase B products and sell h products. Then it can be said that the purchase of the B commodity corresponds to the sale of the h commodity, the purchase and sale pattern may be a normal purchase and sale pattern, and an organization outside the purchase and sale pattern may mark a risk organization.
Further, to illustrate the operation of this step, this step may include:
step 1, determining a normal purchase and sale mode organization from a purchase and sale mode clustering result according to a preset proportion;
and 2, taking the organization which is outside the normal purchase and sale mode organization in the plurality of organizations as a risk organization.
In the alternative scheme, the normal purchase and sale mode organization is determined from the purchase and sale mode clustering result according to the preset proportion. That is, the normal purchase and sale pattern organization is first determined. Then, a tissue other than the normal purchase and sale mode tissue among the plurality of tissues is taken as a risk tissue. That is, an organization outside the normal purchase-and-sale mode serves as a risk organization.
In summary, in this embodiment, the acquired commodity data of multiple organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, then the purchase and sale commodity matrixes are clustered, a purchase and sale mode clustering result is determined, and finally the organization deviating from a normal purchase and sale mode is determined from the purchase and sale mode clustering result to serve as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, rather than only identifying simple similar purchase and sale operations, and accuracy of identifying risks is improved.
A risk identification method provided in the present application is described below by way of a specific embodiment.
In this embodiment, taking an example of analyzing invoice data of a manufacturing enterprise, the method may include:
and step 1, screening invoice data of an enterprise organization according to the analysis period, and selecting invoice data of the enterprise organization at the time of making an invoice t0 (starting during the analysis period) and t1 (stopping during the analysis period).
And step 2, classifying the names of the invoices. The classification standards divide the categories of the names of goods according to a goods and service tax classification coding table issued by the tax administration, and the coding table relates to 4000 remainder of the categories of goods and services together. The method comprises the step of carrying out commodity category correction on the cargo name by adopting N-gram strong rule matching, a core word extraction algorithm, a tax field special word segmentation technology and a BERT natural language processing deep learning framework.
Step 3, converting the data structure according to the result of the second step of deviation correction to construct an enterprise organization purchase and sale matrix A mn Where m represents the number of business organizations in the sample, n represents the coding categories of purchase and sales, a (ij) Representing the ratio of the amount of the commodity code j purchased or sold by the ith enterprise organization to the total amount of the purchase or sale. Proper manual correction is carried out on the industry registered by the enterprise organization to ensure the accuracy of the industry, and an xgboost algorithm is used for learning A mn And (3) registering the relation of the industry (after manual correction) to the enterprise organization, obtaining an industry correction algorithm, and carrying out industry prediction on enterprise organizations with business-in and business-out invoices in the analysis period based on the industry correction algorithm.
And 4, counting main commodities purchased and sold in the industry based on the commodity codes and the industry after deviation correction. And sorting the purchased goods from large to small, calculating the purchase proportion of the goods in the industry, accumulating the sorted purchase proportion, and outputting the main purchased goods codes of the industry when the accumulated value i reaches the preset threshold value of 90%. Further, the sales commodity main code is obtained in the same manner.
And 5, screening the registered industry which is Q after the deviation correction of the enterprise organization, and obtaining the summarized amount occupation ratio data (counted according to the enterprise organization and the commodity codes) of the enterprise organization, and combining the main purchase and sale commodity codes of the Q industry obtained in the fourth step. Converting the data structure to obtain an enterprise organization purchase commodity matrix A in the Q industry buy And sales commodity matrix A sell Wherein A is buy And A sell The number of the lines is equal to the number of enterprise organizations in the industry. A is that buy List of (A) represents purchase of main commodity buy The elements in the matrix represent the ratio of the amount of money to be purchased for the primary commodity, A sell And the same is true.
And 6, clustering the commercial matrix purchased by the enterprise organization under the industry Q, wherein a mean shift clustering method is adopted for determining the diversity of the purchased and sold products caused by different production processes under the same industry. For example, in the wire and cable manufacturing industry, the main purchased commodity categories are copper wires, copper wires, copper wire rods and the like, and various modes may exist in the purchased raw materials due to the fact that different enterprises make invoice categories and fill in errors or different production processes. The method of using the drift mean clustering can automatically mine different modes in different industries, and the number of clusters does not need to be determined manually. And obtaining the buying label a and selling label b of the enterprise n through a clustering algorithm.
And 7, mining the association relation between the buying label and the selling label through a statistical method. Firstly, starting from sales commodities, when the reverse-order arrangement accumulation ratio of the enterprise organization quantity under the sales category reaches 80%, determining the above categories as a main mode of enterprise organization sales, and then mining a purchase mode corresponding to a threshold value according to the established sales mode: the buy-sell mode is said to hold when the business organization under the buy-tag c and sell-tag d overlap by a certain ratio k=80%.
And 8, selecting enterprise organizations which do not accord with the buying and selling modes in the industry as risk enterprises according to the buying and selling modes constructed in the seventh step.
For example, the number of business organizations in different purchase and sales modes of the wire and cable industry may be analyzed. Further, the sample selects 5000 enterprise organizations, and the commodity codes purchased and sold are clustered, and the clustering result is as follows:
it can be seen that the commercial products purchased by the enterprise organization in the wire and cable industry are clustered and copolymerized into 5 categories by using a mean shift method, and the commercial products are clustered into 7 categories. Most of the enterprise organizations are classified in 0,1,2 from the point of view of sales, with an accumulated value of 83.9%. Thus, 0,1,2 is determined to be the main commodity mode sold by the wire and cable industry. Starting from the sales category, the corresponding purchase category is 0 (up to 80%) when the sales category is found to be 0; 1 is 0,1,2 (accumulated value reaches 80%); 2 are 0,1,2 (accumulated value reaches 80%), so that it can be determined that there is a certain risk for the enterprise organizations except for the above mode.
It is obvious that in this embodiment, the acquired commodity data of a plurality of organizations are subjected to data statistics so as to determine purchase and sale commodity matrixes of all the organizations, then the purchase and sale commodity matrixes are clustered, the purchase and sale mode clustering result is determined, and finally the organization deviating from the normal purchase and sale mode is determined from the purchase and sale mode clustering result to be used as a risk organization, so that risks are identified from complex purchase and sale behaviors of the organization, rather than only identifying simple similar purchase and sale, and accuracy of identifying risks is improved.
The risk identification device provided in the embodiments of the present application is described below, and the risk identification device described below and the risk identification method described above may be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a risk identification device according to an embodiment of the present application.
In this embodiment, the apparatus may include:
the commodity data statistics module 100 is configured to perform data statistics processing on the obtained commodity data of multiple organizations according to commodity types and industry standards, so as to obtain purchase and sale commodity matrices of all organizations;
the commodity matrix clustering module 200 is used for clustering purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;
the risk organization determining module 300 is configured to determine an organization deviating from a preset proportion as a risk organization from the purchase-sale pattern clustering result.
Optionally, the commodity data statistics module 100 may include:
the commodity classification unit is used for classifying the acquired commodity data of the plurality of organizations according to commodity types to obtain commodity classification data corresponding to each organization;
the industry deviation rectifying unit is used for carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to industry standards to obtain industry deviation rectifying commodity data corresponding to each organization;
and the data statistics unit is used for carrying out data statistics processing on the industry deviation correction commodity data corresponding to each organization to obtain purchase and sale commodity matrixes of all organizations.
The embodiment of the application also provides a server, which comprises:
a memory for storing a computer program;
a processor for implementing the steps of the risk identification method as described in the above embodiments when executing the computer program.
The present application also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the risk identification method as described in the above embodiments.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above describes in detail a risk identification method, a risk identification device, a server and a computer readable storage medium provided in the present application. Specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the description of the examples above is only intended to assist in understanding the methods of the present application and their core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

Claims (8)

1. A risk identification method, comprising:
carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations;
clustering the purchase and sale commodity matrixes of all organizations to obtain a purchase and sale mode clustering result;
determining the organization deviating from a preset proportion as a risk organization from the purchase and sale mode clustering result;
the method comprises the steps of carrying out data statistics processing on acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all organizations, wherein the method comprises the following steps:
classifying the acquired commodity data of the organizations according to the commodity types to obtain commodity classification data corresponding to each organization;
carrying out industry deviation rectifying treatment on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization;
and carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain the purchase and sale commodity matrix of all organizations.
2. The risk identification method according to claim 1, wherein classifying the acquired commodity data of the plurality of organizations according to the commodity category to obtain commodity classification data corresponding to each organization, comprises:
and classifying the acquired commodity data of the organizations by adopting a natural language processing model to obtain commodity classification data corresponding to each organization.
3. The risk identification method according to claim 1, wherein performing an industry deviation rectifying process on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization includes:
and carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the XGBoost algorithm to obtain the industry deviation rectifying commodity data corresponding to each organization.
4. The risk identification method according to claim 1, wherein clustering the purchase and sale commodity matrix of all organizations to obtain purchase and sale pattern clustering results comprises:
and clustering the purchase and sale commodity matrixes of all organizations by adopting mean shift clustering to obtain the purchase and sale mode clustering result.
5. The risk identification method of claim 1, wherein determining an organization deviating from a preset proportion from the purchase-sale pattern clustering result as a risk organization includes:
determining a normal purchase and sale mode organization from the purchase and sale mode clustering result according to the preset proportion;
and taking the organization outside the normal purchase and sale mode organization in the plurality of organizations as the risk organization.
6. A risk identification device, comprising:
the commodity data statistics module is used for carrying out data statistics processing on the acquired commodity data of a plurality of organizations according to commodity types and industry standards to obtain purchase and sale commodity matrixes of all the organizations;
the commodity matrix clustering module is used for clustering the purchase and sale commodity matrixes of all organizations to obtain purchase and sale mode clustering results;
the risk organization determining module is used for determining the organization deviating from the preset proportion as a risk organization from the purchase and sale mode clustering result;
wherein, commodity data statistics module includes:
the commodity classification unit is used for classifying the acquired commodity data of the plurality of organizations according to the commodity types to obtain commodity classification data corresponding to each organization;
the industry deviation rectifying unit is used for carrying out industry deviation rectifying processing on the commodity classification data corresponding to each organization according to the industry standard to obtain industry deviation rectifying commodity data corresponding to each organization;
and the data statistics unit is used for carrying out data statistics processing on the industry deviation rectifying commodity data corresponding to each organization to obtain the purchase and sale commodity matrix of all organizations.
7. A server, comprising:
a memory for storing a computer program;
processor for implementing the steps of the risk identification method according to any of claims 1 to 5 when executing said computer program.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the risk identification method according to any of claims 1 to 5.
CN202011302019.3A 2020-11-19 2020-11-19 Risk identification method and related device Active CN112418652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011302019.3A CN112418652B (en) 2020-11-19 2020-11-19 Risk identification method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011302019.3A CN112418652B (en) 2020-11-19 2020-11-19 Risk identification method and related device

Publications (2)

Publication Number Publication Date
CN112418652A CN112418652A (en) 2021-02-26
CN112418652B true CN112418652B (en) 2024-01-30

Family

ID=74774146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011302019.3A Active CN112418652B (en) 2020-11-19 2020-11-19 Risk identification method and related device

Country Status (1)

Country Link
CN (1) CN112418652B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297319A (en) * 2021-12-23 2022-04-08 税友信息技术有限公司 Data identification method and related device

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003233759A (en) * 2003-01-23 2003-08-22 Swychco Infrastructure Services Pty Ltd Method and device regarding conclusion and processing of crisis management contract
WO2003093960A2 (en) * 2002-04-29 2003-11-13 Schlumberger Omnes, Inc. Security maturity assessment method
JP2004005702A (en) * 1999-12-16 2004-01-08 Tokio Marine & Fire Insurance Co Ltd System and method for analyzing risk, system and method for designing insurance, insurance agreement generating method, risk analyzing program operated on computer and recording medium recording insurance design program or insurance agreement generating program
CN104182835A (en) * 2014-08-22 2014-12-03 国家电网公司 Three-dimensional material goods classification model based on entire life-cycle management and type determination method
CN106228399A (en) * 2016-07-20 2016-12-14 福建工程学院 A kind of stock trader's customer risk preference categories method based on big data
CN107247971A (en) * 2017-06-28 2017-10-13 中国人民解放军总医院 The intelligent analysis method and system of a kind of ultrasonic thyroid nodule risk indicator
CN109345339A (en) * 2018-09-17 2019-02-15 贺绍鹏 " net electricity "-power industry vertical industry chain integration transaction service system and method
CN109635007A (en) * 2018-12-18 2019-04-16 税友软件集团股份有限公司 A kind of behavior evaluation method, apparatus and relevant device
CN110020796A (en) * 2019-03-26 2019-07-16 苏州纤联电子商务有限公司 A kind of textile industry Enterprise Credit Risk Evaluation method and system
CN110287313A (en) * 2019-05-20 2019-09-27 阿里巴巴集团控股有限公司 A kind of the determination method and server of risk subject
CN110390077A (en) * 2018-11-30 2019-10-29 上海德拓信息技术股份有限公司 A method of commodity price abnormal risk for identification
CA3007786A1 (en) * 2018-06-11 2019-12-11 The Governing Council Of The University Of Toronto Data visualization platform for event-based behavior clustering
CN110597995A (en) * 2019-09-20 2019-12-20 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110991936A (en) * 2019-12-23 2020-04-10 业如商业保理(重庆)有限公司 Enterprise grading and rating method, device, equipment and medium
CN111192128A (en) * 2019-12-30 2020-05-22 航天信息股份有限公司 Method for identifying abnormal tax payment behaviors
CN111325419A (en) * 2018-12-13 2020-06-23 北京沃东天骏信息技术有限公司 Method and device for identifying blacklist user
CN111369175A (en) * 2020-03-25 2020-07-03 南京德实信息科技有限公司 Enterprise product chain analysis, correction and feedback system based on big data
CN111914090A (en) * 2020-08-18 2020-11-10 生态环境部环境规划院 Method and device for enterprise industry classification identification and characteristic pollutant identification
CN113869802A (en) * 2021-12-01 2021-12-31 神州数码信息系统有限公司 Production enterprise invoice false invoice risk assessment method based on sales entry comparison

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11727420B2 (en) * 2019-03-15 2023-08-15 Target Brands, Inc. Time series clustering analysis for forecasting demand

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004005702A (en) * 1999-12-16 2004-01-08 Tokio Marine & Fire Insurance Co Ltd System and method for analyzing risk, system and method for designing insurance, insurance agreement generating method, risk analyzing program operated on computer and recording medium recording insurance design program or insurance agreement generating program
WO2003093960A2 (en) * 2002-04-29 2003-11-13 Schlumberger Omnes, Inc. Security maturity assessment method
JP2003233759A (en) * 2003-01-23 2003-08-22 Swychco Infrastructure Services Pty Ltd Method and device regarding conclusion and processing of crisis management contract
CN104182835A (en) * 2014-08-22 2014-12-03 国家电网公司 Three-dimensional material goods classification model based on entire life-cycle management and type determination method
CN106228399A (en) * 2016-07-20 2016-12-14 福建工程学院 A kind of stock trader's customer risk preference categories method based on big data
CN107247971A (en) * 2017-06-28 2017-10-13 中国人民解放军总医院 The intelligent analysis method and system of a kind of ultrasonic thyroid nodule risk indicator
CA3007786A1 (en) * 2018-06-11 2019-12-11 The Governing Council Of The University Of Toronto Data visualization platform for event-based behavior clustering
CN109345339A (en) * 2018-09-17 2019-02-15 贺绍鹏 " net electricity "-power industry vertical industry chain integration transaction service system and method
CN110390077A (en) * 2018-11-30 2019-10-29 上海德拓信息技术股份有限公司 A method of commodity price abnormal risk for identification
CN111325419A (en) * 2018-12-13 2020-06-23 北京沃东天骏信息技术有限公司 Method and device for identifying blacklist user
CN109635007A (en) * 2018-12-18 2019-04-16 税友软件集团股份有限公司 A kind of behavior evaluation method, apparatus and relevant device
CN110020796A (en) * 2019-03-26 2019-07-16 苏州纤联电子商务有限公司 A kind of textile industry Enterprise Credit Risk Evaluation method and system
CN110287313A (en) * 2019-05-20 2019-09-27 阿里巴巴集团控股有限公司 A kind of the determination method and server of risk subject
CN110597995A (en) * 2019-09-20 2019-12-20 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110991936A (en) * 2019-12-23 2020-04-10 业如商业保理(重庆)有限公司 Enterprise grading and rating method, device, equipment and medium
CN111192128A (en) * 2019-12-30 2020-05-22 航天信息股份有限公司 Method for identifying abnormal tax payment behaviors
CN111369175A (en) * 2020-03-25 2020-07-03 南京德实信息科技有限公司 Enterprise product chain analysis, correction and feedback system based on big data
CN111914090A (en) * 2020-08-18 2020-11-10 生态环境部环境规划院 Method and device for enterprise industry classification identification and characteristic pollutant identification
CN113869802A (en) * 2021-12-01 2021-12-31 神州数码信息系统有限公司 Production enterprise invoice false invoice risk assessment method based on sales entry comparison

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于SMOTE和XGBoost的贷款风险预测方法;刘斌;陈凯;;计算机与现代化(第02期);第26-30页 *
基于数据分析的业务风险防控研究;邢巍;余锦河;曹肖悦;江帆;;现代商业(第09期);第15-17页 *
基于聚类算法的购物篮压缩研究;张文斌;明勇;褚维伟;黄哲学;;计算机技术与发展(01);全文 *
电子商务信用风险的预警指标体系构建;支慧;;科技展望(01);全文 *

Also Published As

Publication number Publication date
CN112418652A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
US11797503B2 (en) Systems and methods for enhanced mapping and classification of data
CN109711424B (en) Behavior rule acquisition method, device and equipment based on decision tree
CN113505936A (en) Project approval result prediction method, device, equipment and storage medium
CN113159421A (en) Method and device for predicting bid winning probability based on enterprise features
CN112418652B (en) Risk identification method and related device
CN112037006A (en) Credit risk identification method and device for small and micro enterprises
CN112330441A (en) Method for evaluating business value credit loan of medium and small enterprises
CN111695979A (en) Method, device and equipment for analyzing relation between raw material and finished product
CN111582315A (en) Sample data processing method and device and electronic equipment
CN114139725A (en) Service object prediction method, device and storage medium
KR102406375B1 (en) An electronic device including evaluation operation of originated technology
CN112434862A (en) Financial predicament method and device for enterprise on market
CN113987240B (en) Customs inspection sample tracing method and system based on knowledge graph
CN115439079A (en) Item classification method and device
CN113261975B (en) Deep learning-based electrocardiogram classification method
CN115063207A (en) Building material purchasing intelligent price comparison analysis and processing system based on cloud computing
CN114219310A (en) Order auditing method, system, electronic equipment and storage medium
CN114663102A (en) Method, equipment and storage medium for predicting debt subject default based on semi-supervised model
CN113792961A (en) Economic responsibility auditing decision method and equipment based on big audit data
CN113379211A (en) Block chain-based logistics information platform default risk management and control system and method
CN113159727A (en) Commodity detection method and device, electronic equipment and storage medium
CN113240353B (en) Cross-border e-commerce oriented export factory classification method and device
CN115187387B (en) Identification method and equipment for risk merchant
US20230334496A1 (en) Automated transaction clustering based on rich, non-human filterable risk elements
Zarmehri et al. Improving data mining results by taking advantage of the data warehouse dimensions: a case study in outlier detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant