CN116385163A - Data unit abnormality identification method, device, equipment, storage medium and product - Google Patents

Data unit abnormality identification method, device, equipment, storage medium and product Download PDF

Info

Publication number
CN116385163A
CN116385163A CN202310357992.2A CN202310357992A CN116385163A CN 116385163 A CN116385163 A CN 116385163A CN 202310357992 A CN202310357992 A CN 202310357992A CN 116385163 A CN116385163 A CN 116385163A
Authority
CN
China
Prior art keywords
data
unit
external
data unit
fluctuation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310357992.2A
Other languages
Chinese (zh)
Inventor
张胜坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310357992.2A priority Critical patent/CN116385163A/en
Publication of CN116385163A publication Critical patent/CN116385163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Abstract

The application relates to the technical field of big data, in particular to the field of financial science and technology or other related fields, and provides a data unit abnormality identification method, a device, equipment, a storage medium and a product, comprising the following steps: collecting external parameters of a data unit through a web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the fluctuation information is used for representing the fluctuation condition of external expression of the data unit; determining at least one abnormal unit from the unit set according to the fluctuation information; and generating change data according to the attribute information of the abnormal unit through the attribute analysis model, wherein the change data is used for representing the change degree of the attribute information of the abnormal unit in the target period. The method and the device have the advantages that the technical effect of timely and quickly mastering the abnormal reasons is achieved, the input of human resources is reduced, the human cost is saved, meanwhile, the accuracy of identification and analysis of abnormal units is improved, and the abnormal processing efficiency of the data units is improved.

Description

Data unit abnormality identification method, device, equipment, storage medium and product
Technical Field
The present disclosure relates to the field of big data technologies, and to the field of financial science and technology, or other related fields, and in particular, to a method, apparatus, device, storage medium, and product for identifying anomalies in a data unit.
Background
Financial institutions need to observe fluctuations in the appearance of each data unit for business or regulatory reasons, for example: daily profit and loss changes of various bonds or stocks are observed.
The number of bonds or stocks held by a large financial institution is large, the types of bonds or stock products are large, the amount of bonds or stock exchanges is large, the calculation logic is complex, the damage benefit of some bonds or stocks frequently fluctuates greatly, even up to hundreds of bonds or stock damage benefit abnormally fluctuates, the current financial institution usually uses manpower to identify the damage benefit and determine abnormal data units (abnormal bonds or stocks), and the abnormal data unit identification efficiency and accuracy are low, and the situation of high labor cost investment occurs.
Disclosure of Invention
The application provides a data unit abnormality identification method, device, equipment, storage medium and product, which are used for solving the problems that the current financial institution damages due to manual identification and determines abnormal data units, so that the abnormal data unit identification efficiency and accuracy are low and the labor cost is high.
In a first aspect, the present application provides a method for identifying anomalies in a data unit, including:
collecting external parameters of a data unit through a preset web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing external manifestations of the data unit, and the fluctuation information is used for representing fluctuation conditions of the external manifestations of the data unit;
determining at least one abnormal unit from a preset unit set according to the fluctuation information, wherein the unit set is provided with at least one data unit, and the abnormal unit is one data unit in the unit set;
and generating change data according to the attribute information of the abnormal unit through a preset attribute analysis model, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
In the above scheme, the collecting, by a preset web crawler and/or a database, an external parameter of a data unit includes:
configuring external configuration information of one data unit in the web crawler, and calling the web crawler to acquire external parameters corresponding to the external configuration information from a preset data system; and/or
Setting a database connected with the data system, calling the database to acquire the data system to acquire external data, wherein the database is provided with external configuration information of the data unit, and is configured to receive the external data sent by the data system and correspond the external data to the data unit.
In the above solution, the calculating the fluctuation information of the data unit according to the external parameter includes:
identifying a fluctuation day of the data unit and a stable day, wherein the fluctuation day refers to a current date, and the stable day is a day before the current date or a first transaction day of the data unit;
extracting a first external feature of the data unit on the fluctuation day and a second external feature of the data unit on the stable day, wherein the first external feature is used for reflecting the external appearance condition of the data unit on the fluctuation day, and the second external feature is used for reflecting the external appearance condition of the data unit on the stable day;
and obtaining fluctuation information of the data unit according to the first external characteristic and the second external characteristic.
In the above aspect, the obtaining the fluctuation information of the data unit according to the first external feature and the second external feature includes:
Calculating a difference between the first external feature and the second external feature to obtain an external difference value, and/or calculating a ratio between the first external feature and the second external feature to obtain an external ratio, and/or calculating a change percentage between the first external feature and the second external feature to obtain an external percentage;
and summarizing the external difference value and/or the external ratio and/or the external percentage to obtain the fluctuation information.
In the above scheme, the determining at least one abnormal unit from the preset unit set according to the fluctuation information includes:
extracting a fluctuation threshold of a target data unit, and determining the target data unit as an abnormal unit if the fluctuation information of the target data unit exceeds the fluctuation threshold; wherein the target data unit is one data unit in the set of units; and/or
Invoking a preset support vector machine to carry out classification operation on fluctuation information of all data units in the unit set to obtain a normal category and an abnormal category; if the fluctuation information of the target data unit is confirmed to belong to the abnormal category, determining that the target data unit is an abnormal unit; wherein the target data unit is one data unit in the set of units; and/or
Invoking a preset neural network model to operate the fluctuation information of the target data unit to obtain fluctuation category information, and determining the target data unit as an abnormal unit if the fluctuation category information is confirmed to be an abnormal category; wherein the target data unit is one data unit in the set of units.
In the above solution, the generating, by a preset attribute analysis model, change data according to attribute information of the abnormal unit includes:
identifying a fluctuation day of the abnormal unit and a stable day, wherein the fluctuation day refers to a current date, and the stable day is a day before the current date or a first transaction day of the data unit;
extracting a first internal feature of the abnormal unit on the fluctuation day and a second internal feature of the data unit on the stable day through the attribute analysis model, wherein the first internal feature is used for reflecting the internal attribute feature of the data unit on the fluctuation day under one attribute type, and the second internal feature is used for reflecting the internal attribute feature of the data unit on the stable day under the attribute type;
obtaining variation sub-data of the abnormal unit under the attribute type according to the first internal feature and the second internal feature through the attribute analysis model, wherein the variation sub-data characterizes the variation degree of the abnormal unit in a target period from a stable day to a fluctuation day;
And summarizing the change sub-data corresponding to at least one attribute type to obtain the change data.
In the above solution, the obtaining, by the attribute analysis model according to the first internal feature and the second internal feature, change sub-data of the abnormal unit under the attribute type includes:
calculating a difference between the first internal feature and the second internal feature to obtain an external difference value, and/or calculating a ratio between the first internal feature and the second internal feature to obtain an external ratio, and/or calculating a change percentage between the first internal feature and the second internal feature to obtain an external percentage;
and summarizing the external difference value and/or the external ratio and/or the external percentage to obtain the change sub-data under the attribute type.
In the above solution, after generating the change data according to the attribute information of the abnormal unit, the method further includes:
setting the change sub-data with the change degree exceeding a preset change threshold value in the change data as concerned sub-data, and recording the concerned sub-data and/or the change sub-data in the change data into a preset analysis template to obtain an analysis report.
In a second aspect, the present application provides a data unit anomaly identification device, including:
the input module is used for collecting external parameters of a data unit through a preset web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing external performance of the data unit, and the fluctuation information is used for representing fluctuation conditions of the external performance of the data unit.
And the processing module is used for determining at least one abnormal unit from a preset unit set according to the fluctuation information, wherein the unit set is provided with at least one data unit, and the abnormal unit is one data unit in the unit set.
The operation module is used for generating change data according to the attribute information of the abnormal unit through a preset attribute analysis model, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
In a third aspect, the present application provides a computer device comprising: a processor and a memory communicatively coupled to the processor;
The memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the data unit anomaly identification method of the claims.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the above-described data unit anomaly identification method.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the data unit anomaly identification method described above.
According to the data unit anomaly identification method, device, equipment, storage medium and product, the external parameters of one data unit are collected through the web crawlers and/or the database, and the fluctuation information of the data unit is calculated according to the external parameters, so that the maximum range of obtaining of the external parameters is realized, and the comprehensiveness of the external parameters is ensured.
The technical scheme for automatically identifying the data units is realized by determining at least one abnormal unit from a preset unit set according to the fluctuation information so as to possibly generate abnormal data units in a large number of data units.
According to the method, the change data are generated according to the attribute information of the abnormal units through a preset attribute analysis model, and then the change degree of each abnormal unit in the target period is obtained through the change data, wherein the change degree is used for representing the reason of the fluctuation abnormality of the abnormal unit, so that the technical effect of timely and quickly mastering the abnormality reason is realized, the manpower resource investment is reduced, the manpower cost is saved, the identification and analysis accuracy of the abnormal units is improved, and the abnormal processing efficiency of the data units is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a flowchart of embodiment 1 of a method for identifying data unit anomalies according to an embodiment of the present application;
fig. 3 is a flowchart of embodiment 2 of a method for identifying data unit anomalies according to an embodiment of the present application;
fig. 4 is a block diagram of embodiment 3 of a data unit anomaly identification device according to an embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of a computer device in the computer device according to the present invention.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
Referring to fig. 1, the specific application scenario in the present application is:
the server 11 running the data unit anomaly identification method is connected with the crawler component 12 provided with the web crawler, the database 13 and the attribute analysis model 14; the server 11 collects external parameters of a data unit through a crawler component 12 and/or a database 13 of a preset web crawler, and calculates fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing external manifestations of the data unit, and the fluctuation information is used for representing fluctuation conditions of the external manifestations of the data unit; the server 11 determines at least one abnormal unit from a preset unit set according to the fluctuation information, wherein the unit set has at least one data unit, and the abnormal unit is one data unit in the unit set; the server 11 generates change data according to the attribute information of the abnormal unit through a preset attribute analysis model 14, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
It should be noted that the method, the device, the equipment, the storage medium and the product for identifying the abnormality of the data unit can be used in the financial field. But also can be used in any fields other than the financial field. The method, the device, the equipment, the storage medium and the application field of the product are not limited.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the prior art problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Example 1:
referring to fig. 2, the present application provides a method for identifying anomalies in data units, including:
s201: and acquiring external parameters of one data unit through a preset web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing the external appearance of the data unit, and the fluctuation information is used for representing the fluctuation condition of the external appearance of the data unit.
In the step, the external parameters of one data unit are collected through the web crawler and/or the database, and the fluctuation information of the data unit is calculated according to the external parameters, so that the external parameters are acquired in the maximum range, and the comprehensiveness of the external parameters is ensured.
In a preferred embodiment, the collection of external parameters of a data unit by a preset web crawler and/or database comprises:
configuring external configuration information of one data unit in a web crawler, and calling the web crawler to acquire external parameters corresponding to the external configuration information from a preset data system; and/or
Setting a database connected with the data system, calling the database to acquire the data system to acquire external data, wherein the database is provided with external configuration information of a data unit, and is configured to receive the external data sent by the data system and correspond the external data to the data unit.
A web crawler is, for example, a program or script that automatically crawls web information according to certain rules.
NoSQL databases such as Redis, mongoDB and HBase are adopted as databases for collecting external parameters, the NoSQL databases refer to non-relational databases which are easy to expand, and the NoSQL databases are various in variety, but have the common characteristic of removing relational characteristics of the relational databases. The data has no relation, so the expansion is very easy.
The external configuration information is metadata corresponding to the external parameters, the metadata describing attributes and properties of the external parameters.
In a preferred embodiment, calculating fluctuation information of the data unit based on the external parameters comprises:
identifying the fluctuation day and the stable day of the data unit, wherein the fluctuation day refers to the current date, and the stable day is the previous day of the current date or the first transaction day of the data unit;
extracting a first external characteristic of the data unit on the fluctuation day and a second external characteristic of the data unit on the stable day, wherein the first external characteristic is used for reflecting the external performance condition of the data unit on the fluctuation day, and the second external characteristic is used for reflecting the external performance condition of the data unit on the stable day;
and obtaining fluctuation information of the data unit according to the first external characteristic and the second external characteristic.
By setting the stationary day to be the previous day of the current day, the technical effect of dynamically monitoring the fluctuation situation of the data unit is realized, so that the fluctuation situation of the external appearance of the current day compared with the previous day can be reflected.
By setting the stable date as the initial date of the obtained data unit so as to obtain the fluctuation condition capable of reflecting the external appearance from the date of obtaining the data unit to the current date, the technical effect of continuously monitoring the fluctuation condition of the data unit is realized.
In one exemplary embodiment, a fluctuation day of the bond is identified, the fluctuation day being a current date, and a balance day being a previous day of the current date or a first transaction day on which the bond was obtained; extracting a first price of the bond on a fluctuation day and a second price of the bond on a stationary day; and obtaining fluctuation information of the bond according to the first price and the second price.
Further, obtaining fluctuation information of the data unit according to the first external feature and the second external feature includes:
calculating the difference between the first external feature and the second external feature to obtain an external difference value, and/or calculating the ratio between the first external feature and the second external feature to obtain an external ratio, and/or calculating the change percentage between the first external feature and the second external feature to obtain an external percentage;
and summarizing the external difference value and/or the external ratio and/or the external percentage to obtain fluctuation information.
In this example, by obtaining an external difference value between the first external parameter and the second external parameter, a technical effect that an absolute value of a difference between the first external parameter and the second external parameter can be reflected is obtained;
by generating the external ratio, the technical effect of reflecting the change in the ratio between the first external parameter and the second external parameter is obtained;
By generating the external percentage, the technical effect that the change proportion of the first external parameter compared with the second external parameter can be reflected is obtained;
therefore, the expression mode for describing the change of the first external parameter compared with the second external parameter is enlarged through the external difference value, the external proportion and the external percentage, and the application range is further enlarged.
Wherein the external percentage is the absolute value of the percentage increase or the absolute value of the percentage decrease.
S202: at least one abnormal unit is determined from a preset unit set according to the fluctuation information, wherein the unit set is provided with at least one data unit, and the abnormal unit is one data unit in the unit set.
In the step, at least one abnormal unit is determined from a preset unit set according to fluctuation information, so that abnormal data units possibly appear in a large number of data units, and the technical scheme for automatically identifying the data units is realized.
In a preferred embodiment, determining at least one abnormal cell from the preset set of cells based on the fluctuation information comprises:
extracting a fluctuation threshold of the target data unit, and determining the target data unit as an abnormal unit if the fluctuation information of the target data unit exceeds the fluctuation threshold; wherein the target data unit is one data unit in the set of units; and/or
Invoking a preset support vector machine to carry out classification operation on fluctuation information of all data units in the unit set to obtain a normal category and an abnormal category; if the fluctuation information of the target data unit is confirmed to belong to the abnormal category, determining that the target data unit is an abnormal unit; wherein the target data unit is one data unit in the set of units; and/or
Invoking a preset neural network model to operate the fluctuation information of the target data unit to obtain fluctuation category information, and determining the target data unit as an abnormal unit if the fluctuation category information is confirmed to be an abnormal category; wherein the target data unit is one data unit of a set of units.
Illustratively, a fluctuation threshold of the target data unit is extracted, wherein the fluctuation threshold comprises: a fluctuation difference threshold and/or a fluctuation proportion threshold and/or a fluctuation percentage threshold.
If the external difference value in the fluctuation information exceeds the fluctuation difference value threshold value, and/or the external proportion in the fluctuation information exceeds the fluctuation proportion threshold value, and/or the external percentage in the fluctuation information exceeds the fluctuation percentage threshold value, confirming that the fluctuation information of the target data unit exceeds the fluctuation threshold value, and further evaluating the fluctuation degree of the fluctuation information from various angles so as to accurately identify the target data unit with fluctuation.
The support vector machine (support vector machines, SVM) is a two-class model whose basic model is a linear classifier defined at maximum separation in feature space, the maximum separation distinguishing it from the perceptron; the SVM also includes a kernel technique, which makes it a substantially nonlinear classifier. The learning strategy of the SVM is interval maximization, and can be formed into a problem of solving convex quadratic programming, and the problem is also equivalent to the minimization of regularized hinge loss function. The learning algorithm of the SVM is an optimization algorithm for solving convex quadratic programming. In this embodiment, the fluctuation information of the data units in the support vector machine calculation unit set is called, the data units corresponding to the fluctuation information with the fluctuation intensity of the first N% are set to be in an abnormal category, and the other data units in the unit set are set to be in a normal category. N is a number greater than zero.
A pre-trained RNN neural network model or a fully connected neural network model is employed as the neural network model. And training the RNN neural network model or the full-connection network through at least one training sample marked with the normal category label and the abnormal category label to obtain the neural network model.
RNN (Recurrent Neural Network) loop neural network is used for solving the problem that training sample input is a continuous sequence and the sequences are different in length, such as a problem based on time sequence. Fully connected neural networks (DNNs) are the most naive neural networks, with the most network parameters and the most computationally intensive. The DNN structure is not fixed, and a general neural network includes an input layer, a hidden layer and an output layer, where a DNN structure has only one input layer and one output layer, and the hidden layer is between the input layer and the output layer. Each layer of neural network has several neurons (blue circles in the lower diagram), the neurons are connected with each other between layers, the neurons in the layers are not connected with each other, and the neurons in the next layer are connected with all the neurons in the upper layer.
S203: and generating change data according to the attribute information of the abnormal unit through a preset attribute analysis model, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
In the step, change data are generated according to the attribute information of the abnormal units through a preset attribute analysis model, and then the change degree of each abnormal unit in the target period is obtained through the change data, wherein the change degree is used for representing the reason of the fluctuation abnormality of the abnormal unit, so that the technical effect of timely and quickly mastering the abnormality reason is realized, the manpower resource investment is reduced, the manpower cost is saved, the identification and analysis accuracy of the abnormal units is improved, and the abnormal processing efficiency of the data units is improved.
In this embodiment, the attribute analysis model is configured to obtain attribute information in the abnormal unit according to preset attribute metadata, and classify and statistically analyze the attribute information, for example, to check a variation trend of the number of users during registration or to check user distribution in each province. The attribute metadata includes: interest rate, repayment mode, whether to name, whether to redeem, term, credit rating, guarantor, etc. The attribute metadata may further include: name, age, family, marital status, gender and highest school of data units; the attribute metadata may also include attributes related to the product: the province in which the data unit is located, the data unit level, the source of the first access channel of the data unit, and the like.
In a preferred embodiment, generating the change data from the attribute information of the abnormal unit by a preset attribute analysis model includes:
identifying the fluctuation day and the stable day of the abnormal unit, wherein the fluctuation day refers to the current date, and the stable day is the previous day of the current date or the first transaction day of the data unit;
extracting a first internal feature of the abnormal unit on the fluctuation day and a second internal feature of the data unit on the stable day through an attribute analysis model, wherein the first internal feature is used for reflecting an internal attribute feature of the data unit on the fluctuation day under one attribute type, and the second internal feature is used for reflecting an internal attribute feature of the data unit on the stable day under the attribute type;
Obtaining change sub-data of the abnormal unit under the attribute type according to the first internal feature and the second internal feature through the attribute analysis model, wherein the change sub-data represents the change degree of the abnormal unit in a target period from a stable day to a fluctuation day;
and summarizing the change sub-data corresponding to at least one attribute type to obtain change data.
By setting the stationary day to be the previous day of the current day, the technical effect of dynamically monitoring the fluctuation condition of the data unit is realized, so that the fluctuation condition of the internal attribute characteristics of the current day compared with the previous day can be reflected.
By setting the stable date as the initial date of the obtained data unit so as to obtain the fluctuation condition capable of reflecting the internal attribute characteristics from the date of the obtained data unit to the current date, the technical effect of continuously monitoring the fluctuation condition of the data unit is realized.
Further, obtaining, by the attribute analysis model, change sub-data of the abnormal unit under the attribute type according to the first internal feature and the second internal feature, including:
calculating the difference between the first internal feature and the second internal feature to obtain an external difference value, and/or calculating the ratio between the first internal feature and the second internal feature to obtain an external ratio, and/or calculating the change percentage between the first internal feature and the second internal feature to obtain an external percentage;
And summarizing the external difference value and/or the external ratio and/or the external percentage to obtain the change sub-data under the attribute type.
In the example, by obtaining the attribute difference value between the first attribute information and the second attribute information, the technical effect that the absolute value of the difference between the first attribute information and the second attribute information can be reflected is obtained; the technical effect capable of reflecting the change of the first attribute information and the second attribute information in proportion is obtained by generating the attribute ratio, and the technical effect capable of reflecting the change proportion of the first attribute information compared with the second attribute information is obtained by generating the attribute percentage, so that the expression mode for describing the change of the first attribute information compared with the second attribute information is expanded by the attribute difference value, the attribute ratio and the attribute percentage, and the application range is further expanded. Wherein the attribute percentage is the absolute value of the percentage increase or the absolute value of the percentage decrease.
Example 2:
referring to fig. 3, the present application provides a method for identifying anomalies in data units, including:
s301: and acquiring external parameters of one data unit through a preset web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing the external appearance of the data unit, and the fluctuation information is used for representing the fluctuation condition of the external appearance of the data unit.
This step corresponds to S201 in embodiment 1, and thus will not be described here.
S302: at least one abnormal unit is determined from a preset unit set according to the fluctuation information, wherein the unit set is provided with at least one data unit, and the abnormal unit is one data unit in the unit set.
This step corresponds to S202 in embodiment 1, and thus will not be described here.
S303: and generating change data according to the attribute information of the abnormal unit through a preset attribute analysis model, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
This step corresponds to S203 in embodiment 1, and thus will not be described here.
S304: setting the change sub-data with the change degree exceeding the preset change rule in the change data as concerned sub-data, and recording the concerned sub-data and/or the change sub-data in the change data into a preset analysis template to obtain an analysis report.
In the step, the change sub-data with the change degree exceeding the preset change threshold value in the change data is set as the concerned sub-data, so that a user can obtain the concerned change sub-data more quickly and indirectly, and the information interaction efficiency is improved. And the analysis report is obtained by inputting the concerned sub-data and/or the changed sub-data in the changed data into a preset analysis template, so that a user can check the changed data conveniently, and the information transfer efficiency is improved.
Illustratively, extracting one of the change sub-data of the change data of the abnormal unit, and a change threshold value corresponding to the change sub-data, wherein the change sub-data is one of the change data; and if the change sub-data is confirmed to exceed the change threshold value, determining the change sub-data as concerned sub-data.
The variation sub-data comprises variation differences and/or variation proportions and/or variation percentages.
The change threshold includes: a variation difference threshold and/or a variation ratio threshold and/or a variation percentage threshold.
If the change difference value in the change data exceeds the change difference value threshold, and/or the change proportion in the change data exceeds the change proportion threshold, and/or the change percentage in the change data exceeds the change percentage threshold, confirming that the change data of the abnormal unit exceeds the change threshold, and further evaluating the change degree of the change data from various angles so as to accurately identify the technical effect of focusing on the sub-data with larger change.
Example 3:
referring to fig. 4, the present application provides a data unit anomaly identification device 4, including:
the input module 41 is configured to collect external parameters of a data unit through a preset web crawler and/or a database, and calculate fluctuation information of the data unit according to the external parameters, where the external parameters are used to represent external appearance of the data unit, and the fluctuation information is used to represent fluctuation of the external appearance of the data unit.
The processing module 42 is configured to determine at least one abnormal unit from a preset unit set according to the fluctuation information, where the unit set has at least one data unit, and the abnormal unit is one data unit in the unit set.
The operation module 43 is configured to generate, according to the attribute information of the abnormal unit, by using a preset attribute analysis model, the change data, where the attribute information is used to characterize an internal attribute feature of the data unit, and the change data is used to characterize a degree of change of the attribute information of the abnormal unit during a preset target period.
Optionally, the data unit anomaly identification device 4 further includes:
and the output module 44 is configured to set the change sub-data with the change degree exceeding the preset change rule in the change data as the concerned sub-data, and record the concerned sub-data and/or the change sub-data in the change data into a preset analysis template to obtain an analysis report.
Example 4:
to achieve the above object, the present application further provides a computer device 5, including: a processor and a memory communicatively coupled to the processor; the memory stores computer-executable instructions;
the processor executes computer execution instructions stored in the memory to implement the data unit anomaly identification method, where the components of the data unit anomaly identification device may be dispersed in different computer devices, and the computer device 5 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server, or a server cluster formed by multiple application servers) that execute a program, and so on. The computer device of the present embodiment includes at least, but is not limited to: a memory 51, a processor 52, which may be communicatively coupled to each other via a system bus, as shown in fig. 5. It should be noted that fig. 5 only shows a computer device with components-but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. In the present embodiment, the memory 51 (i.e., readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 51 may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device. In other embodiments, the memory 51 may also be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Of course, the memory 51 may also include both internal storage units of the computer device and external storage devices. In this embodiment, the memory 51 is generally used to store an operating system installed in a computer device and various types of application software, such as program codes of the data unit abnormality recognition device of the third embodiment. Further, the memory 51 may also be used to temporarily store various types of data that have been output or are to be output. Processor 52 may be a Central processing unit (Central ProcessingUnit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 52 is typically used to control the overall operation of the computer device. In this embodiment, the processor 52 is configured to execute the program code stored in the memory 51 or process data, for example, execute the data unit abnormality recognition device, to implement the data unit abnormality recognition method of the above embodiment.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some steps of the methods of the various embodiments of the present application. It should be appreciated that the processor may be a central processing unit (Central ProcessingUnit, CPU for short), other general purpose processors, digital signal processor (Digital Signal Processor, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution. The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
To achieve the above object, the present application further provides a computer readable storage medium such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which computer-executable instructions are stored, which when executed by the processor 52, perform the corresponding functions. The computer-readable storage medium of the present embodiment is for storing computer-executable instructions that implement the data unit abnormality recognition method, which when executed by the processor 52 implement the data unit abnormality recognition method of the above-described embodiment.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
The application provides a computer program product, comprising a computer program, which realizes the data unit abnormality identification method when being executed by a processor.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (12)

1. A method for identifying anomalies in a data unit, comprising:
collecting external parameters of a data unit through a preset web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing external manifestations of the data unit, and the fluctuation information is used for representing fluctuation conditions of the external manifestations of the data unit;
Determining at least one abnormal unit from a preset unit set according to the fluctuation information, wherein the unit set is provided with at least one data unit, and the abnormal unit is one data unit in the unit set;
and generating change data according to the attribute information of the abnormal unit through a preset attribute analysis model, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
2. The method for identifying abnormal data according to claim 1, wherein the collecting the external parameters of a data unit through a preset web crawler and/or database comprises:
configuring external configuration information of one data unit in the web crawler, and calling the web crawler to acquire external parameters corresponding to the external configuration information from a preset data system; and/or
Setting a database connected with the data system, calling the database to acquire the data system to acquire external data, wherein the database is provided with external configuration information of the data unit, and is configured to receive the external data sent by the data system and correspond the external data to the data unit.
3. The data anomaly identification method of claim 1, wherein the calculating fluctuation information of the data unit from the external parameter comprises:
identifying a fluctuation day of the data unit and a stable day, wherein the fluctuation day refers to a current date, and the stable day is a day before the current date or a first transaction day of the data unit;
extracting a first external feature of the data unit on the fluctuation day and a second external feature of the data unit on the stable day, wherein the first external feature is used for reflecting the external appearance condition of the data unit on the fluctuation day, and the second external feature is used for reflecting the external appearance condition of the data unit on the stable day;
and obtaining fluctuation information of the data unit according to the first external characteristic and the second external characteristic.
4. A data anomaly identification method according to claim 3, wherein the deriving fluctuation information of the data unit from the first external feature and the second external feature comprises:
calculating a difference between the first external feature and the second external feature to obtain an external difference value, and/or calculating a ratio between the first external feature and the second external feature to obtain an external ratio, and/or calculating a change percentage between the first external feature and the second external feature to obtain an external percentage;
And summarizing the external difference value and/or the external ratio and/or the external percentage to obtain the fluctuation information.
5. The method of claim 1, wherein the determining at least one abnormal unit from a preset unit set according to the fluctuation information comprises:
extracting a fluctuation threshold of a target data unit, and determining the target data unit as an abnormal unit if the fluctuation information of the target data unit exceeds the fluctuation threshold; wherein the target data unit is one data unit in the set of units; and/or
Invoking a preset support vector machine to carry out classification operation on fluctuation information of all data units in the unit set to obtain a normal category and an abnormal category; if the fluctuation information of the target data unit is confirmed to belong to the abnormal category, determining that the target data unit is an abnormal unit; wherein the target data unit is one data unit in the set of units; and/or
Invoking a preset neural network model to operate the fluctuation information of the target data unit to obtain fluctuation category information, and determining the target data unit as an abnormal unit if the fluctuation category information is confirmed to be an abnormal category; wherein the target data unit is one data unit in the set of units.
6. The data anomaly identification method according to claim 1, wherein the generating of the change data from the attribute information of the anomaly unit by a preset attribute analysis model includes:
identifying a fluctuation day of the abnormal unit and a stable day, wherein the fluctuation day refers to a current date, and the stable day is a day before the current date or a first transaction day of the data unit;
extracting a first internal feature of the abnormal unit on the fluctuation day and a second internal feature of the data unit on the stable day through the attribute analysis model, wherein the first internal feature is used for reflecting the internal attribute feature of the data unit on the fluctuation day under one attribute type, and the second internal feature is used for reflecting the internal attribute feature of the data unit on the stable day under the attribute type;
obtaining variation sub-data of the abnormal unit under the attribute type according to the first internal feature and the second internal feature through the attribute analysis model, wherein the variation sub-data characterizes the variation degree of the abnormal unit in a target period from a stable day to a fluctuation day;
and summarizing the change sub-data corresponding to at least one attribute type to obtain the change data.
7. The method for identifying data anomalies according to claim 6, wherein said obtaining, by the attribute analysis model, change sub-data of the anomaly unit under the attribute type according to the first internal feature and the second internal feature, includes:
calculating a difference between the first internal feature and the second internal feature to obtain an external difference value, and/or calculating a ratio between the first internal feature and the second internal feature to obtain an external ratio, and/or calculating a change percentage between the first internal feature and the second internal feature to obtain an external percentage;
and summarizing the external difference value and/or the external ratio and/or the external percentage to obtain the change sub-data under the attribute type.
8. The method according to any one of claims 1 to 7, characterized in that after the generating of the change data from the attribute information of the abnormal unit, the method further comprises:
setting the change sub-data with the change degree exceeding a preset change threshold value in the change data as concerned sub-data, and recording the concerned sub-data and/or the change sub-data in the change data into a preset analysis template to obtain an analysis report.
9. A data unit anomaly identification device, comprising:
the input module is used for collecting external parameters of a data unit through a preset web crawler and/or a database, and calculating fluctuation information of the data unit according to the external parameters, wherein the external parameters are used for representing external performance of the data unit, and the fluctuation information is used for representing fluctuation conditions of the external performance of the data unit;
a processing module, configured to determine at least one abnormal unit from a preset unit set according to the fluctuation information, where the unit set has at least one data unit, and the abnormal unit is one data unit in the unit set;
the operation module is used for generating change data according to the attribute information of the abnormal unit through a preset attribute analysis model, wherein the attribute information is used for representing the internal attribute characteristics of the data unit, and the change data is used for representing the change degree of the attribute information of the abnormal unit in a preset target period.
10. A computer device, comprising: a processor and a memory communicatively coupled to the processor;
The memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the data unit anomaly identification method of any one of claims 1 to 8.
11. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to implement the data unit anomaly identification method of any one of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the data unit anomaly identification method of any one of claims 1 to 8.
CN202310357992.2A 2023-04-04 2023-04-04 Data unit abnormality identification method, device, equipment, storage medium and product Pending CN116385163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310357992.2A CN116385163A (en) 2023-04-04 2023-04-04 Data unit abnormality identification method, device, equipment, storage medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310357992.2A CN116385163A (en) 2023-04-04 2023-04-04 Data unit abnormality identification method, device, equipment, storage medium and product

Publications (1)

Publication Number Publication Date
CN116385163A true CN116385163A (en) 2023-07-04

Family

ID=86978331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310357992.2A Pending CN116385163A (en) 2023-04-04 2023-04-04 Data unit abnormality identification method, device, equipment, storage medium and product

Country Status (1)

Country Link
CN (1) CN116385163A (en)

Similar Documents

Publication Publication Date Title
CN108876133B (en) Risk assessment processing method, device, server and medium based on business information
CN109636607B (en) Service data processing method and device based on model deployment and computer equipment
CN109829629B (en) Risk analysis report generation method, apparatus, computer device and storage medium
CN111242793B (en) Medical insurance data abnormality detection method and device
CN110704730A (en) Product data pushing method and system based on big data and computer equipment
CN111260189B (en) Risk control method, risk control device, computer system and readable storage medium
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN111985937A (en) Method, system, storage medium and computer equipment for evaluating value information of transaction traders
CN110766275A (en) Data verification method and device, computer equipment and storage medium
CN111222994A (en) Client risk assessment method, device, medium and electronic equipment
CN111738762A (en) Method, device, equipment and storage medium for determining recovery price of poor assets
CN111582932A (en) Inter-scene information pushing method and device, computer equipment and storage medium
CN107220246B (en) Business object analysis method and device
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN113159796A (en) Trade contract verification method and device
CN111046947B (en) Training system and method of classifier and recognition method of abnormal sample
CN112329862A (en) Decision tree-based anti-money laundering method and system
CN112565422A (en) Method, system and storage medium for identifying fault data of power internet of things
CN106022915A (en) Enterprise credit risk assessment method and apparatus
CN116385163A (en) Data unit abnormality identification method, device, equipment, storage medium and product
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
CN114429283A (en) Risk label processing method and device, wind control method and device and storage medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN112150260A (en) Method, system, equipment and medium for verifying authenticity of business information of manufacturing enterprise
CN111027296A (en) Report generation method and system based on knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination