CN112597209A - Data verification method, device and system and computer readable storage medium - Google Patents

Data verification method, device and system and computer readable storage medium Download PDF

Info

Publication number
CN112597209A
CN112597209A CN202011480066.7A CN202011480066A CN112597209A CN 112597209 A CN112597209 A CN 112597209A CN 202011480066 A CN202011480066 A CN 202011480066A CN 112597209 A CN112597209 A CN 112597209A
Authority
CN
China
Prior art keywords
data
degree
determining
abnormal
unsupervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011480066.7A
Other languages
Chinese (zh)
Inventor
朱晨鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011480066.7A priority Critical patent/CN112597209A/en
Publication of CN112597209A publication Critical patent/CN112597209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The application relates to the technical field of financial science and technology, and discloses a data verification method, a device, a system and a computer readable storage medium, wherein the method comprises the following steps: responding to a user instruction, and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm; determining service target data, and constructing a corresponding data verification model based on the abnormal degree data and the service target data; and verifying the effectiveness of the unsupervised anomaly identification algorithm based on the data verification model. According to the method and the device, the abnormality degree data are determined through the user instruction and the unsupervised abnormality recognition algorithm, so that the accuracy of the abnormality degree data is guaranteed. And then, a data verification model is constructed through the abnormal degree data and the service target data, and the effectiveness of the unsupervised abnormal recognition algorithm is verified through the data verification model, so that the relevance between the data verification model and the service target data is ensured.

Description

Data verification method, device and system and computer readable storage medium
Technical Field
The present application relates to the field of financial technology (Fintech) data processing technologies, and in particular, to a method, an apparatus, a system, and a computer-readable storage medium for data verification.
Background
With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the data verification technology due to the requirements of security and real-time performance of the financial industry.
The existing data verification method for the risk management of the small and micro enterprises mainly comprises expert experience verification and unsupervised algorithm verification, wherein the expert experience verification mainly carries out judgment according to subjective experience instead of objective calculation according to statistical analysis or a model algorithm, for example, feature importance is judged according to related experience, and variable weighting is carried out according to the related experience. Unsupervised algorithmic verification is typically unsupervised anomaly detection, which is the finding of variability between samples. However, expert experience verification requires a large accumulation of industry experience, data is not persuasive, and unsupervised algorithm verification does not necessarily have a direct link between models learned from data and true business goals.
The above is only for the purpose of assisting understanding of the technical solutions of the present application, and does not represent an admission that the above is prior art.
Disclosure of Invention
The application mainly aims to provide a data verification method, a data verification device, a data verification system and a computer-readable storage medium, and aims to ensure the relevance between a data model and business target data.
In order to achieve the above object, the present application provides a data verification method, which includes the steps of:
responding to a user instruction, and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm;
determining business target data, and constructing a corresponding data verification model based on the abnormal degree data and the business target data;
verifying the validity of the unsupervised anomaly identification algorithm based on the data verification model.
Optionally, the step of verifying the validity of the unsupervised anomaly identification algorithm based on the data verification model comprises:
and determining the degree of correlation between the abnormal degree data and the business target data in the data verification model, and verifying the effectiveness of the unsupervised abnormal recognition algorithm based on the degree of correlation.
Optionally, the step of verifying the effectiveness of the unsupervised anomaly identification algorithm based on the degree of relevance comprises:
determining whether the degree of correlation is greater than or equal to a preset degree of correlation;
if the relevance degree is larger than or equal to the preset relevance degree, determining that the unsupervised abnormal recognition algorithm is effective;
and if the relevance degree is smaller than the preset relevance degree, determining that the unsupervised abnormal recognition algorithm is invalid.
Optionally, the step of determining business target data, and constructing a corresponding data verification model based on the abnormality degree data and the business target data includes:
determining the abnormal degree data as independent variable data, and detecting whether a service data label exists;
if the business data label is detected to exist, the business target data is determined based on the business data label, and the business target data is determined as target variable data;
and constructing the data verification model based on the independent variable data and the target variable data.
Optionally, after the step of detecting whether there is an identifiable service data tag, the method further includes:
and if no business data label is detected, determining the business target data based on the user instruction, and determining the business target data as the target variable data.
Optionally, the step of determining corresponding abnormality degree data based on the user instruction and an unsupervised abnormality recognition algorithm includes:
determining corresponding user data and data characteristics corresponding to the user data based on the user instruction;
performing data analysis on the user data based on the unsupervised anomaly identification algorithm and the data characteristics to determine the data of the anomaly points of the user data;
determining corresponding anomaly data based on the anomaly point data.
Optionally, the step of determining corresponding user data based on the user instruction includes:
determining database information, data content information, data distribution information and data magnitude information in the user instruction;
and determining user data corresponding to the user instruction based on the database information, the data content information, the data distribution information and the data magnitude information.
The embodiment of the present application further provides a data verification apparatus, where the data verification apparatus includes:
the determining module is used for responding to a user instruction and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm;
the construction module is used for determining the business target data and constructing a corresponding data verification model based on the abnormal degree data and the business target data;
and the verification module is used for verifying the effectiveness of the unsupervised anomaly identification algorithm based on the data verification model.
The embodiment of the present application further provides a data verification system, where the data verification system includes a memory, a processor, and a data verification program stored in the memory and running on the processor, and when executed by the processor, the data verification program implements the steps of the data verification method described above.
Further, to achieve the above object, the present application also provides a computer-readable storage medium having stored thereon a data authentication program that, when executed by a processor, implements the steps of the data authentication method as described above.
The embodiment of the application provides a data verification method, a device and a system and a computer readable storage medium, wherein corresponding abnormal degree data is determined based on a user instruction and an unsupervised abnormal recognition algorithm by responding to the user instruction; determining service target data, and constructing a corresponding data verification model based on the abnormal degree data and the service target data; and verifying the effectiveness of the unsupervised anomaly identification algorithm based on the data verification model. Therefore, in the data verification process, the abnormality degree data is determined through the user instruction and the unsupervised abnormality identification algorithm, and the accuracy of the abnormality degree data is guaranteed. And then, a data verification model is constructed through the abnormal degree data and the service target data, and the effectiveness of the unsupervised abnormal recognition algorithm is verified through the data verification model, so that the relevance between the data verification model and the service target data is ensured.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of a first embodiment of a method for verifying data of the present application;
FIG. 3 is a schematic flow chart of a method for validating data of the present application;
fig. 4 is a schematic structural diagram of a preferred verification device for the data of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As shown in fig. 1, fig. 1 is a schematic system structure diagram of a hardware operating environment according to an embodiment of the present application. The system for verifying the data may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the data verification system architecture shown in FIG. 1 does not constitute a limitation on the data verification system, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a verification program of data.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client and performing data communication with the client; and the processor 1001 may be configured to invoke a verification procedure for the data stored in the memory 1005 and perform the following operations:
responding to a user instruction, and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm;
determining business target data, and constructing a corresponding data verification model based on the abnormal degree data and the business target data;
verifying the validity of the unsupervised anomaly identification algorithm based on the data verification model.
Further, the processor 1001 may call a verification program of the data stored in the memory 1005, and also perform the following operations:
and determining the degree of correlation between the abnormal degree data and the business target data in the data verification model, and verifying the effectiveness of the unsupervised abnormal recognition algorithm based on the degree of correlation.
Further, the processor 1001 may call a verification program of the data stored in the memory 1005, and also perform the following operations:
determining whether the degree of correlation is greater than or equal to a preset degree of correlation;
if the relevance degree is larger than or equal to the preset relevance degree, determining that the unsupervised abnormal recognition algorithm is effective;
and if the relevance degree is smaller than the preset relevance degree, determining that the unsupervised abnormal recognition algorithm is invalid.
Further, the processor 1001 may call a verification program of the data stored in the memory 1005, and also perform the following operations:
determining the abnormal degree data as independent variable data, and detecting whether a service data label exists;
if the business data label is detected to exist, the business target data is determined based on the business data label, and the business target data is determined as target variable data;
and constructing the data verification model based on the independent variable data and the target variable data.
Further, the processor 1001 may call a verification program of the data stored in the memory 1005, and also perform the following operations:
and if no business data label is detected, determining the business target data based on the user instruction, and determining the business target data as the target variable data.
Further, the processor 1001 may call a verification program of the data stored in the memory 1005, and also perform the following operations:
determining corresponding user data and data characteristics corresponding to the user data based on the user instruction;
performing data analysis on the user data based on the unsupervised anomaly identification algorithm and the data characteristics to determine the data of the anomaly points of the user data;
determining corresponding anomaly data based on the anomaly point data.
Further, the processor 1001 may call a verification program of the data stored in the memory 1005, and also perform the following operations:
determining database information, data content information, data distribution information and data magnitude information in the user instruction;
and determining user data corresponding to the user instruction based on the database information, the data content information, the data distribution information and the data magnitude information.
The present application provides a data verification method, and referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of the data verification method of the present application.
While the embodiments of the present application provide an embodiment of a method for verifying data, it should be noted that, although a logical order is shown in the flowchart, in some data, the steps shown or described may be performed in an order different from that shown or described herein.
The embodiment of the application takes a data verification system as an execution subject for illustration, and the data verification method comprises the following steps:
and step S10, responding to the user instruction, and determining corresponding abnormal degree data based on the user instruction and the unsupervised abnormal recognition algorithm.
The user inputs a corresponding user instruction in an input interface of the data verification system according to user experience, wherein the user instruction is used for indicating a corresponding user data source, data characteristics of the user data source, business data information and the like, the user includes but is not limited to an individual user, a small and micro enterprise, a medium enterprise and a large enterprise, and it needs to be noted that the user mainly faces the small and micro enterprise in the embodiment of the application.
When the data verification system detects that the user instruction is input by the input interface, the data verification system responds to the user instruction, then analyzes the user data source through an unsupervised anomaly identification algorithm and in combination with the data characteristics of the user data source, and identifies isolated points in the user data source, namely the identified isolated points are determined as abnormal data points. The unsupervised anomaly identification algorithm includes, but is not limited to, a local anomaly factor algorithm, a DBSCAN clustering algorithm, a single classification SVM algorithm, and an isolated forest algorithm, and this embodiment is not limited.
The application scenario of the unsupervised anomaly identification algorithm only needs to obtain the ranking of the abnormal data points, a part of abnormal data points which are most abnormal are determined as abnormal data, the output result is an abnormal value corresponding to the abnormal data, and the abnormal value is abnormal degree data corresponding to the abnormal data.
Further, the step S10 includes:
step S101, determining corresponding user data and data characteristics corresponding to the user data based on the user instruction;
step S102, performing data analysis on the user data based on the unsupervised anomaly identification algorithm and the data characteristics, and determining the data of the anomaly points of the user data;
step S103, determining corresponding abnormal degree data based on the abnormal point data.
Specifically, the data verification system determines user data information contained in the user instruction, determines user data corresponding to the user instruction and data characteristics corresponding to the user data according to the user data information, and performs normal data and abnormal data analysis on the user data according to the data characteristics through an unsupervised abnormal recognition algorithm to recognize normal point data deviating from the normal point data. And then, outputting an abnormal numerical value corresponding to the abnormal point data by the data verification system, wherein the abnormal numerical value is generally expressed in an abnormal score form, subtracting the abnormal numerical value from a preset standard numerical value to obtain an abnormal difference value, determining a preset difference value stage where the abnormal difference value is located, and determining the abnormal degree data corresponding to the abnormal point data through the preset difference value stage where the abnormal difference value is located. The preset standard value and the preset difference value stage are set according to actual conditions, and the embodiment is not limited.
In this embodiment, for example, the preset standard value is 0.5, the preset difference stage is 0 to 0.1 in the first stage, 0.11 to 0.3 in the second stage, 0.31 to 0.6 in the third stage, and the fourth stage is greater than 0.6. And the abnormal value corresponding to the abnormal point data is 0.94, the data verification system determines that the abnormal difference value is 0.95-0.5-0.45, and the abnormal difference value 0.45 is greater than 0.31 and less than 0.6, and then determines that the abnormal degree data corresponding to the abnormal point data is the third stage.
Further, in step S101, the step of determining corresponding user data based on the user instruction includes:
step S1011, determining database information, data content information, data distribution information and data magnitude information in the user instruction;
step S1102, determining user data corresponding to the user instruction based on the database information, the data content information, the data distribution information, and the data magnitude information.
Specifically, the data verification system analyzes the user command and determines database information, data content information, data distribution information and data magnitude information carried in the user command. Then, the data verification system determines the database for which the corresponding data needs to be acquired according to the database information, such as the name of the database or the address of the database. Then, the data verification system determines what the data to be acquired is according to the data content information, determines the intrinsic distribution of the data according to the data distribution information, and determines the magnitude of the data according to the data magnitude information. And finally, the data verification system determines the user data to be acquired in the database according to the data content information, the data distribution information and the data magnitude information.
In this embodiment, for example, the address of the database is "101.1.12.1.0", the data content information is "withdrawal data", the data distribution information is "2020 1 month to 11 months in 2020", the data magnitude information is "10000 pieces", the data verification system obtains "10000 pieces of withdrawal data" from "2020 1 month to 11 months" in the "101.1.12.1.0" database, if "withdrawal data" from "2020 1 month to 11 months" is greater than or equal to 10000 pieces, the data verification system obtains 10000 pieces of "withdrawal data", and if "withdrawal data" from "2020 1 month to 11 months" is less than 10000 pieces, the data verification system obtains "withdrawal data" of current magnitude.
And step S20, determining service target data, and constructing a corresponding data verification model based on the abnormal degree data and the service target data.
The data verification system determines whether a service data tag which can be identified in a corresponding scene exists in the database, if the data verification system determines that the service data tag which can be identified exists in the database, the data verification system determines corresponding service target data according to the service data tag, if the data verification system determines that the service data tag which can be identified does not exist in the database, the data verification system determines the service target data according to data information carried in a user instruction, wherein the scene defines normal operation and abnormal operation of a user, and the service target data can be whether default or not, whether money is drawn or not and the like.
And after the data verification system determines the service target data, determining the abnormal degree data and the service target data as corresponding model variables, and constructing a corresponding data verification model according to the model variables of the abnormal degree data and the model variables of the service target data.
Further, the step S20 includes:
step S201, determining the abnormal degree data as independent variable data, and detecting whether a service data label exists;
step S202, if a business data label is detected to exist, determining the business target data based on the business data label, and determining the business target data as target variable data;
step S203, if no business data label is detected, determining the business target data based on the user instruction, and determining the business target data as the target variable data;
step S204, the data verification model is constructed based on the independent variable data and the target variable data.
Specifically, the data verification system determines the abnormal degree data as independent variable data for constructing a data verification model, detects whether a business data label exists in a database, if the data verification system detects that the business data label exists in the database, the data verification system identifies the business data label to obtain a corresponding identification result, determines the identification result as business target data corresponding to the business data label, and determines the business target data as target variable data for constructing the data verification model. If the data verification system detects that no service data label exists in the database, the data verification system determines service data information carried in the user instruction, determines service target data corresponding to the user instruction according to the service data information, and determines the service target data as target variable data for constructing a data verification model.
After the data verification system determines independent variable data for constructing the data verification model and target variable data for constructing the data verification model, the independent variable data for constructing the data verification model and the target variable data for constructing the data verification model are used for constructing a regression model, and a corresponding data verification model is obtained.
And step S30, verifying the effectiveness of the unsupervised anomaly identification algorithm based on the data verification model.
And after the data verification system constructs a data verification model, determining the relevance between the abnormal degree data and the service target data in the data verification model, and determining whether the unsupervised abnormal recognition algorithm is valid or invalid according to the relevance between the abnormal degree data and the service target data in the data verification model.
Further, as shown in fig. 3, fig. 3 is a schematic flow chart of the verification method of the data of the present application, and feature screening based on expert experience (user experience) selects a user data source suitable for a small micro enterprise (user) risk model and a feature derivation manner (data feature) corresponding to the user data source to obtain an expert experience instruction (user instruction). And (3) based on the abnormal recognition of the unsupervised algorithm, calculating abnormal degree data of the user data source through the unsupervised abnormal recognition algorithm and a characteristic derivative mode corresponding to the user data source. The method comprises the steps of conducting multi-scene regression testing based on a business target, using abnormal degree data calculated by an unsupervised abnormal recognition algorithm as independent variables, using multi-scene business target data as target variables to construct a regression model (data verification model), and verifying effectiveness of the unsupervised abnormal recognition algorithm through the regression model.
The method comprises the steps of responding to a user instruction, and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm; determining service target data, and constructing a corresponding data verification model based on the abnormal degree data and the service target data; and verifying the effectiveness of the unsupervised anomaly identification algorithm based on the data verification model. Therefore, in the data verification process, the abnormality degree data is determined through the user instruction and the unsupervised abnormality identification algorithm, so that the accuracy of the abnormality degree data is ensured. And then, a data verification model is constructed through the abnormal degree data and the service target data, and the effectiveness of the unsupervised abnormal recognition algorithm is verified through the data verification model, so that the relevance between the data verification model and the service target data is ensured.
Further, the verification method of the data of the present application provides another embodiment, and the step S30 includes:
step S301, determining the degree of correlation between the abnormal degree data and the service target data in the data verification model, and verifying the effectiveness of the unsupervised abnormal recognition algorithm based on the degree of correlation.
Specifically, the data verification system determines a degree of association between the abnormal degree data and the business target data in the data verification model, compares the degree of association between the abnormal degree data and the business target data in the data verification model with a preset degree of association, and determines whether the unsupervised abnormality identification algorithm is valid or invalid according to a comparison result of the degree of association and the preset degree of association, where the preset degree of association is set by a technician, and this embodiment is not limited. The expression form of the degree of association and the preset degree of association includes, but is not limited to, a numerical expression form and a hierarchical expression form.
Further, the step S301 includes:
step S3011, determining whether the degree of relevance is greater than or equal to a preset degree of relevance;
step S3012, if the relevance degree is determined to be greater than or equal to the preset relevance degree, determining that the unsupervised anomaly identification algorithm is valid;
step S3013, if it is determined that the degree of correlation is smaller than the preset degree of correlation, it is determined that the unsupervised anomaly recognition algorithm is invalid.
Specifically, the data verification system determines a value or a grade corresponding to the degree of association and a preset degree of association respectively, compares the value or the grade corresponding to the degree of association with the value or the grade corresponding to the preset degree of association, determines whether the value or the grade corresponding to the degree of association is greater than or equal to the value or the grade corresponding to the preset degree of association, and if the data verification system determines that the value or the grade corresponding to the degree of association is greater than or equal to the value or the grade corresponding to the preset degree of association, the data verification system determines that the association between the abnormal degree data and the service target data in the data verification model meets a preset requirement, that is, the unsupervised abnormal recognition algorithm is determined to be valid. If the data verification system determines that the value or the grade corresponding to the relevance degree is smaller than the value or the grade corresponding to the preset relevance degree, the data verification system determines that the relevance between the abnormal degree data and the service target data in the data verification model does not meet the preset requirement, namely the unsupervised abnormal recognition algorithm is determined to be invalid.
In this embodiment, for example, the expression form of the degree of association and the preset degree of association is a numerical value, the numerical value of the preset degree of association is 75, and the data verification system determines that the degree of association between the abnormality degree data and the service target data is 82 greater than 75, and then determines that the unsupervised abnormality identification algorithm is valid. And the data verification system determines that the degree of correlation between the abnormality degree data and the business target data is 67 and is less than 75, and determines that the unsupervised abnormality identification algorithm is invalid.
The embodiment determines the degree of relevance between the abnormal degree data and the business target data in the data verification model, and verifies the effectiveness of the unsupervised abnormal recognition algorithm based on the degree of relevance. Therefore, the effectiveness of the unsupervised anomaly identification algorithm is verified through the association anomaly degree data between the data verification model and the business target data, the association degree of the business target data and the preset association degree, and therefore the association between the data verification model and the business target data is guaranteed.
In addition, the present application further provides a data verification apparatus, referring to fig. 4, where fig. 4 is a schematic structural diagram of the data verification apparatus of the present application, and the data verification apparatus includes:
the determining module 10 is used for responding to a user instruction, and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm;
the construction module 20 is used for determining the business target data and constructing a corresponding data verification model based on the abnormal degree data and the business target data;
a verification module 30 for verifying the validity of the unsupervised anomaly identification algorithm based on the data verification model.
Further, the determining module 10 is further configured to determine a degree of association between the data of the degree of abnormality and the business target data in the data verification model;
the verification module 30 is further configured to verify the validity of the unsupervised anomaly identification algorithm based on the degree of relevance;
the determining module 10 is further configured to determine whether the degree of correlation is greater than or equal to a preset degree of correlation;
the determining module 10 is further configured to determine that the unsupervised anomaly identification algorithm is valid if it is determined that the degree of correlation is greater than or equal to the preset degree of correlation;
the determining module 10 is further configured to determine that the unsupervised anomaly identification algorithm is invalid if it is determined that the degree of correlation is smaller than the preset degree of correlation;
the determination module 10 is further configured to determine the abnormality degree data as independent variable data.
Further, the determining module 10 further includes:
and the detection unit is used for detecting whether the service data label exists or not.
Further, the determining module 10 is further configured to determine the service target data based on the service data tag if the service data tag is detected to exist, and determine the service target data as target variable data;
the building module 20 is further configured to build the data validation model based on the independent variable data and the target variable data;
the determining module 10 is further configured to determine the service target data based on the user instruction if it is detected that no service data tag exists, and determine the service target data as the target variable data;
the determining module 10 is further configured to determine corresponding user data and data characteristics corresponding to the user data based on the user instruction;
the determining module 10 is further configured to perform data analysis on the user data based on the unsupervised anomaly identification algorithm and the data features, and determine anomaly point data of the user data;
the determining module 10 is further configured to determine corresponding abnormality degree data based on the abnormality point data;
the determining module 10 is further configured to determine database information, data content information, data distribution information, and data magnitude information in the user instruction;
the determining module 10 is further configured to determine user data corresponding to the user instruction based on the database information, the data content information, the data distribution information, and the data magnitude information.
The specific implementation of the data-based verification apparatus of the present application is substantially the same as that of each embodiment of the data-based verification method, and is not described herein again.
In addition, an embodiment of the present application also provides a computer-readable storage medium, where a data verification program is stored on the computer-readable storage medium, and when executed by a processor, the data verification program implements the steps of the data verification method described above.
The specific implementation of the computer-readable storage medium of the present application is substantially the same as the embodiments of the data verification method, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation manner in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of software goods stored in a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and including instructions for causing a data verification system to execute the methods according to the embodiments of the present application.

Claims (10)

1. A method for verifying data, the method comprising the steps of:
responding to a user instruction, and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm;
determining business target data, and constructing a corresponding data verification model based on the abnormal degree data and the business target data;
verifying the validity of the unsupervised anomaly identification algorithm based on the data verification model.
2. The method of validating data as defined in claim 1, wherein the step of validating the unsupervised anomaly identification algorithm based on the data validation model comprises:
and determining the degree of correlation between the abnormal degree data and the business target data in the data verification model, and verifying the effectiveness of the unsupervised abnormal recognition algorithm based on the degree of correlation.
3. A method of validating data as claimed in claim 2, wherein the step of validating the unsupervised anomaly identification algorithm based on the degree of relatedness comprises:
determining whether the degree of correlation is greater than or equal to a preset degree of correlation;
if the relevance degree is larger than or equal to the preset relevance degree, determining that the unsupervised abnormal recognition algorithm is effective;
and if the relevance degree is smaller than the preset relevance degree, determining that the unsupervised abnormal recognition algorithm is invalid.
4. The method for validating data as claimed in claim 1, wherein the step of determining business objective data, and constructing a corresponding data validation model based on the degree of abnormality data and the business objective data comprises:
determining the abnormal degree data as independent variable data, and detecting whether a service data label exists;
if the business data label is detected to exist, the business target data is determined based on the business data label, and the business target data is determined as target variable data;
and constructing the data verification model based on the independent variable data and the target variable data.
5. The method for validating data as claimed in claim 4, wherein said step of detecting whether an identifiable service data tag exists is followed by the step of:
and if no business data label is detected, determining the business target data based on the user instruction, and determining the business target data as the target variable data.
6. A method of validating data as claimed in any one of claims 1 to 5, wherein the step of determining corresponding anomaly data based on the user instructions and an unsupervised anomaly recognition algorithm comprises:
determining corresponding user data and data characteristics corresponding to the user data based on the user instruction;
performing data analysis on the user data based on the unsupervised anomaly identification algorithm and the data characteristics to determine the data of the anomaly points of the user data;
determining corresponding anomaly data based on the anomaly point data.
7. The method for validating data as claimed in claim 6, wherein the step of determining the corresponding user data based on the user instruction comprises:
determining database information, data content information, data distribution information and data magnitude information in the user instruction;
and determining user data corresponding to the user instruction based on the database information, the data content information, the data distribution information and the data magnitude information.
8. An apparatus for verifying data, comprising:
the determining module is used for responding to a user instruction and determining corresponding abnormal degree data based on the user instruction and an unsupervised abnormal recognition algorithm;
the construction module is used for determining the business target data and constructing a corresponding data verification model based on the abnormal degree data and the business target data;
and the verification module is used for verifying the effectiveness of the unsupervised anomaly identification algorithm based on the data verification model.
9. A system for the verification of data, characterized in that it comprises a memory, a processor and a program for the verification of data stored on said memory and running on said processor, said program for the verification of data implementing the steps of the method for the verification of data according to any one of claims 1 to 7 when executed by said processor.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a verification program of data, which when executed by a processor implements the steps of the verification method of data according to any one of claims 1 to 7.
CN202011480066.7A 2020-12-15 2020-12-15 Data verification method, device and system and computer readable storage medium Pending CN112597209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011480066.7A CN112597209A (en) 2020-12-15 2020-12-15 Data verification method, device and system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011480066.7A CN112597209A (en) 2020-12-15 2020-12-15 Data verification method, device and system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112597209A true CN112597209A (en) 2021-04-02

Family

ID=75196216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011480066.7A Pending CN112597209A (en) 2020-12-15 2020-12-15 Data verification method, device and system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112597209A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139610A (en) * 2021-04-29 2021-07-20 国网河北省电力有限公司电力科学研究院 Abnormity detection method and device for transformer monitoring data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886809A (en) * 2019-01-24 2019-06-14 平安科技(深圳)有限公司 Transaction data intelligent analysis method, electronic device and computer readable storage medium
CN109902721A (en) * 2019-01-28 2019-06-18 平安科技(深圳)有限公司 Outlier detection model verification method, device, computer equipment and storage medium
CN110009359A (en) * 2019-01-22 2019-07-12 阿里巴巴集团控股有限公司 Training method, update method and the device of unsupervised risk prevention system model
CN111507376A (en) * 2020-03-20 2020-08-07 厦门大学 Single index abnormality detection method based on fusion of multiple unsupervised methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009359A (en) * 2019-01-22 2019-07-12 阿里巴巴集团控股有限公司 Training method, update method and the device of unsupervised risk prevention system model
CN109886809A (en) * 2019-01-24 2019-06-14 平安科技(深圳)有限公司 Transaction data intelligent analysis method, electronic device and computer readable storage medium
CN109902721A (en) * 2019-01-28 2019-06-18 平安科技(深圳)有限公司 Outlier detection model verification method, device, computer equipment and storage medium
CN111507376A (en) * 2020-03-20 2020-08-07 厦门大学 Single index abnormality detection method based on fusion of multiple unsupervised methods

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139610A (en) * 2021-04-29 2021-07-20 国网河北省电力有限公司电力科学研究院 Abnormity detection method and device for transformer monitoring data

Similar Documents

Publication Publication Date Title
CN109241418B (en) Abnormal user identification method and device based on random forest, equipment and medium
CN114207648A (en) Techniques to automatically update payment information in a computing environment
CN111523119A (en) Vulnerability detection method and device, electronic equipment and computer readable storage medium
CN113051543B (en) Cloud service security verification method and cloud service system in big data environment
CN113888299A (en) Wind control decision method and device, computer equipment and storage medium
CN112307464A (en) Fraud identification method and device and electronic equipment
WO2023108833A1 (en) Terminal anomalous behavior detection method and apparatus, device, and storage medium
CN112597209A (en) Data verification method, device and system and computer readable storage medium
CN110781494A (en) Data abnormity early warning method, device, equipment and storage medium
CN111314326B (en) Method, device, equipment and medium for confirming HTTP vulnerability scanning host
CN111767543B (en) Replay attack vulnerability determination method, device, equipment and readable storage medium
Ugarte-Pedrero et al. On the adoption of anomaly detection for packed executable filtering
CN111582757A (en) Fraud risk analysis method, device, equipment and computer-readable storage medium
KR102143510B1 (en) Risk management system for information cecurity
CN111737695A (en) White list optimization method, device, equipment and computer readable storage medium
CN114244611B (en) Abnormal attack detection method, device, equipment and storage medium
US11663547B2 (en) Evolutionary software prioritization protocol for digital systems
CN111767544B (en) Multi-frequency replay attack vulnerability determination method, device, equipment and readable storage medium
CN114301713A (en) Risk access detection model training method, risk access detection method and risk access detection device
US20170149831A1 (en) Apparatus and method for verifying detection rule
CN113610132A (en) User equipment identification method and device and computer equipment
WO2015081834A1 (en) Method and apparatus for distinguishing software types
CN111782967A (en) Information processing method, information processing device, electronic equipment and computer readable storage medium
CN112784990A (en) Training method of member inference model
CN115865809A (en) Data transmission method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination