CN116881787A - Data sample classification method and device, processor and electronic equipment - Google Patents

Data sample classification method and device, processor and electronic equipment Download PDF

Info

Publication number
CN116881787A
CN116881787A CN202310848166.8A CN202310848166A CN116881787A CN 116881787 A CN116881787 A CN 116881787A CN 202310848166 A CN202310848166 A CN 202310848166A CN 116881787 A CN116881787 A CN 116881787A
Authority
CN
China
Prior art keywords
data
samples
data samples
classifying
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310848166.8A
Other languages
Chinese (zh)
Inventor
王樱
钟玉兴
林浩
李旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310848166.8A priority Critical patent/CN116881787A/en
Publication of CN116881787A publication Critical patent/CN116881787A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data sample classification method and device, a processor and electronic equipment. Relates to the field of artificial intelligence, and the method comprises the following steps: classifying historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer; respectively establishing N data prediction models according to the N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; and classifying the N second data samples through the N data prediction models respectively to determine K third data samples belonging to the first category in the target report data, wherein K is a positive integer, K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples.

Description

Data sample classification method and device, processor and electronic equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method and apparatus for classifying data samples, a processor, and an electronic device.
Background
In the related art, a supervision information reporting system of a bank collects cross-border service data of each upstream service system and uniformly reports the data to a supervision organization according to different frequency requirements according to the requirements of the supervision organization; however, as the upstream business systems are numerous, the business such as international trade balance, foreign exchange account junction and exchange, foreign currency cash access of institutions, bank card overseas transactions, bank own capital and foreign bonds are involved, and with the development of new technology, the upgrading and transformation of each business system are frequent, and the abnormal fluctuation of upstream data is effectively monitored and predicted by the supervision and reporting system, so that the error loss of the data reported to the supervision and management department is caused.
Aiming at the problem that the supervision and reporting system in the related art has no effective monitoring and prediction on abnormal fluctuation of upstream data, so that error and deletion exist in data reported to supervision departments, no effective solution is proposed at present.
Disclosure of Invention
The application mainly aims to provide a classification method and device for data samples, a processor and electronic equipment, and aims to solve the problem that abnormal fluctuation of upstream data is effectively monitored and predicted by a supervision and reporting system in the related technology, so that error loss exists in data reported to a supervision and management department.
In order to achieve the above object, according to one aspect of the present application, there is provided a classification method of data samples. The method comprises the following steps: classifying historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer; respectively establishing N data prediction models according to the N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; and classifying the N second data samples through the N data prediction models respectively to determine K third data samples belonging to the first category in the target report data, wherein K is a positive integer, K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples.
Optionally, classifying the historical report data according to M data attributes to obtain N first data samples, including: determining first numbers of categories of data sub-attributes contained in each data attribute in the M data attributes to obtain M first numbers; determining N data categories contained in the M data attributes, wherein each data category of the N data categories contains one data sub-attribute of each data attribute of the M data attributes, and N is the product of the M first numbers; and classifying the historical report data according to the N data categories to obtain the N first data samples.
Optionally, building a data prediction model according to the first data sample includes: performing stationarity check on the first data sample; determining: determining a seasonal period of the first data sample if the first data sample is determined to not pass the stationarity check; the operation steps are as follows: performing differential operation on the first data sample according to the seasonal period to obtain a second data sample; circularly executing the determining step and the operation step until the finally obtained target second data sample passes through the stationarity check; and establishing an autocorrelation diagram and a partial correlation diagram of the target second data sample, and establishing the data prediction model according to the autocorrelation diagram and the partial correlation diagram.
Optionally, after the data prediction model is built according to the autocorrelation graph and the partial correlation graph, the method further includes: determining a first parameter of the data prediction model through a preset estimation method; and (3) checking: performing accuracy check on the data prediction model according to the first parameter; an adjustment step: re-determining the first parameter by adjusting an order of the data prediction model, wherein the order of the data prediction model is determined by the autocorrelation map and the partial correlation map, if the data prediction model fails the accuracy check; and circularly executing the checking step and the adjusting step until the data prediction model passes the accuracy check.
Optionally, classifying the N second data samples by the N data prediction models includes: predicting N fourth data samples through the N data prediction models, wherein the N fourth data samples are predicted values of the N second data samples; classifying the N second data samples according to the N fourth data samples.
Optionally, classifying the N second data samples according to the N fourth data samples includes: calculating the fluctuation value of each second data sample according to the N fourth data samples to obtain N fluctuation values; acquiring fluctuation early warning values of the N second data samples; classifying the corresponding second data sample into a second class under the condition that the fluctuation value is smaller than or equal to the fluctuation early-warning value, wherein the second class is used for indicating that the second data sample is normal; and classifying the corresponding second data sample into the first category under the condition that the fluctuation value is larger than the fluctuation early-warning value, wherein the first category is used for indicating that the second data sample is abnormal.
Optionally, after determining K third data samples belonging to the first category in the target report data, the method further includes: and modifying the data states of the K third data samples into target states, and sending prompt information to the first object, wherein the target states are used for indicating that the third data samples are forbidden to be sent to the second object.
To achieve the above object, according to another aspect of the present application, there is provided a sorting apparatus for data samples. The device comprises: the first classification module is used for classifying the historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer; the establishing module is used for respectively establishing N data prediction models according to the N first data samples; the second classification module is used for classifying the target report data according to the M data attributes to obtain N second data samples; the determining module is configured to classify the N second data samples according to the N data prediction models, so as to determine K third data samples belonging to the first category in the target report data, where K is a positive integer, K is less than or equal to N, and the N data prediction models and the N second data samples are all in one-to-one correspondence with the N first data samples.
According to the application, the following steps are adopted: classifying historical report data according to M data attributes to obtain N first data samples; then, respectively establishing N data prediction models according to N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; classifying the N second data samples through the N data prediction models respectively, so as to determine K third data samples belonging to the first category in the target report data, wherein K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples; by adopting the scheme, the fluctuation value of the data quantity acquired on the current day is checked before the supervision data is reported, so that abnormal data is determined, and abnormal data is effectively prevented from being reported to the supervision department; the problem that the supervision and reporting system in the related art has less effective supervision and prediction on abnormal fluctuation of upstream data, so that error and deletion of data reported to a supervision department are caused is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flowchart of a method for classifying data samples according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for classifying historical report data according to an embodiment of the application;
FIG. 3 is a flowchart of a method for creating a data prediction model according to an embodiment of the present application;
FIG. 4 is a second flowchart of a method for classifying data samples according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an abnormal data monitoring apparatus provided according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a supervision report according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a classification apparatus for data samples according to an embodiment of the present application;
fig. 8 is a schematic diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The present application will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for classifying data samples according to an embodiment of the present application, as shown in fig. 1, and the method includes the following steps:
step S101, classifying historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer;
step S102, respectively establishing N data prediction models according to the N first data samples;
step S103, classifying the target report data according to the M data attributes to obtain N second data samples;
step S104, classifying the N second data samples through the N data prediction models respectively to determine K third data samples belonging to the first category in the target report data, wherein K is a positive integer, K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples.
According to the data sample classification method provided by the embodiment of the application, the historical report data is classified according to M data attributes to obtain N first data samples; then, respectively establishing N data prediction models according to N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; classifying the N second data samples through the N data prediction models respectively, so as to determine K third data samples belonging to the first category in the target report data, wherein K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples; by adopting the scheme, the fluctuation value of the data quantity acquired on the current day is checked before the supervision data is reported, so that abnormal data is determined, and abnormal data is effectively prevented from being reported to the supervision department; the problem that the supervision and reporting system in the related art has less effective supervision and prediction on abnormal fluctuation of upstream data, so that error and deletion of data reported to a supervision department are caused is solved.
Optionally, the above step S101 is performed: classifying the historical report data according to M data attributes to obtain N first data samples, wherein the method comprises the following steps: determining first numbers of categories of data sub-attributes contained in each data attribute in the M data attributes to obtain M first numbers; determining N data categories contained in the M data attributes, wherein each data category of the N data categories contains one data sub-attribute of each data attribute of the M data attributes, and N is the product of the M first numbers; and classifying the historical report data according to the N data categories to obtain the N first data samples.
The historical report data is subdivided by: for example, the data attributes comprise four types of business varieties, transaction systems, transaction channels and client types, wherein the business varieties comprise three data sub-attributes of a, b and c; the transaction system comprises d, e data sub-attributes; the transaction channel comprises three data sub-attributes of f, g and h; the client type comprises four data sub-attributes of i, j, k and l; therefore, 3×2×3×4=72 (N) data categories can be classified according to the data attributes, and the 72 data categories are composed of different data sub-attribute arrangements included in the four data attributes; and classifying the historical report data according to the 72 data types, so that the data attributes in each divided data sample are more convergent, and errors are reduced to the greatest extent.
The process of subdividing the historical report data is shown in fig. 2, and fig. 2 is a flowchart of an alternative method for classifying the historical report data according to an embodiment of the present application, which specifically includes the following steps:
step S201: acquiring historical report data from an initial date to t-1 days, wherein t is the date of the current day;
step S202: the history report data is subdivided into a plurality of samples (corresponding to the N first data samples) according to the data attribute.
It should be noted that, the data attribute may include: business categories, transaction systems, transaction channels, customer types, etc., and may also include other data attributes, the present application is used as an example only and not as a limitation.
Alternatively, the data prediction model may be built according to the first data sample by the following method, specifically including: performing stationarity check on the first data sample; determining: determining a seasonal period of the first data sample if the first data sample is determined to not pass the stationarity check; the operation steps are as follows: performing differential operation on the first data sample according to the seasonal period to obtain a second data sample; circularly executing the determining step and the operation step until the finally obtained target second data sample passes through the stationarity check; and establishing an autocorrelation diagram and a partial correlation diagram of the target second data sample, and establishing the data prediction model according to the autocorrelation diagram and the partial correlation diagram.
The method for establishing the data prediction model comprises the following steps: firstly, carrying out stability verification on a first data sample, and then, circularly executing the following processes until the preset condition is met: if the first data sample is determined to not pass the stability check, determining the seasonal period of the first data sample, and then carrying out differential operation on the first data sample according to the seasonal period to obtain a second data sample; after the circulation is finished, the finally obtained target second data sample can pass through the stationarity check; then, an autocorrelation map and a partial correlation map of the target second data sample are established, and then, the data prediction model is established according to the autocorrelation map and the partial correlation map.
It should be noted that the data prediction model may be SARIMA (Seasonal Autoregressive Integrated Moving Average model, seasonal differential integration moving average autoregressive model), which is one of the time series prediction analysis methods. There is often a complex interaction relationship between the long-term trend, seasonal variation, cyclic fluctuation and random disturbance of the four factors of the time series, so that the development of the sequence is difficult to fit by using the ARIMA model, and thus the SARIMA model needs to be built.
Based on the above steps, after the data prediction model is established according to the autocorrelation graph and the partial correlation graph, the method further includes: determining a first parameter of the data prediction model through a preset estimation method; and (3) checking: performing accuracy check on the data prediction model according to the first parameter; an adjustment step: re-determining the first parameter by adjusting an order of the data prediction model, wherein the order of the data prediction model is determined by the autocorrelation map and the partial correlation map, if the data prediction model fails the accuracy check; and circularly executing the checking step and the adjusting step until the data prediction model passes the accuracy check.
After a data prediction model is established, determining a first parameter of the data prediction model by a preset estimation method; and then the following processes are circularly executed until the preset condition is met, and the cycle is ended: carrying out accuracy check on the data prediction model according to the first parameter, and re-determining the first parameter by adjusting the order of the data prediction model under the condition that the data prediction model fails the accuracy check, wherein the order of the data prediction model can be determined through an autocorrelation graph and a partial correlation graph; after the cycle is finished, the data prediction model can pass the accuracy check.
Optionally, an embodiment of the present application provides another method for establishing a data prediction model, where specific steps are shown in fig. 3, and the method includes the following steps:
step S301: acquiring a time sequence;
step S302: performing stability test, and executing step S303 if the stability test fails; if yes, executing step S304;
modeling by using a time sequence model, checking the stability of data, and if the checking sequence is stable, directly performing model identification; if the test sequence is not stationary, the sequence should be differentially processed until stationary.
Step S303: performing differential processing on the time sequence;
For a non-stationary time sequence only containing trend, after the trend influence is eliminated by proper phase-by-phase difference, an ARMA (p, q) model is built for the formed new stationary sequence to extract the correlation of the time sequence. If the original time series contains both trending and seasonal changes, the seasonal effect itself also has a correlation, so the extraction of the seasonal correlation can use an ARMA (P, Q) model in units of periodic steps.
Step S304: identifying a time sequence by a model;
step S305: estimating model parameters;
step S306: performing model verification, if the verification is passed, executing step S307, and if the verification is not passed, executing step S304;
step S307: and (5) shaping the model.
Optionally, the above step S104 is performed: classifying the N second data samples by the N data prediction models, respectively, includes: predicting N fourth data samples through the N data prediction models, wherein the N fourth data samples are predicted values of the N second data samples; classifying the N second data samples according to the N fourth data samples.
The process of classifying the second data sample by the data prediction model includes: predicting predicted values of the N second data samples, namely N fourth data samples, through N data prediction models; the second data samples are then classified according to the predicted values.
Optionally, in this embodiment, the steps described above are performed: classifying the N second data samples according to the N fourth data samples, comprising: calculating the fluctuation value of each second data sample according to the N fourth data samples to obtain N fluctuation values; acquiring fluctuation early warning values of the N second data samples; classifying the corresponding second data sample into a second class under the condition that the fluctuation value is smaller than or equal to the fluctuation early-warning value, wherein the second class is used for indicating that the second data sample is normal; and classifying the corresponding second data sample into the first category under the condition that the fluctuation value is larger than the fluctuation early-warning value, wherein the first category is used for indicating that the second data sample is abnormal.
Wherein the classifying the N second data samples according to the N fourth data samples comprises: firstly, calculating a fluctuation value of each second data sample according to the difference value of the N fourth data samples and the N second data samples; then obtaining fluctuation early warning values corresponding to the N second data samples; if the fluctuation value is smaller than or equal to the fluctuation early warning value, classifying the corresponding second data sample into a second category, wherein the second category is a normal data sample category; if the fluctuation value is larger than the fluctuation early warning value, classifying the corresponding second data sample into a first class, wherein the first class is an abnormal data sample class.
It should be noted that the surge early-warning values corresponding to each second data sample may be the same or different, which is not limited in the present application.
Based on the above steps, in this embodiment, after determining K third data samples belonging to the first category in the target report data, the method further includes: and modifying the data states of the K third data samples into target states, and sending prompt information to the first object, wherein the target states are used for indicating that the third data samples are forbidden to be sent to the second object.
After determining the K third data samples of the first class (i.e., the data fluctuation is abnormal), modifying the data states of the K third data samples into a target state, and sending prompt information to the first object to inform the first object of confirmation, wherein the target state is used for indicating to prohibit the third data samples from being sent to the second object, so that abnormal data is effectively prevented from being sent to the supervision department.
Optionally, an embodiment of the present application provides an optional method for classifying data samples, as shown in fig. 4, specifically including the following steps:
step S401: reading data collected on the same day (target report data);
Step S402: subdividing the data acquired on the same day according to the data attribute;
step S403: for each acquired data, the following steps are performed, taking acquired data 1 as an example;
step S404: calculating the actual acquired data quantity Z t
Reading data collected on the same day, and calculating the actual collected data quantity Z of each subdivision dimension according to a business organization, a business generation organization, a channel source, a client type and a transaction type t
Step S405: obtaining the obtainedTaking the predicted data amount X of SARIMA model (data prediction model) t
Acquiring the report data quantity X predicted by SARIMA model of each subdivision dimension of the day t
Step S406: calculating the fluctuation value F of the actual acquired data t
Report data volume X predicted by SARIMA model t Based on the calculation of the fluctuation value F of the actual acquired data amount of a certain subdivision dimension (acquired data 1) on the same day t
F t =abs((Z t -X t )/X t );
Step S407: acquiring a fluctuation early warning value Y of each subdivision dimension preset by a supervision and reporting system;
step S408: judging the fluctuation value F t And the magnitude of the fluctuation early-warning value Y, if the fluctuation value is smaller than or equal to the fluctuation early-warning value, executing step S409; if the fluctuation value is greater than the fluctuation early-warning value, executing step S410;
step S409: determining that the fluctuation is normal, and updating the data state to be 'to be reported';
When F t And when the fluctuation value is less than or equal to the early warning value, the data fluctuation is considered to be normal, and the state of the corresponding acquired data is updated to be 'waiting for reporting', and the data is waiting for reading by a reporting module.
Step S410: the fluctuation anomaly is determined and the updated data state is "to be confirmed" (corresponding to the above-mentioned target state).
When F t >And when Y, namely the fluctuation value is larger than the early warning value, the fluctuation of the data is considered to be abnormal, and the state of the corresponding acquired data is updated to be 'to be confirmed', so that the acquired data is prevented from being read by the reporting module, and meanwhile, a mail sending interface is called to inform related personnel (a first object).
Based on the above method, the embodiment of the present application provides an abnormal data monitoring method, which is applied to an abnormal data monitoring apparatus as shown in fig. 5, where the abnormal data monitoring apparatus is configured to perform the following steps:
step S51: subdividing and preprocessing historical report data;
step S52: respectively establishing an SARIMA model based on each subdivision sample;
step S53: and comparing the fluctuation value of the data on the same day with the early warning value (fluctuation early warning value) and updating the data state.
By the abnormal data monitoring device, abnormal data detection is carried out on daily reported data, and different data states are marked on the detected data, so that the abnormal data is prevented from being reported.
Optionally, the abnormal data monitoring device is applied to the supervision and reporting system of fig. 6, and fig. 6 is an optional supervision and reporting flow schematic diagram provided by an embodiment of the present application, including:
each upstream business system (a document settlement system, an overseas bandwidth system, a cross-border payment system, a financial market transaction system and the like) sends report data to a supervisory report system for summarization, the supervisory report system collects the report data through an acquisition module, then the report data is input into an abnormal data monitoring device for abnormal data detection, the report data is subjected to data state update, then normal data is reported to a supervisory department through the report module, and the supervisory department receives the normal report data through a receiving front-end processor.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application also provides a data sample classification device, and the data sample classification device can be used for executing the data sample classification method provided by the embodiment of the application. The following describes a classification device for data samples according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a classification apparatus for data samples according to an embodiment of the application. As shown in fig. 7, the apparatus includes:
a first classification module 72, configured to classify the historical report data according to M data attributes to obtain N first data samples, where M, N is a positive integer;
a building module 74, configured to build N data prediction models according to the N first data samples, respectively;
a second classification module 76, configured to classify the target report data according to the M data attributes, to obtain N second data samples;
the determining module 78 is configured to classify the N second data samples according to the N data prediction models, so as to determine K third data samples belonging to the first category in the target report data, where K is a positive integer, K is less than or equal to N, and the N data prediction models and the N second data samples are all in one-to-one correspondence with the N first data samples.
According to the data sample classifying device provided by the embodiment of the application, the historical report data is classified according to M data attributes to obtain N first data samples; then, respectively establishing N data prediction models according to N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; classifying the N second data samples through the N data prediction models respectively, so as to determine K third data samples belonging to the first category in the target report data, wherein K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples; by adopting the scheme, the fluctuation value of the data quantity acquired on the current day is checked before the supervision data is reported, so that abnormal data is determined, and abnormal data is effectively prevented from being reported to the supervision department; the problem that the supervision and reporting system in the related art has less effective supervision and prediction on abnormal fluctuation of upstream data, so that error and deletion of data reported to a supervision department are caused is solved.
Optionally, the first classification module 72 is further configured to determine a first number of categories of data sub-attributes included in each of the M data attributes, to obtain M first numbers; determining N data categories contained in the M data attributes, wherein each data category of the N data categories contains one data sub-attribute of each data attribute of the M data attributes, and N is the product of the M first numbers; and classifying the historical report data according to the N data categories to obtain the N first data samples.
The historical report data is subdivided by: for example, the data attributes comprise four types of business varieties, transaction systems, transaction channels and client types, wherein the business varieties comprise three data sub-attributes of a, b and c; the transaction system comprises d, e data sub-attributes; the transaction channel comprises three data sub-attributes of f, g and h; the client type comprises four data sub-attributes of i, j, k and l; therefore, 3×2×3×4=72 (N) data categories can be classified according to the data attributes, and the 72 data categories are composed of different data sub-attribute arrangements included in the four data attributes; and classifying the historical report data according to the 72 data types, so that the data attributes in each divided data sample are more convergent, and errors are reduced to the greatest extent.
Optionally, the establishing module 74 is further configured to perform stationarity check on the first data sample; determining: determining a seasonal period of the first data sample if the first data sample is determined to not pass the stationarity check; the operation steps are as follows: performing differential operation on the first data sample according to the seasonal period to obtain a second data sample; circularly executing the determining step and the operation step until the finally obtained target second data sample passes through the stationarity check; and establishing an autocorrelation diagram and a partial correlation diagram of the target second data sample, and establishing the data prediction model according to the autocorrelation diagram and the partial correlation diagram.
The method for establishing the data prediction model comprises the following steps: firstly, carrying out stability verification on a first data sample, and then, circularly executing the following processes until the preset condition is met: if the first data sample is determined to not pass the stability check, determining the seasonal period of the first data sample, and then carrying out differential operation on the first data sample according to the seasonal period to obtain a second data sample; after the circulation is finished, the finally obtained target second data sample can pass through the stationarity check; then, an autocorrelation map and a partial correlation map of the target second data sample are established, and then, the data prediction model is established according to the autocorrelation map and the partial correlation map.
It should be noted that the data prediction model may be SARIMA (Seasonal Autoregressive Integrated Moving Average model, seasonal differential integration moving average autoregressive model), which is one of the time series prediction analysis methods. There is often a complex interaction relationship between the long-term trend, seasonal variation, cyclic fluctuation and random disturbance of the four factors of the time series, so that the development of the sequence is difficult to fit by using the ARIMA model, and thus the SARIMA model needs to be built.
Based on the above steps, the above establishing module 74 is further configured to determine a first parameter of the data prediction model by a preset estimation method; and (3) checking: performing accuracy check on the data prediction model according to the first parameter; an adjustment step: re-determining the first parameter by adjusting an order of the data prediction model, wherein the order of the data prediction model is determined by the autocorrelation map and the partial correlation map, if the data prediction model fails the accuracy check; and circularly executing the checking step and the adjusting step until the data prediction model passes the accuracy check.
After a data prediction model is established, determining a first parameter of the data prediction model by a preset estimation method; and then the following processes are circularly executed until the preset condition is met, and the cycle is ended: carrying out accuracy check on the data prediction model according to the first parameter, and re-determining the first parameter by adjusting the order of the data prediction model under the condition that the data prediction model fails the accuracy check, wherein the order of the data prediction model can be determined through an autocorrelation graph and a partial correlation graph; after the cycle is finished, the data prediction model can pass the accuracy check.
Optionally, the determining module 78 is further configured to predict N fourth data samples through the N data prediction models, where the N fourth data samples are predicted values of the N second data samples; classifying the N second data samples according to the N fourth data samples.
The process of classifying the second data sample by the data prediction model includes: predicting predicted values of the N second data samples, namely N fourth data samples, through N data prediction models; the second data samples are then classified according to the predicted values.
Optionally, the determining module 78 is further configured to calculate a fluctuation value of each second data sample according to the N fourth data samples, to obtain N fluctuation values; acquiring fluctuation early warning values of the N second data samples; classifying the corresponding second data sample into a second class under the condition that the fluctuation value is smaller than or equal to the fluctuation early-warning value, wherein the second class is used for indicating that the second data sample is normal; and classifying the corresponding second data sample into the first category under the condition that the fluctuation value is larger than the fluctuation early-warning value, wherein the first category is used for indicating that the second data sample is abnormal.
Wherein the classifying the N second data samples according to the N fourth data samples comprises: firstly, calculating a fluctuation value of each second data sample according to the difference value of the N fourth data samples and the N second data samples; then obtaining fluctuation early warning values corresponding to the N second data samples; if the fluctuation value is smaller than or equal to the fluctuation early warning value, classifying the corresponding second data sample into a second category, wherein the second category is a normal data sample category; if the fluctuation value is larger than the fluctuation early warning value, classifying the corresponding second data sample into a first class, wherein the first class is an abnormal data sample class.
It should be noted that the surge early-warning values corresponding to each second data sample may be the same or different, which is not limited in the present application.
Optionally, the determining module 78 is further configured to modify a data state of the K third data samples to a target state, and send a prompt message to the first object, where the target state is used to indicate that the third data samples are prohibited from being sent to the second object.
After determining the K third data samples of the first class (i.e., the data fluctuation is abnormal), modifying the data states of the K third data samples into a target state, and sending prompt information to the first object to inform the first object of confirmation, wherein the target state is used for indicating to prohibit the third data samples from being sent to the second object, so that abnormal data is effectively prevented from being sent to the supervision department.
The classification device for the data samples comprises a processor and a memory, wherein the first classification module, the establishment module, the second classification module, the determination module and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem that the error loss exists in the data reported to the supervision department due to the fact that abnormal fluctuation of upstream data is effectively monitored and predicted by the supervision reporting system in the related technology is solved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a method of classifying data samples.
The embodiment of the invention provides a processor which is used for running a program, wherein the program runs to execute a classification method of data samples.
As shown in fig. 8, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: classifying historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer; respectively establishing N data prediction models according to the N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; and classifying the N second data samples through the N data prediction models respectively to determine K third data samples belonging to the first category in the target report data, wherein K is a positive integer, K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples.
Optionally, classifying the historical report data according to M data attributes to obtain N first data samples, including: determining first numbers of categories of data sub-attributes contained in each data attribute in the M data attributes to obtain M first numbers; determining N data categories contained in the M data attributes, wherein each data category of the N data categories contains one data sub-attribute of each data attribute of the M data attributes, and N is the product of the M first numbers; and classifying the historical report data according to the N data categories to obtain the N first data samples.
Optionally, building a data prediction model according to the first data sample includes: performing stationarity check on the first data sample; determining: determining a seasonal period of the first data sample if the first data sample is determined to not pass the stationarity check; the operation steps are as follows: performing differential operation on the first data sample according to the seasonal period to obtain a second data sample; circularly executing the determining step and the operation step until the finally obtained target second data sample passes through the stationarity check; and establishing an autocorrelation diagram and a partial correlation diagram of the target second data sample, and establishing the data prediction model according to the autocorrelation diagram and the partial correlation diagram.
Optionally, after the data prediction model is built according to the autocorrelation graph and the partial correlation graph, the method further includes: determining a first parameter of the data prediction model through a preset estimation method; and (3) checking: performing accuracy check on the data prediction model according to the first parameter; an adjustment step: re-determining the first parameter by adjusting an order of the data prediction model, wherein the order of the data prediction model is determined by the autocorrelation map and the partial correlation map, if the data prediction model fails the accuracy check; and circularly executing the checking step and the adjusting step until the data prediction model passes the accuracy check.
Optionally, classifying the N second data samples by the N data prediction models includes: predicting N fourth data samples through the N data prediction models, wherein the N fourth data samples are predicted values of the N second data samples; classifying the N second data samples according to the N fourth data samples.
Optionally, classifying the N second data samples according to the N fourth data samples includes: calculating the fluctuation value of each second data sample according to the N fourth data samples to obtain N fluctuation values; acquiring fluctuation early warning values of the N second data samples; classifying the corresponding second data sample into a second class under the condition that the fluctuation value is smaller than or equal to the fluctuation early-warning value, wherein the second class is used for indicating that the second data sample is normal; and classifying the corresponding second data sample into the first category under the condition that the fluctuation value is larger than the fluctuation early-warning value, wherein the first category is used for indicating that the second data sample is abnormal.
Optionally, after determining K third data samples belonging to the first category in the target report data, the method further includes: and modifying the data states of the K third data samples into target states, and sending prompt information to the first object, wherein the target states are used for indicating that the third data samples are forbidden to be sent to the second object.
The device herein may be a server, PC, PAD, cell phone, etc.
The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: classifying historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer; respectively establishing N data prediction models according to the N first data samples; classifying the target report data according to the M data attributes to obtain N second data samples; and classifying the N second data samples through the N data prediction models respectively to determine K third data samples belonging to the first category in the target report data, wherein K is a positive integer, K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples.
Optionally, classifying the historical report data according to M data attributes to obtain N first data samples, including: determining first numbers of categories of data sub-attributes contained in each data attribute in the M data attributes to obtain M first numbers; determining N data categories contained in the M data attributes, wherein each data category of the N data categories contains one data sub-attribute of each data attribute of the M data attributes, and N is the product of the M first numbers; and classifying the historical report data according to the N data categories to obtain the N first data samples.
Optionally, building a data prediction model according to the first data sample includes: performing stationarity check on the first data sample; determining: determining a seasonal period of the first data sample if the first data sample is determined to not pass the stationarity check; the operation steps are as follows: performing differential operation on the first data sample according to the seasonal period to obtain a second data sample; circularly executing the determining step and the operation step until the finally obtained target second data sample passes through the stationarity check; and establishing an autocorrelation diagram and a partial correlation diagram of the target second data sample, and establishing the data prediction model according to the autocorrelation diagram and the partial correlation diagram.
Optionally, after the data prediction model is built according to the autocorrelation graph and the partial correlation graph, the method further includes: determining a first parameter of the data prediction model through a preset estimation method; and (3) checking: performing accuracy check on the data prediction model according to the first parameter; an adjustment step: re-determining the first parameter by adjusting an order of the data prediction model, wherein the order of the data prediction model is determined by the autocorrelation map and the partial correlation map, if the data prediction model fails the accuracy check; and circularly executing the checking step and the adjusting step until the data prediction model passes the accuracy check.
Optionally, classifying the N second data samples by the N data prediction models includes: predicting N fourth data samples through the N data prediction models, wherein the N fourth data samples are predicted values of the N second data samples; classifying the N second data samples according to the N fourth data samples.
Optionally, classifying the N second data samples according to the N fourth data samples includes: calculating the fluctuation value of each second data sample according to the N fourth data samples to obtain N fluctuation values; acquiring fluctuation early warning values of the N second data samples; classifying the corresponding second data sample into a second class under the condition that the fluctuation value is smaller than or equal to the fluctuation early-warning value, wherein the second class is used for indicating that the second data sample is normal; and classifying the corresponding second data sample into the first category under the condition that the fluctuation value is larger than the fluctuation early-warning value, wherein the first category is used for indicating that the second data sample is abnormal.
Optionally, after determining K third data samples belonging to the first category in the target report data, the method further includes: and modifying the data states of the K third data samples into target states, and sending prompt information to the first object, wherein the target states are used for indicating that the third data samples are forbidden to be sent to the second object.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method of classifying a data sample, comprising:
classifying historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer;
respectively establishing N data prediction models according to the N first data samples;
classifying the target report data according to the M data attributes to obtain N second data samples;
And classifying the N second data samples through the N data prediction models respectively to determine K third data samples belonging to the first category in the target report data, wherein K is a positive integer, K is smaller than or equal to N, and the N data prediction models and the N second data samples are in one-to-one correspondence with the N first data samples.
2. The method of claim 1, wherein classifying the historical report data according to M data attributes to obtain N first data samples comprises:
determining first numbers of categories of data sub-attributes contained in each data attribute in the M data attributes to obtain M first numbers;
determining N data categories contained in the M data attributes, wherein each data category of the N data categories contains one data sub-attribute of each data attribute of the M data attributes, and N is the product of the M first numbers;
and classifying the historical report data according to the N data categories to obtain the N first data samples.
3. The method of claim 1, wherein building a data prediction model from the first data sample comprises:
Performing stationarity check on the first data sample;
determining: determining a seasonal period of the first data sample if the first data sample is determined to not pass the stationarity check;
the operation steps are as follows: performing differential operation on the first data sample according to the seasonal period to obtain a second data sample;
circularly executing the determining step and the operation step until the finally obtained target second data sample passes through the stationarity check;
and establishing an autocorrelation diagram and a partial correlation diagram of the target second data sample, and establishing the data prediction model according to the autocorrelation diagram and the partial correlation diagram.
4. A method according to claim 3, wherein after establishing the data prediction model from the autocorrelation map and the partial correlation map, the method further comprises:
determining a first parameter of the data prediction model through a preset estimation method;
and (3) checking: performing accuracy check on the data prediction model according to the first parameter;
an adjustment step: re-determining the first parameter by adjusting an order of the data prediction model, wherein the order of the data prediction model is determined by the autocorrelation map and the partial correlation map, if the data prediction model fails the accuracy check;
And circularly executing the checking step and the adjusting step until the data prediction model passes the accuracy check.
5. The method of claim 1, wherein classifying the N second data samples by the N data prediction models, respectively, comprises:
predicting N fourth data samples through the N data prediction models, wherein the N fourth data samples are predicted values of the N second data samples;
classifying the N second data samples according to the N fourth data samples.
6. The method of claim 5, wherein classifying the N second data samples according to the N fourth data samples comprises:
calculating the fluctuation value of each second data sample according to the N fourth data samples to obtain N fluctuation values;
acquiring fluctuation early warning values of the N second data samples;
classifying the corresponding second data sample into a second class under the condition that the fluctuation value is smaller than or equal to the fluctuation early-warning value, wherein the second class is used for indicating that the second data sample is normal;
and classifying the corresponding second data sample into the first category under the condition that the fluctuation value is larger than the fluctuation early-warning value, wherein the first category is used for indicating that the second data sample is abnormal.
7. The method of claim 1, wherein after determining K third data samples belonging to the first category in the target delivery data, the method further comprises:
and modifying the data states of the K third data samples into target states, and sending prompt information to the first object, wherein the target states are used for indicating that the third data samples are forbidden to be sent to the second object.
8. A data sample classification device, comprising:
the first classification module is used for classifying the historical report data according to M data attributes to obtain N first data samples, wherein M, N is a positive integer;
the establishing module is used for respectively establishing N data prediction models according to the N first data samples;
the second classification module is used for classifying the target report data according to the M data attributes to obtain N second data samples;
the determining module is configured to classify the N second data samples according to the N data prediction models, so as to determine K third data samples belonging to the first category in the target report data, where K is a positive integer, K is less than or equal to N, and the N data prediction models and the N second data samples are all in one-to-one correspondence with the N first data samples.
9. A processor for running a program, wherein the program when run performs the method of any one of claims 1 to 7.
10. An electronic device comprising one or more processors and memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
CN202310848166.8A 2023-07-11 2023-07-11 Data sample classification method and device, processor and electronic equipment Pending CN116881787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310848166.8A CN116881787A (en) 2023-07-11 2023-07-11 Data sample classification method and device, processor and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310848166.8A CN116881787A (en) 2023-07-11 2023-07-11 Data sample classification method and device, processor and electronic equipment

Publications (1)

Publication Number Publication Date
CN116881787A true CN116881787A (en) 2023-10-13

Family

ID=88256269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310848166.8A Pending CN116881787A (en) 2023-07-11 2023-07-11 Data sample classification method and device, processor and electronic equipment

Country Status (1)

Country Link
CN (1) CN116881787A (en)

Similar Documents

Publication Publication Date Title
Řezáč et al. How to measure the quality of credit scoring models
WO2003096237A2 (en) Electronic data processing system and method of using an electronic data processing system for automatically determining a risk indicator value
CN111242793B (en) Medical insurance data abnormality detection method and device
CN112801801A (en) Model training method, risk identification method, model, device, equipment and medium
CN116126843A (en) Data quality evaluation method and device, electronic equipment and storage medium
CN113554228B (en) Training method of repayment rate prediction model and repayment rate prediction method
CN114584601A (en) User loss identification and intervention method, system, terminal and medium
CN110910241B (en) Cash flow evaluation method, apparatus, server device and storage medium
CN110458581B (en) Method and device for identifying business turnover abnormality of commercial tenant
CN116977063A (en) Loan risk monitoring device, method, equipment and storage medium
CN116881787A (en) Data sample classification method and device, processor and electronic equipment
CN116228312A (en) Processing method and device for large-amount point exchange behavior
US20220156666A1 (en) Systems and methods for confidence interval transaction settlement range predictions
CN111311086A (en) Capacity monitoring method and device and computer readable storage medium
CN115423588A (en) Method and device for determining account hanging type, storage medium and electronic device
CN116051262A (en) Training method of risk prediction model, risk prediction method and device
CN114596681B (en) Method and device for processing exception of circulator
CN113837863B (en) Business prediction model creation method and device and computer readable storage medium
CN112396513B (en) Data processing method and device
CN116881349A (en) Service state detection method and device of service system and electronic equipment
CN113723710A (en) Customer loss prediction method, system, storage medium and electronic equipment
CN112580840A (en) Data analysis method and device
CN115953040A (en) Assessment plan processing method, assessment plan processing device, assessment plan processor and computer readable storage medium
CN117033726A (en) Report access duration prediction method and device, processor and electronic equipment
CN117171652A (en) Management body data quality monitoring method, system, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination